Meta-analysis of prediction model performance across multiple studies : Which scale helps ensure between-study normality for the C-statistic and calibration measures?
TL;DR: A simulation study to examine the normality assumption for various performance measures relating to a logistic regression prediction model found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects.
read more
Abstract: If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
PROBAST : A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration
Karel G.M. Moons,Robert Wolff,Richard D Riley,Penny Whiting,Marie Westwood,Gary S. Collins,Johannes B. Reitsma,Jos Kleijnen,Susan Mallett +8 more
TL;DR: The rationale behind the domains and signaling questions, how to use them, and how to reach domain-level and overall judgments about ROB and applicability of primary studies to a review question are described.
Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis
Johanna A A G Damen,Romin Pajouheshnia,Pauline Heus,Karel G.M. Moons,Johannes B. Reitsma,Rob J P M Scholten,Lotty Hooft,Thomas P. A. Debray +7 more
TL;DR: The Framingham Wilson, ATP III and PCE discriminate comparably well but all overestimate the risk of developing CVD, especially in higher risk populations, and it is highly recommend that researchers further explore reasons for overprediction and that the models be updated for specific populations.
A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes.
Thomas P. A. Debray,Johanna A A G Damen,Richard D Riley,Kym I E Snell,Johannes B. Reitsma,Lotty Hooft,Gary S. Collins,Karel G.M. Moons +7 more
TL;DR: This work discusses how to undertake meta-analysis of the performance of prediction models with either a binary or a time-to-event outcome, and addresses how to deal with incomplete availability of study-specific results and summary estimates of the c-statistic, the observed:expected ratio and the calibration slope.
175
Validation of models to diagnose ovarian cancer in patients managed surgically or conservatively: multicentre cohort study.
Ben Van Calster,Ben Van Calster,Lil Valentin,Wouter Froyman,Chiara Landolfo,Chiara Landolfo,Jolien Ceusters,Antonia Carla Testa,Laure Wynants,Laure Wynants,Povilas Sladkevicius,Caroline Van Holsbeke,Ekaterini Domali,Robert Fruscio,Elisabeth Epstein,Dorella Franchi,Marek Kudla,Valentina Chiappa,Juan Luis Alcázar,F. Leone,F. Buonomo,Maria Elisabetta Coccia,Stefano Guerriero,Nandita Deo,Ligita Jokubkiene,Luca Savelli,Daniela Fischerova,Artur Czekierdowski,J. Kaijser,An Coosemans,Giovanni Scambia,Ignace Vergote,Tom Bourne,Tom Bourne,Dirk Timmerman +34 more
TL;DR: The study found the ADNEX models and SRRisk are the best models to distinguish between benign and malignant masses in all patients presenting with an adnexal mass, including those managed conservatively.
101
Discovery and validation of a personalized risk predictor for incident tuberculosis in low transmission settings.
Rishi K Gupta,Claire J. Calderwood,Alexei Yavlinsky,Maria Krutikov,Matteo Quartagno,Maximilian C. Aichelburg,Neus Altet,Roland Diel,Claudia C. Dobler,Claudia C. Dobler,José Domínguez,Joseph Doyle,Joseph Doyle,Connie Erkens,Steffen Geis,Pranabashis Haldar,Anja M. Hauri,Thomas Stig Hermansen,James C. Johnston,Christoph Lange,Berit Lange,Frank van Leth,Laura Muñoz,Christine Roder,Christine Roder,Kamila Romanowski,David Roth,Martina Sester,Rosa Sloot,Giovanni Sotgiu,Gerrit Woltmann,Takashi Yoshiyama,Jean-Pierre Zellweger,Dominik Zenner,Robert W Aldridge,Andrew Copas,Molebogeng X Rangaka,Marc Lipman,Marc Lipman,Mahdad Noursadeghi,Ibrahim Abubakar +40 more
TL;DR: A personalized risk predictor was developed to better target preventative treatment to individuals at greatest risk, supporting evidence-based clinical decision-making for latent TB.
References
A new approach to outliers in meta-analysis
Rose Baker,Dan Jackson +1 more
TL;DR: A model that allows a long-tailed distribution for the random effect, which removes the necessity for an arbitrary decision to include or exclude outliers is proposed, but with a reduced weight.
Assessing discriminative ability of risk models in clustered data.
TL;DR: It is argued that the within-cluster concordance probability is most relevant when a risk model supports decisions within clusters (e.g. who should be treated in a particular center) and meta-analysis of cluster-specific c-indexes is recommended.
Geographic and temporal validity of prediction models: different approaches were useful to examine model performance
Peter C. Austin,Peter C. Austin,David van Klaveren,David van Klaveren,Yvonne Vergouwe,Daan Nieboer,Douglas S. Lee,Ewout W. Steyerberg +7 more
TL;DR: This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods by studying different approaches to geographic and temporal validation in the setting of multicenterData from two time periods.
Multivariate meta-analysis of individual participant data helped externally validate the performance and implementation of a prediction model
Kym I E Snell,Harry Hua,Thomas P. A. Debray,Joie Ensor,Maxime P. Look,Karel G.M. Moons,Richard D Riley +6 more
TL;DR: Multivariate meta-analysis can be used to externally validate a prediction model's calibration and discrimination performance across multiple populations and to evaluate different implementation strategies.