Meta-analysis of prediction model performance across multiple studies : Which scale helps ensure between-study normality for the C-statistic and calibration measures?
TL;DR: A simulation study to examine the normality assumption for various performance measures relating to a logistic regression prediction model found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects.
read more
Abstract: If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Benign descriptors and ADNEX in two‐step strategy to estimate risk of malignancy in ovarian tumors: retrospective validation in IOTA5 multicenter cohort
Chiara Landolfo,Tom Bourne,Wouter Froyman,Ben Van Calster,Jolien Ceusters,A. C. Testa,Laure Wynants,Povilas Sladkevicius,C. Van Holsbeke,Ekaterini Domali,Robert Fruscio,Elisabeth Epstein,Dorella Franchi,Marek Kudla,Valentina Chiappa,J. L. Alcazar,Francesco Leone,Frances C. Buonomo,Maria Elisabetta Coccia,Stefano Guerriero,N.D. Deo,Ligita Jokubkiene,Luca Savelli,Daniela Fischerova,Artur Czekierdowski,J. Kaijser,An Coosemans,Giovanni Scambia,Ignace Vergote,Dirk Timmerman,Lil Valentin +30 more
TL;DR: A large proportion of adnexal masses can be classified as benign by the Benign Simple Descriptors, and for the remaining masses the ADNEX model can be used to estimate the risk of malignancy.
27
Prognostic models for predicting relapse or recurrence of major depressive disorder in adults
Andrew S Moriarty,Andrew S Moriarty,Nicholas Meader,Kym I E Snell,Richard D Riley,Lewis William Paton,Carolyn Chew-Graham,Simon Gilbody,Simon Gilbody,Rachel Churchill,Robert S. Phillips,Shehzad Ali,Shehzad Ali,Dean McMillan,Dean McMillan +14 more
TL;DR: In this article, a review of the predictive performance of prognostic models developed to predict the risk of relapse, recurrence, sustained remission, or recovery in adults with major depressive disorder who meet criteria for remission or recovery.
25
Transparent reporting of multivariable prediction models developed or validated using clustered data (TRIPOD-Cluster): explanation and elaboration
Thomas P. A. Debray,Gary S. Collins,Richard D Riley,Kym I E Snell,Ben Van Calster,Johannes B. Reitsma,Karel G.M. Moons +6 more
TL;DR: The TRIPOD-Cluster (transparent reporting of multivariable prediction models developed or validated using clustered data) statement comprises a 19 item checklist, which aims to improve the reporting of studies developing or validating a prediction model in clustered data, such as individual participant data meta-analyses and electronic health records (clustering by practice or hospital) as mentioned in this paper .
External validation, update and development of prediction models for pre-eclampsia using an Individual Participant Data (IPD) meta-analysis: the International Prediction of Pregnancy Complication Network (IPPIC pre-eclampsia) protocol
John Allotey,Kym I E Snell,Claire L Chan,Richard Hooper,Julie Dodds,Ewelina Rogozińska,Khalid S. Khan,Lucilla Poston,Louise C. Kenny,Jenny Myers,Basky Thilaganathan,Lucy C Chappell,Ben W.J. Mol,Peter von Dadelszen,Asif Ahmed,Marcus Green,Liona C. Poon,Liona C. Poon,Asma Khalil,Karel G.M. Moons,Richard D Riley,Shakila Thangaratinam +21 more
- 03 Oct 2017
TL;DR: This large-scale collaborative IPD approach encourages consensus towards well developed, and validated prognostic models, rather than a number of competing non-validated ones, and will allow development and validation of multivariable prediction model for the relatively rare outcome of early onset pre-eclampsia.
23
Risk models for recurrence and survival after kidney cancer: a systematic review
Omar Salim Akhtar
- 11 Jan 2022
TL;DR: A systematic review of the performance of prognostic models for predicting recurrence-free survival (RFS), cancer-specific survival (CSS) or overall survival (OS) in patients who have undergone surgical resection for localized renal cell cancer (RCC) was conducted by as discussed by the authors .
References
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
21.8K
Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors
TL;DR: In this article, an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities are discussed, which are particularly needed for binary, ordinal, and time-to-event outcomes.
Regression modeling strategies : with applications to linear models, logistic regression, and survival analysis
Frank E. Harrell
- 01 Jan 2001
TL;DR: In this article, the authors present a case study in least squares fitting and interpretation of a linear model, where they use nonparametric transformations of X and Y to fit a linear regression model.
8.7K
Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors
Frank E. Harrell,Kerry L. Lee,Daniel B. Mark +2 more
- 24 Aug 2005
TL;DR: An easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities are discussed, applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes.
7K
Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis
TL;DR: The basic Bayesian framework must be constrained, use of the step function in computing the probability that a team would rank best or worst in a league, and implementation of a Dirichlet process prior are presented.
5.2K