Richard D. Riley
Minimum sample size for developing a multivariable prediction model: Part II-binary and time-to-event outcomes
Riley, Richard D.; Snell, Kym I. E.; Ensor, Joie; Burke, Danielle L.; Harrell, Frank E.; Moons, Karel G. M.; Collins, Gary S.
Authors
Kym I. E. Snell
Joie Ensor
Danielle L. Burke
Frank E. Harrell
Karel G. M. Moons
Gary S. Collins
Abstract
When designing a study to develop a new prediction model with binary or time-to-event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of = 0.9, (ii) small absolute difference of = 0.05 in the model's apparent and adjusted Nagelkerke's R2 , and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox-Snell R2 , which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 13, 2018 |
Online Publication Date | Oct 24, 2018 |
Publication Date | Mar 30, 2019 |
Publicly Available Date | May 26, 2023 |
Journal | Statistics in Medicine |
Print ISSN | 0277-6715 |
Publisher | Wiley |
Peer Reviewed | Peer Reviewed |
Volume | 38 |
Issue | 7 |
Pages | 1276-1296 |
DOI | https://doi.org/10.1002/sim.7992 |
Keywords | binary and time-to-event outcomes, logistic and Cox regression, multivariable prediction model,pseudo R-squared, sample size, shrinkage |
Publisher URL | https://doi.org/10.1002/sim.7992 |
Files
Riley_et_al-2019-Statistics_in_Medicine.pdf
(2.1 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc/4.0/
You might also like
Predicting and preventing relapse of depression in primary care: a mixed methods study
(2023)
Journal Article
Downloadable Citations
About Keele Repository
Administrator e-mail: research.openaccess@keele.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search