On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

  1. Amaia Iparragirre
  2. Irantzu Barrio
  3. María Xosé Rodríguez-Álvarez
Sort: Statistics and Operations Research Transactions

ISSN: 1696-2281

Ano de publicación: 2019

Volume: 43

Número: 1

Páxinas: 145-162

Tipo: Artigo

DOI: 10.2436/20.8080.02.82 DIALNET GOOGLE SCHOLAR lock_openAcceso aberto editor

Outras publicacións en: Sort: Statistics and Operations Research Transactions


When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap.

Información de financiamento

This study was partially supported by grants Severo Ochoa Program SEV-2013-0323, Basque Government BERC Program 2018-2021, IT620-13 from the Departamento de Educación, Política Lingüística y Cultura del Gobierno Vasco and through project MTM2017-82379-R funded by (AEI/FEDER, UE) and acronym “AFTERAM”, and projects MTM2014-55966-P and MTM2016-74931-P from the Ministerio de Economía y Competitividad and FEDER. Amaia Iparragirre was partially supported by an Inter-ship Position at BCAM - Basque Centre for Applied Mathematics.

Referencias bibliográficas

  • Airola, A., Pahikkala, T., Waegeman, W., De Baets, B., and Salakoski, T. (2011). An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Computational Statistics & Data Analysis, 55, 1828–1844.
  • Austin, P.C. and Steyerberg, E.W. (2017). Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Statistical Methods in Medical Research, 26, 796–808.
  • Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12, 387–415.
  • Bradley, A.P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145–1159.
  • Copas, J. and Corbett, P. (2002). Overestimation of the receiver operating characteristic curve for logistic regression. Biometrika, 89, 315–331.
  • Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78, 316–331.
  • Efron, B. (1986). How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81, 461–470.
  • Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap. New York: Chapman & Hall/CRC.
  • Garcia-Gutierrez, S., Quintana, J.M., Antón-Ladislao, A., Gallardo, M.S., Pulido, E., Rilo, I., Zubillaga, E., Morillas, M., Onaindia, J.J., Murga, N., et al. (2017). Creation and validation of the acute heart failure risk score: AHFRS. Internal and Emergency Medicine, 12, 1197–1206.
  • Hanley, J.A. and McNeil, B.J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29–36.
  • Harrell, F.E. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. New York: Springer.
  • Harrell, F.E., Lee, K.L. and Mark, D.B. (1996). Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine, 15, 361–387.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA.
  • Hosmer, D.W. and Lemeshow, S. (2000). Applied Logistic Regression. New York, N.Y.: Wiley.
  • Lachenbruch, P.A. and Mickey, M.R. (1968). Estimation of error rates in discriminant analysis. Technometrics, 10, 1–11.
  • McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall/CRC.
  • Parker, B.J., Günter, S. and Bedo, J. (2007). Stratification bias in low signal microarray studies. BMC Bioinformatics, 8, 326.
  • Pepe, M. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series. Oxford University Press.
  • Picard, R.R. and Berk, K.N. (1990). Data splitting. The American Statistician, 44, 140–147.
  • Quintana, J., Esteban, C., Unzurrunzaga, A., Garcia-Gutierrez, S., Gonzalez, N., Lafuente, I., Bare, M., de Larrea, N.F., Vidal, S., et al. (2014). Prognostic severity scores for patients with COPD exacerbations attending emergency departments. The International Journal of Tuberculosis and Lung Disease, 18, 1415–1420.
  • Smith, G. C.S., Seaman, S.R., Wood, A.M., Royston, P. and White, I.R. (2014). Correcting for optimistic prediction in small data sets. American Journal of Epidemiology, 180, 318–324.
  • Snee, R.D. (1977). Validation of regression models: methods and examples. Technometrics, 19, 415–428.
  • Steyerberg, E. (2009). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer Science & Business Media.
  • Steyerberg, E.W., Bleeker, S.E., Moll, H.A., Grobbee, D.E. and Moons, K.G. (2003). Internal and external validation of predictive models: a simulation study of bias and precision in small samples. Journal of Clinical Epidemiology, 56, 441–447.
  • Steyerberg, E.W., Harrell, F.E., Borsboom, G.J., Eijkemans, M., Vergouwe, Y. and Habbema, J.F. (2001). Internal validation of predictive models. Journal of Clinical Epidemiology, 54, 774–781.
  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society. Series B (Methodological), 36, 111–147.
  • Swets, J.A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293.
  • van Smeden, M., Moons, K.G., de Groot, J.A., Collins, G.S., Altman, D.G., Eijkemans, M.J. and Reitsma, J.B. (2018). Sample size for binary logistic prediction models: Beyond events per variable criteria. Statistical Methods in Medical Research, in press.
  • Wada, T., Yasunaga, H., Yamana, H., Matsui, H., Fushimi, K. and Morimura, N. (2017). Development and validation of an ICD-10-based disability predictive index for patients admitted to hospitals with trauma. Injury, in press.
  • Wishart, G., Bajdik, C., Dicks, E., Provenzano, E., Schmidt, M., Sherman, M., Greenberg, D., Green, A., Gelmon, K., Kosma, V., et al. (2012). PREDICT Plus: development and validation of a prognostic model for early breast cancer that includes HER2. British Journal of Cancer, 107, 800–807.