The Evolution of Shrinkage Estimators between Statistical Theory and Modern Intelligent Applications

Aisha Mohamed Abutartour

doi:10.65540/jar.v30i1.1402

المؤلفون

Aisha Mohamed Abutartour Statistics Department Faculty of Science University of Tripoli Tripoli Libya

DOI:

https://doi.org/10.65540/jar.v30i1.1402

الكلمات المفتاحية:

Bias and Variance، Model Stability، Model Regularization، Overfitting، Shrinkage Estimator

الملخص

Mathematical statistics underwent a major transformation during the last century with the discovery of shrinkage estimators, which provided an improved version of the classical tradition based on unbiased estimators. Since the discovery of Stein's theory in the early 1960s, accepting a degree of bias has become an effective tool for reducing overall error and producing more stable estimators. This paper aims to provide a comprehensive review of the historical and theoretical development of shrinkage estimators, from the James-Stein model to modern models such as ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), and elastic net regression. It also focuses on their fundamental statistical basis and their growing role in the development of artificial intelligence (AI) and deep learning models. The paper concludes by presenting a unified vision that links statistical theory and modern intelligent applications by adopting shrinkage estimators as an integrated framework that combines mathematical precision with applied innovation in complex data environments. This unified framework illustrates how shrinkage estimators have evolved to become the cornerstone of intelligent predictive modeling.

السيرة الشخصية للمؤلف

Aisha Mohamed Abutartour، Statistics Department Faculty of Science University of Tripoli Tripoli Libya

المراجع

Aldrich, J. (1997). RA Fisher and the making of maximum likelihood 1912-1922. Statistical Science, 12(3), 162-176.‏ DOI: https://doi.org/10.1214/ss/1030037906

Casella, G., & Berger, R. (2024). Statistical inference. Chapman and Hall/CRC.‏ DOI: https://doi.org/10.1201/9781003456285

Çınar, Z. M., Abdussalam Nuhu, A., Zeeshan, Q., Korhan, O., Asmael, M., & Safaei, B. (2020). Machine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0. Sustainability, 12(19), 8211.‏ DOI: https://doi.org/10.3390/su12198211

D'Angelo, F., Andriushchenko, M., Varre, A. V., & Flammarion, N. (2023). Why do we need weight decay in modern deep learning?. Advances in Neural Information Processing Systems, 37, 23191-23223.‏ DOI: https://doi.org/10.52202/079017-0730

Donoho, D. L., & Johnstone, I. M. (1994). Minimax risk over lp-balls for lp-error. Probability Theory and Related Fields, 99(2), 277-303. DOI: https://doi.org/10.1007/BF01199026

Efron, B., & Morris, C. (1977). Stein's paradox in statistics. Scientific American, 236(5), 119-127.‏ DOI: https://doi.org/10.1038/scientificamerican0577-119

Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, 222(594-604), 309-368.‏ DOI: https://doi.org/10.1098/rsta.1922.0009

Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1, No. 2). Cambridge: MIT Press.‏

Gui, J., & Li, H. (2005). Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 21(13), 3001-3008.‏ DOI: https://doi.org/10.1093/bioinformatics/bti422

Hastie, T. (2009). The elements of statistical learning: data mining, inference, and prediction.‏

Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity. Monographs on statistics and applied probability, 143(143), 8.‏ DOI: https://doi.org/10.1201/b18401

Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.‏ DOI: https://doi.org/10.1080/00401706.1970.10488634

Jacob, L., Obozinski, G., & Vert, J. P. (2009, June). Group lasso with overlap and graph lasso. In Proceedings of the 26th annual international conference on machine learning (pp. 433-440).‏ DOI: https://doi.org/10.1145/1553374.1553431

James, W., & Stein, C. (1961, June). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 1961, pp. 361-379).‏

Korobilis, D., & Shimizu, K. (2022). Bayesian approaches to shrinkage and sparse estimation. Foundations and Trends® in Econometrics, 11(4), 230-354.‏ DOI: https://doi.org/10.1561/0800000041

Krogh, A., & Hertz, J. (1991). A simple weight decay can improve generalization. Advances in neural information processing systems, 4.‏

Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis, 88(2), 365-411.‏ DOI: https://doi.org/10.1016/S0047-259X(03)00096-4

Lehmann, E. L., & Casella, G. (1998). Theory of point estimation. New York, NY: Springer New York.‏

Salerno, S., & Li, Y. (2023). High-dimensional survival analysis: Methods and applications. Annual review of statistics and its application, 10(1), 25-49.‏ DOI: https://doi.org/10.1146/annurev-statistics-032921-022127

Schäfer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology, 4(1).‏ DOI: https://doi.org/10.2202/1544-6115.1175

Stein, C. (1956, January). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability, volume 1: Contributions to the theory of statistics (Vol. 3, pp. 197-207). University of California Press.‏ DOI: https://doi.org/10.1525/9780520313880-018

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.‏ DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Vilhjálmsson, B. J., Yang, J., Finucane, H. K., Gusev, A., Lindström, S., Ripke, S., ... & Marsal, S. (2015). Modeling linkage disequilibrium increases accuracy of polygenic risk scores. The american journal of human genetics, 97(4), 576-592.‏ DOI: https://doi.org/10.1101/015859

Wettstein, A., Jenni, G., Schneider, I., & Kühne, F. (2023). grosse Holtforth, M.; La Marca, R. Predictors of Psychological Strain and Allostatic Load in Teachers: Examining the Long-Term Effects of Biopsychosocial Risk and Protective Factors Using a LASSO Regression Approach. Int. J. Environ. Res. Public Health, 20, 5760.‏ DOI: https://doi.org/10.3390/ijerph20105760

You, C., Su, G. H., Zhang, X., Xiao, Y., Zheng, R. C., Sun, S. Y., ... & Gu, Y. J. (2024). Multicenter radio-multiomic analysis for predicting breast cancer outcome and unravelling imaging-biological connection. NPJ Precision Oncology, 8(1), 193.‏ DOI: https://doi.org/10.1038/s41698-024-00666-y

Yu, W., & Zhao, C. (2019). Robust monitoring and fault isolation of nonlinear industrial processes using denoising autoencoder and elastic net. IEEE Transactions on Control Systems Technology, 28(3), 1083-1091.‏ DOI: https://doi.org/10.1109/TCST.2019.2897946

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320.‏ DOI: https://doi.org/10.1111/j.1467-9868.2005.00503.x