The Evolution of Shrinkage Estimators between Statistical Theory and Modern Intelligent Applications
DOI:
https://doi.org/10.65540/jar.v30i1.1402الكلمات المفتاحية:
Bias and Variance، Model Stability، Model Regularization، Overfitting، Shrinkage Estimatorالملخص
Mathematical statistics underwent a major transformation during the last century with the discovery of shrinkage estimators, which provided an improved version of the classical tradition based on unbiased estimators. Since the discovery of Stein's theory in the early 1960s, accepting a degree of bias has become an effective tool for reducing overall error and producing more stable estimators. This paper aims to provide a comprehensive review of the historical and theoretical development of shrinkage estimators, from the James-Stein model to modern models such as ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), and elastic net regression. It also focuses on their fundamental statistical basis and their growing role in the development of artificial intelligence (AI) and deep learning models. The paper concludes by presenting a unified vision that links statistical theory and modern intelligent applications by adopting shrinkage estimators as an integrated framework that combines mathematical precision with applied innovation in complex data environments. This unified framework illustrates how shrinkage estimators have evolved to become the cornerstone of intelligent predictive modeling.
المراجع
Aldrich, J. (1997). RA Fisher and the making of maximum likelihood 1912-1922. Statistical Science, 12(3), 162-176. DOI: https://doi.org/10.1214/ss/1030037906
Casella, G., & Berger, R. (2024). Statistical inference. Chapman and Hall/CRC. DOI: https://doi.org/10.1201/9781003456285
Çınar, Z. M., Abdussalam Nuhu, A., Zeeshan, Q., Korhan, O., Asmael, M., & Safaei, B. (2020). Machine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0. Sustainability, 12(19), 8211. DOI: https://doi.org/10.3390/su12198211
D'Angelo, F., Andriushchenko, M., Varre, A. V., & Flammarion, N. (2023). Why do we need weight decay in modern deep learning?. Advances in Neural Information Processing Systems, 37, 23191-23223. DOI: https://doi.org/10.52202/079017-0730
Donoho, D. L., & Johnstone, I. M. (1994). Minimax risk over lp-balls for lp-error. Probability Theory and Related Fields, 99(2), 277-303. DOI: https://doi.org/10.1007/BF01199026
Efron, B., & Morris, C. (1977). Stein's paradox in statistics. Scientific American, 236(5), 119-127. DOI: https://doi.org/10.1038/scientificamerican0577-119
Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, 222(594-604), 309-368. DOI: https://doi.org/10.1098/rsta.1922.0009
Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1, No. 2). Cambridge: MIT Press.
Gui, J., & Li, H. (2005). Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 21(13), 3001-3008. DOI: https://doi.org/10.1093/bioinformatics/bti422
Hastie, T. (2009). The elements of statistical learning: data mining, inference, and prediction.
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity. Monographs on statistics and applied probability, 143(143), 8. DOI: https://doi.org/10.1201/b18401
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67. DOI: https://doi.org/10.1080/00401706.1970.10488634
Jacob, L., Obozinski, G., & Vert, J. P. (2009, June). Group lasso with overlap and graph lasso. In Proceedings of the 26th annual international conference on machine learning (pp. 433-440). DOI: https://doi.org/10.1145/1553374.1553431
James, W., & Stein, C. (1961, June). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 1961, pp. 361-379).
Korobilis, D., & Shimizu, K. (2022). Bayesian approaches to shrinkage and sparse estimation. Foundations and Trends® in Econometrics, 11(4), 230-354. DOI: https://doi.org/10.1561/0800000041
Krogh, A., & Hertz, J. (1991). A simple weight decay can improve generalization. Advances in neural information processing systems, 4.
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis, 88(2), 365-411. DOI: https://doi.org/10.1016/S0047-259X(03)00096-4
Lehmann, E. L., & Casella, G. (1998). Theory of point estimation. New York, NY: Springer New York.
Salerno, S., & Li, Y. (2023). High-dimensional survival analysis: Methods and applications. Annual review of statistics and its application, 10(1), 25-49. DOI: https://doi.org/10.1146/annurev-statistics-032921-022127
Schäfer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology, 4(1). DOI: https://doi.org/10.2202/1544-6115.1175
Stein, C. (1956, January). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability, volume 1: Contributions to the theory of statistics (Vol. 3, pp. 197-207). University of California Press. DOI: https://doi.org/10.1525/9780520313880-018
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Vilhjálmsson, B. J., Yang, J., Finucane, H. K., Gusev, A., Lindström, S., Ripke, S., ... & Marsal, S. (2015). Modeling linkage disequilibrium increases accuracy of polygenic risk scores. The american journal of human genetics, 97(4), 576-592. DOI: https://doi.org/10.1101/015859
Wettstein, A., Jenni, G., Schneider, I., & Kühne, F. (2023). grosse Holtforth, M.; La Marca, R. Predictors of Psychological Strain and Allostatic Load in Teachers: Examining the Long-Term Effects of Biopsychosocial Risk and Protective Factors Using a LASSO Regression Approach. Int. J. Environ. Res. Public Health, 20, 5760. DOI: https://doi.org/10.3390/ijerph20105760
You, C., Su, G. H., Zhang, X., Xiao, Y., Zheng, R. C., Sun, S. Y., ... & Gu, Y. J. (2024). Multicenter radio-multiomic analysis for predicting breast cancer outcome and unravelling imaging-biological connection. NPJ Precision Oncology, 8(1), 193. DOI: https://doi.org/10.1038/s41698-024-00666-y
Yu, W., & Zhao, C. (2019). Robust monitoring and fault isolation of nonlinear industrial processes using denoising autoencoder and elastic net. IEEE Transactions on Control Systems Technology, 28(3), 1083-1091. DOI: https://doi.org/10.1109/TCST.2019.2897946
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320. DOI: https://doi.org/10.1111/j.1467-9868.2005.00503.x
التنزيلات
منشور
كيفية الاقتباس
إصدار
القسم
الرخصة
الحقوق الفكرية (c) 2026 Aisha Mohamed Abutartour

هذا العمل مرخص بموجب Creative Commons Attribution 4.0 International License.
