Modeller, tahminin en önemli bileşenidir. Ancak her veri kümesi farklı ilişkileri tanımlaması gereken farklı değişken türleri içermektedir ve her model tipinin veri seti ile ilgili kısıtlamaları bulunmaktadır. Bu nedenle, aranan ilişkiyi doğru tanımlayabilecek tahmin modelinin seçilmesi önemlidir. Model seçimi ile ilgili literatürde yer alan çalışmalar, örneklem büyüklüğü, model yapısı, verilerin dağılımı, tahmin yöntemleri ve modelde yer alan değişken sayısı gibi birçok faktörün model seçim kriterlerinin sonuçlarını etkilediğini göstermiştir. Bu durum araştırmacıların en iyi model seçim kriterini ve özelliklerini merak etmelerine neden olmaktadır. Modeli doğrulayan indeksleri kullanmak yerine, modelin verilere uygunluğunu en iyi şekilde değerlendiren uygun indekslerin seçilmesi önerilmesine rağmen, pratikte bu durum oldukça zor ve karmaşıktır. Her ne kadar çalışmalarda bir modeli değerlendirmek için bazı kriterlerin kullanılması önerilmesine rağmen, her bir çalışmada kullanılan veriler birbirinden tamamen farklı olacağından, bu önerilerin genelleştirilemediği görülmektedir. Nicel bir ölçüt olan model değerlendirme kriterleri, tanımlayıcı yeterlilik, basitlik ve genelleştirilebilirlik gibi özellikler içermektedir. Bir modelin yeterliliğini tam olarak değerlendirmek için bu üç özelliğin üçünün de aynı anda değerlendirilmesi gerekmektedir. Çalışmanın amacı, çeşitli performans kriterlerine ve bunların sınıflandırılmasına yönelik genel bir bakış sağlamaktır.
Anahtar Kelimeler: Performans ölçüleri; modelleme; model seçimi
Models are the most important component of estimation. However, each dataset contains different types of variables that need to define different relationships, and each model type has constraints on the dataset. Thus, it is important to select forecasting model that can define the sought relationship properly. Studies in the literature on model selection have shown that many factors such as sample size, model structure, distribution of data, estimation methods, and number of variables in the model affect the results of the model selection criteria. This makes researchers wonder about the best model selection criteria and features. Although it is recommended to select appropriate indexes that best evaluate the suitability of the model to data rather than using indexes that confirm the model, in practice this is quite difficult and complex. Although some criteria are suggested to evaluate a model in the studies, it is seen that these recommendations cannot be generalized because the data used in each study will be completely different from each other. Model evaluation criteria, which are quantitative measures, include descriptive adequacy (whether the model fits observed data), simplicity (whether the model's description of observed data is achieved in the simplest possible manner) and generalizability (whether the model provides a good predictor of future observations). To fully assess the adequacy of a model, all three of these features need to be evaluated at the same time. The aim of the study is to provide an overview of the various performance criteria and their classification.
Keywords: Performance metrics; modeling; model selection
- Forster MR. Key concepts in model selection: performance and generalizability. J Math Psychol. 2000;44(1):205-31. [Crossref] [PubMed]
- Sayyareh A, Obeidi R, Bar-Hen A. Empiricial comparison between some model selection criteria. Commun Stat Simulat. 2010;40(1):72-86. [Crossref]
- Botchkarev A. A new typology design of performance metrics to measure errors in machine learning regression algorithms. IJIKM. 2019;14:45-76. [Crossref]
- Kadane JB, Lazar NA. Methods and criteria for model selection. J Am Stat Assoc. 2004;99(465):279-90. [Crossref]
- Pearson K. Karl Pearson's Early Statistical Papers. 1st ed. Cambridge: Cambridge University Press; 1948. p.557.
- Fox DG. Judging air quality model performance. B Am Meteorol Soc. 1981;62(5):599-609. [Crossref]
- Muroi H, Takeshita Y, Adachi S. Model validation criteria for system identification in time domain. T Soc Instr Control Eng. 2015;51(10):80-91. [Crossref]
- Duveiller G, Fasbender D, Meroni M. Revisiting the concept of a symmetric index of agreement for continuous datasets. Sci Rep-UK. 2016;6:19401. [Crossref] [PubMed] [PMC]
- Theil H. Economic Forecasts and Policy. 2nd ed. Amsterdam: North-Holland Pub Co;1961. p.657.
- Legates DR, McCabe GJ. Evaluating the use of goodness of fit measures in hydrologic and hydroclimatic model validation. Water Resour Res. 1999;35(1):233-41. [Crossref]
- Krause P, Boyle DP, Base F. Comparison of different efficiency criteria for hydrological model assessment. ADGEO. 2005;5:89-97. [Crossref]
- Gupta HV, Kling H, Yilmaz KK, Martinez GF. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J Hydrol. 2009;377(1-2):80-91. [Crossref]
- Kling H, Fuchs M, Paulin M. Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios. J Hydrol. 2012;424-425:264-77. [Crossref]
- Akaike H. A new look at the statistical model identification. IEEE T Automat Contr. 1974;19(6):716-23. [Crossref]
- Sawa T. Information criteria for discriminating among alternative regression models. Econometrica. 1978;46(6):1273-91. [Crossref]
- Schwarz G. Estimating the dimension of a model. Ann Statist. 1978;6(2):461-4. [Crossref] [Crossref]
- Hannan EJ, Quinn BG. The determination of the order of an autoregression. J Roy Stat Soc B Met. 1979;41(2):190-5. [Crossref]
- Hurvich CM, Tsai CL. Regression and time series model selection in small samples. Biometrika. 1989;76(2):297-307. [Crossref]
- Mallows CL. Some comments on Cp. Technometrics. 1973;15(4):661-75. [Crossref] [Crossref]
- Hocking RR. A biometrics invited paper. The analysis and selection of variables in linear regression. Biometrics. 1976;32(1):1-49. [Crossref]
- Golub GH, Heath M, Wahba G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics. 1979;21(2):215-23. [Crossref]
- Amemiya T. Selection of regressors. Int Econ Rev. 1980;21(2):331-54. [Crossref]
- Geweke J, Meese R. Estimating regression models of finite but unknown order. Int Econ Rev. 1981;22(1):55-70. [Crossref]
- Shibata R. An optimal selection of regression variables. Biometrika. 1981;68(1):45-54. [Crossref]
- Rissanen J. Stochastic complexity. J Roy Stat Soc B Met. 1987;49(3):223-9. [Crossref]
- Robinson WS. The statistical measurement of agreement. Am Sociol Rev. 1957;22(1):17-25. [Crossref]
- Nash JE, Sutcliffe JV. River flow forecasting through conceptual models: part I-A discussion of principles. J Hydrol. 1970;10(3):282-90. [Crossref]
- Willmott CJ. On the validation of models. Phys Geog. 1981;2(2):184-94. [Crossref]
- Mielke PW Jr. The application of multivariate permutation methods based on distance functions in the earth sciences. Earth-Sci Rev. 1991;31(1):55-71. [Crossref]
- Loague K, Green RE. Statistical and graphical methods for evaluating solute transport models: overwiev and application. J Contam Hydrol. 1991;7(1-2):51- 73. [Crossref]
- Watterson LG. Non-dimensional measures of climate model performance. Int J Climatol. 1996;16(4):379-91. [Crossref]
- Ji L, Gallo K. An agreement coefficient for image comparison. Photogramm Eng Rem S. 2006;72(7):823-33. [Crossref]
- Fisher RA. A mathematical examination of the methods of determining the accuracy of an observation by the mean error, and by the mean square error. Mon Not R Astron Soc. 1920;80(8):758-70. [Crossref]
- Patry GG, Marino MA. Nonlinear runoff modelling: parameter identification. J Hydraul Eng. 1983;109(6):865-80. [Crossref]
- Manley RE. Calibration of hydrological model using optimization technique. J Hydraul Div ASCE. 1978;189-202.
- Stephenson D. Direct optimization of Muskingum routing coefficients. J Hydrol. 1979;41(1-2):161-5. [Crossref]
- Dust M, Baran N, Errera G, Hutson JL, Mouvet C, Schafer H, et al. Simulation of water and solute transporte in field soils with the LEACHP model. Agr Water Manage. 2000;44(1-3):225-45. [Crossref]
- Nayak PC, Sudheer KP, Rangan DM, Ramasastri KS. A neuro-fuzzy computing technique for modeling hydrological time series. J Hydrol. 2004;291(1- 2):52-66. [Crossref]
- Willmott CJ, Robeson SM, Matsuura K. Short communication: a refined index of model performance. Int J Climatol. 2012;32(13):2088-94. [Crossref]
- Joreskog KG. A general method for estimating a linear structural equation system. In: Goldberger AS, Duncan OD, eds. Structural Equation Models in the Social Sciences. 1st ed. New York: Seminar Press; 1973. p.83-112.
- Tucker LR, Lewis C. A reliability coefficient for maximum likelihood factor analysis. Psychometrika. 1973;38(1):1-10. [Crossref]
- Wheaton B, Muthén B, Alwin DF, Summers GF. Assessing reliability and stability in panel models. Sociol Methodol. 1977;8:84-136. [Crossref]
- Bentler PM, Bonett DG. Significance tests and goodness of fit in the analysis of covariance structures. Psychol Bull. 1980;88(3):588-606. [Crossref]
- Joreskog KG, Sorbom D. Lisrel V. Analysis of Linear Structural Relations by the Method of Maximum Likelihood. 2nd ed. Chicago: International Educational Services; 1981.
- Hoelter JW. The analysis of covariance structures. Sociol Method Res. 1983;11(3):325-44. [Crossref]
- Bollen KA. A new incremental fit index for general structural equation models. Sociol Method Res. 1989;17(3):303-16. [Crossref]
- Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107(2):238-46. [Crossref] [PubMed]
- McDonald RP, Marsh HW. Choosing a multivariate model: noncentrality and goodness of fit. Psychol Bull. 1990;107(2):247-55. [Crossref]
- Browne MW, Cudeck R. Alternative ways of assessing model fit. Sociol Method Res. 1992;21(2):230-58. [Crossref]
- Steiger JH. Point estimation, hypothesis testing and interval estimation using the RMSEA: some comments and a reply to Hayduk and Glaser. Struct Equ Modeling. 2000;7(2):149-62. [Crossref]
- Ding J, Tarokh V, Yang Y. Model selection techniques: an overview. IEEE Signal Proc Mag. 2018;35(6):16-34. [Crossref]
- Pereira HR, Meschiatti MC, Pires RCM, Blain GC. On the performance of three indices of agreement: an easy-to-use r-code for calculating the Willmott indices. Bragantia. 2018;77(2):394-403. [Crossref]
- Maydeu-Olivares A, Garcia-Forero C. Goodness-of-fit testing. International Encyclopedia of Education; 2010. p.190-6. [Crossref]
- Dziak JJ, Coffman DL, Lanza ST, Li R. Sensitivity and specificity of information criteria. PeerJ Preprints. 2017:5:e1103v3. [Crossref]
- Navarro DJ, Myung JJ. Model evaluation. In: Everitt B, Howel D, eds. Encyclopedia of Statistics in Behavioral Science. 1st ed. New York: John Wiley & Sons; 2005. p.1239-42. [Crossref]
- Rakotoasimbola E, Blili S. Measures of fit impacts: application to the causal model of consumer involvement. Int J Market Res. 2019;61(1):77-92. [Crossref]
- Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res. 2005;30(1):79-82. [Crossref]
- Willmott CJ, Matsuura K, Robeson SM. Ambiguities inherent in sums-of-squares-based error statistics. Atmos Environ. 2009;43:749-52. [Crossref]
- Lin TH, Dayton CM. Model selection information criteria for non-nested latent class models. J Educ Behav Stat. 1997;22(3):249-64. [Crossref]
- Nadif M, Govaert G. Clustering for binary data and mixture models--choice of the model. Appl Stoch Model D A. 1998;13(3-4):269-78. [Crossref]
- McDonald RP. An index of goodness-of-fit based on noncentrality. J Classif. 1989;6(1):97-103. [Crossref]
- Guse B, Pfannerstill M, Gafurov A, Kiesel J, Lehr C, Fohrer N. Identifying the connective strength between model parameters and performance criteria. Hydrol Earth Syst Sci. 2017;21(11):5663-79. [Crossref]
.: Process List