Amaç: Sağlık harcaması değişkeninin, normal dağılım özelliği göstermeyerek aşırı derecede sağa çarpık olması doğrusallıktan ayrılmayı beraberinde getirerek, regresyon modelini zayıflatmaktadır. Copas testi ile istatistiksel daraltıcı (Shrinkage) modellerin kullanılması yolu ile tahmin modeli performansının iyileştirilmesi mümkündür. Bu çalışmada, cepten sağlık harcamasını tahmin etmek amacıyla Copas testi ve istatistiksel daraltıcı modelleri uygulanarak model performansları karşılaştırılmıştır. Gereç ve Yöntemler: Türkiye İstatistik Kurumu 2012 yılı Hane halkı Bütçe Anketi verileri kullanılmıştır. Cepten sağlık harcamasının tahmininde ödeme gücü değişkeni ile hane reisinin cinsiyet, 65 yaş üzerinde olma, sağlık sigortası ve kır ya da kentte yerleşim durumu değişkenleri kullanılmıştır. Eğitim veri setinin orijinal veri setini %50-95 oranlarında temsil ettiği 10 farklı eğitim ve test veri setleri oluşturulmuş ve Copas testi kullanılarak, en iyi performans sergileyen veri seti seçilmiştir. En küçük kareler (EKK) regresyonu ile istatistiksel daraltıcı olarak da bilinen Lasso ve Ridge regresyon teknikleri uygulanmıştır. Bulgular: EKK modelinin en düşük ortalama karesel hataya sahip olduğu ve en iyi performansı sergilediği görülmüştür. Cepten sağlık harcamasını tahmin etmede en etkili değişken ödeme gücü olup, Lasso regresyonun, Ridge regresyona göre daha iyi tahmin gücüne sahip olduğu görülmüştür. Lasso regresyon için optimal büzülme parametre değerinin eğitim veri seti için λ=0,0158; test veri seti için ise λ=0,0630 olduğu bulunmuştur. Sonuç: Copas testinin sağlık harcaması gibi modellenmesi zor değişkenler için en iyi model performansının araştırılmasında yararlı bir teknik olduğu, EKK ve Lasso regresyonun iyi tahmin performansına sahip olduğu ve ödeme gücü değişkeninin modele en fazla katkı sağlayan değişken olduğu belirlenmiştir. Farklı regresyon modelleri kullanılarak yapılacak incelemeler ile çalışma bulgularının geliştirilmesi tavsiye edilmektedir.
Anahtar Kelimeler: Cepten sağlık harcaması; Copas testi; EKK regresyonu; Lasso regresyon; Ridge regresyon
Objective: Not normal and extremely positively skewed distribution of health expenditures causes departure from normality and poor regression performance. It is possible to improve prediction performance with the usage of Copas test and Shrinkage models. In this study Copas test and Shrinkage models are used and compared to predict out-of-pocket health expenditures. Material and Methods: TurkStat Household Budget Survey for the year 2012 was used. Capacity to pay, which is a kind of adjusted income variable, gender of head of household, being older than 65 years old, health insurance and rural or urban settlement variables are used to predict households out-of-pocket health expenditures. Ten different training and test data sets were created, in which the training data set represented 50% to 95% of the original data set, and the data set with the best performance was selected by applying the Copas test. Least squares regression (LSR) and Lasso and Ridge regression techniques are applied which are also known as Shrinkage methods. Results: It is seen that LSC model had the lowest mean square error and had superior performance. The most effective variable in predicting out-of-pocket health expenditure is capacity to pay, and Lasso regression was found to have better predictive power than Ridge regression. It is seen that optimal Shrinkage parameter for Lasso regression for training data is λ=0.0158 and λ=0.0630 for test data. Conclusion: It has been determined that the Copas test is a useful technique in investigating the best model performance for variables that are difficult to model, such as health expenditure, LSC regression performs better than the Shrinkage models, and capacity to pay is the variable that contributes most into the model. It is recommended to develop study findings by examining different regression models.
Keywords: Out-of-pocket health expenditure; Copas test;LSR regression; Lasso regression;Ridge regression
- Grigoli F, Kapsoli J. Waste not, want not: The efficiency of health expenditure in emerging and developing economies. Rev Dev Econ. 2018;22(1):384-403.[Crossref] [PubMed] [PMC]
- Newhouse JP. Medical-care expenditure: a cross-national survey. J Hum Resour. 1977;12(1):115-25.[Crossref] [PubMed]
- Catlin MK, Poisal JA, Cowan CA. Out-of-pocket health care expenditures, by insurance status, 2007-10. Health Aff (Millwood). 2015;34(1):111-6.[Crossref] [PubMed]
- Xu K, Evans DB, Kawabata K, Zeramdini R, Klavus J, Murray CJ. Household catastrophic health expenditure: a multicountry analysis. Lancet. 2003;362(9378):111-7.[Crossref] [PubMed]
- World Health Organization. Distribution of health payments and catastrophic expenditures Methodology/by Ke Xu. World Health Organization. 2005.[Link]
- Eigner I, Hamper A. Predictive analytics in health care: methods and approaches to identify the risk of readmission. In: Wickramasinghe N, Schaffer JL, eds. Theories to Inform Superior Health Informatics Research and Practice. Healthcare Delivery in the Information Age. Cham, Switzerland: Springer; 2018. p.55-73.[Crossref]
- Manning WG. The logged dependent variable, heteroscedasticity, and the retransformation problem. J Health Econ. 1998;17(3):283-95.[Crossref] [PubMed]
- Manning WG, Mullahy J. Estimating log models: to transform or not to transform? J Health Econ. 2001;20(4):461-94.[Crossref] [PubMed]
- Harrell FE. Regression Modeling Strategies with Applications to Linear Models, Logistic Regression and Survival Analysis. Springer Series in Statistics. New York: Springer-Verlag; 2001. p.103-26.[Crossref]
- Bilger M, Manning WG. Measuring overfitting in nonlinear models: a new method and an application to health expenditures. Health Econ. 2015;24(1):75-85.[Crossref] [PubMed]
- Alpar R. Uygulamalı Çok Değişkenli İstatistiksel Yöntemler. 3. Baskı. Ankara: Detay Yayıncılık; 2011. p.415-628.
- Rodrigues JFD. A bayesian approach to the balancing of statistical economic data. Entropy. 2014;16(3):1243-71.[Crossref]
- Tibshirani R. Regression shrinkage and selection via the lasso. J R Statist Soc: Series B: Methodological. 1996;58(1):267-88.[Crossref]
- Varmuza K, Filzmoser P. Introduction to Multivariate Statistical Analysis in Chemometrics, Taylor & Francis Group, CRC Press, 2009.
- Davison AC, Hinkley DV. Further topics in regression. In: Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. New York, USA: Cambridge University Press; 1997. p.326-84.[Crossref]
- Manning W. Dealing with skewed data on costs and expenditures. In: Jones AM, ed. The Elgar Companion to Health Economics. Cheltenham, U.K.; Northampton, Mass.: Edward Elgar; 2006. p.439-46.
- Frank IE, Friedman JH. A statistical view of some chemometrics regression tools. Technometrics. 1993;35(2):109-35.[Crossref]
- Larson SC. The shrinkage of the coefficient of multiple correlation. J Educ Psychol. 1931;22(1):45-55.[Crossref]
- Copas JB. Cross validation shrinkage of regression predictors. J R Statist Soc: Series B: Methodological. 1987;49(2):175-83.[Crossref]
- Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15(4):361-87.[Crossref] [PubMed]
- Harrell FE. Regression Modelling Strategies: with Applications to Linear Models, Logistic Regression and Survival Analysis. New York: Springer; 2001.[Crossref]
- Blough DK, Madden CW, Hornbrook MC. Modeling risk using generalized linear models. J Health Econ. 1999;18(2):153-71.[Crossref] [PubMed]
- Ellis RP, Mookim PG. Mookim. Cross-Validation Methods for Risk Adjustment Models. 2009.[Link]
- Copas JB. Regression, prediction and shrinkage. J R Statist Soc: Series B (Methodological). 1983;45(3):311-54.[Crossref]
- James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning with Applications in R. Springer Texts in Statistics. New York: Springer: Springer Science+Business Media; 2013. p.214-27.[Crossref]
- Hastie T, Tibshirani R, Wainwright M. Optimization methods. In: Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall; 2015. p.93-127.[Crossref]
- Friedman J, Hastie T, Tibshirani R, Narasimhan B, Tay K, Simon N, et al. Package "glmnet". R Foundation for Statistical Computing; 2020.[Link]
- Hastie T, Tibshirani R, Friedman JH. Linear methods for regression. In: The Elements of Statistical Learning. New York: Springer; 2009. p.43-93.[Crossref]
- Melkumova LE, Shatskikh SY. Comparing ridge and LASSO estimators for data analysis. Procedia Engineering. 2017;201:746-55.[Crossref]
- Türkiye İstatistik Kurumu (TÜİK). Hanehalkı Bütçe Anketi-2012. (Erişim tarihi: 14.9.2020).[Link]
- Gross J, Ligges U. Package "nortest". R Foundation for Statistical Computing; 2015.[Link]
- Dufour JM, Farhat A, Gardiol L, Khalaf L. Simulation‐based finite sample normality tests in linear regressions. Economet J. 1998;1(1):C154-73.[Crossref]
- Yongmei N. LDdiag: link function and distribution diagnostic test for social science researchers. (2012).[Link]
- Sottile G, Cilluffo G, Muggeo VMR. Package "islasso". R Foundation for Statistical Computing; 2020.[Link]
- Muhammad IU, Muhammad A. Package "lmridge". R Foundation for Statistical Computing; 2018.[Link]
- Wickham H, Chang W, Henry L, Pedersen TL, Takahashi K, Wilke C, et al. Package "ggplot2". R Foundation for Statistical Computing; 2020.[Link]
- Gohel D, Skintzos P, Bostock M, Kokenes S, Shull E, Book E. Package "ggiraph". R Foundation for Statistical Computing; 2020.[Link]
- Moon KW. Package "ggiraphExtra". R Foundation for Statistical Computing; 2018.[Link]
- Pacifico A. Robust open Bayesian analysis: Overfitting, model uncertainty, and endogeneity issues in multiple regression models. Econom. Rev. 2020;[Crossref]
- Afifi AA, Kotlerman JB, Ettner SL, Cowan M. Methods for improving regression analysis for skewed continuous or counted responses. Annu Rev Public Health. 2007;28:95-111.[Crossref] [PubMed]
- Briggs A, Gray A. The distribution of health care costs and their statistical analysis for economic evaluation. J Health Serv Res Policy. 1998;3(4):233-45.[Crossref] [PubMed]
- Gerdtham UG, Søgaard J, Andersson F, Jönsson B. An econometric analysis of health care expenditure: a cross-section study of the OECD countries. J Health Econ. 1992;11(1):63-84.[Crossref] [PubMed]
- Raikou M, Mcguire A. Estimating costs for economic evaluation. In: Jones AM, ed. The Elgar Companion to Health Economics. Cheltenham, U.K.; Northampton, Mass.: Edward Elgar; 2006. p.429-38.
.: Process List