Objective: Breast cancer is a leading cause of cancer-related death among women worldwide, with approximately 2.3 million new cases and 685,000 deaths reported in 2020 alone. One critical step in developing effective classification and prediction models is variable selection, which involves identifying a subset of relevant variables from a larger set of potential predictors. Accurate variable selection is crucial for building interpretable and robust models that are not overfit to noise, leading to improved model performance and generalization ability. In this paper, we proposed an alternative objective approach for comparing two Akaike Information Criterions (AIC) that originated from two competing models, such that the magnitude of the difference is subjected to the statistical test of significance. Material and Methods: We developed a new backward elimination variable selection procedure similar in spirit to the existing ''stepAIC'' within the environment of R statistical software. We used both simulated and Wisconsin breast cancer diagnostic datasets to compare the proposed method's variable selection and predictive performances with ''stepAIC'' and LASSO. Results: The simulation showed that the proposed AIC procedure achieved higher variable selection sensitivity, specificity and accuracy when compared to stepAIC and LASSO. Also, the proposed AIC method's prediction results are relatively comparable with stepAIC and LASSO at various simulated data dimensions. Similar supremacy results were observed with the breast cancer dataset used. Conclusion: The AIC-based variable selection approach proposed is a promising method that integrates AIC with statistical testing for improved variable selection in breast cancer classification and prediction.
Keywords: Breast cancer; Akaike Information Criteria; variable selection; backward selection; LASSO
Amaç: Göğüs kanseri, yalnızca 2020 yılında bildirilmiş yaklaşık 2,3 milyon yeni vaka ve 685.000 ölüm ile dünya çapında kadınlar arasında kanser ilişkili ölümlerin başında gelen sebeplerinden biridir. Etkili sınıflandırma ve tahmin modelleri geliştirmede kritik bir adım, daha geniş bir potansiyel öngörücü setinden, ilgili değişken alt seti tanımlamayı içeren değişken seçimdir. Doğru değişken seçimi, gürültüye fazla uyum sağlamayan, yorumlanabilir ve sağlam modeller oluşturmada çok önemlidir. Bu durum gelişmiş model performansı ve generalizasyon becerisi sağlar. Bu makalede, 2 rakip modelden oluşan 2 Akaike Bilgi Kriterleri''ni [Akaike Information Criterions (AIC)] karşılaştırdığımız alternatif objektif bir yaklaşım sunduk, öyle ki farkın büyüklüğü istatistiksel anlamlılık testine tabi tutulmuştur. Gereç ve Yöntemler: R istatistik yazılımı ortamında bulunan ''stepAIC''ye benzer yeni bir geriye dönük eleme değişken seçme prosedürü geliştirdik. Sunulan metodun değişken seçimi ile ''stepAIC'' ve LASSO ile tahmini performanslarını karşılaştımak için simüle edilmiş, Wisconsin meme kanseri tanı veri setlerini kullandık. Bulgular: Simülasyon, sunulan AIC prosedürünün stepAIC ve LASSO'ya kıyasla yüksek değişken seçim hassasiyeti, spesifitesi ve doğruluğu kazandığını göstermiştir. Ayrıca, sunulan AIC yönteminin tahmin sonuçları, simüle edilen çeşitli veri boyutlarında stepAIC ve LASSO ile görece karşılaştırılabilirdir. Kullanılan meme kanseri veri setinde de benzer üstünlük sonuçları gözlemlenmiştir. Sonuç: AIC temelli değişken seçim yaklaşımı, meme kanseri sınıflandırması ve tahmininde AIC'yi gelişmiş değişken seçimi için istatistiksel testlere entegre eden, umut verici bir metottur.
Anahtar Kelimeler: Meme kanseri; Akaike Bilgi Kriterleri; değişken seçimi; geriye dönük seçim; LASSO
- Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209-49. [Crossref] [PubMed]
- Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19(6):716-23. [Crossref]
- Li X, Li Y, Yu X, Jin F. Identification and validation of stemness-related lncRNA prognostic signature for breast cancer. J Transl Med. 2020;18(1):331. [Crossref] [PubMed] [PMC]
- Yu S, Hu C, Liu L, Cai L, Du X, Yu Q, et al. Comprehensive analysis and establishment of a prediction model of alternative splicing events reveal the prognostic predictor and immune microenvironment signatures in triple negative breast cancer. J Transl Med. 2020;18(1):286. [Crossref] [PubMed] [PMC]
- Cheng J, Ren C, Liu G, Shui R, Zhang Y, Li J, et al. Development of high-resolution dedicated PET-based radiomics machine learning model to predict axillary lymph node status in early-stage breast cancer. Cancers (Basel). 2022;14(4):950. [Crossref] [PubMed] [PMC]
- Liu H, Li J, Koirala P, Ding X, Chen B, Wang Y, et al. Long non-coding RNAs as prognostic markers in human breast cancer. Oncotarget. 2016;7(15):20584-96. [Crossref] [PubMed] [PMC]
- Li N, Yu K, Lin Z, Zeng D. Identifying a cervical cancer survival signature based on mRNA expression and genome-wide copy number variations. Exp Biol Med (Maywood). 2022;247(3):207-20. [Crossref] [PubMed] [PMC]
- Chen R, Qi Y, Huang Y, Liu W, Yang R, Zhao X, et al. Diagnostic value of core needle biopsy for determining HER2 status in breast cancer, especially in the HER2-low population. Breast Cancer Res Treat. 2023;197(1):189-200. [Crossref] [PubMed] [PMC]
- Zhang J, Zhang Z, Mao N, Zhang H, Gao J, Wang B, et al. Radiomics nomogram for predicting axillary lymph node metastasis in breast cancer based on DCE-MRI: a multicenter study. J Xray Sci Technol. 2023;31(2):247-63. [Crossref] [PubMed]
- Olaniran OR, Olaniran SF, Popoola J, Omekam IV. Bayesian additive regression trees for predicting colon cancer: methodological study (validity study). Turkiye Klinikleri Journal of Biostatistics. 2022;14(2):103-9. [Crossref]
- Banjoko AW, Yahya WB, Garba MK, Olaniran OR, Dauda KA, Olorede KO. Efficient support vector machine classification of diffuse large b-cell lymphoma and follicular lymphoma MRNA tissue sampless. annals. Computer Science Series. 2015;13(2):69-79. [Link]
- Olaniran OR, Abdullah MAA. Gene selection for colon cancer classification using bayesian model averaging of linear and quadratic discriminants. Journal of Science and Technology. 2017;9(3):140-4. [Link]
- Olaniran OR, Abdullah MAA. BayesRandomForest: an R implementation of bayesian random forest for regression analysis of high-dimensional data. Romanian Statistical Review. 2018;66(1):95-102. [Link]
- Olaniran OR, Abdullah MAA. Bayesian variable selection for multiclass classification using Bootstrap Prior Technique. Austrian Journal of Statistics. 2019;48(2):63-72. [Crossref]
- Olaniran OR, Abdullah MAA. Bayesian analysis of extended cox model with time-varying covariates using bootstrap prior. Journal of Modern Applied Statistical Methods. 2020;18(2):7. [Crossref]
- Olaniran OR, Yahya WB. Bayesian hypothesis testing of two normal samples using bootstrap prior technique. Journal of Modern Applied Statistical Methods. 2017;16(2):618-38. [Crossref]
- Garofoli R, Resche-Rigon M, Roux C, van der Heijde D, Dougados M, Moltó A. Machine-learning derived algorithms for prediction of radiographic progression in early axial spondyloarthritis. Clin Exp Rheumatol. 2023;41(3):727-34. [Crossref] [PubMed]
- Engebretsen S, Bohlin J. Statistical predictions with glmnet. Clin Epigenetics. 2019;11(1):123. [Crossref] [PubMed] [PMC]
- Wolberg WH, Street N, Mangasarian OL. Wisconsin diagnostic breast cancer (wdbc). U. o. California, Ed., ed. USA. 1995. [Link]
- Haq AU, Li JP, Saboor A, Khan J, Wali S, Ahmad S, et al. Detection of breast cancer through clinical data using supervised and unsupervised variable selection techniques. IEEE Access. 2021;9:22090-105. [Crossref]
- Dua D, Graff C. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science; 2019. [Link]
.: Process List