Objective: In this study, it was aimed to find the method with high classification success among the methods used in the study by comparing the supervised machine learning methods according to the classification performance. Material and Methods: In our study, both the real data set obtained from 302 patients with invasive ductal carcinoma and 24 different data sets obtained by simulation were used to compare the classification performance of support vector machines, random forest and artificial neural networks. The success of classifications of the methods used was compared according to the general accuracy, F-measure, Matthews correlation coefficient, area under the curve (AUC) and discriminant power in breast cancer data. In addition, the difference in training-test accuracy in the simulation data and the significance of this difference were also evaluated. Results: The highest survival classification accuracy (80%) for the test set of stage III patients with invasive ductal carcinoma was obtained from support vector machines (SVM) with the radial-based kernel. The highest values in other performance metrics (F-measure=0.87, Matthews correlation coefficient=0.22, AUC=0.89 and discriminant power=0.52), and the most successful results in simulation data were generally obtained from SVM. Conclusion: SVM had higher accuracy in both the real data set and simulation data than random forest and artificial neural networks.
Keywords: Machine learning; classification; breast cancer; support vector machines; random forest
Amaç: Bu çalışmada, danışmanlı makine öğrenimi yöntemleri sınıflama performansına göre kıyaslanarak, çalışmada kullanılan yöntemlerin içerisinden sınıflama başarısı yüksek olan yöntemin bulunması amaçlandı. Gereç ve Yöntemler: Çalışmamızda, destek vektör makineleri, rastgele orman ve yapay sinir ağları yöntemlerinin sınıflama performanslarını kıyaslamak için hem invaziv duktal karsinomlu 302 hastadan elde edilen gerçek veri seti hem de simülasyonla elde edilen 24 farklı veri seti kullanıldı. Kullanılan yöntemlerin sınıflama başarıları meme kanseri verilerinde genel doğruluk, F-ölçütü, Matthews korelasyon katsayısı, eğri altında kalan alan [area under the curve (AUC)] ve ayırsama gücüne göre kıyaslandı. Ayrıca simülasyon verilerinde eğitim-test doğrulukları farkı ve bu farkın anlamlılığı da değerlendirildi. Bulgular: İnvaziv duktal karsinomlu evre III hastalarının test seti için en yüksek sağkalım sınıflama doğruluğu (%80), radyal tabanlı çekirdek ile destek vektör makinelerinden [support vector machines (SVM)] elde edildi. Diğer performans ölçütlerindeki (F-ölçütü=0,87; Matthews korelasyon katsayısı=0,22; AUC=0,89 ve ayırsama gücü=0,52) en yüksek değerler ve simülasyon verilerinde en başarılı sonuçlar, genel olarak SVM'den elde edilmiştir. Sonuç: SVM, hem gerçek veri setinde hem de simülasyon verilerinde, rastgele orman ve yapay sinir ağlarına göre daha yüksek doğruluk oranına sahiptir.
Anahtar Kelimeler: Makine öğrenimi; sınıflama; meme kanseri; destek vektör makineleri; rastgele orman
- Cabitza F, Banfi G. Machine learning in laboratory medicine: waiting for the flood? Clin Chem Lab Med. 2018;56(4):516-24. [Crossref] [PubMed]
- Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw. 1999;10(5):988-99. [Crossref] [PubMed]
- Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):6. [Crossref] [PubMed] [PMC]
- Revathi S, Malathi A. A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection. International Journal of Engineering Research & Technology (IJERT). 2013;2(12):1848-53. [Link]
- Shao Y, Liu Y, Ye X, Zhang S. A machine learning based global simulation data mining approach for efficient design changes. Adv Eng Softw. 2018;124:22-41. [Crossref]
- Böcker W. WHO-Klassifikation der Tumoren der Mamma und des weiblichen Genitale: Pathologie und Genetik [WHO classification of breast tumors and tumors of the female genital organs: pathology and genetics]. Verh Dtsch Ges Pathol. 2002;86:116-9. German. [PubMed]
- Tata A, Woolman M, Ventura M, Bernards N, Ganguly M, Gribble A, et al. Rapid detection of necrosis in breast cancer with desorption electrospray ionization mass spectrometry. Sci Rep. 2016;6:35374. [Crossref] [PubMed] [PMC]
- Jones RL, Salter J, A'Hern R, Nerurkar A, Parton M, Reis-Filho JS, et al. The prognostic significance of Ki67 before and after neoadjuvant chemotherapy in breast cancer. Breast Cancer Res Treat. 2009;116(1):53-68. [Crossref] [PubMed]
- Lehmann BD, Bauer JA, Chen X, Sanders ME, Chakravarthy AB, Shyr Y, et al. Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J Clin Invest. 2011;121(7):2750-67. [Crossref] [PubMed] [PMC]
- Dent R, Trudeau M, Pritchard KI, Hanna WM, Kahn HK, Sawka CA, et al. Triple-negative breast cancer: clinical features and patterns of recurrence. Clin Cancer Res. 2007;13(15 Pt 1):4429-34. [Crossref] [PubMed]
- Wong JS, O'Neill A, Recht A, Schnitt SJ, Connolly JL, Silver B, et al. The relationship between lymphatic vessell invasion, tumor size, and pathologic nodal status: can we predict who can avoid a third field in the absence of axillary dissection? Int J Radiat Oncol Biol Phys. 2000;48(1):133-7. [Crossref] [PubMed]
- Lee AH, Pinder SE, Macmillan RD, Mitchell M, Ellis IO, Elston CW, et al. Prognostic value of lymphovascular invasion in women with lymph node negative invasive breast carcinoma. Eur J Cancer. 2006;42(3):357-62. [Crossref] [PubMed]
- Agarwal G, Pradeep PV, Aggarwal V, Yip CH, Cheung PS. Spectrum of breast cancer in Asian women. World J Surg. 2007;31(5):1031-40. [Crossref] [PubMed]
- Woodward WA, Vinh-Hung V, Ueno NT, Cheng YC, Royce M, Tai P, et al. Prognostic value of nodal ratios in node-positive breast cancer. J Clin Oncol. 2006;24(18):2910-6. [Crossref] [PubMed]
- Neuhouser ML, Aragaki AK, Prentice RL, Manson JE, Chlebowski R, Carty CL, et al. Overweight, obesity, and postmenopausal invasive breast cancer risk: a secondary analysis of the women's health initiative randomized clinical trials. JAMA Oncol. 2015;1(5):611-21. [Crossref] [PubMed] [PMC]
- R Core Team. R: A language and environment for statistical computing. 2013. [Accessing Date: 05 September 2019]. Accessing Link: [Link]
- Noble WS. What is a support vector machine? Nat Biotechnol. 2006;24(12):1565-7. [Crossref] [PubMed]
- Li M, Chen F, Lei M, Li C. [Near-infrared spectrum of coal origin identification based on LVQ with SVM algorithm]. Guang Pu Xue Yu Guang Pu Fen Xi. 2016;36(9):2793-7. Chinese. [PubMed]
- Palmer DS, O'Boyle NM, Glen RC, Mitchell JB. Random forest models to predict aqueous solubility. J Chem Inf Model. 2007;47(1):150-8. [Crossref] [PubMed]
- Breiman L. Random forests. Machine Learning. 2001;45:5-32. [Crossref]
- Renganathan V. Overview of artificial neural network models in the biomedical domain. Bratisl Lek Listy. 2019;120(7):536-40. [Crossref] [PubMed]
- Kriegeskorte N, Golan T. Neural network models and deep learning. Curr Biol. 2019;29(7):R231-6. [Crossref] [PubMed]
- Zhang G, Patuwo BE, Hu MY. Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting. 1998;14(1):35-62. [Crossref]
- Akosa JS. Predictive accuracy: A misleading performance measure for highly imbalanced data. Proceedings of the SAS Global Forum. 2017:1-12. [Link]
- Murat Yilmaz C, Kose C, Hatipoglu B. A Quasi-probabilistic distribution model for EEG Signal classification by using 2-D signal representation. Comput Methods Programs Biomed. 2018;162:187-96. [Crossref] [PubMed]
- Wang S, Li D, Petrick N, Sahiner B, Linguraru MG, Summers RM. Optimizing area under the ROC curve using semi-supervised learning. Pattern Recognit. 2015;48(1):276-87. [Crossref] [PubMed] [PMC]
- Nellore SB. Various performance measures in Binary classification--An Overview of ROC study. IJISET-International Journal of Innovative Science, Engineering & Technology. 2015;2(9):596-605. [Link]
- Ünçel M, Aköz G, Yıldırım Z, Pişkin G, Değirmenci M, Solakoğlu Kahraman D, et al. Meme kanserinin klinikopatolojik özelliklerinin moleküler alt tipe göre değerlendirilmesi [Evaluation of clinicopathological features of breast cancer according to the molecular subtypes]. Tepecik Eğit ve Araşt Hast Dergisi. 2015;25(3):151-6. [Link]
- Chao CM, Yu YW, Cheng BW, Kuo YL. Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree. J Med Syst. 2014;38(10):106. [Crossref] [PubMed]
- Horiguchi K, Toi M, Horiguchi S, Sugimoto M, Naito Y, Hayashi Y, et al. Predictive value of CD24 and CD44 for neoadjuvant chemotherapy response and prognosis in primary breast cancer patients. J Med Dent Sci. 2010;57(2):165-75. [PubMed]
- Park K, Ali A, Kim D, An Y, Kim M, Shin H. Robust predictive model for evaluating breast cancer survivability. Engineering Applications of Artificial Intelligence. 2013;26(9):2194-205. [Crossref]
- Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak. 2019;19(1):48. [Crossref] [PubMed] [PMC]
- Khondoker M, Dobson R, Skirrow C, Simmons A, Stahl D. A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies. Stat Methods Med Res. 2016;25(5):1804-23. [Crossref] [PubMed] [PMC]
- Kate RJ, Nadig R. Stage-specific predictive models for breast cancer survivability. Int J Med Inform. 2017;97:304-11. [Crossref] [PubMed]
- Engelhardt A, Kanawade R, Knipfer C, Schmid M, Stelzle F, Adler W. Comparing classification methods for diffuse reflectance spectra to improve tissue specific laser surgery. BMC Med Res Methodol. 2014;14:91. [Crossref] [PubMed] [PMC]
.: Process List