Amaç: Meme kanserinin; meme dokusu içiresinde yer alan süt kanallarının doku hücrelerinde oluştuğu bilinmektedir. Süt kanallarını oluşturan bu hücrelerin kontrolsüz olarak artmasına ise duktal hiperplazi denir. Bir kadında, yaşamı süresince invazif (yayılma eğilimi olan) meme kanseri gelişme riskinin %13,3 olduğu bilinmektedir. Meme kanserinin oluşma riski yaşa bağlı olarak artmaktadır. Gail modeli; meme kanserinde temel faktörleri değerlendiren, genel olarak kabul görmüş kanser riski değerlendirme modelidir. Bu çalışmada, Gail modeli baz alınarak, makine öğrenmesi yöntemlerinin meme kanseri risk değerlendirmesinde karşılaştırılması amaçlanmıştır. Gereç ve Yöntemler: İlk olarak, veri setine Gail modeli uygulanmış ve risk faktörü belirlenmiş ve %80 eğitim, %20 test olmak üzere ayrı eğitim test veri seti oluşturulmuştur. Daha sonra oluşturulan bu veri setlerine k-en yakın komşu, yapay sinir ağları (YSA), destek vektör makinesi [support vector machine (SVM)] ve naive Bayes (NB) algoritmaları uygulanmış ve uygukanan yöntemlerin risk tahmin sonuçları karşılaştırılmıştır. Bulgular: Karşılaştırma sonuçlarına göre %80 eğitim, %20 test veri seti için sınıflandırma performansı en düşükten en yükseğe doğru sırasıyla SVM [eğri altında kalan alan (area under the curve 'AUC') =0,911], NB (AUC=0,939) ve YSA (AUC=0,949) şeklindedir. Sonuç: Meme kanserinin erken aşamada teşhis edilmesi; tedavi yöntemlerinin sayısını, tedavinin başarıya ulaşma oranını ve hayatta kalma şansını artırmaktadır. Meme kanseri risk hesaplamasında makine öğrenmesi yöntemlerinin etkili olduğu görülmüştür.
Anahtar Kelimeler: Gail modeli; meme kanseri; makine öğrenmesi; yapay sinir ağları; destek vektör makinesi
Objective: Breast cancer; it occurs in the tissue cells of the milk ducts in the breast tissue. The uncontrolled increase in the cells forming the milk ducts is called ductal hyperplasia. It is known that the risk of developing invasive (with a tendency to spread) breast cancer in a woman during her lifetime is 13.3%, and the risk of developing breast cancer increases with age. The Gail model is a well accepted cancer risk assessment model which evaluates the main factors in breast cancer. The aim of this study is to compare machine learning methods in breast cancer risk assessment based on the Gail model. Material and Methods: Firstly, the risk factor was determined by the application of the Gail model into the data set, discrete training test data sets were presented which is 80% train and 20% test. Afterwards, k-nearest neighbor, artificial neural network (ANN), support vector machine (SVM) and naive Bayes (NB) algorithms applied on each set and risk estimation results were compared. Results: Classification performance from the lowest to the highest for 80% training and 20% test data set according to the comparison results is as follows; SVM [area under the curve (AUC)=0.911], NB (AUC=0.939) and ANN (AUC=0.949). Conclusion: Early diagnosis of breast cancer increases the number of possible treatments, the success rate of the treatments and the chance of survival. It has been seen that machine learning algorithms effective in breast cancer risk calculation.
Keywords: Gail model; breast cancer; machine learning; artificial neural network; support vector machine
- Barton MB. Breast cancer screening: benefits, risks, and current controversies. Postgrad Med. 2005;118(2):27-46. [Crossref] [PubMed]
- Phillips KA, Glendon G, Knight JA. Putting the risk of breast cancer in perspective. N Engl J Med. 1999;340(2):141-4. [Crossref] [PubMed]
- Chapman C, Murray A, Chakrabarti J, Thorpe A, Woolston C, Sahin U, et al. Autoantibodies in breast cancer: their use as an aid to early diagnosis. Ann Oncol. 2007;18(5):868-73. [Crossref] [PubMed]
- Boyle P, Mezzetti M, La Vecchia C, Franceschi S, Decarli A, Robertson C. Contribution of three components to individual cancer risk predicting breast cancer risk in Italy. Eur J Cancer Prev. 2004;13(3):183-91. [Crossref] [PubMed]
- Dumitrescu R, Cotarla I. Understanding breast cancer risk‐where do we stand in 2005? J Cell Mol Med. 2005;9(1):208-21. [Crossref] [PubMed] [PMC]
- Amir E, Freedman OC, Seruga B, Gareth Evans D. Assessing women at high risk of breast cancer: a review of risk assessment models. J Natl Cancer Inst. 2010;102(10):680-91. [Crossref] [PubMed]
- Costantino JP, Gail MH, Pee D, Anderson S, Redmond CK, Benichou J, et al. Validation studies for models projecting the risk of invasive and total breast cancer incidence. J Natl Cancer Inst. 1999;91(18):1541-8. [Crossref] [PubMed]
- Karakayali FY, Ekici Y, Sevmiş Ş, Pehlivan S, Arat Z, Moray G. Meme kanseri için risk belirlenmesinde Gail modeli [Gail model for determination of the risk factors of breast cancer]. Turkish Journal of Surgery. 2007;23(4):129-35. [Link]
- Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989;81(24):1879-86. [Crossref] [PubMed]
- Akay EÇ. Ekonometride yeni bir ufuk: büyük veri ve makine öğrenmesi [A new horizon in econometrics: big data and machine learning]. Sosyal Bilimler Araştırma Dergisi. 2018;7(2):41-53. [Link]
- Sharma SK, Wang X. Towards massive machine type communications in ultra-dense cellular IoT networks: current issues and machine learning-assisted solutions. IEEE Communications Surveys & Tutorials. 2019. [Crossref]
- Clark IA, Niehaus KE, Duff EP, Di Simplicio MC, Clifford GD, Smith SM, et al. First steps in using machine learning on fMRI data to predict intrusive memories of traumatic film footage. Behav Res Ther. 2014;62:37-46. [PubMed] [PMC]
- Hastie T, Tibshirani R, Wainwright M. Statistical Learning with Sparsity: The Lasso and Generalizations. 1st ed. Boca Raton: Chapman and Hall/CRC; 2015. [Crossref]
- Coşkun S, Kartal M. Lojistik regresyon analizinin incelenmesi ve diş hekimliğinde bir uygulaması. Cumhuriyet Üniversitesi Diş Hekimliği Fakültesi Dergisi. 2004;7(1):42-50. [Link]
- Seber GA, Lee AJ. Linear Regression Analysis. Vol. 329. 2nd ed. Hoboken, N.J: John Wiley & Sons; 2012.
- Keller JM, Gray MR, Givens JA. A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics. 1985;(4):580-5. [Crossref]
- Hu LY, Huang MW, Ke SW, Tsai CF. The distance function effect on k-nearest neighbor classification for medical datasets. Springerplus. 2016;5(1):1304. [Crossref] [PubMed] [PMC]
- Hamzaçebi C, Kutay F. Yapay sinir ağlari ile Türkiye elektrik enerjisi tüketiminin 2010 yılına kadar tahmini [Electric consumption forecasting of Turkey using artificial neural networks up to year 2010]. J Fac Eng Arch Gazi Univ. 2004;19(3):227-33. [Link]
- Kalogirou SA. Applications of artificial neural networks in energy systems. Energy Conversion and Management. 1999;40(10):1073-87. [Crossref]
- Koç ML, Balas CE, Arslan A. Taş dolgu dalgakıranların yapay sinir ağları ile ön tasarımı [Preliminary design of ruble mound breakwaters by using artificial neural networks]. İMO Teknik Dergi. 2004;15(74):3351-75. [Link]
- Bose NK, Garga AK. Neural network design using Voronoi diagrams. IEEE Trans Neural Netw. 1993;4(5):778-87. [Crossref] [PubMed]
- Harrington P. Machine Learning in Action. 1st ed. Shelter Island, NY: Manning Publications Co; 2012.
- Lin CF, Wang SD. Fuzzy support vector machines. IEEE Trans Neural Netw. 2002;13(2):464-71. [Crossref] [PubMed]
- Chang YW, Hsieh CJ, Chang KW, Ringgaard M, Lin CJ. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research. 2010;11(4):1471-90. [Link]
- Lewis DD. Naive (Bayes) at forty: the independence assumption in information retrieval. 10th European Conference on Machine Learning Chemnitz, Germany, April 21-23, 1998 Proceedings. Springer; 1998. p.4-15. [Crossref]
- Zhang Z. Naïve Bayes classification in R. Ann Transl Med. 2016;4(12):241. [Crossref] [PubMed] [PMC]
- Rish I. An empirical study of the naive Bayes classifier. IJCAI 2001 workshop on empirical methods in artificial intelligence. 2001. p.41-6. [Link]
- United States Census Bureau. Race and Hispanic Origin. Erişim tarihi: 2020 Temmuz 2020. Erişim linki: [Link]
- Palmer JR, Rosenberg L, Wise LA, Horton NJ, Adams-Campbell LL. Onset of natural menopause in African American women. Am J Public Health. 2003;93(2):299-306. [Crossref] [PubMed] [PMC]
- Ahuja M. Age of menopause and determinants of menopause age: a PAN India survey by IMS. J Midlife Health. 2016;7(3):126-31. [Crossref] [PubMed] [PMC]
- Statista. Age of mothers at first birth in the U.S. by Hispanic origin 2018. Erişim tarihi: 20 Temmuz 2020. Erişim linki: [Link]
- Stark GF, Hart GR, Nartowt BJ, Deng J. Predicting breast cancer risk using personal health data and machine learning models. Plos One. 2019;14(12):e0226765. [Crossref] [PubMed] [PMC]
- Tseng YJ, Huang CE, Wen CN, Lai PY, Wu MH, Sun YC, et al. Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies. Int J Med Inform. 2019;128:79-86. [Crossref] [PubMed]
- Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak. 2019;19(1):48. [Crossref] [PubMed] [PMC]
- Ming C, Viassolo V, Probst-Hensh N, Dinov ID, Chappuis PO, Katapodi MC. Machine learning-based lifetime breast cancer risk reclassification compared with the BOADICEA model: impact on screening recommendations. Br J Cancer. 2020;123(5):860-7. [Crossref] [PubMed] [PMC]
.: Process List