Objective: Classification is one of the most important research topics of machine learning that aims to correctly predict the target class for each case in the data. In this study, classification performances of fuzzy inference systems that learn from experts and machine learning methods that learn from the data were compared in different sample sizes. Material and Methods: This study was planned as a methodological research. The machine learning algorithms used in the comparison are Multilayer Perceptron, Random Forest, Support Vector Machine, which are frequently encountered classifiers in the literature. The dataset were generated for 6 (sex, chest pain type, max heart rate, exercise induced, oldpeak and major vessels) independent variables determined by variable importance, preserving the characteristics of heart disease data in the University of California Irvine database. Sample sizes were determined in four different sizes as 100, 250, 500 and 1,000. The datasets were divided into two separate sets randomly, with 70% training set and 30% test set. In fuzzy inference systems, the fuzzy rules were automatically generated and Chi's technique was used to create the rules. Accuracy, precision, sensitivity and Fmeasure were used as performance metrics for classification to compare four methods. Results: As a result of the study, it had been observed that fuzzy inference systems are affected by the sample size, and the classification performance is better than other methods as the sample size increases. Conclusion: In general, it has been observed that as the sample size increased, the classification performance of the methods increased.
Keywords: Fuzzy inference system; classification; machine learning
Amaç: Sınıflandırma, veri setinde her gözlem için hedef sınıfı doğru tahmin etmeyi amaçlayan makine öğrenmesinin en önemli araştırma konularından biridir. Bu çalışmada, uzmandan öğrenen bulanık çıkarsama sistemleri ile veriden öğrenen makine öğrenmesi yöntemlerinin farklı örnek büyüklüklerinde sınıflandırma performansları karşılaştırılmıştır. Gereç ve Yöntemler: Bu çalışma metodolojik bir araştırma olarak planlanmıştır. Karşılaştırmada kullanılan makine öğrenmesi algoritmaları literatürde sıklıkla karşılaşılan sınıflandırıcılar olan; Çok Katmanlı Algılayıcı, Rastgele Orman ve Destek Vektör Makinesi yöntemleridir. Bu çalışmada, University of California Irvine (UCI) makine öğrenmesi veritabanında yer alan kalp hastalığı verisinin özellikleri korunarak değişken önemine göre belirlenen 6 bağımsız değişken (cinsiyet, göğüs ağrı tipi, maksimum kalp atım hızı, egzersizin neden olduğu anjin, oldpeak ve ana damar sayısı) için türetilen veriler kullanılmıştır. Örneklem büyüklükleri 100, 250, 500 ve 1000 olmak üzere dört farklı büyüklükte belirlenmiştir. Veri seti %70 eğitim ve %30 test seti olmak üzere ikiye ayrılmıştır. Bulanık çıkarsama sistemlerinde kurallar otomatik olarak oluşturulmuş olup, kuralların oluşturulmasında Chi'nin tekniğinden yararlanılmıştır. Dört yöntemin karşılaştırılmasında, sınıflandırma performans ölçütleri olarak doğru sınıflama oranı, kesinlik, duyarlılık ve F-ölçütü kullanılmıştır. Bulgular: Çalışma sonucunda, bulanık çıkarsama sistemlerinin örneklem sayısına daha duyarlı olduğu, örneklem büyüklüğü arttıkça sınıflama performansının da diğer yöntemlere göre iyi olduğu gözlenmiştir. Sonuç: Genel olarak ise örneklem büyüklüğü arttıkça, yöntemlerin sınıflama performanslarının arttığı gözlemlenmiştir.
Anahtar Kelimeler: Bulanık çıkarsama sistemleri; sınıflama; makine öğrenmesi
- Mutlu B, Sezer EA, Nefeslioglu HA. A defuzzification-free hierarchical fuzzy system (DF-HFS): Rock mass rating prediction. Fuzzy Sets Syst. 2017;307:50-66. [Crossref]
- Zadeh LA. Fuzzy sets. Information and Control. 1965;8(3):338-53. [Crossref]
- Zadeh LA. Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans Syst Man Cybern. 1973;3(1):28-44. [Crossref]
- Ross TJ. Fuzzy Logic with Engineering Applications. 3rd ed. Chichester, UK: John Wiley & Sons; 2010. [Crossref]
- Jang, JSR, Sun CT, Mizutani E. Neuro-Fuzzy and Soft Computing-A Computational Approach to Learning and Machine Intelligence. 1st ed. Upper Saddle River, NJ: Prentice Hall; 1996.
- Mamdani EH, Assilian S. An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud. 1975;7(1):1-13. [Crossref]
- Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern. 1985;(1):116-32. [Crossref]
- Sugeno M, Kang GT. Structure identification of fuzzy model. Fuzzy Sets Syst. 1988;28(1):15-33. [Crossref]
- Tsukamoto Y. An approach to fuzzy reasoning method. In: Gupta MM, Ragade RK, Yager RR, eds. Advances in Fuzzy Set Theory and Applications. 1st ed. Amsterdam: Elsevier North-Holland; 1979. p.137-49.
- Rumelhart DE, Durbin R, Golden R, Chauvin Y. Backpropagation: The basic theory. Backpropagation: Theory, Architectures and Applications. 1st ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1995. p.1-34.
- Han J, Pei J, Kamber M. Data Mining: Concepts and Techniques. 3rd ed. Waltham, MA: Elsevier; 2011.
- Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273-97. [Crossref]
- Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. [Crossref]
- Dua D, Karra Taniskidou E. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science; 2017. Access date: [10 March 2022]. Access link: [Link]
- Chi Z, Yan H, Pham T. Fuzzy Algorithms: with Applications to Image Processing and Pattern Recognition. Vol. 10. 1st ed. Singapore: World Scientific; 1996. [Crossref] [PubMed]
- Alcalá-Fdez J, Sánchez L, Garcia S, del Jesus MJ, Ventura S, Garrell JM, et al. KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 2009;13(3):307-18. [Crossref]
- Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, et al. Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing. 2011;17:255-87. [Link]
- Team RC. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2016. [Link]
- Bergmeir CN, Benítez Sánchez JM. Neural networks in R using the Stuttgart neural network simulator: RSNNS. American Statistical Association. Journal of Statistical Software. 2012;46(7). [Crossref]
- Hothorn T, Bühlmann P, Dudoit S, Molinaro A, Van Der Laan MJ. Survival ensembles. Biostatistics. 2005;7(3):355-73. [Crossref] [PubMed]
- Liaw A, Wiener M. Classification and regression by random Forest. R News. 2002;2(3):18-22. [Link]
- Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F, Chang CC, et al. e1071: Misc Functions of the Department of Statistics, Probability Theory Group. Vienna: R Foundation for Statistical Computing; 2017. [Link]
- Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics. 2007;8(1):25. [Crossref] [PubMed] [PMC]
- Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A. Conditional variable importance for random forests. BMC Bioinformatics. 2008;9(1):307. [Crossref] [PubMed] [PMC]
- Riza LS, Bergmeir CN, Herrera F, Benítez Sánchez JM. frbs: Fuzzy rule-based systems for classification and regression in R. American Statistical Association. Journal of Statisticak Software. 2015;65(6). [Crossref]
- Adeli A, Neshat M. A fuzzy expert system for heart disease diagnosis. Proceedings of International Multi Conference of Engineers and Computer Scientists. Hong Kong: IMECS 2010, 17-19 March, 2010. [Link]
- Barman M, Choudhury JP. A fuzzy rule base system for the diagnosis of heart disease. Int J Comput Appl. 2012;57(7). [Link]
- Bhatla N, Jyoti K. A novel approach for heart disease diagnosis using data mining and fuzzy logic. Int J Comput Appl. 2012;54(17):16-21. [Crossref]
- Kumari N, Sunita S. Comparison of ANNs, fuzzy logic and neuro-fuzzy integrated approach for diagnosis of coronary heart disease: a sur-vey. International Journal of Computer Science and Mobile Computing, IJCSMC. 2013;2(6):216-24. [Link]
.: Process List