Amaç: Makalenin amacı, nominal verilerde kategorik uyum için tarafımızdan önerilen yeni bir yöntem sunmak, uyum kavramına genel bir bakış sağlamak ve bilimsel araştırmalardaki önemini vurgulamaktır. Gereç ve Yöntemler: Sosyal, davranışsal, fiziksel, biyolojik ve tıbbi bilimlerde güvenilir ve doğru ölçümler, değerlendirme için temel oluşturmaktadır. Yöntem karşılaştırma ve güvenilirlik çalışmalarında, farklı gözlemciler veya enstrümanlarla yapılan çoklu ölçümler arasındaki uyumu değerlendirmek önemlidir. Literatürde, 2 ölçüm arasındaki ilişkiyi veya uyumu özetlemek için çok sayıda indeks geliştirilmiştir. Bu çalışmada, aynı denek ya da örnekteki farklı gözlemciler, yöntemler, araçlar, laboratuvarlar, tahliller, cihazlar vb. ile yapılan ölçme ya da okumaların karşılaştırılmasında kullanılan yöntemler dikkate alınmış, sürekli ve kategorik verilerin söz konusu olduğu durumlarda uyumu değerlendirmek için kullanılan istatistiksel yaklaşımlar gözden geçirilmiştir. Çalışmada, örnek olarak kullanılan veriler tamamen rastgele türetilmiş verilerdir. Bulgular: Gerek Helldén tarafından gerekse tarafımızdan önerilen yöntemler, uyum değerlerinin 0-1 arasında kalmasını garanti etmektedir. Üstelik her 2 yöntemin uyumsuzluktan diğer yöntemler kadar etkilenmediği, gerçeğe daha yakın sonuçlar verdiği söylenebilir. Sonuç: Kategori uyumu için 'uyum oranı' kriterine göre 2 karar verici için Helldén tarafından önerilen yöntemin, 3 ve daha fazla sayıda karar verici için ise tarafımızdan önerilen yöntemin kullanılmasının uygun olacağı sonucuna ulaşılmıştır. Bazı durumlarda alternatif yöntemlerin, değerlendiriciler arasındaki uyumun belirlenmesinde daha uygun olabileceği unutulmamalıdır
Anahtar Kelimeler: Uyum; kategorik uyum; yöntem karşılaştırma; güvenilirlik
ABSTRACT Objective: The aim of the article is to present a new method proposed by us for categorical agreement in nominal data, to provide an overview of the concept of agreement and to emphasize its importance in scientific research. Material and Methods: In social, behavioral, physical, biological, and medical sciences, reliable and accurate measurements serve as the basis for evaluation. In method comparison and reliability studies, it is often important to assess agreement between multiple measurements made by different observers or devices. The literature contains a vast amount of coefficients for summarizing association or agreement between two measurements. In this study, the comparison of measurement or reading methods performed by different observers, methods, instruments, laboratories, tests, devices, etc. on the same subject or sample is dealt with. In addition, statistical approaches were reviewed to evaluate the agreement with continuous and categorical responses. The data used as an example in the study are completely randomly derived data. Results: The methods suggested by both Helldén and us ensure that the agreement values remain between 0 and 1. It can be said that both methods are not affected by disagreements as much as other methods and give results closer to the truth. Conclusion: According to the 'agreement ratio' criterion for category agreement, it was concluded that the method suggested by Helldén would be appropriate for two decision makers and the method suggested by us would be appropriate for three or more decision makers. However, alternative measures of interrater agreement may be more appropriate in certain instances
Keywords: Agreement; categorical agreement; method comparison; reliability
- Lin L, Hedayat AS, Sinha B, Yang M. Statistical methods in assessing agreement: models, issues and tools. J Am Stat Assoc. 2002;97(457):257-70.[Crossref]
- Zapf A, Castell S, Morawietz L, Karch A. Measuring inter-rater reliability for nominal data - which coefficients and confidence intervals are appropriate? BMC Med Res Methodol. 2016;16:93.[Crossref] [PubMed] [PMC]
- Bryington AA, Palmer DJ, Watkins MW. The estimation of interobserver agreement in behavioral assessment. The Behavior Analyst Today. 2002;3(3):323-8.[Crossref]
- Ato M, López JJ, Benavente A. A simulation study of rater agreement measures with 2x2 contingency tables. Psicológica. 2011;32(2):385-402.
- Xinshu Z, Liu JS, Deng K. Assumptions behind intercoder reliability indices. In: Solmon CT, ed. Communication Yearbook 36. New York: Routledge; 2013. p.419-80.[Crossref]
- Adejumo AO, Heumann C, Toutenburg H. A review of agreement measure as a subset of association measure between raters. SFB386-Discussion Paper 385, Ludwig-Maximilians-Universitat, München, 2004.[Link]
- Bloch DA, Kraemer HC. 2 x 2 kappa coefficients: measures of agreement or association. Biometrics. 1989;45(1):269-87.[Crossref] [PubMed]
- Aickin M. Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen's kappa. Biometrics. 1990;46(2):293-302.[Crossref] [PubMed]
- Berry KJ, Mielke PW. A generalization of Cohen's kappa agreement measure to interval measurement and multiple raters. Educ Psychol Meas. 1988;48(4):921-33.[Crossref]
- Krippendorff K. Reliability in content analysis: some common misconceptions and recommendations. Hum Commun Res. 2004;30(3):411-33.[Crossref]
- Benini R. No. 29 of Manuali Barbèra di Scienza Giuridiche Sociale e Politiche. In: Firenze GB, ed. Principii di Demografia. 1901.
- Yule GU. On the methods of measuring the association between two attributes. J R Stat Soc. 1912;75(6):579-652.[Crossref]
- Gini C. Atti del Reale Istituto veneto di scienze, lettere ed arti. Nuovi contribute alla teoria delle relazioni statistiche. 1914-1915;74:1903-42.
- Guttman L. The test-retest reliability of qualitative data. Psychometrika. 1946;11:81-95.[Crossref] [PubMed]
- Goodman LA, Kruskal WH. Measures of association for cross classifications. J Am Stat Assoc. 1954;49(268):732-64.[Crossref]
- Fisher RA. Statistical Methods for Research Workers.12th ed. Edinburgh: Oliver and Boyd; 1954.
- Bennett EM, Alpert R, Goldstein AC. Communications through limited-response questioning. Public Opinion Quarterly. 1954;18:303-8.[Crossref]
- Scott WA. Reliability of content analysis: the case of nominal scale coding. Public Opin Q. 1955;19(3):321-5.[Crossref]
- Osgood CE. Representational model and relevant research methods. In: Pool I, ed. Trends in Content Analysis. Urbana: Illinois Press; 1959. p.33-88.
- Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37-46.[Crossref]
- Holley JW, Guilford JP. A note on the G-index of agreement. Educ Psychol Meas. 1964;24(4):749-53.[Crossref]
- Rogot E, Goldberg ID. A proposed index for measuring agreement in test-retest studies. J Chronic Dis. 1966;19(9):991-1006.[Crossref] [PubMed]
- Holsti OR. Content Analysis for the Social Sciences and Humanities. Reading, MA: Addison-Wesley; 1969.
- Krippendorff K. Bivariate agreement coefficients for reliability data. In: Borgatta ER, Bohrnstedt GW, eds. Sociological methodology. San Fransisco, CA: Jossey Bass; 1970. p.139-50.[Crossref]
- Cicchetti DV. Assessing inter-rater reliability for rating scales: resolving some basic issues. Br J Psychiatry. 1976;129:452-6.[Crossref] [PubMed]
- Maxwell AE. Coefficients of agreement between observers and their interpretation. Br J Psychiatry. 1977;130:79-83.[Crossref] [PubMed]
- Janson S, Vegelius J. On generalizations of the G index and the phi coefficient to nominal scales. Multivariate Behav Res. 1979;14(2):255-69.[Crossref] [PubMed]
- Brennan RL, Prediger DJ. Coefficient kappa: some uses, misuses, and alternatives. Educ Psychol Meas. 1981;41(3):687-99.[Crossref]
- Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. The Statistician. 1983;32(3):307-17.[Crossref]
- Popping R. Traces of agreement: on the dot-product as a coefficient of agreement. Qual Quant. 1983;17:1-18.[Crossref]
- Bangdiwala S. The agreement chart. University of North Carolina Institute of Statistics, Mimeo Series No. 1859, 1988.
- Perreault WD, Leigh LE. Reliability of nominal data based on qualitative judgments. J Mark Res. 1989;26(2):135-48.[Crossref]
- Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989;45(1):255-68.[Crossref] [PubMed]
- Kupper LL, Hafner KB. On assessing interrater agreement for multiple attribute responses. Biometrics. 1989;45(3):957-67.[Crossref] [PubMed]
- Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. J Clin Epidemiol. 1993;46(5):423-9.[Crossref] [PubMed]
- Donner A, Eliasziw M. A hierarchical approach to inferences concerning interobserver agreement for multinomial data. Stat Med. 1997;16(10):1097-106.[Crossref] [PubMed]
- Svensson E. A coefficient of agreement adjusted for bias in paired ordered categorical data. Biometrical Journal. 1997;39:643-57.[Crossref]
- Andrés AM, Marzo PF. Delta: a new measure of agreement between two raters. Br J Math Stat Psychol. 2004;57(Pt 1):1-19.[Crossref] [PubMed]
- von Eye A. An alternative to Cohen's κ. Eur Psychol. 2006;11(1):12-24.[Crossref]
- Barnhart HX, Kosinski AS, Haber MJ. Assessing individual agreement. J Biopharm Stat. 2007;17(4):697-719.[Crossref] [PubMed]
- Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol. 2008;61(Pt 1):29-48.[Crossref] [PubMed]
- Gwet KL. Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. 3rd ed. Maryland: Advanced Analytics, LLC; 2012.
- Warrens MJ. Weighted kappas for 3×3 tables. Journal of Probability and Statistics. 2013;1:1-9.[Crossref]
- Kendall MG. Rank Correlation Methods. 3rd ed. London: Griffin; 1962.
- Maxwell AE, Pilliner AEG. Deriving coefficients of reliability and agreement for ratings. Br J Math Stat Psychol. 1968;21(1):105-16.[Crossref] [PubMed]
- Light RJ. Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol Bull. 1971;76(5):365-77.[Crossref]
- Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76(5):378-82.[Crossref]
- Tinsley HE, Weiss DJ. Interrater reliability and agreement of subjective judgments. Journal of Counseling Psychology. 1975;22(4):358-76.[Crossref]
- Hubert L. Kappa revisited. Psychol Bull. 1977;84(2):289-97.[Crossref]
- Landis JR, Koch GG. A one-way components of variance model for categorical data. Biometrics. 1977;33(4):671-9.[Crossref]
- Fleiss JL, Cuzick J. The reliability of dichotomous judgments: unequal numbers of judges per subject. Appl Psychol Meas. 1979;3(4):537-42.[Crossref]
- Conger AJ. Integration and generalization of kappas for multiple raters. Psychol Bull. 1980;88(2):322-8.[Crossref]
- Kraemer HC. Extension of the kappa coefficient. Biometrics. 1980;36(2):207-16.[Crossref] [PubMed]
- Schouten HJA. Measuring pairwise agreement among many observers. Biom J. 1980;22(6):497-504.[Crossref]
- Craig RT. Generalization of Scott's index of intercoder agreement. Public Opin Q. 1981;45(2):260-4.[Crossref]
- Davies M, Fleiss JL. Measuring agreement for multinomial data. Biometrics. 1982;38(4):1047-51.[Crossref]
- O'Connell DL, Dobson AJ. General observer-agreement measures on individual subjects and groups of subjects. Biometrics. 1984;40(4):973-83.[Crossref]
- Siegel S, Castellan NJ. Nonparametric Statistics for the Behavioural Sciences. 2nd ed. New York: McGraw-Hill; 1988. p.284-90.
- Potter WJ, Levine-Donnerstein D. Rethinking validity and reliability in content analysis. J Appl Commun Res. 1999;27(3):258-84.[Crossref]
- Randolph JJ. Free-marginal multirater Kappa (multirater K[free]): an alternative to Fleiss' fixed-marginal multirater kappa. Joensuu Learning and Instruction Symposium. Joensuu, Finland, Oct 14-15, 2005.
- Mielke PW, Berry KJ, Johnston JE. Unweighted and weighted kappa as measures of agreement for multiple judges. Int J Manag. 2009;26(2):213-23.[Link]
- Hughes J. Sklarsomega: an R package for measuring agreement using Sklar's Omega coefficient. 2018; arXiv:1809.10728v1.[Link]
- van Oest R. A new coefficient of interrater agreement: The challenge of highly unequal category proportions. Psychol Methods. 2019;24(4):439-51.[Crossref] [PubMed]
- Haber M, Barnhart HX. Coefficients of agreement for fixed observers. Stat Methods Med Res. 2006;15(3):255-71.[Crossref] [PubMed]
- Barnhart HX, Haber MJ, Lin LI. An overview on assessing agreement with continuous measurements. J Biopharm Stat. 2007;17(4):529-69.[Crossref] [PubMed]
- Örekici Temel G, Erdoğan S, Selvi H, Kaya Ersöz İ. Investigation of coefficient of individual agreement in terms of sample size, random and monotone missing ratio, and number of repeated measures. KUYEB Dergisi. 2016;16(4):1381-95.[Crossref]
- Gwet KL. Handbook of Inter-rater Reliability. 4th ed. Gaithersburg: Advanced Analytics, LLC; 2014. p.3-25.
- Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, et al. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64(1):96-106.[Crossref] [PubMed]
- Bishop YMM, Feinberg SE, Holland PW. Discrete Multivariate Analysis-Theory and Practice. Cambridge, Mass: The MIT Press; 1975.
- Türk G. Gt index: a measure of the success of prediction. Remote Sens Environ. 1979;8:65-75.[Crossref]
- Helldén U. A test of landsat-2 imagery and digital data for thematic mapping illustrated by an environmental study in northern Kenya, Lund University. 1980.[Link]
- Short NM. The Landsat Tutorial Workbook-Basics of Satellite Remote Sensing. Vol. 1078. NASA reference publication. 1982. p.553.
- Rosenfield GH, Fitzpatrick-Lins K. A coefficient of agreement as a measure of thematic classification accuracy. Photogramm Eng Rem S. 1986;52(2):223-7.[Link]
- Barnhart HX, Yow E, Crowley AL, Daubert MA, Rabineau D, Bigelow R, et al. Choice of agreement indices for assessing and improving measurement reproducibility in a core laboratory setting. Stat Methods Med Res. 2016;25(6):2939-58.[Crossref] [PubMed]
- Lin HM, Kim HY, Williamson JM, Lesser VM. Estimating agreement coefficients from sample survey data. Surv Methodol. 2012;38(1):63-72.[Link]
.: Process List