{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T17:05:30Z","timestamp":1770915930612,"version":"3.50.1"},"reference-count":62,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,5,18]],"date-time":"2021-05-18T00:00:00Z","timestamp":1621296000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Animals"],"abstract":"<jats:p>This study focuses on the problem of assessing inter-observer reliability (IOR) in the case of dichotomous categorical animal-based welfare indicators and the presence of two observers. Based on observations obtained from Animal Welfare Indicators (AWIN) project surveys conducted on nine dairy goat farms, and using udder asymmetry as an indicator, we compared the performance of the most popular agreement indexes available in the literature: Scott\u2019s \u03c0, Cohen\u2019s k, kPABAK, Holsti\u2019s H, Krippendorff\u2019s \u03b1, Hubert\u2019s \u0393, Janson and Vegelius\u2019 J, Bangdiwala\u2019s B, Andr\u00e9s and Marzo\u2019s \u2206, and Gwet\u2019s \u03b3(AC1). Confidence intervals were calculated using closed formulas of variance estimates for \u03c0, k, kPABAK,\u00a0H,\u00a0\u03b1,\u00a0\u0393, J, \u2206, and \u03b3(AC1), while the bootstrap and exact bootstrap methods were used for all the indexes. All the indexes and closed formulas of variance estimates were calculated using Microsoft Excel. The bootstrap method was performed with R software, while the exact bootstrap method was performed with SAS software. k, \u03c0, and \u03b1 exhibited a paradoxical behavior, showing unacceptably low values even in the presence of very high concordance rates. B and \u03b3(AC1) showed values very close to the concordance rate, independently of its value. Both bootstrap and exact bootstrap methods turned out to be simpler compared to the implementation of closed variance formulas and provided effective confidence intervals for all the considered indexes. The best approach for measuring IOR in these cases is the use of B or \u03b3(AC1), with bootstrap or exact bootstrap methods for confidence interval calculation.<\/jats:p>","DOI":"10.3390\/ani11051445","type":"journal-article","created":{"date-parts":[[2021,5,18]],"date-time":"2021-05-18T12:17:16Z","timestamp":1621340236000},"page":"1445","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Evaluation of Inter-Observer Reliability of Animal Welfare Indicators: Which Is the Best Index to Use?"],"prefix":"10.3390","volume":"11","author":[{"given":"Mauro","family":"Giammarino","sequence":"first","affiliation":[{"name":"Department of Prevention, Asl TO3, Veterinary Service, Area Animal Sanity, 10045 Piossasco, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1885-949X","authenticated-orcid":false,"given":"Silvana","family":"Mattiello","sequence":"additional","affiliation":[{"name":"Department of Agricultural and Environmental Sciences\u2014Production, Landscape, Agroenergy, University of Milan, 20133 Milan, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6134-7759","authenticated-orcid":false,"given":"Monica","family":"Battini","sequence":"additional","affiliation":[{"name":"Department of Agricultural and Environmental Sciences\u2014Production, Landscape, Agroenergy, University of Milan, 20133 Milan, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6679-7169","authenticated-orcid":false,"given":"Piero","family":"Quatto","sequence":"additional","affiliation":[{"name":"Department of Economics, Management and Statistics, University of Milan-Bicocca, 20126 Milan, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2136-3826","authenticated-orcid":false,"given":"Luca Maria","family":"Battaglini","sequence":"additional","affiliation":[{"name":"Department of Agricultural, Forest and Food Sciences, University of Turin, 10095 Grugliasco, Italy"}]},{"given":"Ana C. L.","family":"Vieira","sequence":"additional","affiliation":[{"name":"Centre for Management Studies of Instituto Superior T\u00e9cnico (CEG-IST), University of Lisbon, 1049-001 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3733-3223","authenticated-orcid":false,"given":"George","family":"Stilwell","sequence":"additional","affiliation":[{"name":"Department of Veterinary Medicine, University of Lisbon, 1300-477 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4296-7589","authenticated-orcid":false,"given":"Manuela","family":"Renna","sequence":"additional","affiliation":[{"name":"Department of Veterinary Sciences, University of Turin, 10095 Grugliasco, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2021,5,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"6625","DOI":"10.3168\/jds.2013-7493","article-title":"Invited review: Animal-based indicators for on-farm welfare assessment for dairy goats","volume":"97","author":"Battini","year":"2014","journal-title":"J. Dairy Sci."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.applanim.2009.02.026","article-title":"Observer ratings: Validity and value as a tool for animal welfare research","volume":"119","author":"Meagher","year":"2009","journal-title":"Appl. Anim. Behav. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1487","DOI":"10.1016\/j.anbehav.2009.09.014","article-title":"Can you believe my eyes? The importance of interobserver reliability statistics in observations of animal behavior","volume":"78","author":"Kaufman","year":"2009","journal-title":"Anim. Behav."},{"key":"ref_4","first-page":"411","article-title":"Reliability in content analysis: Some common misconceptions and recommendations","volume":"30","author":"Krippendorff","year":"2004","journal-title":"Hum. Commun. Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A coefficient of agreement for nominal scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educ. Psychol. Meas."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"103","DOI":"10.4081\/ijas.2009.s1.103","article-title":"The welfare of dairy buffalo","volume":"8","author":"Grasso","year":"2009","journal-title":"Ital. J. Anim. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2611","DOI":"10.1177\/0962280214529560","article-title":"Assessing the inter-rater agreement for ordinal data through weighted indexes","volume":"25","author":"Marasini","year":"2016","journal-title":"Stat. Methods Med. Res."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1079","DOI":"10.1080\/1828051X.2020.1816509","article-title":"Inter-rater reliability of welfare outcome assessment by an expert and farmers of South Tyrolean dairy farming","volume":"19","author":"Katzenberger","year":"2020","journal-title":"Ital. J. Anim. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1016\/j.jevs.2019.02.005","article-title":"Interobserver reliability of the animal welfare indicators welfare assessment protocol for horses","volume":"75","author":"Czycholl","year":"2019","journal-title":"J. Equine Vet. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.applanim.2019.02.004","article-title":"Reliability of different behavioral tests for growing pigs on-farm","volume":"213","author":"Czycholl","year":"2019","journal-title":"Appl. Anim. Behav. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1712","DOI":"10.1017\/S1751731118003701","article-title":"Inter- and intra-observer reliability of animal welfare indicators for the on-farm self-assessment of fattening pigs","volume":"13","author":"Pfeifer","year":"2019","journal-title":"Animal"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1942","DOI":"10.1017\/S1751731117003597","article-title":"Inter-observer reliability of animal-based welfare indicators included in the Animal Welfare Indicators welfare assessment protocol for dairy goats","volume":"12","author":"Vieira","year":"2018","journal-title":"Animal"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"6886","DOI":"10.3168\/jds.2015-9350","article-title":"Application of the Welfare Quality protocol to dairy buffalo farms: Prevalence and reliability of selected measures","volume":"98","author":"Grasso","year":"2015","journal-title":"J. Dairy Sci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"e100","DOI":"10.1016\/j.tvjl.2011.01.012","article-title":"Inter-observer reliability testing of pig welfare outcome measures proposed for inclusion within farm assurance schemes","volume":"190","author":"Mullan","year":"2011","journal-title":"Vet. J."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Mattiello, S., Battini, M., De Rosa, G., Napolitano, F., and Dwyer, C. (2019). How Can We Assess Positive Welfare in Ruminants?. Animals, 9.","DOI":"10.3390\/ani9100758"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Spigarelli, C., Zuliani, A., Battini, M., Mattiello, S., and Bovolenta, S. (2020). Welfare Assessment on Pasture: A Review on Animal-Based Measures for Ruminants. Animals, 10.","DOI":"10.3390\/ani10040609"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"e651","DOI":"10.7717\/peerj.651","article-title":"Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department","volume":"2","author":"Walsh","year":"2014","journal-title":"PeerJ"},{"key":"ref_18","first-page":"385","article-title":"A simulation study of rater agreement measures with 2x2 contingency tables","volume":"32","author":"Ato","year":"2011","journal-title":"Psicol\u00f3gica"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1086\/266577","article-title":"Reliability of content analysis: The case of nominal scale coding","volume":"19","author":"Scott","year":"1955","journal-title":"Public Opin. Q."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1086\/266520","article-title":"Communications through limited response questioning","volume":"18","author":"Bennett","year":"1954","journal-title":"Public Opin. Q."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1348\/000711006X126600","article-title":"Computing inter-rater reliability and its variance in presence of high agreement","volume":"61","author":"Gwet","year":"2008","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1080\/01621459.1985.10477157","article-title":"Modeling agreement among raters","volume":"80","author":"Tanner","year":"1985","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"293","DOI":"10.2307\/2531434","article-title":"Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen\u2019s kappa","volume":"46","author":"Aickin","year":"1990","journal-title":"Biometrics"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1348\/000711004849268","article-title":"Delta: A new measure of agreement between two raters","volume":"57","author":"Marzo","year":"2004","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"ref_25","unstructured":"AWIN (Animal Welfare Indicators) (2021, May 03). AWIN Welfare Assessment Protocol for Goats. Available online: https:\/\/air.unimi.it\/retrieve\/handle\/2434\/269102\/384790\/AWINProtocolGoats.pdf."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"934","DOI":"10.3390\/ani5040393","article-title":"On-farm welfare assessment protocol for adult dairy goats in intensive production systems","volume":"5","author":"Battini","year":"2015","journal-title":"Animals"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1177\/001316446402400402","article-title":"A note on the G index of agreement","volume":"34","author":"Holley","year":"1964","journal-title":"Educ. Psychol. Meas."},{"key":"ref_28","first-page":"145","article-title":"Un test di concordanza tra pi\u00f9 esaminatori","volume":"64","author":"Quatto","year":"2004","journal-title":"Statistica"},{"key":"ref_29","unstructured":"Holsti, O.R. (1969). Content Analysis for the Social Sciences and Humanities, Addison-Wesley."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1177\/001316447003000105","article-title":"Estimating the reliability, systematic error and random error of interval data","volume":"30","author":"Krippendorff","year":"1970","journal-title":"Educ. Psychol. Meas."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1111\/j.2044-8317.1977.tb00728.x","article-title":"Nominal scale response agreement as a generalized correlation","volume":"30","author":"Hubert","year":"1977","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1177\/014662167800200113","article-title":"On the applicability of truncated component analysis based on correlation coefficients for nominal scales","volume":"2","author":"Janson","year":"1978","journal-title":"Appl. Psychol. Meas."},{"key":"ref_33","unstructured":"Bishop, Y.M.M., Fienberg, S.E., and Holland, P.W. (1985, January 12\u201322). A graphical test for observer agreement. Proceedings of the 45th International Statistical Institute Meeting, Amsterdam, The Netherlands."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1214\/aos\/1176344552","article-title":"Bootstrap methods: Another look at the jackknife","volume":"7","author":"Efron","year":"1979","journal-title":"Ann. Stat."},{"key":"ref_35","first-page":"467","article-title":"An exact bootstrap confidence interval for k in small samples","volume":"51","author":"Klar","year":"2002","journal-title":"J. R. Stat. Soc. Ser. D-Stat."},{"key":"ref_36","first-page":"345","article-title":"The \u2018exact\u2019 bootstrap approach to confidence intervals for the relative difference statistic","volume":"36","author":"Kinsella","year":"1987","journal-title":"J. R. Stat. Soc. Ser. D-Stat."},{"key":"ref_37","unstructured":"Quatto, P., and Ripamonti, E. (2021, May 05). Raters: A Modification of Fleiss\u2019 Kappa in Case of Nominal and Ordinal Variables. R Package Version 2.0.1. Available online: https:\/\/CRAN.R-project.org\/package=raters."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v017.i03","article-title":"The Strucplot Framework: Visualizing Multi-Way contingency Table with vcd","volume":"17","author":"Meyer","year":"2006","journal-title":"J. Stat. Softw."},{"key":"ref_39","unstructured":"S Original, from StatLib and by Tibshirani, R. R Port by Friedrich Leisch (2021, May 05). Bootstrap: Functions for the Book \u201dAn Introduction to the Bootstrap\u201d. R Package Version 2019.6. Available online: https:\/\/CRAN.R-project.org\/packages=bootstrap."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"3","DOI":"10.2307\/3315487","article-title":"Beyond kappa: A review of interrater agreement measures","volume":"27","author":"Banerjee","year":"1999","journal-title":"Can. J. Stat.-Rev. Can. Stat."},{"key":"ref_41","unstructured":"Wang, W. (2011). A Content Analysis of Reliability in Advertising Content Analysis Studies. [Master\u2019s Thesis, Department of Communication, East Tennessee State Univ.]. Available online: https:\/\/dc.etsu.edu\/etd\/1375."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1111\/j.1468-2958.2002.tb00826.x","article-title":"Content analysis in mass communication: Assessment and reporting of intercoder reliability","volume":"28","author":"Lombard","year":"2002","journal-title":"Hum. Commun. Res."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1093\/swr\/35.3.185","article-title":"A Kappa-related decision: K, Y, G, or AC1","volume":"35","author":"Kuppens","year":"2011","journal-title":"Soc. Work Res."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"543","DOI":"10.1016\/0895-4356(90)90158-L","article-title":"High agreement but low kappa: I. The problem of two paradoxes","volume":"43","author":"Feinstein","year":"1990","journal-title":"J. Clin. Epidemiol."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1016\/0895-4356(95)00571-4","article-title":"Behavior and interpretation of the \u03ba statistics: Resolution of the two paradoxes","volume":"49","author":"Lantz","year":"1996","journal-title":"J. Clin. Epidemiol."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1016\/0895-4356(93)90018-V","article-title":"Bias, prevalence and kappa","volume":"46","author":"Byrt","year":"1993","journal-title":"J. Clin. Epidemiol."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Shankar, V., and Bangdiwala, S.I. (2014). Observer agreement paradoxes in 2 \u00d7 2 tables: Comparison of agreement measures. BMC Med. Res. Methodol., 14.","DOI":"10.1186\/1471-2288-14-100"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"159","DOI":"10.2307\/2529310","article-title":"The measurement of observer agreement for categorical data","volume":"33","author":"Landis","year":"1977","journal-title":"Biometrics"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"378","DOI":"10.1037\/h0031619","article-title":"Measuring nominal scale agreement among many raters","volume":"76","author":"Fleiss","year":"1981","journal-title":"Psychol. Bull."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1016\/0895-4356(90)90159-M","article-title":"High agreement but low kappa: II. Resolving the paradoxes","volume":"43","author":"Cicchetti","year":"1990","journal-title":"J. Clin. Epidemiol."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1177\/001316448104100307","article-title":"Coefficient kappa: Some uses, misuses, and alternatives","volume":"41","author":"Brennan","year":"1981","journal-title":"Educ. Psychol. Meas."},{"key":"ref_52","unstructured":"Zhao, X. (2011, January 10\u201313). When to Use Scott\u2019s \u03c0 or Krippendorff\u2019s \u03b1, If Ever?. Presented at the Annual Conference of Association for Education in Journalism and Mass Communication, St. Louis, MO, USA. Available online: https:\/\/repository.hkbu.edu.hk\/cgi\/viewcontent.cgi?referer=&httpsredir=1&article=1002&context=coms_conf."},{"key":"ref_53","unstructured":"Gwet, K.L. (2021, March 22). On Krippendorff\u2019s Alpha Coefficient. Available online: http:\/\/www.bwgriffin.com\/gsu\/courses\/edur9131\/content\/onkrippendorffalpha.pdf."},{"key":"ref_54","first-page":"151","article-title":"On avoiding paradoxes in assessing inter-rater agreement","volume":"22","author":"Falotico","year":"2010","journal-title":"Ital. J. Appl. Stat."},{"key":"ref_55","unstructured":"Friendly, M. (2000). Visualizing Categorical Data, SAS Institute."},{"key":"ref_56","unstructured":"McCray, G. (2013, January 15\u201317). Assessing Inter-Rater Agreement for Nominal Judgement Variables. Presented at the Language Testing Forum, University of Lancaster, Nottingham, UK. Available online: https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.725.8104&rep=rep1&type=pdf."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Wongpakaran, N., Wongpakaran, T., Wedding, D., and Gwet, K.L. (2013). A comparison of Cohen\u2019s Kappa and Gwet\u2019s AC1 when calculating inter-rater reliability coefficients: A study conducted with personality disorder samples. BMC Med. Res. Methodol., 13.","DOI":"10.1186\/1471-2288-13-61"},{"key":"ref_58","unstructured":"Kendall, M.G. (1955). Rank Correlation Methods, Hafner Publishing Co."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1177\/014662168200600111","article-title":"The J-index as a measure of nominal scale response agreement","volume":"6","author":"Janson","year":"1982","journal-title":"Appl. Psychol. Meas."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1037\/h0028106","article-title":"Large-sample standard errors of kappa and weighted kappa","volume":"72","author":"Fleiss","year":"1969","journal-title":"Psychol. Bull."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1111\/j.2044-8317.1968.tb00400.x","article-title":"Moments of the statistics kappa and weighted kappa","volume":"21","author":"Everitt","year":"1968","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"3275","DOI":"10.1002\/1097-0258(20001215)19:23<3275::AID-SIM626>3.0.CO;2-M","article-title":"Statistics in medical journals: Some recent trends","volume":"19","author":"Altman","year":"2000","journal-title":"Stat. Med."}],"container-title":["Animals"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2076-2615\/11\/5\/1445\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:03:15Z","timestamp":1760162595000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2076-2615\/11\/5\/1445"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,18]]},"references-count":62,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["ani11051445"],"URL":"https:\/\/doi.org\/10.3390\/ani11051445","relation":{},"ISSN":["2076-2615"],"issn-type":[{"value":"2076-2615","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,18]]}}}