{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T18:56:09Z","timestamp":1771700169891,"version":"3.50.1"},"reference-count":21,"publisher":"MIT Press","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["TACL"],"published-print":{"date-parts":[[2017,12]]},"abstract":"<jats:p> With the ever growing amount of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure a consistent performance across heterogeneous setups. However, such multiple comparisons pose significant challenges to traditional statistical analysis methods in NLP and can lead to erroneous conclusions. In this paper we propose a Replicability Analysis framework for a statistically sound analysis of multiple comparisons between algorithms for NLP tasks. We discuss the theoretical advantages of this framework over the current, statistically unjustified, practice in the NLP literature, and demonstrate its empirical value across four applications: multi-domain dependency parsing, multilingual POS tagging, cross-domain sentiment classification and word similarity prediction. <\/jats:p>","DOI":"10.1162\/tacl_a_00074","type":"journal-article","created":{"date-parts":[[2018,12,28]],"date-time":"2018-12-28T10:42:50Z","timestamp":1545993770000},"page":"471-486","source":"Crossref","is-referenced-by-count":22,"title":["Replicability Analysis for Natural Language Processing: Testing                     Significance with Multiple Datasets"],"prefix":"10.1162","volume":"5","author":[{"given":"Rotem","family":"Dror","sequence":"first","affiliation":[{"name":"Faculty of Industrial Engineering and Management, Technion, IIT,"}]},{"given":"Gili","family":"Baumer","sequence":"additional","affiliation":[{"name":"Faculty of Industrial Engineering and Management, Technion, IIT,"}]},{"given":"Marina","family":"Bogomolov","sequence":"additional","affiliation":[{"name":"Faculty of Industrial Engineering and Management, Technion, IIT,"}]},{"given":"Roi","family":"Reichart","sequence":"additional","affiliation":[{"name":"Faculty of Industrial Engineering and Management, Technion, IIT,"}]}],"member":"281","reference":[{"issue":"7391","key":"p_5","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1038\/483531a","volume":"483","author":"Glenn Begley C.","year":"2012","journal-title":"Nature"},{"issue":"4","key":"p_6","doi-asserted-by":"crossref","first-page":"1215","DOI":"10.1111\/j.1541-0420.2007.00984.x","volume":"64","author":"Benjamini Yoav","year":"2008","journal-title":"Biometrics"},{"key":"p_7","author":"Benjamini Yoav","year":"1995","journal-title":"Journal of the Royal Statistical Society. Series B (Methodological), pages 289- 300."},{"issue":"1906","key":"p_9","doi-asserted-by":"crossref","first-page":"4255","DOI":"10.1098\/rsta.2009.0127","volume":"367","author":"Benjamini Yoav","year":"2009","journal-title":"Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences"},{"key":"p_15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1613\/jair.4135","volume":"49","author":"Bruni Elia","year":"2014","journal-title":"Journal of Artificial Intelligence Research (JAIR)"},{"issue":"6","key":"p_21","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1177\/1745691612462588","volume":"7","author":"Collaboration Open Science","year":"2012","journal-title":"Perspectives on Psychological Science"},{"key":"p_25","first-page":"1","volume":"7","author":"Dem\u0161ar Janez","year":"2006","journal-title":"Journal of Machine Learning Research"},{"issue":"46","key":"p_34","doi-asserted-by":"crossref","first-page":"16262","DOI":"10.1073\/pnas.1314814111","volume":"111","author":"Heller Ruth","year":"2014","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"2","key":"p_35","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1093\/cje\/bet075","volume":"38","author":"Herndon Thomas","year":"2014","journal-title":"Cambridge Journal of Economics"},{"key":"p_36","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00237"},{"issue":"4","key":"p_38","doi-asserted-by":"crossref","first-page":"800","DOI":"10.1093\/biomet\/75.4.800","volume":"75","author":"Hochberg Yosef","year":"1988","journal-title":"Biometrika"},{"issue":"2","key":"p_39","first-page":"65","volume":"6","author":"Holm Sture","year":"1979","journal-title":"Scandinavian Journal of Statistics"},{"issue":"2","key":"p_40","doi-asserted-by":"crossref","first-page":"383","DOI":"10.1093\/biomet\/75.2.383","volume":"75","author":"Hommel Gerhard","year":"1988","journal-title":"Biometrika"},{"issue":"6","key":"p_45","doi-asserted-by":"crossref","first-page":"1645","DOI":"10.1073\/pnas.1421412111","volume":"112","author":"Leek Jeffrey T.","year":"2015","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"3","key":"p_48","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1016\/j.csda.2003.11.020","volume":"47","author":"Loughin Thomas M.","year":"2004","journal-title":"Computational Statistics & Data Analysis"},{"issue":"2","key":"p_52","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/BF02295996","volume":"12","author":"McNemar Quinn","year":"1947","journal-title":"Psychometrika"},{"issue":"2","key":"p_55","doi-asserted-by":"crossref","first-page":"e28","DOI":"10.1371\/journal.pmed.0040028","volume":"4","author":"Moonesinghe Ramal","year":"2007","journal-title":"PLoS Med"},{"issue":"3","key":"p_58","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1162\/COLI_a_00194","volume":"40","author":"S\u00e9aghdha Diarmuid","year":"2014","journal-title":"Computational Linguistics"},{"issue":"6060","key":"p_60","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1126\/science.1213847","volume":"334","author":"Peng Roger D.","year":"2011","journal-title":"Science"},{"issue":"10","key":"p_66","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1145\/365628.365657","volume":"8","author":"Rubenstein Herbert","year":"1965","journal-title":"Communications of the ACM"},{"issue":"2","key":"p_73","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1037\/0033-2909.87.2.245","volume":"87","author":"Steiger James H.","year":"1980","journal-title":"Psychological Bulletin"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/tacl_a_00074","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T16:38:15Z","timestamp":1615567095000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/43410"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12]]},"references-count":21,"alternative-id":["10.1162\/tacl_a_00074"],"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00074","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,12]]}}}