{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T16:19:41Z","timestamp":1778948381524,"version":"3.51.4"},"reference-count":100,"publisher":"MIT Press","issue":"1","license":[{"start":{"date-parts":[[2022,10,7]],"date-time":"2022-10-07T00:00:00Z","timestamp":1665100800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising number of annotation errors or inconsistencies. To alleviate this issue, many methods for annotation error detection have been devised over the years. While researchers show that their approaches work well on their newly introduced datasets, they rarely compare their methods to previous work or on the same datasets. This raises strong concerns on methods\u2019 general performance and makes it difficult to assess their strengths and weaknesses. We therefore reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets for text classification as well as token and span labeling. In addition, we define a uniform evaluation setup including a new formalization of the annotation error detection task, evaluation protocol, and general best practices. To facilitate future research and reproducibility, we release our datasets and implementations in an easy-to-use and open source software package.1<\/jats:p>","DOI":"10.1162\/coli_a_00464","type":"journal-article","created":{"date-parts":[[2022,10,7]],"date-time":"2022-10-07T13:46:37Z","timestamp":1665150397000},"page":"157-198","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":26,"title":["Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future"],"prefix":"10.1162","volume":"49","author":[{"given":"Jan-Christoph","family":"Klie","sequence":"first","affiliation":[{"name":"Ubiquitous Knowledge Processing Lab, Department of Computer Science, Technical University of Darmstadt. www.ukp.tu-darmstadt.de"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bonnie","family":"Webber","sequence":"additional","affiliation":[{"name":"School of Informatics,, University of Edinburgh"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Iryna","family":"Gurevych","sequence":"additional","affiliation":[{"name":"UKP Lab \/ TU Darmstadt"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2023,3,1]]},"reference":[{"key":"2023030119555669200_","volume-title":"How to Take Smart Notes: One Simple Technique to Boost Writing, Learning and Thinking: For Students, Academics and Nonfiction Book Writers","author":"Ahrens","year":"2017"},{"key":"2023030119555669200_","first-page":"54","article-title":"FLAIR: An easy-to-use framework for state-of-the-art NLP","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Akbik","year":"2019"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"1558","DOI":"10.18653\/v1\/2020.acl-main.142","article-title":"TACRED revisited: A thorough evaluation of the TACRED relation extraction task","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Alt","year":"2020"},{"key":"2023030119555669200_","first-page":"23","article-title":"Error detection for treebank validation","volume-title":"Proceedings of the 9th Workshop on Asian Language Resources","author":"Ambati","year":"2011"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"2006","DOI":"10.18653\/v1\/N18-1182","article-title":"Spotting spurious data with neural networks","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume","author":"Amiri","year":"2018"},{"key":"2023030119555669200_","first-page":"11","article-title":"Automated error correction and validation for POS tagging of Hindi","volume-title":"Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation","author":"Angle","year":"2018"},{"issue":"1","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1609\/aimag.v36i1.2564","article-title":"Truth is a lie: Crowd truth and the seven myths of human annotation","volume":"36","author":"Aroyo","year":"2015","journal-title":"AI Magazine"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"12","DOI":"10.18653\/v1\/W19-4802","article-title":"Sentiment analysis is not solved! Assessing and probing sentiment classification","volume-title":"Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Barnes","year":"2019"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"15","DOI":"10.18653\/v1\/2021.bppf-1.3","article-title":"We need to consider disagreement in evaluation","volume-title":"Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future","author":"Basile","year":"2021"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","DOI":"10.1075\/tilar.6.03beh","volume-title":"Corpora in Language Acquisition Research: History, Methods, Perspectives, volume 6 of Trends in Language Acquisition Research","author":"Behrens","year":"2008"},{"issue":"2","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1007\/s11168-008-9051-9","article-title":"On detecting errors in dependency treebanks","volume":"6","author":"Boyd","year":"2008","journal-title":"Research on Language and Computation"},{"issue":"2","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1145\/335191.335388","article-title":"LOF: Identifying density-based local outliers","volume":"29","author":"Breunig","year":"2000","journal-title":"ACM SIGMOD Record"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972238","volume-title":"Assignment Problems: Revised Reprint","author":"Burkard","year":"2012"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"169","DOI":"10.18653\/v1\/D18-2029","article-title":"Universal sentence encoder for English","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Cer","year":"2018"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"69","DOI":"10.3115\/1072017.1072026","article-title":"MUC-5 evaluation metrics","volume-title":"Proceedings of the Fifth Message Understanding Conference (MUC-5)","author":"Chinchor","year":"1993"},{"key":"2023030119555669200_","first-page":"829","article-title":"Example-based robust outlier detection in high dimensional datasets","volume-title":"Fifth IEEE International Conference on Data Mining (ICDM\u201905)","author":"Zhu","year":"2005"},{"key":"2023030119555669200_","first-page":"233","article-title":"The relationship between precision-recall and ROC curves","volume-title":"Proceedings of the 23rd International Conference on Machine Learning - ICML \u201906","author":"Davis","year":"2006"},{"issue":"1","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"20","DOI":"10.2307\/2346806","article-title":"Maximum likelihood estimation of observer error-rates using the EM algorithm","volume":"28","author":"Dawid","year":"1979","journal-title":"Applied Statistics"},{"key":"2023030119555669200_","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2023030119555669200_","first-page":"265","article-title":"From detecting errors to automatically correcting them","volume-title":"11th Conference of the European Chapter of the Association for Computational Linguistics","author":"Dickinson","year":"2006"},{"issue":"3","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1111\/lnc3.12129","article-title":"Detection of annotation errors in Corpora","volume":"9","author":"Dickinson","year":"2015","journal-title":"Language and Linguistics Compass"},{"key":"2023030119555669200_","first-page":"605","article-title":"Detecting errors in semantic annotation","volume-title":"Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC\u201908)","author":"Dickinson","year":"2008"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"107","DOI":"10.3115\/1067807.1067823","article-title":"Detecting errors in part-of-speech annotation","volume-title":"Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - Volume 1","author":"Dickinson","year":"2003"},{"key":"2023030119555669200_","first-page":"1","article-title":"Detecting inconsistencies in treebanks","volume-title":"Proceedings of the Second Workshop on Treebanks and Linguistic Theories","author":"Dickinson","year":"2003"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"322","DOI":"10.3115\/1219840.1219880","article-title":"Detecting errors in discontinuous structural annotation","volume-title":"Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics - ACL \u201905","author":"Dickinson","year":"2005"},{"key":"2023030119555669200_","first-page":"65","article-title":"Reducing the need for double annotation","volume-title":"Proceedings of the 5th Linguistic Annotation Workshop","author":"Dligach","year":"2011"},{"key":"2023030119555669200_","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1145\/371920.372165","article-title":"Rank aggregation methods for the Web","volume-title":"Proceedings of the Tenth International Conference on World Wide Web - WWW \u201901","author":"Dwork","year":"2001"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"2591","DOI":"10.18653\/v1\/2021.naacl-main.204","article-title":"Beyond black & white: Leveraging annotator disagreement via soft-label multi-task learning","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Fornaciari","year":"2021"},{"key":"2023030119555669200_","first-page":"1050","article-title":"Dropout as a Bayesian approximation: Representing model uncertainty in deep learning","volume-title":"Proceedings of the 33rd International Conference on Machine Learning","author":"Gal","year":"2016"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"42","DOI":"10.21236\/ADA547371","article-title":"Part-of-speech tagging for Twitter: Annotation, features, and experiments","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies","author":"Gimpel","year":"2011"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"24","DOI":"10.18653\/v1\/2020.louhi-1.4","article-title":"Not a cute stroke: Analysis of Rule- and Neural Network-based Information Extraction Systems for Brain Radiology Reports","volume-title":"Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis","author":"Grivas","year":"2020"},{"key":"2023030119555669200_","first-page":"1321","article-title":"On calibration of modern neural networks","volume-title":"Proceedings of the 34th International Conference on Machine Learning","author":"Guo","year":"2017"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"8342","DOI":"10.18653\/v1\/2020.acl-main.740","article-title":"Don\u2019t stop pretraining: Adapt language models to domains and tasks","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Gururangan","year":"2020"},{"key":"2023030119555669200_","first-page":"1113","article-title":"Approximating theoretical linguistics classification in real data: The case of German \u201cnach\u201d particle verbs","volume-title":"Proceedings of COLING 2012","author":"Haselbach","year":"2012"},{"issue":"9","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"7675","DOI":"10.1609\/aaai.v35i9.16938","article-title":"Analysing the noise model error for realistic noisy label data","volume":"35","author":"Hedderich","year":"2021","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"2023030119555669200_","first-page":"2989","article-title":"BPEmb: Tokenization-free pre-trained subword embeddings in 275 languages","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Heinzerling","year":"2018"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"96","DOI":"10.3115\/116580.116613","article-title":"The ATIS spoken language systems pilot corpus","volume-title":"Proceedings of the Workshop on Speech and Natural Language","author":"Hemphill","year":"1990"},{"key":"2023030119555669200_","first-page":"1","article-title":"A baseline for detecting misclassified and out-of-distribution examples in neural networks","volume-title":"Proceedings of International Conference on Learning Representations","author":"Hendrycks","year":"2017"},{"key":"2023030119555669200_","first-page":"3986","article-title":"Inconsistency detection in semantic annotation","volume-title":"Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Hollenstein","year":"2016"},{"key":"2023030119555669200_","first-page":"1120","article-title":"Learning whom to trust with MACE","volume-title":"Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Hovy","year":"2013"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"291","DOI":"10.18653\/v1\/D15-1035","article-title":"Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks","volume-title":"Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing","author":"Jamison","year":"2015"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"427","DOI":"10.18653\/v1\/E17-2068","article-title":"Bag of tricks for efficient text classification","volume-title":"Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers","author":"Joulin","year":"2017"},{"key":"2023030119555669200_","first-page":"1","article-title":"LightGBM: A highly efficient gradient boosting decision tree","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Ke","year":"2017"},{"issue":"1","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1093\/jos\/ffm018","article-title":"Coherence and coreference revisited","volume":"25","author":"Kehler","year":"2007","journal-title":"Journal of Semantics"},{"issue":"1\u20132","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1093\/biomet\/30.1-2.81","article-title":"A new measure of rank correlation","volume":"30","author":"Kendall","year":"1938","journal-title":"Biometrika"},{"key":"2023030119555669200_","first-page":"1","article-title":"Generalization through memorization: Nearest neighbor language models","volume-title":"International Conference on Learning Representations (ICLR)","author":"Khandelwal","year":"2020"},{"key":"2023030119555669200_","first-page":"1","article-title":"Multivariate confidence calibration for object detection","volume-title":"2nd Workshop on Safe Artificial Intelligence for Automated Driving (SAIAD)","author":"K\u00fcppers","year":"2020"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3115\/1072228.1072249","article-title":"(Semi-)automatic detection of errors in PoS-tagged corpora","volume-title":"COLING 2002: The 19th International Conference on Computational Linguistics","author":"Kve\u0306to\u0148","year":"2002"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"5035","DOI":"10.18653\/v1\/2020.coling-main.442","article-title":"Inconsistencies in crowdsourced slot-filling annotations: A typology and identification methods","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Larson","year":"2020"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"517","DOI":"10.18653\/v1\/N19-1051","article-title":"Outlier detection for improved data quality and diversity in dialog systems","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Larson","year":"2019"},{"key":"2023030119555669200_","volume-title":"So Lernt Man Leben [How to Learn to Live]","author":"Leitner","year":"1974"},{"key":"2023030119555669200_","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu","year":"2019","journal-title":"arxiv preprints 11692"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"523","DOI":"10.3115\/1609067.1609125","article-title":"Correcting a POS-Tagged corpus using three complementary methods","volume-title":"Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)","author":"Loftsson","year":"2009"},{"key":"2023030119555669200_","volume-title":"Statistical Theories of Mental Test Scores","author":"Lord","year":"1968"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/978-3-642-19400-9_14","article-title":"Part-of-speech tagging from 97% to 100%: Is it time for some linguistics?","volume-title":"Computational Linguistics and Intelligent Text Processing","author":"Manning","year":"2011"},{"key":"2023030119555669200_","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieval","author":"Manning","year":"2008"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"758","DOI":"10.26615\/978-954-452-056-4_088","article-title":"Turning silver into gold: Error-focused corpus reannotation with active learning","volume-title":"Proceedings - Natural Language Processing in a Deep Learning World","author":"M\u00e9nard","year":"2019"},{"key":"2023030119555669200_","first-page":"2901","article-title":"Obtaining well calibrated probabilities using Bayesian binning","volume-title":"Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence","author":"Naeini","year":"2015"},{"key":"2023030119555669200_","first-page":"4034","article-title":"Universal Dependencies v2: An evergrowing multilingual treebank collection","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference","author":"Nivre","year":"2020"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"1373","DOI":"10.1613\/jair.1.12125","article-title":"Confident learning: Estimating uncertainty in dataset labels","volume":"70","author":"Northcutt","year":"2021","journal-title":"Journal of Artificial Intelligence Research"},{"key":"2023030119555669200_","first-page":"1","article-title":"Pervasive label errors in test sets destabilize machine learning benchmarks","volume-title":"35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track","author":"Northcutt","year":"2021"},{"issue":"0","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"571","DOI":"10.1162\/tacl_a_00040","article-title":"Comparing Bayesian models of annotation","volume":"6","author":"Paun","year":"2018","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2023030119555669200_","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1162\/tacl_a_00293","article-title":"Inherent disagreements in human textual inferences","volume":"7","author":"Pavlick","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"1532","DOI":"10.3115\/v1\/D14-1162","article-title":"GloVe: Global vectors for word representation","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Pennington","year":"2014"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"7","DOI":"10.18653\/v1\/W19-4302","article-title":"To tune or not to tune? Adapting pretrained representations to diverse tasks","volume-title":"Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)","author":"Peters","year":"2019"},{"key":"2023030119555669200_","doi-asserted-by":"crossref","first-page":"742","DOI":"10.3115\/v1\/E14-1078","article-title":"Learning part-of-speech taggers with inter-annotator agreement loss","volume-title":"Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics","author":"Plank","year":"2014"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"507","DOI":"10.3115\/v1\/P14-2083","article-title":"Linguistically debatable or just plain wrong?","volume-title":"Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Plank","year":"2014"},{"issue":"3","key":"2023030119555669200_","first-page":"1","article-title":"Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods","volume":"10","author":"Platt","year":"1999","journal-title":"Advances in Large Margin Classifiers"},{"key":"2023030119555669200_","volume-title":"Natural Language Annotation for Machine Learning","author":"Pustejovsky","year":"2013"},{"key":"2023030119555669200_","doi-asserted-by":"crossref","first-page":"326","DOI":"10.18653\/v1\/2021.sigdial-1.35","article-title":"Annotation inconsistency and entity bias in MultiWOZ","volume-title":"Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue","author":"Qian","year":"2021"},{"key":"2023030119555669200_","doi-asserted-by":"crossref","first-page":"20","DOI":"10.3115\/v1\/W14-4903","article-title":"POS error detection in automatically annotated corpora","volume-title":"Proceedings of LAW VIII - the 8th Linguistic Annotation Workshop","author":"Rehbein","year":"2014"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"3980","DOI":"10.18653\/v1\/D19-1410","article-title":"Sentence-BERT: Sentence embeddings using Siamese BERT-networks","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Reimers","year":"2019"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"215","DOI":"10.18653\/v1\/2020.conll-1.16","article-title":"Identifying incorrect labels in the CoNLL-2003 corpus","volume-title":"Proceedings of the 24th Conference on Computational Natural Language Learning","author":"Reiss","year":"2020"},{"key":"2023030119555669200_","first-page":"1611","article-title":"Deep learning from crowds","volume-title":"Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence","author":"Rodrigues","year":"2018"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"4486","DOI":"10.18653\/v1\/2021.acl-long.346","article-title":"Evaluation examples are not equally informative: How should that change NLP leaderboards?","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Rodriguez","year":"2021"},{"issue":"3","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLOS ONE"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1007\/978-3-319-14206-7_3","article-title":"PartTUT: The Turin University Parallel Treebank","volume-title":"Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project","author":"Sanguinetti","year":"2015"},{"key":"2023030119555669200_","first-page":"1","article-title":"DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter","volume-title":"Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing","author":"Sanh","year":"2019"},{"key":"2023030119555669200_","volume-title":"A Companion to Digital Humanities","author":"Schreibman","year":"2004"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"1833","DOI":"10.18653\/v1\/2021.eacl-main.157","article-title":"How certain is your transformer?","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Shelmanov","year":"2021"},{"key":"2023030119555669200_","first-page":"1631","article-title":"Recursive deep models for semantic compositionality over a sentiment treebank","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Socher","year":"2013"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3152527","article-title":"Learning from noisy labels with deep neural networks: A survey","author":"Song","year":"2020","journal-title":"arxiv preprint, 2007.8199"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"13843","DOI":"10.1609\/aaai.v35i15.17631","article-title":"Re-TACRED: Addressing shortcomings of the TACRED dataset","volume-title":"Proceedings of the 35th AAAI Conference on Artificial Intelligence 2021","author":"Stoica","year":"2021"},{"key":"2023030119555669200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/ICCV.2017.97","article-title":"Revisiting unreasonable effectiveness of data in deep learning era","volume-title":"IEEE International Conference on Computer Vision (ICCV)","author":"Sun","year":"2017"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"9275","DOI":"10.18653\/v1\/2020.emnlp-main.746","article-title":"Dataset cartography: Mapping and diagnosing datasets with training dynamics","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Swayamdipta","year":"2020"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","DOI":"10.1515\/9781400834440","volume-title":"Numbers Rule: The Vexing Mathematics of Democracy, from Plato to the Present","author":"Szpiro","year":"2010"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"142","DOI":"10.3115\/1119176.1119195","article-title":"Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition","volume-title":"Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003","author":"SangErik","year":"2003"},{"key":"2023030119555669200_","first-page":"1","article-title":"A general-purpose crowdsourcing computational quality control toolkit for Python","volume-title":"Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track","author":"Ustalov","year":"2021"},{"key":"2023030119555669200_","first-page":"48","article-title":"The detection of inconsistency in manually tagged text","volume-title":"Proceedings of the COLING-2000 Workshop on Linguistically Interpreted Corpora","author":"van Halteren","year":"2000"},{"key":"2023030119555669200_","first-page":"64","article-title":"Active annotation","volume-title":"Proceedings of the Workshop on Adaptive Text Extraction and Mining (ATEM 2006)","author":"Vlachos","year":"2006"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"5153","DOI":"10.18653\/v1\/D19-1519","article-title":"CrossWeigh: Training named entity tagger from imperfect annotations","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Wang","year":"2019"},{"issue":"6","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"80","DOI":"10.2307\/3001968","article-title":"Individual comparisons by ranking methods","volume":"1","author":"Wilcoxon","year":"1945","journal-title":"Biometrics Bulletin"},{"key":"2023030119555669200_","first-page":"4489","article-title":"Errator: A tool to help detect annotation errors in the Universal Dependencies project","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Wisniewski","year":"2018"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"295","DOI":"10.18653\/v1\/N19-1026","article-title":"A study of incorrect paraphrases in crowdsourced user utterances","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume","author":"Yaghoub-Zadeh-Fard","year":"2019"},{"key":"2023030119555669200_","first-page":"609","article-title":"Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers","volume-title":"Proceedings of the Eighteenth International Conference on Machine Learning","author":"Zadrozny","year":"2001"},{"key":"2023030119555669200_","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1145\/775047.775151","article-title":"Transforming classifier scores into accurate multiclass probability estimates","volume-title":"Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Zadrozny","year":"2002"},{"issue":"3","key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1007\/s10579-016-9343-x","article-title":"The GUM corpus: Creating multilayer resources in the classroom","volume":"51","author":"Zeldes","year":"2017","journal-title":"Language Resources and Evaluation"},{"key":"2023030119555669200_","first-page":"649","article-title":"Character-level convolutional networks for text classification","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1","author":"Zhang","year":"2015"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"5558","DOI":"10.18653\/v1\/2021.acl-long.432","article-title":"Crowdsourcing learning as domain adaptation: A case study on named entity recognition","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Zhang","year":"2021"},{"key":"2023030119555669200_","doi-asserted-by":"publisher","first-page":"11053","DOI":"10.1609\/aaai.v35i12.17319","article-title":"Meta label correction for noisy label learning","volume-title":"Proceedings of the Thirty-fifth AAAI Conference on Artificial Intelligence 2021","author":"Zheng","year":"2021"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/49\/1\/157\/2068980\/coli_a_00464.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/49\/1\/157\/2068980\/coli_a_00464.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,28]],"date-time":"2023-11-28T15:51:12Z","timestamp":1701186672000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/49\/1\/157\/113280\/Annotation-Error-Detection-Analyzing-the-Past-and"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"references-count":100,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,3,1]]},"published-print":{"date-parts":[[2023,3,1]]}},"URL":"https:\/\/doi.org\/10.1162\/coli_a_00464","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023]]},"published":{"date-parts":[[2023]]}}}