{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T11:37:38Z","timestamp":1774957058939,"version":"3.50.1"},"reference-count":97,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2021,7,13]],"date-time":"2021-07-13T00:00:00Z","timestamp":1626134400000},"content-version":"vor","delay-in-days":193,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,7,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity. We introduce NRB, a new testbed carefully designed to diagnose Name Regularity Bias of NER models. Our results indicate that all state-of-the-art models we tested show such a bias; BERT fine-tuned models significantly outperforming feature-based (LSTM-CRF) ones on NRB, despite having comparable (sometimes lower) performance on standard benchmarks.<\/jats:p><jats:p>To mitigate this bias, we propose a novel model-agnostic training method that adds learnable adversarial noise to some entity mentions, thus enforcing models to focus more strongly on the contextual signal, leading to significant gains on NRB. Combining it with two other training strategies, data augmentation and parameter freezing, leads to further gains.<\/jats:p>","DOI":"10.1162\/tacl_a_00386","type":"journal-article","created":{"date-parts":[[2021,9,20]],"date-time":"2021-09-20T20:08:41Z","timestamp":1632168521000},"page":"586-604","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":19,"title":["Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition"],"prefix":"10.1162","volume":"9","author":[{"given":"Abbas","family":"Ghaddar","sequence":"first","affiliation":[{"name":"Huawei Noah\u2019s Ark Lab, Montreal Research Center, Canada. abbas.ghaddar@huawei.com"}]},{"given":"Philippe","family":"Langlais","sequence":"additional","affiliation":[{"name":"RALI\/DIRO, Universit\u00e9 de Montr\u00e9al, Canada. felipe@iro.umontreal.ca"}]},{"given":"Ahmad","family":"Rashid","sequence":"additional","affiliation":[{"name":"Huawei Noah\u2019s Ark Lab, Montreal Research Center, Canada. ahmad.rashid@huawei.com"}]},{"given":"Mehdi","family":"Rezagholizadeh","sequence":"additional","affiliation":[{"name":"Huawei Noah\u2019s Ark Lab, Montreal Research Center, Canada. mehdi.rezagholizadeh@huawei.com"}]}],"member":"281","published-online":{"date-parts":[[2021,7,8]]},"reference":[{"key":"2021071519144018300_bib1","article-title":"Entity-switched datasets: an approach to auditing the in-domain robustness of named entity recognition models","author":"Agarwal","year":"2020","journal-title":"arXiv preprint arXiv:2004.04123"},{"key":"2021071519144018300_bib2","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00397","article-title":"Interpretability analysis for named entity recognition to understand system predictions and how they can improve","author":"Agarwal","year":"2020","journal-title":"arXiv preprint arXiv:2004.04564"},{"key":"2021071519144018300_bib3","first-page":"1638","article-title":"Contextual string embeddings for sequence labeling","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics","author":"Akbik","year":"2018"},{"key":"2021071519144018300_bib4","doi-asserted-by":"publisher","first-page":"586","DOI":"10.1137\/1.9781611974010.66","article-title":"Polyglot-ner: Massive multilingual named entity recognition","volume-title":"Proceedings of the 2015 SIAM International Conference on Data Mining","author":"Al-Rfou","year":"2015"},{"key":"2021071519144018300_bib5","first-page":"84","article-title":"Domain adaption of named entity recognition to support credit risk assessment","volume-title":"Proceedings of the Australasian Language Technology Association Workshop 2015","author":"Alvarado","year":"2015"},{"key":"2021071519144018300_bib6","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1016\/j.csl.2017.01.012","article-title":"Generalisation in named entity recognition: A quantitative analysis","volume":"44","author":"Augenstein","year":"2017","journal-title":"Computer Speech & Language"},{"key":"2021071519144018300_bib7","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.repl4nlp-1.24","article-title":"What\u2019s in a name? Are BERT named entity representations just as good for any other name?","author":"Balasubramanian","year":"2020","journal-title":"arXiv preprint arXiv:2007.06897"},{"key":"2021071519144018300_bib8","doi-asserted-by":"publisher","first-page":"2830","DOI":"10.18653\/v1\/D18=-1307","article-title":"Adversarial training for multi-context joint entity and relation extraction","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Bekoulis","year":"2018"},{"key":"2021071519144018300_bib9","doi-asserted-by":"publisher","first-page":"256","DOI":"10.18653\/v1\/S19-1028","article-title":"On adversarial removal of hypothesis-only bias in natural language inference","volume-title":"Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (SEM 2019)","author":"Belinkov","year":"2019"},{"key":"2021071519144018300_bib10","first-page":"1697","article-title":"HardEval: Focusing on challenging tokens to assess robustness of NER","volume-title":"Proceedings of The 12th Language Resources and Evaluation Conference","author":"Bernier-Colborne","year":"2020"},{"key":"2021071519144018300_bib11","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching word vectors with subword information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021071519144018300_bib12","doi-asserted-by":"publisher","first-page":"1247","DOI":"10.1145\/1376616.1376746","article-title":"Freebase: A collaboratively created graph database for structuring human knowledge","volume-title":"Proceedings of the 2008 ACM SIGMOD international conference on Management of data","author":"Bollacker","year":"2008"},{"key":"2021071519144018300_bib13","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2020-1750","article-title":"Do end-to-end speech recognition models care about context?","volume-title":"Proceedings of Interspeech","author":"Borgholt","year":"2020"},{"key":"2021071519144018300_bib14","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2020-1750","article-title":"Spanish pre-trained bert model and evaluation data","volume":"2020","author":"Canete","year":"2020","journal-title":"PML4DC at ICLR"},{"key":"2021071519144018300_bib15","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.598","article-title":"German\u2019s next language model","author":"Chan","year":"2020","journal-title":"arXiv preprint arXiv:2010.10906"},{"key":"2021071519144018300_bib16","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2014-564","article-title":"One billion word benchmark for measuring progress in statistical language modeling","volume-title":"Fifteenth Annual Conference of the International Speech Communication Association","author":"Chelba","year":"2014"},{"key":"2021071519144018300_bib17","doi-asserted-by":"publisher","first-page":"4324","DOI":"10.18653\/v1\/P19-1425","article-title":"Robust neural machine translation with doubly adversarial inputs","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Cheng","year":"2019"},{"key":"2021071519144018300_bib18","doi-asserted-by":"crossref","first-page":"4060","DOI":"10.18653\/v1\/D19-1418","article-title":"Dont take the easy way out: Ensemble based methods for avoiding known dataset biases","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Clark","year":"2019"},{"key":"2021071519144018300_bib19","article-title":"Learning to model and ignore dataset bias with mixed capacity ensembles","author":"Clark","year":"2020","journal-title":"arXiv preprint arXiv:2011.03856"},{"key":"2021071519144018300_bib20","article-title":"Empirical methods for artificial intelligence","author":"Cohen","year":"1996","journal-title":"IEEE Intelligent Systems"},{"key":"2021071519144018300_bib21","doi-asserted-by":"crossref","first-page":"3861","DOI":"10.18653\/v1\/2020.coling-main.343","article-title":"An analysis of simple data augmentation for named entity recognition","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Dai","year":"2020"},{"key":"2021071519144018300_bib22","article-title":"Evaluating compositionality in sentence embeddings","author":"Dasgupta","year":"2018","journal-title":"arXiv preprint arXiv:1802.04302"},{"key":"2021071519144018300_bib23","doi-asserted-by":"publisher","first-page":"4385","DOI":"10.18653\/v1\/2020.acl-main.404","article-title":"Masking actor information leads to fairer political claims detection","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Dayanik","year":"2020"},{"key":"2021071519144018300_bib24","article-title":"Bertje: A Dutch BERT model","author":"de Vries","year":"2019","journal-title":"arXiv preprint arXiv:1912.09582"},{"key":"2021071519144018300_bib25","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.292","article-title":"Robbert: A Dutch roberta-based language model","author":"Delobelle","year":"2020","journal-title":"arXiv preprint arXiv:2001 .06286"},{"key":"2021071519144018300_bib26","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2021071519144018300_bib27","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.488","article-title":"Daga: Data augmentation with a generation approach for low-resource tagging tasks","author":"Ding","year":"2020","journal-title":"arXiv preprint arXiv:2011.01549"},{"key":"2021071519144018300_bib28","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1162\/tacl_a_00197","article-title":"A joint model for entity analysis: Coreference, typing, and linking","volume":"2","author":"Durrett","year":"2014","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021071519144018300_bib29","doi-asserted-by":"publisher","first-page":"31","DOI":"10.18653\/v1\/P18-2006","article-title":"Hotflip: White-box adversarial examples for text classification","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Ebrahimi","year":"2018"},{"key":"2021071519144018300_bib30","first-page":"3344","article-title":"Government domain named entity recognition for south african languages","volume-title":"Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Eiselen","year":"2016"},{"key":"2021071519144018300_bib31","first-page":"1632","article-title":"Robustness of classifiers: from adversarial to random noise","volume-title":"Proceedings of the 30th International Conference on Neural Information Processing Systems","author":"Fawzi","year":"2016"},{"key":"2021071519144018300_bib32","doi-asserted-by":"publisher","first-page":"1161","DOI":"10.18653\/v1\/D19-1107","article-title":"Are we modeling the task or the annotator? An investigation of annotator bias in natural language understanding datasets","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Geva","year":"2019"},{"key":"2021071519144018300_bib33","doi-asserted-by":"publisher","first-page":"229","DOI":"10.18653\/v1\/K16-1023","article-title":"Coreference in Wikipedia: Main concept resolution","volume-title":"Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning","author":"Ghaddar","year":"2016"},{"key":"2021071519144018300_bib34","first-page":"413","article-title":"Winer: A Wikipedia annotated corpus for named entity recognition","volume-title":"Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Ghaddar","year":"2017"},{"key":"2021071519144018300_bib35","article-title":"Transforming Wikipedia into a large-scale fine-grained entity type corpus","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Ghaddar","year":"2018"},{"key":"2021071519144018300_bib36","article-title":"Explaining and harnessing adversarial examples","author":"Goodfellow","year":"2014","journal-title":"arXiv preprint arXiv:1412.6572"},{"key":"2021071519144018300_bib37","doi-asserted-by":"publisher","first-page":"107","DOI":"10.18653\/v1\/N18-2017","article-title":"Annotation artifacts in natural language inference data","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Gururangan","year":"2018"},{"key":"2021071519144018300_bib38","doi-asserted-by":"publisher","first-page":"132","DOI":"10.18653\/v1\/D19-6115","article-title":"Unlearn dataset bias in natural language inference by fitting the residual","author":"He","year":"2019","journal-title":"EMNLP-IJCNLP 2019"},{"key":"2021071519144018300_bib39","first-page":"187","article-title":"KenLM: Faster and smaller language model queries","volume-title":"Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation","author":"Heafield","year":"2011"},{"key":"2021071519144018300_bib40","article-title":"A baseline for detecting misclassified and out-of-distribution examples in neural networks","author":"Hendrycks","year":"2017","journal-title":"Proceedings of International Conference on Learning Representations"},{"key":"2021071519144018300_bib41","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.244","article-title":"Pretrained transformers improve out-of-distribution robustness","author":"Hendrycks","year":"2020","journal-title":"arXiv preprint arXiv:2004.06100"},{"issue":"8","key":"2021071519144018300_bib42","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2021071519144018300_bib43","first-page":"4597","article-title":"Dane: A named entity resource for Danish","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference","author":"Hvingelby","year":"2020"},{"key":"2021071519144018300_bib44","doi-asserted-by":"publisher","first-page":"3651","DOI":"10.18653\/v1\/P19-1356","article-title":"What Does BERT Learn about the Structure of Language?","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Jawahar","year":"2019"},{"key":"2021071519144018300_bib45","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1162\/tacl_a_00300","article-title":"Spanbert: Improving pre-training by representing and predicting spans","volume":"8","author":"Joshi","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021071519144018300_bib46","first-page":"2479","article-title":"Flaubert: Unsupervised language model pre-training for French","volume-title":"Proceedings of The 12th Language Resources and Evaluation Conference","author":"Le","year":"2020"},{"key":"2021071519144018300_bib47","first-page":"1078","article-title":"Adversarial filters of dataset biases","volume-title":"International Conference on Machine Learning","author":"Le Bras","year":"2020"},{"key":"2021071519144018300_bib48","doi-asserted-by":"crossref","first-page":"5849","DOI":"10.18653\/v1\/2020.acl-main.519","article-title":"A unified MRC framework for named entity recognition","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Li","year":"2020"},{"key":"2021071519144018300_bib49","doi-asserted-by":"crossref","first-page":"7291","DOI":"10.18653\/v1\/2020.emnlp-main.592","article-title":"A rigorous study on named entity recognition: Can fine-tuning pretrained model lead to the promised land?","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Lin","year":"2020"},{"key":"2021071519144018300_bib50","article-title":"Roberta: A robustly optimized bert pretraining approach","author":"Liu","year":"2019","journal-title":"arXiv preprint arXiv:1907.11692"},{"key":"2021071519144018300_bib51","article-title":"Training corpus hr500k 1.0","author":"Ljube\u0161i\u0107","year":"2018"},{"key":"2021071519144018300_bib52","first-page":"4615","article-title":"A broad-coverage corpus for finnish named entity recognition","volume-title":"Proceedings of The 12th Language Resources and Evaluation Conference","author":"Luoma","year":"2020"},{"key":"2021071519144018300_bib53","doi-asserted-by":"publisher","first-page":"8706","DOI":"10.18653\/v1\/2020.acl-main.769","article-title":"End-to-end bias mitigation by modelling biases in corpora","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Mahabadi","year":"2020"},{"key":"2021071519144018300_bib54","doi-asserted-by":"publisher","first-page":"55","DOI":"10.3115\/v1\/P14-5010","article-title":"The Stanford CoreNLP Natural Language Processing Toolkit.","volume-title":"ACL (System Demonstrations)","author":"Manning","year":"2014"},{"key":"2021071519144018300_bib55","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.645","article-title":"Camembert: A tasty French language model","author":"Martin","year":"2019","journal-title":"arXiv preprint arXiv:1911.03894"},{"key":"2021071519144018300_bib56","doi-asserted-by":"publisher","first-page":"8480","DOI":"10.1609\/aaai.v34i05.6368","article-title":"Robust named entity recognition with truecasing pretraining","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Mayhew","year":"2020"},{"key":"2021071519144018300_bib57","doi-asserted-by":"publisher","first-page":"6257","DOI":"10.18653\/v1\/D19-1650","article-title":"ner and pos when nothing is capitalized","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Mayhew","year":"2019"},{"key":"2021071519144018300_bib58","doi-asserted-by":"publisher","first-page":"3428","DOI":"10.18653\/v1\/P19-1334","article-title":"Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"McCoy","year":"2019"},{"key":"2021071519144018300_bib59","doi-asserted-by":"crossref","first-page":"2339","DOI":"10.18653\/v1\/2020.acl-main.212","article-title":"Syntactic data augmentation increases robustness to inference heuristics","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Min","year":"2020"},{"key":"2021071519144018300_bib60","article-title":"Adversarial training methods for semi-supervised text classification","author":"Miyato","year":"2016","journal-title":"arXiv preprint arXiv:1605.07725"},{"key":"2021071519144018300_bib61","article-title":"Improving robustness by augmenting training sentences with predicate-argument structures","author":"Moosavi","year":"2020","journal-title":"arXiv preprint arXiv:2010.12510"},{"key":"2021071519144018300_bib62","article-title":"Adaptive Name Entity Recognition under highly unbalanced data","author":"Nguyen","year":"2020","journal-title":"arXiv preprint arXiv:2003.10296"},{"key":"2021071519144018300_bib63","first-page":"1659","article-title":"Universal dependencies v1: A multilingual treebank collection","volume-title":"Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Nivre","year":"2016"},{"key":"2021071519144018300_bib64","doi-asserted-by":"crossref","first-page":"2841","DOI":"10.18653\/v1\/P19-1273","article-title":"Who sides with whom? Towards computational construction of discourse networks for political debates","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Pad\u00f3","year":"2019"},{"key":"2021071519144018300_bib65","doi-asserted-by":"crossref","first-page":"1946","DOI":"10.18653\/v1\/P17-1178","article-title":"Cross-lingual name tagging and linking for 282 languages","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Pan","year":"2017"},{"key":"2021071519144018300_bib66","doi-asserted-by":"publisher","first-page":"311","DOI":"10.3115\/1073083.1073135","article-title":"Bleu: A method for automatic evaluation of machine translation","volume-title":"Proceedings of the 40th annual meeting of the Association for Computational Linguistics","author":"Papineni","year":"2002"},{"key":"2021071519144018300_bib67","doi-asserted-by":"publisher","first-page":"2227","DOI":"10.18653\/v1\/N18-1202","article-title":"Deep Contextualized Word Representations","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Peters","year":"2018"},{"key":"2021071519144018300_bib68","doi-asserted-by":"publisher","first-page":"180","DOI":"10.18653\/v1\/S18-2023","article-title":"Hypothesis only baselines in natural language inference","volume-title":"Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics","author":"Poliak","year":"2018"},{"key":"2021071519144018300_bib69","first-page":"1","article-title":"CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes","volume-title":"Joint Conference on EMNLP and CoNLL-Shared Task","author":"Pradhan","year":"2012"},{"key":"2021071519144018300_bib70","doi-asserted-by":"publisher","first-page":"147","DOI":"10.3115\/15963741596399","article-title":"Design challenges and misconceptions in named entity recognition","volume-title":"Proceedings of the Thirteenth Conference on Computational Natural Language Learning","author":"Ratinov","year":"2009"},{"key":"2021071519144018300_bib71","first-page":"5389","article-title":"Do imagenet classifiers generalize to imagenet?","volume-title":"International Conference on Machine Learning","author":"Recht","year":"2019"},{"key":"2021071519144018300_bib72","article-title":"Learning from others\u2019 mistakes: Avoiding dataset biases without modeling them","author":"Sanh","year":"2020","journal-title":"arXiv preprint arXiv:2012.01300"},{"key":"2021071519144018300_bib73","doi-asserted-by":"publisher","first-page":"3410","DOI":"10.18653\/v1\/D19-1341","article-title":"Towards debiasing fact verification models","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Schuster","year":"2019"},{"key":"2021071519144018300_bib74","doi-asserted-by":"publisher","first-page":"7398","DOI":"10.1109\/ICASSP.2013.6639100","article-title":"An investigation of deep neural networks for noise robust speech recognition","volume-title":"2013 IEEE International Conference on Acoustics, Sspeech and Signal Processing","author":"Seltzer","year":"2013"},{"key":"2021071519144018300_bib75","doi-asserted-by":"crossref","first-page":"5248","DOI":"10.18653\/v1\/2020.acl-main.468","article-title":"Predictive biases in natural language processing models: A conceptual framework and overview","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Shah","year":"2020"},{"key":"2021071519144018300_bib76","first-page":"640","article-title":"Part-of-speech tagging with antagonistic adversaries","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"S\u00f8gaard","year":"2013"},{"key":"2021071519144018300_bib77","first-page":"138","article-title":"Results of the WNUT16 named entity recognition shared task","volume-title":"Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)","author":"Strauss","year":"2016"},{"key":"2021071519144018300_bib78","article-title":"Intriguing properties of neural networks","author":"Szegedy","year":"2013","journal-title":"arXiv preprint arXiv:1312.6199"},{"key":"2021071519144018300_bib79","doi-asserted-by":"publisher","first-page":"809","DOI":"10.18653\/v1\/N18-1074","article-title":"Fever: A large-scale dataset for fact extraction and verification","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Thorne","year":"2018"},{"key":"2021071519144018300_bib80","doi-asserted-by":"publisher","DOI":"10.3115\/1118853.1118877","article-title":"Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition","volume-title":"COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)","author":"Tjong Kim Sang","year":"2002"},{"key":"2021071519144018300_bib81","doi-asserted-by":"publisher","first-page":"142","DOI":"10.3115\/1119176.1119195","article-title":"Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition","volume-title":"Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4","author":"Tjong Kim Sang","year":"2003"},{"key":"2021071519144018300_bib82","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58323-1_11","article-title":"Finest BERT and crosloengual BERT: Less is more in multilingual models","author":"Ul\u010dar","year":"2020","journal-title":"arXiv preprint arXiv:2006.07890"},{"key":"2021071519144018300_bib83","article-title":"Mind the trade-off: Debiasing NLU models without degrading the in-distribution performance","author":"Utama","year":"2020","journal-title":"arXiv preprint arXiv:2005.00315"},{"key":"2021071519144018300_bib84","doi-asserted-by":"crossref","first-page":"7597","DOI":"10.18653\/v1\/2020.emnlp-main.613","article-title":"Towards debiasing NLU models from unknown biases","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Utama","year":"2020"},{"key":"2021071519144018300_bib85","first-page":"5998","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2021071519144018300_bib86","article-title":"Multilingual is not enough: BERT for Finnish","author":"Virtanen","year":"2019","journal-title":"arXiv preprint arXiv:1912.07076"},{"key":"2021071519144018300_bib87","doi-asserted-by":"publisher","first-page":"353","DOI":"10.18653\/v1\/W18-5446","article-title":"Glue: A multi-task benchmark and analysis platform for natural language understanding","volume-title":"Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Wang","year":"2018"},{"key":"2021071519144018300_bib88","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1101","article-title":"A broad-coverage challenge corpus for sentence understanding throughv inference","author":"Williams","year":"2017","journal-title":"arXiv preprint arXiv:1704.05426"},{"key":"2021071519144018300_bib89","doi-asserted-by":"publisher","first-page":"2246","DOI":"10.18653\/v1\/2020.acl-main.204","article-title":"DeeBERT: Dynamic early exiting for accelerating BERT inference","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Ji","year":"2020"},{"key":"2021071519144018300_bib90","article-title":"Robust natural language inference models with example forgetting","author":"Yaghoobzadeh","year":"2019","journal-title":"arXiv preprint arXiv:1911.03861"},{"key":"2021071519144018300_bib91","article-title":"Named entity recognition as dependency parsing","author":"Juntao","year":"2020","journal-title":"arXiv preprint arXiv:2005.07150"},{"key":"2021071519144018300_bib92","doi-asserted-by":"publisher","first-page":"93","DOI":"10.18653\/v1\/D18-1009","article-title":"SWAG: A large-scale adversarial dataset for grounded commonsense inference","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Zellers","year":"2018"},{"key":"2021071519144018300_bib93","doi-asserted-by":"publisher","first-page":"7270","DOI":"10.18653\/v1\/2020.emnlp-main.590","article-title":"Counterfactual generator: A weakly-supervised method for named entity recognition","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Zeng","year":"2020"},{"key":"2021071519144018300_bib94","first-page":"1298","article-title":"PAWS: Paraphrase adversaries from word scrambling","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Zhang","year":"2019"},{"key":"2021071519144018300_bib95","doi-asserted-by":"publisher","first-page":"357","DOI":"10.18653\/v1\/D19-1034","article-title":"A boundary-aware neural model for nested named entity recognition","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Zheng","year":"2019"},{"key":"2021071519144018300_bib96","doi-asserted-by":"crossref","first-page":"3461","DOI":"10.18653\/v1\/P19-1336","article-title":"Dual adversarial neural transfer for low-resource named entity recognition","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zhou","year":"2019"},{"key":"2021071519144018300_bib97","article-title":"Freelb: Enhanced adversarial training for language understanding","author":"Zhu","year":"2019","journal-title":"arXiv preprint arXiv:1909.11764"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00386\/1929691\/tacl_a_00386.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00386\/1929691\/tacl_a_00386.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,9]],"date-time":"2023-01-09T19:32:04Z","timestamp":1673292724000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00386\/102846\/Context-aware-Adversarial-Training-for-Name"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":97,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00386","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}