{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T03:29:54Z","timestamp":1768706994531,"version":"3.49.0"},"reference-count":36,"publisher":"MIT Press - Journals","issue":"1","license":[{"start":{"date-parts":[[2021,3,6]],"date-time":"2021-03-06T00:00:00Z","timestamp":1614988800000},"content-version":"vor","delay-in-days":5,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,4,21]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Named entity recognition systems achieve remarkable performance on domains such as English news. It is natural to ask: What are these models actually learning to achieve this? Are they merely memorizing the names themselves? Or are they capable of interpreting the text and inferring the correct entity type from the linguistic context? We examine these questions by contrasting the performance of several variants of architectures for named entity recognition, with some provided only representations of the context as features. We experiment with GloVe-based BiLSTM-CRF as well as BERT. We find that context does influence predictions, but the main factor driving high performance is learning the named tokens themselves. Furthermore, we find that BERT is not always better at recognizing predictive contexts compared to a BiLSTM-CRF model. We enlist human annotators to evaluate the feasibility of inferring entity types from context alone and find that humans are also mostly unable to infer entity types for the majority of examples on which the context-only system made errors. However, there is room for improvement: A system should be able to recognize any named entity in a predictive context correctly and our experiments indicate that current systems may be improved by such capability. Our human study also revealed that systems and humans do not always learn the same contextual clues, and context-only systems are sometimes correct even when humans fail to recognize the entity type from the context. Finally, we find that one issue contributing to model errors is the use of \u201centangled\u201d representations that encode both contextual and local token information into a single vector, which can obscure clues. Our results suggest that designing models that explicitly operate over representations of local inputs and context, respectively, may in some cases improve performance. In light of these and related findings, we highlight directions for future work.<\/jats:p>","DOI":"10.1162\/coli_a_00397","type":"journal-article","created":{"date-parts":[[2021,3,5]],"date-time":"2021-03-05T18:59:47Z","timestamp":1614970787000},"page":"117-140","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":17,"title":["Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve"],"prefix":"10.1162","volume":"47","author":[{"given":"Oshin","family":"Agarwal","sequence":"first","affiliation":[{"name":"University of Pennsylvania, Department of Computer and Information Science. oagarwal@seas.upenn.edu"}]},{"given":"Yinfei","family":"Yang","sequence":"additional","affiliation":[{"name":"Google Research. yinfeiy@google.com"}]},{"given":"Byron C.","family":"Wallace","sequence":"additional","affiliation":[{"name":"Northeastern University, Khoury College of Computer Sciences. b.wallace@northeastern.edu"}]},{"given":"Ani","family":"Nenkova","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, Department of Computer and Information Science. nenkova@seas.upenn.edu"}]}],"member":"281","published-online":{"date-parts":[[2021,4,21]]},"reference":[{"key":"2021042218045099600_bib1","article-title":"Entity-switched data sets: An approach to auditing the in-domain robustness of named entity recognition models","author":"Agarwal","year":"2020"},{"key":"2021042218045099600_bib2","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1145\/336597.336644","article-title":"Snowball: Extracting relations from large plain-text collections","volume-title":"Proceedings of the fifth ACM Conference on Digital Libraries","author":"Agichtein","year":"2000"},{"key":"2021042218045099600_bib3","first-page":"1312","article-title":"Effective selectional restrictions for unsupervised relation extraction","volume-title":"Proceedings of the Sixth International Joint Conference on Natural Language Processing","author":"Akbik","year":"2013"},{"key":"2021042218045099600_bib4","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.csl.2017.01.012","article-title":"Generalisation in named entity recognition: A quantitative analysis","volume":"44","author":"Augenstein","year":"2017","journal-title":"Computer Speech & Language"},{"key":"2021042218045099600_bib5","doi-asserted-by":"crossref","first-page":"10","DOI":"10.3115\/1699765.1699767","article-title":"Named entity recognition in Wikipedia","volume-title":"Proceedings of the 2009 Workshop on The People\u2019s Web Meets NLP: Collaboratively Constructed Semantic Resources (People\u2019s Web)","author":"Balasuriya","year":"2009"},{"key":"2021042218045099600_bib6","first-page":"2670","article-title":"Open information extraction from the web","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence","author":"Banko","year":"2007"},{"issue":"1\u20133","key":"2021042218045099600_bib7","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1023\/A:1007558221122","article-title":"An algorithm that learns what\u2019s in a name","volume":"34","author":"Bikel","year":"1999","journal-title":"Machine Learning"},{"issue":"4","key":"2021042218045099600_bib8","first-page":"467","article-title":"Class-based n-gram models of natural language","volume":"18","author":"Brown","year":"1992","journal-title":"Computational Linguistics"},{"key":"2021042218045099600_bib9","first-page":"20","article-title":"Modeling violations of selectional restrictions with distributional semantics","volume-title":"Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing","author":"Chersoni","year":"2018"},{"key":"2021042218045099600_bib10","first-page":"100","article-title":"Unsupervised models for named entity classification","volume-title":"1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora","author":"Collins","year":"1999"},{"issue":"Aug","key":"2021042218045099600_bib11","first-page":"2493","article-title":"Natural language processing (almost) from scratch","volume":"12","author":"Collobert","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"2021042218045099600_bib12","doi-asserted-by":"crossref","DOI":"10.3115\/1118853.1118860","article-title":"Language independent NER using a unified model of internal and contextual evidence","volume-title":"COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)","author":"Cucerzan","year":"2002"},{"key":"2021042218045099600_bib13","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"issue":"1","key":"2021042218045099600_bib14","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.artint.2005.03.001","article-title":"Unsupervised named-entity extraction from the web: An experimental study","volume":"165","author":"Etzioni","year":"2005","journal-title":"Artificial Intelligence"},{"key":"2021042218045099600_bib15","first-page":"769","article-title":"An experiment on learning appropriate selectional restrictions from a parsed corpus","volume-title":"COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics","author":"Framis","year":"1994"},{"key":"2021042218045099600_bib16","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v34i05.6276","article-title":"Rethinking generalization of neural models: A named entity recognition case study","author":"Fu","year":"2020"},{"key":"2021042218045099600_bib17","doi-asserted-by":"crossref","first-page":"2047","DOI":"10.1109\/IJCNN.2005.1556215","article-title":"Framewise phoneme classification with bidirectional LSTM networks","volume-title":"Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005","author":"Graves","year":"2005"},{"key":"2021042218045099600_bib18","first-page":"466","article-title":"Message Understanding Conference-6: A brief history","volume-title":"COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics","author":"Grishman","year":"1996"},{"issue":"8","key":"2021042218045099600_bib19","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2021042218045099600_bib20","article-title":"Bidirectional LSTM-CRF models for sequence tagging","author":"Huang","year":"2015","journal-title":"arXiv preprint arXiv: 1508.01991"},{"key":"2021042218045099600_bib21","first-page":"698","article-title":"Exploiting Wikipedia as external knowledge for named entity recognition","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)","author":"Kazama","year":"2007"},{"key":"2021042218045099600_bib22","first-page":"282","article-title":"Conditional random fields: Probabilistic models for segmenting and labeling sequence data","volume-title":"Proceedings of the Eighteenth International Conference on Machine Learning, ICML \u201901","author":"Lafferty","year":"2001"},{"key":"2021042218045099600_bib23","first-page":"260","article-title":"Neural architectures for named entity recognition","volume-title":"Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Lample","year":"2016"},{"key":"2021042218045099600_bib24","doi-asserted-by":"crossref","first-page":"58","DOI":"10.3115\/1621829.1621837","article-title":"A simple semi-supervised algorithm for named entity recognition","volume-title":"Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, SemiSupLearn \u201909","author":"Liao","year":"2009"},{"key":"2021042218045099600_bib25","first-page":"32","article-title":"Internal and external evidence in the identification and semantic categorization of proper names","volume-title":"Acquisition of Lexical Knowledge from Text","author":"McDonald","year":"1993"},{"key":"2021042218045099600_bib26","first-page":"337","article-title":"Name tagging with word clusters and discriminative training","volume-title":"Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004","author":"Miller","year":"2004"},{"key":"2021042218045099600_bib27","first-page":"266","article-title":"Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity","volume-title":"Conference of the Canadian Society for Computational Studies of Intelligence","author":"Nadeau","year":"2006"},{"key":"2021042218045099600_bib28","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.3115\/v1\/D14-1162","article-title":"GloVe: Global vectors for word representation","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Pennington","year":"2014"},{"key":"2021042218045099600_bib29","first-page":"2227","article-title":"Deep contextualized word representations","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Peters","year":"2018"},{"key":"2021042218045099600_bib30","first-page":"11","article-title":"OntoNotes: The 90% solution","volume-title":"Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts","author":"Pradhan","year":"2009"},{"key":"2021042218045099600_bib31","doi-asserted-by":"crossref","first-page":"147","DOI":"10.3115\/1596374.1596399","article-title":"Design challenges and misconceptions in named entity recognition","volume-title":"Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)","author":"Ratinov","year":"2009"},{"key":"2021042218045099600_bib32","first-page":"474","article-title":"Learning dictionaries for information extraction by multi-level bootstrapping","volume-title":"Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, AAAI \u201999\/IAAI \u201999","author":"Riloff","year":"1999"},{"key":"2021042218045099600_bib33","doi-asserted-by":"crossref","first-page":"141","DOI":"10.3115\/1596276.1596303","article-title":"A context pattern induction method for named entity extraction","volume-title":"Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)","author":"Talukdar","year":"2006"},{"key":"2021042218045099600_bib34","doi-asserted-by":"crossref","first-page":"142","DOI":"10.3115\/1119176.1119195","article-title":"Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition","volume-title":"Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003","author":"Tjong Kim Sang","year":"2003"},{"key":"2021042218045099600_bib35","doi-asserted-by":"crossref","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"2021042218045099600_bib36","first-page":"2145","article-title":"A survey on recent advances in named entity recognition from deep learning models","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics","author":"Yadav","year":"2018"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/coli\/article-pdf\/47\/1\/117\/1911479\/coli_a_00397.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/coli\/article-pdf\/47\/1\/117\/1911479\/coli_a_00397.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,4,22]],"date-time":"2021-04-22T23:09:21Z","timestamp":1619132961000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/47\/1\/117\/97335\/Interpretability-Analysis-for-Named-Entity"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,4,21]]},"published-print":{"date-parts":[[2021,4,21]]}},"URL":"https:\/\/doi.org\/10.1162\/coli_a_00397","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,3]]},"published":{"date-parts":[[2021,3]]}}}