{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T10:05:44Z","timestamp":1773655544070,"version":"3.50.1"},"reference-count":55,"publisher":"MIT Press - Journals","license":[{"start":{"date-parts":[[2021,9,10]],"date-time":"2021-09-10T00:00:00Z","timestamp":1631232000000},"content-version":"vor","delay-in-days":252,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,9,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens. Morphologically rich languages (MRLs) pose a challenge to this basic formulation, as the boundaries of named entities do not necessarily coincide with token boundaries, rather, they respect morphological boundaries. To address NER in MRLs we then need to answer two fundamental questions, namely, what are the basic units to be labeled, and how can these units be detected and classified in realistic settings (i.e., where no gold morphology is available). We empirically investigate these questions on a novel NER benchmark, with parallel token- level and morpheme-level NER annotations, which we develop for Modern Hebrew, a morphologically rich-and-ambiguous language. Our results show that explicitly modeling morphological boundaries leads to improved NER performance, and that a novel hybrid architecture, in which NER precedes and prunes morphological decomposition, greatly outperforms the standard pipeline, where morphological decomposition strictly precedes NER, setting a new performance bar for both Hebrew NER and Hebrew morphological decomposition tasks.<\/jats:p>","DOI":"10.1162\/tacl_a_00404","type":"journal-article","created":{"date-parts":[[2021,9,20]],"date-time":"2021-09-20T19:23:48Z","timestamp":1632165828000},"page":"909-928","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":9,"title":["Neural Modeling for Named Entities and Morphology (NEMO2)"],"prefix":"10.1162","volume":"9","author":[{"given":"Dan","family":"Bareket","sequence":"first","affiliation":[{"name":"Bar Ilan University, Ramat-Gan, Israel"},{"name":"Open Media and Information Lab (OMILab), The Open University of Israel, Israel. dbareket@gmail.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Reut","family":"Tsarfaty","sequence":"additional","affiliation":[{"name":"Bar Ilan University, Ramat-Gan, Israel. reut.tsarfaty@biu.ac.il"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2021,9,8]]},"reference":[{"key":"2021091013024063000_bib1","first-page":"29","article-title":"Agile corpus annotation in practice: An overview of manual and automatic annotation of CVs","volume-title":"Proceedings of the Fourth Linguistic Annotation Workshop","author":"Alex","year":"2010"},{"key":"2021091013024063000_bib2","doi-asserted-by":"crossref","unstructured":"Naama\n              Ben-Mordecai\n            \n          . 2005. Hebrew Named Entity Recognition. Master\u2019s thesis, Department of Computer Science, Ben-Gurion University. https:\/\/doi.org\/10.1007\/978-3-540-70939-8_13","DOI":"10.1007\/978-3-540-70939-8_13"},{"key":"2021091013024063000_bib3","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1007\/978-3-540-70939-8_13","article-title":"Anersys: An Arabic named entity recognition system based on maximum entropy","volume-title":"Computational Linguistics and Intelligent Text Processing","author":"Benajiba","year":"2007"},{"key":"2021091013024063000_bib4","first-page":"2524","article-title":"NoSta-D named entity annotation for German: Guidelines and dataset","volume-title":"Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014)","author":"Benikova","year":"2014"},{"key":"2021091013024063000_bib5","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching word vectors with subword information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021091013024063000_bib6","unstructured":"N.\n              Chinchor\n            , E.Brown, L.Ferro, and P.Robinson. 1999. Named entity recognition task definition. The MITRE Corporation and SAIC."},{"key":"2021091013024063000_bib7","article-title":"Named entity recognition with bidirectional LSTM-CNNS","author":"Chiu","year":"2015","journal-title":"CoRR"},{"key":"2021091013024063000_bib8","article-title":"The Leipzig glossing rules: Conventions for interlinear morpheme- by-morpheme glosses","author":"Comrie","year":"2008","journal-title":"Department of Linguistics of the Max Planck Institute for Evolutionary Anthropology & the Department of Linguistics of the University of Leipzig"},{"key":"2021091013024063000_bib9","doi-asserted-by":"publisher","first-page":"974","DOI":"10.18653\/v1\/D19-1090","article-title":"Don\u2019t forget the long tail! A comprehensive analysis of morphological generalization in bilingual lexicon induction","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Czarnowska","year":"2019"},{"key":"2021091013024063000_bib10","first-page":"1558","article-title":"Named entity recognition using cross-lingual resources: Arabic as an example","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Darwish","year":"2013"},{"key":"2021091013024063000_bib11","doi-asserted-by":"publisher","first-page":"140","DOI":"10.18653\/v1\/W17-4418","article-title":"Results of the WNUT2017 shared task on novel and emerging entity recognition","volume-title":"Proceedings of the 3rd Workshop on Noisy User-generated Text","author":"Derczynski","year":"2017"},{"key":"2021091013024063000_bib12","doi-asserted-by":"publisher","first-page":"141","DOI":"10.3115\/1699510.1699529","article-title":"Nested named entity recognition","volume-title":"Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1","author":"Finkel","year":"2009"},{"key":"2021091013024063000_bib13","doi-asserted-by":"publisher","first-page":"142","DOI":"10.3115\/1698381.1698406","article-title":"Towards a methodology for named entities annotation","volume-title":"Proceedings of the Third Linguistic Annotation Workshop (LAW III)","author":"Fort","year":"2009"},{"key":"2021091013024063000_bib14","unstructured":"Yoav\n              Goldberg\n            \n          . 2014. Hebrew Wikipedia dependency parsed corpus, v.1.0."},{"key":"2021091013024063000_bib15","first-page":"371","article-title":"A single generative model for joint morphological segmentation and syntactic parsing","volume-title":"Proceedings of ACL-08: HLT","author":"Goldberg","year":"2008"},{"key":"2021091013024063000_bib16","first-page":"394","article-title":"Better Arabic parsing: Baselines, evaluations, and analysis","volume-title":"Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)","author":"Green","year":"2010"},{"key":"2021091013024063000_bib17","article-title":"Improving named entity recognition by jointly learning to disambiguate morphological tags","author":"G\u00fcng\u00f6r","year":"2018","journal-title":"CoRR"},{"key":"2021091013024063000_bib18","doi-asserted-by":"publisher","first-page":"573","DOI":"10.3115\/1219840.1219911","article-title":"Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop","volume-title":"Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL\u201905)","author":"Habash","year":"2005"},{"key":"2021091013024063000_bib19","article-title":"Bidirectional LSTM-CRF models for sequence tagging","author":"Huang","year":"2015","journal-title":"CoRR"},{"key":"2021091013024063000_bib20","doi-asserted-by":"crossref","first-page":"961","DOI":"10.18653\/v1\/D16-1097","article-title":"Neural morphological analysis: Encoding-decoding canonical segments","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing","author":"Kann","year":"2016"},{"key":"2021091013024063000_bib21","first-page":"2736","article-title":"Neural semi-Markov conditional random fields for robust character-based part-of-speech tagging","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Kemos","year":"2019"},{"key":"2021091013024063000_bib22","doi-asserted-by":"publisher","first-page":"204","DOI":"10.18653\/v1\/2020.sigmorphon-1.24","article-title":"Getting the ##life out of living: How adequate are word- pieces for modelling complex morphology?","volume-title":"Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology","author":"Klein","year":"2020"},{"key":"2021091013024063000_bib23","article-title":"Segmental recurrent neural networks","author":"Kong","year":"2015","journal-title":"arXiv preprint arXiv:1511.06018"},{"key":"2021091013024063000_bib24","article-title":"A tweet dataset annotated for named entity recognition and stance detection","author":"K\u00fc\u00e7\u00fck","year":"2019","journal-title":"CoRR"},{"key":"2021091013024063000_bib25","article-title":"Neural architectures for named entity recognition","author":"Lample","year":"2016","journal-title":"CoRR"},{"key":"2021091013024063000_bib26","unstructured":"LDC\n          . 2008. ACE (automatic content extraction) english annotation guidelines for entities version 6.6."},{"key":"2021091013024063000_bib27","doi-asserted-by":"publisher","first-page":"3219","DOI":"10.18653\/v1\/D18-1360","article-title":"Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Yi","year":"2018"},{"key":"2021091013024063000_bib28","article-title":"End-to- end sequence labeling via bi-directional LSTM-CNNS-CRF","author":"Ma","year":"2016","journal-title":"CoRR"},{"key":"2021091013024063000_bib29","first-page":"162","article-title":"Recall-oriented learning of named entities in Arabic Wikipedia","volume-title":"Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics","author":"Mohit","year":"2012"},{"key":"2021091013024063000_bib30","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1162\/tacl_a_00253","article-title":"Joint transition-based models for morpho-syntactic parsing: Parsing strategies for MRLs and a case study from modern Hebrew","volume":"7","author":"More","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021091013024063000_bib31","first-page":"915","article-title":"The CoNLL 2007 shared task on dependency parsing","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)","author":"Nivre","year":"2007"},{"key":"2021091013024063000_bib32","doi-asserted-by":"publisher","first-page":"532","DOI":"10.3115\/v1\/D14-1162","article-title":"GloVe: Global vectors for word representation","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Pennington","year":"2014"},{"key":"2021091013024063000_bib33","doi-asserted-by":"publisher","first-page":"742","DOI":"10.3115\/v1\/E14-1078","article-title":"Learning part-of-speech taggers with inter-annotator agreement loss","volume-title":"Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics","author":"Plank","year":"2014"},{"key":"2021091013024063000_bib34","first-page":"1","article-title":"CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes","volume-title":"Joint Conference on EMNLP and CoNLL - Shared Task","author":"Pradhan","year":"2012"},{"issue":"1","key":"2021091013024063000_bib35","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1186\/1471-2105-8-50","article-title":"Bioinfer: A corpus for information extraction in the biomedical domain","volume":"8","author":"Pyysalo","year":"2007","journal-title":"BMC bioinformatics"},{"key":"2021091013024063000_bib36","doi-asserted-by":"publisher","first-page":"147","DOI":"10.3115\/1596374.1596399","article-title":"Design challenges and misconceptions in named entity recognition","volume-title":"Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)","author":"Ratinov","year":"2009"},{"key":"2021091013024063000_bib37","article-title":"Optimal hyperparameters for deep LSTM-networks for sequence labeling tasks","author":"Reimers","year":"2017","journal-title":"CoRR"},{"key":"2021091013024063000_bib38","first-page":"146","article-title":"Overview of the SPMRL 2013 shared task: A cross-framework evaluation of parsing morphologically rich languages","volume-title":"Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages","author":"Seddah","year":"2013"},{"key":"2021091013024063000_bib39","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1162\/tacl_a_00144","article-title":"A graph-based lattice dependency parser for joint morphological segmentation and syntactic analysis","volume":"3","author":"Seeker","year":"2015","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021091013024063000_bib40","doi-asserted-by":"publisher","first-page":"4368","DOI":"10.18653\/v1\/2020.findings-emnlp.391","article-title":"A pointer network architecture for joint morphological segmentation and tagging","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Seker","year":"2020"},{"key":"2021091013024063000_bib41","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1162\/tacl_a_00033","article-title":"Universal word segmentation: Implementation and interpretation","volume":"6","author":"Shao","year":"2018","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021091013024063000_bib42","first-page":"173","article-title":"Character-based joint segmentation and POS tagging for Chinese using bidirectional RNN-CRF","volume-title":"Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Shao","year":"2017"},{"issue":"2","key":"2021091013024063000_bib43","first-page":"347","article-title":"Building a tree- bank of modern Hebrew text","volume":"42","author":"Sima\u2019an","year":"2001","journal-title":"Traitement Automatique des Langues"},{"issue":"2","key":"2021091013024063000_bib44","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1186\/s12938-018-0573-6","article-title":"Comparison of named entity recognition methodologies in biomedical documents","volume":"17","author":"Song","year":"2018","journal-title":"Biomedical Engineering Online"},{"key":"2021091013024063000_bib45","doi-asserted-by":"publisher","first-page":"142","DOI":"10.3115\/1119176.1119195","article-title":"Introduction to the conll-2003 shared task: Language-independent named entity recognition","volume-title":"Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4","author":"Sang","year":"2003"},{"key":"2021091013024063000_bib46","doi-asserted-by":"publisher","first-page":"7396","DOI":"10.18653\/v1\/2020.acl-main.660","article-title":"From SPMRL to NMRL: What did we learn (and unlearn) in a decade of parsing morphologically-rich languages (MRLs)?","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Tsarfaty","year":"2020"},{"key":"2021091013024063000_bib47","doi-asserted-by":"publisher","first-page":"259","DOI":"10.18653\/v1\/D19-3044","article-title":"What\u2019s wrong with Hebrew NLP? And how to make it right","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations","author":"Tsarfaty","year":"2019"},{"key":"2021091013024063000_bib48","first-page":"1","article-title":"Statistical parsing of morphologically rich languages (SPMRL) what, how and whither","volume-title":"Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages","author":"Tsarfaty","year":"2010"},{"issue":"2","key":"2021091013024063000_bib49","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1017\/S135132490200284X","article-title":"A statistical information extraction system for turkish","volume":"9","author":"T\u00fcr","year":"2003","journal-title":"Natural Language Engineering"},{"key":"2021091013024063000_bib50","doi-asserted-by":"publisher","first-page":"2573","DOI":"10.18653\/v1\/D18-1278","article-title":"What do character-level models learn about morphology? The case of dependency parsing","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Vania","year":"2018"},{"key":"2021091013024063000_bib51","article-title":"Ontonotes release 5.0","volume-title":"Linguistic Data Consortium","author":"Weischedel","year":"2013"},{"key":"2021091013024063000_bib52","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-4013","article-title":"Design challenges and misconceptions in neural sequence labeling","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics (COLING)","author":"Yang","year":"2018"},{"key":"2021091013024063000_bib53","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/P18-4013","article-title":"NCRF++: An open-source neural sequence labeling toolkit","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics","author":"Yang","year":"2018"},{"key":"2021091013024063000_bib54","first-page":"1","article-title":"WebAnno: A flexible, Web-based and visually supported system for distributed annotations","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations","author":"Yimam","year":"2013"},{"key":"2021091013024063000_bib55","unstructured":"Ziqi\n              Zhang\n            \n          . 2013. Named Entity Recognition: Challenges in Document Annotation, Gazetteer Construction and Disambiguation. Ph.D. thesis, University of Sheffield."}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00404\/1962472\/tacl_a_00404.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00404\/1962472\/tacl_a_00404.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,20]],"date-time":"2021-09-20T19:24:28Z","timestamp":1632165868000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00404\/107206\/Neural-Modeling-for-Named-Entities-and-Morphology"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":55,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00404","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}