{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T22:13:25Z","timestamp":1780352005718,"version":"3.54.1"},"reference-count":131,"publisher":"MIT Press - Journals","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computational Linguistics"],"published-print":{"date-parts":[[2020,3]]},"abstract":"<jats:p> Despite the recent success of deep neural networks in natural language processing and other spheres of artificial intelligence, their interpretability remains a challenge. We analyze the representations learned by neural machine translation (NMT) models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word structure captured within the learned representations, which is an important aspect in translating morphologically rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or subword unit) impact the linguistic properties captured by the underlying representations? (iii) Do the encoder and decoder learn differently and independently? (iv) Do the representations learned by multilingual NMT models capture the same amount of linguistic information as their bilingual counterparts? Our data-driven, quantitative evaluation illuminates important aspects in NMT models and their ability to capture various linguistic phenomena. We show that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information. Notable findings include the following observations: (i) Word morphology and part-of-speech information are captured at the lower layers of the model; (ii) In contrast, lexical semantics or non-local syntactic and semantic dependencies are better represented at the higher layers of the model; (iii) Representations learned using characters are more informed about word-morphology compared to those learned using subword units; and (iv) Representations learned by multilingual models are richer compared to bilingual models. <\/jats:p>","DOI":"10.1162\/coli_a_00367","type":"journal-article","created":{"date-parts":[[2020,1,2]],"date-time":"2020-01-02T18:51:49Z","timestamp":1577991109000},"page":"1-52","source":"Crossref","is-referenced-by-count":28,"title":["On the Linguistic Representational Power of Neural Machine Translation Models"],"prefix":"10.1162","volume":"46","author":[{"given":"Yonatan","family":"Belinkov","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Harvard University, John F. Paulson, School of Engineering and Applied Sciences."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nadir","family":"Durrani","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute, HBKU Research Complex."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fahim","family":"Dalvi","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute, HBKU Research Complex."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hassan","family":"Sajjad","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute, HBKU Research Complex."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"James","family":"Glass","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Computer Science and Artificial, Intelligence Laboratory."}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","reference":[{"key":"bib1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-3003"},{"key":"bib2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2039"},{"key":"bib3","volume-title":"International Conference on Learning Representations (ICLR)","author":"Adi Yossi","year":"2017"},{"key":"bib4","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2021"},{"key":"bib5","author":"Arivazhagan Naveen","year":"2019","journal-title":"arXiv preprint arXiv:1903.07091"},{"key":"bib6","author":"Bahdanau Dzmitry","year":"2014","journal-title":"arXiv preprint arXiv:1409.0473"},{"issue":"2","key":"bib7","volume":"25","author":"Bangalore Srinivas","year":"1999","journal-title":"Computational Linguistics"},{"key":"bib8","volume-title":"International Conference on Learning Representations (ICLR)","author":"Bau D. Anthony","year":"2019"},{"key":"bib9","volume-title":"International Conference on Learning Representations (ICLR)","author":"Bau D. Anthony","year":"2019"},{"key":"bib10","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1118"},{"key":"bib11","unstructured":"Belinkov, Yonatan. 2018. On Internal Language Representations in Deep Learning: An Analysis of Machine Translation and Speech Recognition. Ph.D. thesis, Massachusetts Institute of Technology."},{"key":"bib12","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1080"},{"key":"bib13","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00254"},{"key":"bib14","volume-title":"Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP)","author":"Belinkov Yonatan","year":"2017"},{"key":"bib15","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1025"},{"key":"bib16","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1313"},{"key":"bib17","first-page":"3531","volume-title":"Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers","author":"Bjerva Johannes","year":"2016"},{"key":"bib18","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4717"},{"key":"bib19","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2301"},{"key":"bib20","author":"Bowman Samuel R.","year":"2018","journal-title":"ArXiv:1812.10860"},{"key":"bib21","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2308"},{"key":"bib22","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4705"},{"key":"bib23","volume-title":"Proceedings of the 13th International Workshop on Spoken Language Translation (IWSLT 2016)","author":"Cettolo Mauro","year":"2016"},{"key":"bib24","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1304"},{"key":"bib25","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1008"},{"key":"bib26","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219873"},{"key":"bib27","doi-asserted-by":"publisher","DOI":"10.3115\/1620754.1620786"},{"key":"bib28","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1160"},{"key":"bib29","unstructured":"Cinkov\u00e1, Silvie, Jan Haji\u010d, Marie Mikulov\u00e1, Lucie Mladov\u00e1, Anja Nedolu\u017eko, Petr Pajas, Jarmila Panevov\u00e1, Ji\u0159\u00ed Semeck\u1ef3, Jana \u0160indlerov\u00e1, Josef Toman, Zde\u0148ka Ure\u0161ov\u00e1, and Zden\u011bk \u017eabokrtsk\u00fd. 2004. Annotation of English on the tectogrammatical level: Reference book. Technical report, \u00daFAL\/CKL, Prague, Czech Republic."},{"issue":"92","key":"bib30","first-page":"85","author":"Cinkov\u00e1 Silvie","year":"2009","journal-title":"The Prague Bulletin of Mathematical Linguistics"},{"key":"bib31","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220396"},{"key":"bib32","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1198"},{"key":"bib33","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-2058"},{"key":"bib34","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016309"},{"key":"bib35","volume-title":"Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP)","author":"Dalvi Fahim","year":"2017"},{"key":"bib36","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019851"},{"key":"bib37","volume-title":"Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)","author":"Devlin Jacob","year":"2019"},{"key":"bib38","first-page":"465","volume-title":"Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics","author":"Durrani Nadir","year":"2010"},{"key":"bib39","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/E14-4029"},{"key":"bib40","volume-title":"Proceedings of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT\u201911)","author":"Durrani Nadir","year":"2011"},{"key":"bib41","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114844"},{"key":"bib42","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1078"},{"key":"bib43","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220296"},{"key":"bib44","doi-asserted-by":"publisher","DOI":"10.3115\/1613715.1613824"},{"key":"bib45","doi-asserted-by":"publisher","DOI":"10.1145\/3110025.3110083"},{"key":"bib46","first-page":"1243","volume-title":"Proceedings of the 34th International Conference on Machine Learning","author":"Gehring Jonas","year":"2017"},{"key":"bib47","volume-title":"International Conference on Learning Representations","author":"Gu Jiatao","year":"2018"},{"key":"bib48","volume-title":"Proceedings of the Workshop on Statistical Machine Translation (WMT\u201911)","author":"Heafield Kenneth","year":"2011"},{"key":"bib49","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"bib50","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220239"},{"key":"bib51","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2007.33.3.355"},{"key":"bib52","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2059"},{"key":"bib53","author":"Hupkes Dieuwke","year":"2017","journal-title":"arXiv preprint arXiv:1711.10203"},{"key":"bib54","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00065"},{"key":"bib55","first-page":"1359","volume-title":"Proceedings of COLING 2012","author":"Jones Bevan","year":"2012"},{"key":"bib56","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00300"},{"key":"bib57","author":"Karpathy Andrej","year":"2015","journal-title":"arXiv preprint arXiv:1506.02078"},{"key":"bib59","author":"Kim Yoon","year":"2015","journal-title":"arXiv preprint arXiv:1508.06615"},{"key":"bib60","doi-asserted-by":"publisher","DOI":"10.3115\/997939.997976"},{"key":"bib61","volume-title":"arXiv preprint arXiv:1412.6980","author":"Kingma Diederik","year":"2014"},{"key":"bib62","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning","author":"Koehn Philipp","year":"2007"},{"key":"bib63","doi-asserted-by":"publisher","DOI":"10.3115\/1557769.1557821"},{"key":"bib64","doi-asserted-by":"publisher","DOI":"10.3115\/1067807.1067833"},{"key":"bib65","doi-asserted-by":"publisher","DOI":"10.21236\/ADA461156"},{"key":"bib66","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1246"},{"key":"bib67","first-page":"77","volume-title":"International Workshop on Spoken Language Translation (IWSLT)","author":"Komachi Mamoru","year":"2006"},{"key":"bib68","volume-title":"Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)","author":"Lakretz Yair","year":"2019"},{"key":"bib69","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00067"},{"key":"bib70","first-page":"540","volume-title":"Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Li Junhui","year":"2013"},{"key":"bib71","volume":"1511","author":"Ling Wang","year":"2015","journal-title":"CoRR"},{"key":"bib72","volume-title":"ICML Workshop on Human Interpretability in Machine Learning (WHI)","author":"Lipton Zachary C","year":"2016"},{"key":"bib73","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1121"},{"key":"bib74","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1112"},{"key":"bib75","author":"Luong Minh-Thang","year":"2016","journal-title":"arXiv preprint arXiv:1604.00788"},{"key":"bib76","first-page":"148","volume-title":"Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing","author":"Luong Minh-Thang","year":"2010"},{"key":"bib77","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5444"},{"key":"bib78","first-page":"6294","volume-title":"Advances in Neural Information Processing Systems 30","author":"McCann Bryan","year":"2017"},{"key":"bib79","doi-asserted-by":"publisher","DOI":"10.3115\/1596276.1596317"},{"key":"bib80","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00039"},{"key":"bib81","volume-title":"Dependency Syntax: Theory and Practice","author":"Mel\u2019\u010duk Igor Aleksandrovi\u010d","year":"1988"},{"key":"bib82","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4707"},{"key":"bib83","first-page":"301","volume-title":"Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Nakov Preslav","year":"2012"},{"key":"bib84","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-2024"},{"key":"bib85","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/E14-2005"},{"key":"bib86","unstructured":"Nivre, Joakim. 2005. Dependency Grammar and Dependency Parsing., Technical Report MSI 015133, V\u00e4xj\u00f6 University, School of Mathematics and Systems Engineering."},{"key":"bib88","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S15-2153"},{"key":"bib89","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/S14-2008"},{"key":"bib91","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002"},{"key":"bib92","first-page":"1094","volume-title":"Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC\u201914)","author":"Pasha Arfath","year":"2014"},{"key":"bib93","author":"Peters Matthew","year":"2019","journal-title":"arXiv preprint arXiv:1903.05987"},{"key":"bib94","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"bib95","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1179"},{"key":"bib96","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4737"},{"key":"bib97","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1079"},{"key":"bib98","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1140"},{"key":"bib99","unstructured":"Radford, Alec, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Technical report, OpenAI. https:\/\/s3-us-west-2.amazonaws.com\/openai-assets\/research-covers\/language-unsupervised\/language_understanding_paper.pdf."},{"key":"bib100","unstructured":"Radford, Alec, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. Technical report, OpenAI. https:\/\/d4mucfpksywv.cloudfront.net\/better-language-models\/language_models_are_unsupervised_multitask_learners.pdf"},{"key":"bib101","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5431"},{"key":"bib102","author":"Renduchintala Adithya","year":"2018","journal-title":"arXiv preprint arXiv:1809.02223"},{"key":"bib103","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4702"},{"key":"bib104","doi-asserted-by":"publisher","DOI":"10.3115\/991886.991915"},{"key":"bib105","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2060"},{"key":"bib106","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4739"},{"key":"bib107","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"key":"bib108","author":"Shapiro Pamela","year":"2018","journal-title":"arXiv preprint arXiv:1809.01301"},{"key":"bib109","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00015"},{"key":"bib110","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1159"},{"key":"bib111","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/E14-2006"},{"key":"bib112","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-2049"},{"key":"bib113","volume-title":"Combinatory Categorial Grammar","author":"Steedman Mark","year":"2011"},{"key":"bib114","first-page":"3104","volume-title":"Advances in Neural Information Processing Systems","author":"Sutskever Ilya","year":"2014"},{"key":"bib115","volume-title":"Proceedings of ICLR","author":"Tenney Ian","year":"2019"},{"key":"bib116","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-1100"},{"key":"bib117","author":"Tran Ke","year":"2018","journal-title":"arXiv preprint arXiv:1803.03585"},{"key":"bib118","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems 30","author":"Vaswani Ashish","year":"2017"},{"key":"bib119","author":"Vylomova Ekaterina","year":"2016","journal-title":"arXiv preprint arXiv:1606.04217"},{"key":"bib120","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2017-877"},{"key":"bib121","first-page":"15","volume":"14","author":"Weaver Warren","year":"1955","journal-title":"Machine Translation of Languages"},{"key":"bib122","doi-asserted-by":"publisher","DOI":"10.2200\/S00716ED1V04Y201604HLT033"},{"key":"bib123","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1065"},{"key":"bib124","first-page":"29","volume-title":"Proceedings of 5th International Joint Conference on Natural Language Processing","author":"Wu Xianchao","year":"2011"},{"key":"bib125","volume":"1609","author":"Wu Yonghui","year":"2016","journal-title":"CoRR"},{"key":"bib126","author":"Wu Yonghui","year":"2016","journal-title":"arXiv preprint arXiv:1609.08144"},{"key":"bib127","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472657"},{"key":"bib128","first-page":"902","volume-title":"Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Xiong Deyi","year":"2012"},{"key":"bib129","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-2041"},{"key":"bib130","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Yamada Kenji","year":"2002"},{"key":"bib131","volume-title":"Proceedings of BlackboxNLP","author":"Zhang Kelly W.","year":"2018"},{"key":"bib132","first-page":"535","author":"Zhang Min","year":"2007","journal-title":"MT-Summit-07"},{"key":"bib133","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00105"},{"key":"bib134","volume-title":"Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)","author":"Ziemski Michal","year":"2016"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/coli_a_00367","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:28:29Z","timestamp":1615584509000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/46\/1\/1-52\/93381"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3]]},"references-count":131,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,3]]}},"alternative-id":["10.1162\/coli_a_00367"],"URL":"https:\/\/doi.org\/10.1162\/coli_a_00367","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3]]}}}