{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T07:52:12Z","timestamp":1761897132364},"reference-count":82,"publisher":"MIT Press - Journals","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computational Linguistics"],"published-print":{"date-parts":[[2020,1]]},"abstract":"<jats:p> Language change across space and time is one of the main concerns in historical linguistics. In this article, we develop tools to assist researchers and domain experts in the study of language evolution. <\/jats:p><jats:p> First, we introduce a method to automatically determine whether two words are cognates. We propose an algorithm for extracting cognates from electronic dictionaries that contain etymological information. Having built a data set of related words, we further develop machine learning methods based on orthographic alignment for identifying cognates. We use aligned subsequences as features for classification algorithms in order to infer rules for linguistic changes undergone by words when entering new languages and to discriminate between cognates and non-cognates. <\/jats:p><jats:p> Second, we extend the method to a finer-grained level, to identify the type of relationship between words. Discriminating between cognates and borrowings provides a deeper insight into the history of a language and allows a better characterization of language relatedness. We show that orthographic features have discriminative power and we analyze the underlying linguistic factors that prove relevant in the classification task. To our knowledge, this is the first attempt of this kind. <\/jats:p><jats:p> Third, we develop a machine learning method for automatically producing related words. We focus on reconstructing proto-words, but we also address two related sub-problems, producing modern word forms and producing cognates. The task of reconstructing proto-words consists of recreating the words in an ancient language from its modern daughter languages. Having modern word forms in multiple Romance languages, we infer the form of their common Latin ancestors. Our approach relies on the regularities that occurred when words entered the modern languages. We leverage information from several modern languages, building an ensemble system for reconstructing proto-words. We apply our method to multiple data sets, showing that our approach improves on previous results, also having the advantage of requiring less input data, which is essential in historical linguistics, where resources are generally scarce. <\/jats:p>","DOI":"10.1162\/coli_a_00361","type":"journal-article","created":{"date-parts":[[2019,10,8]],"date-time":"2019-10-08T14:59:06Z","timestamp":1570546746000},"page":"667-704","source":"Crossref","is-referenced-by-count":4,"title":["Automatic Identification and Production of Related Words for Historical Linguistics"],"prefix":"10.1162","volume":"45","author":[{"given":"Alina Maria","family":"Ciobanu","sequence":"first","affiliation":[{"name":"University of Bucharest, Department of Computer Science, HLT Research Center."}]},{"given":"Liviu P.","family":"Dinu","sequence":"additional","affiliation":[{"name":"University of Bucharest, Department of Computer Science, HLT Research Center"}]}],"member":"281","reference":[{"key":"bib1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511845192"},{"key":"bib2","first-page":"66","volume-title":"Proceedings of the 4th Named Entity Workshop","author":"Ammar Waleed","year":"2012"},{"key":"bib3","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511808852"},{"key":"bib4","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1300397110"},{"key":"bib5","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-968X.2005.00151.x"},{"key":"bib6","first-page":"1937","volume-title":"Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008","author":"Barbu Ana Maria","year":"2008"},{"key":"bib7","first-page":"883","volume-title":"Proceedings of the 6th International Joint Conference on Natural Language Processing, IJCNLP 2013","author":"Beinborn Lisa","year":"2013"},{"key":"bib8","doi-asserted-by":"publisher","DOI":"10.3115\/1620932.1620940"},{"key":"bib9","volume-title":"M\u00e9moire sur les \u00c9lections au Scrutin","author":"de Borda Jean Charles","year":"1781"},{"key":"bib10","doi-asserted-by":"publisher","DOI":"10.3115\/1620754.1620764"},{"key":"bib11","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1204678110"},{"key":"bib12","first-page":"887","volume-title":"Proceedings of EMNLP-CoNLL 2007","author":"Bouchard-C\u00f4t\u00e9 Alexandre","year":"2007"},{"key":"bib13","first-page":"45","volume-title":"Proceedings of the 2nd International Conference on New Methods in Language Processing","author":"Brew Chris","year":"1996"},{"key":"bib14","volume-title":"Spanish Vocabulary: An Etymological Approach","author":"Brodsky David","year":"2009"},{"key":"bib15","first-page":"107","volume-title":"Proceedings of the 6th Text Retrieval Conference, TREC 1997","author":"Buckley Chris","year":"1997"},{"key":"bib16","volume-title":"Historical Linguistics. An Introduction","author":"Campbell Lyle","year":"1998"},{"key":"bib17","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961199"},{"key":"bib18","first-page":"311","volume-title":"The Oxford Handbook of Laboratory Phonology","author":"Chitoran Ioana","year":"2011"},{"key":"bib19","doi-asserted-by":"publisher","DOI":"10.3115\/981574.981575"},{"key":"bib20","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1112"},{"key":"bib21","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-2017"},{"key":"bib22","first-page":"1038","volume-title":"Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014","author":"Ciobanu Alina Maria","year":"2014"},{"key":"bib23","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-2071"},{"key":"bib24","first-page":"1604","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018","author":"Ciobanu Alina Maria","year":"2018"},{"key":"bib25","first-page":"275","volume-title":"Proceedings of ACL 1998, Volume 1","author":"Covington Michael A.","year":"1998"},{"issue":"2","key":"bib26","first-page":"71","volume":"12","author":"Delmestri Antonella","year":"2010","journal-title":"Bucharest Working Papers in Linguistics"},{"key":"bib27","doi-asserted-by":"publisher","DOI":"10.1080\/09296174.2012.659001"},{"key":"bib28","doi-asserted-by":"publisher","DOI":"10.1017\/S1366728910000623"},{"key":"bib29","volume-title":"Dictionar de Cuvinte Recente","author":"Dimitrescu Florica","year":"1997"},{"key":"bib30","first-page":"591","volume-title":"Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers, Part I","author":"Dinu Liviu P.","year":"2017"},{"key":"bib31","doi-asserted-by":"publisher","DOI":"10.1007\/BF02404005"},{"key":"bib32","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/7287.001.0001"},{"key":"bib33","first-page":"42","volume-title":"Proceedings of the 2nd Workshop on Cross Lingual Information Access, CLIA 2008","author":"Ganesh Surya","year":"2008"},{"key":"bib34","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-24769-9_45"},{"key":"bib35","doi-asserted-by":"publisher","DOI":"10.3366\/E1753854809000317"},{"key":"bib36","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(82)90398-9"},{"key":"bib37","doi-asserted-by":"publisher","DOI":"10.1038\/nature02029"},{"key":"bib38","doi-asserted-by":"publisher","DOI":"10.4137\/EBO.S893"},{"key":"bib39","first-page":"1030","volume-title":"Proceedings of ACL 2010","author":"Hall David","year":"2010"},{"key":"bib40","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"bib41","volume-title":"Linguistics and Your Language","author":"Hall Robert Anderson","year":"1960"},{"key":"bib42","doi-asserted-by":"publisher","DOI":"10.1007\/BF02404201"},{"key":"bib43","doi-asserted-by":"publisher","DOI":"10.1075\/bct.46.07heg"},{"key":"bib44","first-page":"191","volume-title":"Proceedings of ICHL 1974","author":"Hewson John","year":"1974"},{"key":"bib45","first-page":"251","volume-title":"Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2005","author":"Inkpen Diana","year":"2005"},{"key":"bib46","author":"J\u00e4ger Gerhard","year":"2018","journal-title":"CoRR"},{"key":"bib47","first-page":"388","volume-title":"Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004","author":"Koehn Philipp","year":"2004"},{"key":"bib48","first-page":"711","volume-title":"Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence","author":"Koehn Philipp","year":"2000"},{"key":"bib49","first-page":"288","volume-title":"Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, NAACL 2000","author":"Kondrak Grzegorz","year":"2000"},{"key":"bib50","unstructured":"Kondrak, Grzegorz. 2002. Algorithms for Language Reconstruction. Ph.D. thesis, University of Toronto."},{"key":"bib51","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24840-8_4"},{"key":"bib52","doi-asserted-by":"publisher","DOI":"10.3115\/1073483.1073499"},{"key":"bib53","first-page":"282","volume-title":"Proceedings of ICML 2001","author":"Lafferty John D.","year":"2001"},{"key":"bib54","first-page":"707","volume":"10","author":"Levenshtein Vladimir I.","year":"1965","journal-title":"Soviet Physics Doklady"},{"key":"bib55","first-page":"117","volume-title":"Proceedings of the EACL 2012 Joint Workshop of LINGVIS and UNCLH","author":"List Johann Mattis","year":"2012"},{"key":"bib56","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0170046"},{"key":"bib58","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1166"},{"key":"bib59","doi-asserted-by":"publisher","DOI":"10.3115\/1706543.1706551"},{"key":"bib61","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-968X.2005.00148.x"},{"key":"bib62","first-page":"184","volume-title":"Proceedings of the 3rd Workshop on Very Large Corpora","author":"Melamed Dan","year":"1995"},{"key":"bib63","doi-asserted-by":"publisher","DOI":"10.1075\/dia.20.2.04min"},{"key":"bib64","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-968X.2005.00147.x"},{"key":"bib65","doi-asserted-by":"publisher","DOI":"10.3115\/1557835.1557841"},{"key":"bib66","first-page":"2387","volume-title":"Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006","author":"Mulloni Andrea","year":"2006"},{"key":"bib67","first-page":"247","volume-title":"Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2011","author":"Navlea Mirabela","year":"2011"},{"key":"bib68","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(70)90057-4"},{"issue":"2","key":"bib69","first-page":"43","volume":"20","author":"Ng Ee Lee","year":"2010","journal-title":"Int. J. of Asian Lang. Proc."},{"key":"bib70","doi-asserted-by":"publisher","DOI":"10.1076\/jqul.7.3.233.4105"},{"key":"bib71","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1218726110"},{"key":"bib72","first-page":"1018","volume-title":"Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers","author":"Rama Taraka","year":"2016"},{"key":"bib73","first-page":"171","volume-title":"Sequences in Language and Text","author":"Rama Taraka","year":"2014"},{"key":"bib74","volume-title":"Lingvistica Romanica: Lexic, Morfologie, Fonetica","author":"Reinheimer Ripeanu Sanda","year":"2001"},{"key":"bib75","volume-title":"Formarea Cuvintelor \u00een Limba Rom\u00e2n\u0103","volume":"4","author":"R\u0103dulescu Sala Marina","year":"2015"},{"key":"bib76","doi-asserted-by":"publisher","DOI":"10.1002\/0471223921.ch8"},{"key":"bib77","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809682"},{"key":"bib78","first-page":"67","volume-title":"Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation","author":"Simard Michel","year":"1992"},{"key":"bib79","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(81)90087-5"},{"key":"bib80","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1267"},{"key":"bib81","doi-asserted-by":"publisher","DOI":"10.1163\/221058211X570358"},{"key":"bib82","doi-asserted-by":"publisher","DOI":"10.1080\/00437956.1954.11659530"},{"key":"bib83","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1062"},{"key":"bib84","doi-asserted-by":"publisher","DOI":"10.3115\/1610075.1610108"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/coli_a_00361","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:28:27Z","timestamp":1615584507000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/45\/4\/667-704\/93357"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1]]},"references-count":82,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,1]]}},"alternative-id":["10.1162\/coli_a_00361"],"URL":"https:\/\/doi.org\/10.1162\/coli_a_00361","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1]]}}}