{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T10:26:40Z","timestamp":1758709600920},"reference-count":47,"publisher":"Cambridge University Press (CUP)","issue":"2","license":[{"start":{"date-parts":[[2019,9,4]],"date-time":"2019-09-04T00:00:00Z","timestamp":1567555200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2020,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Some languages have very few NLP resources, while many of them are closely related to better-resourced languages. This paper explores how the similarity between the languages can be utilised by porting resources from better- to lesser-resourced languages. The paper introduces a way of building a representation shared across related languages by combining cross-lingual embedding methods with a lexical similarity measure which is based on the weighted Levenshtein distance. One of the outcomes of the experiments is a Panslavonic embedding space for nine Balto-Slavonic languages. The paper demonstrates that the resulting embedding space helps in such applications as morphological prediction, named-entity recognition and genre classification.<\/jats:p>","DOI":"10.1017\/s1351324919000354","type":"journal-article","created":{"date-parts":[[2019,9,4]],"date-time":"2019-09-04T12:41:56Z","timestamp":1567600916000},"page":"163-182","source":"Crossref","is-referenced-by-count":7,"title":["Finding next of kin: Cross-lingual embedding spaces for related languages"],"prefix":"10.1017","volume":"26","author":[{"given":"Serge","family":"Sharoff","sequence":"first","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2019,9,4]]},"reference":[{"key":"S1351324919000354_ref27","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1030"},{"key":"S1351324919000354_ref35","first-page":"2487","article-title":"Hubs in space: Popular nearest neighbors in high-dimensional data","volume":"11","author":"Radovanovi\u0107","year":"2010","journal-title":"Journal of Machine Learning Research"},{"key":"S1351324919000354_ref33","volume-title":"Synthesis Lectures on Human Language Technologies","author":"Piotrowski","year":"2012"},{"key":"S1351324919000354_ref13","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2017.1421542"},{"key":"S1351324919000354_ref32","first-page":"285","article-title":"Stable classification of text genres","volume":"34","author":"Petrenz","year":"2010","journal-title":"Computational Linguistics"},{"key":"S1351324919000354_ref24","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1515\/9783110305258.375","volume-title":"Approaches to Measuring Linguistic Differences","author":"Kondrak","year":"2013"},{"key":"S1351324919000354_ref25","unstructured":"Krek, S. , Erjavec, T. , Dobrovoljc, K. , Holz, N. , Ledinek, N. and Mo\u017ee, S. (2012). U\u010dni korpus ssj500k kot podatkovna zbirka."},{"key":"S1351324919000354_ref46","doi-asserted-by":"publisher","DOI":"10.1613\/jair.4786"},{"key":"S1351324919000354_ref42","volume-title":"Technical report","author":"Sorower","year":"2010"},{"key":"S1351324919000354_ref7","first-page":"1137","article-title":"A neural probabilistic language model","volume":"3","author":"Bengio","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"S1351324919000354_ref28","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-1414"},{"key":"S1351324919000354_ref45","unstructured":"Tiedemann, J. (2014). Rediscovering annotation projection for cross-lingual parser induction. In Proceedings of COLING, Dublin, pp. 1854\u20131864."},{"key":"S1351324919000354_ref30","unstructured":"Mikolov, T. , Le, Q.V. and Sutskever, I. (2013). Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168."},{"key":"S1351324919000354_ref4","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-009-9081-4"},{"key":"S1351324919000354_ref15","unstructured":"Dyer, C. , Chahuneau, V. and Smith, N.A. (2013). A simple, fast, and effective reparameterization of IBM Model 2. In Proceedings of NAACL, Atlanta, Georgia."},{"key":"S1351324919000354_ref37","doi-asserted-by":"publisher","DOI":"10.1515\/pralin-2016-0017"},{"key":"S1351324919000354_ref1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1250"},{"key":"S1351324919000354_ref6","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1080"},{"key":"S1351324919000354_ref2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1172"},{"key":"S1351324919000354_ref12","unstructured":"Das, D. and Petrov, S. (2011). Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of ACL, Portland, Oregon."},{"key":"S1351324919000354_ref5","unstructured":"Bateman, J.A. , Kruijff, G.-J. , Kruijff-Korbayov\u00e1, I. , Skoumalov\u00e1, H. , Sharoff, S. and Teich, E. (2000). Resources for multilingual text generation in three slavic languages. In Proceedings of Second International Conference on Language Resources and Evaluation (LREC), Athens, Greece."},{"key":"S1351324919000354_ref3","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqi039"},{"key":"S1351324919000354_ref10","first-page":"2493","article-title":"Natural language processing (almost) from scratch","volume":"12","author":"Collobert","year":"2011","journal-title":"The Journal of Machine Learning Research"},{"key":"S1351324919000354_ref11","unstructured":"Conneau, A. , Lample, G. , Ranzato, M. , Denoyer, L. and J\u00e9gou, H. (2017). Word translation without parallel data. arXiv preprint arXiv:1710.04087."},{"key":"S1351324919000354_ref16","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/E14-1049"},{"key":"S1351324919000354_ref17","first-page":"50","article-title":"Best friends or just faking it? Corpus-based extraction of Slovene\u2013Croatian translation equivalents and false friends","volume":"1","author":"Fi\u0161er","year":"2013","journal-title":"Sloven\u0161\u010dina 2.0"},{"key":"S1351324919000354_ref18","doi-asserted-by":"publisher","DOI":"10.1007\/s10590-011-9090-0"},{"key":"S1351324919000354_ref20","unstructured":"Fung, P. (1995). Compiling bilingual lexicon entries from a non-parallel English\u2013Chinese corpus. In Proceedings of the Third Annual Workshop on Very Large Corpora, Boston, Massachusetts, pp. 173\u2013183."},{"key":"S1351324919000354_ref41","volume-title":"Ethnologue: Languages of the World","author":"Simons","year":"2017"},{"key":"S1351324919000354_ref8","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"S1351324919000354_ref21","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2068"},{"key":"S1351324919000354_ref40","doi-asserted-by":"publisher","DOI":"10.3366\/cor.2018.0136"},{"key":"S1351324919000354_ref22","unstructured":"Klementiev, A. , Titov, I. and Bhattarai, B. (2012). Inducing crosslingual distributed representations of words. In Proceedings of COLING, Mumbai, India."},{"key":"S1351324919000354_ref23","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1246"},{"key":"S1351324919000354_ref19","unstructured":"Frunza, O. and Inkpen, D. (2009). Identification and disambiguation of cognates, false friends, and partial cognates using machine learning techniques. International Journal of Linguistics 1(1)."},{"key":"S1351324919000354_ref26","volume-title":"Computational Analysis of Present-Day American English","author":"Ku\u010dera","year":"1967"},{"key":"S1351324919000354_ref29","unstructured":"Mikolov, T. , Grave, E. , Bojanowski, P. , Puhrsch, C. and Joulin, A. (2017). Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405."},{"key":"S1351324919000354_ref31","unstructured":"Nivre, J. , de Marneffe, M.-C. , Ginter, F. , Goldberg, Y. , Haji\u010d, J. , Manning, C.D. , McDonald, R. , Petrov, S. , Pyysalo, S. , Silveira, N. , Tsarfaty, R. and Zeman, D. (2016). Universal dependencies v1: A multilingual treebank collection. In Proceedings of LREC 2016, Portoro\u017e, Slovenia."},{"key":"S1351324919000354_ref34","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-1412"},{"key":"S1351324919000354_ref9","unstructured":"Bojanowski, P. , Grave, E. , Joulin, A. and Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606."},{"key":"S1351324919000354_ref36","doi-asserted-by":"publisher","DOI":"10.3115\/981658.981709"},{"key":"S1351324919000354_ref38","volume-title":"Genres on the Web: Computational Models and Empirical Studies","author":"Santini","year":"2010"},{"key":"S1351324919000354_ref39","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-20128-8_6"},{"key":"S1351324919000354_ref43","unstructured":"Straka, M. , Haji\u010d, J. and Strakov\u00e1, J. (2016). UDPipe: Trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing. In Proceedings of LREC 2016, Portoro\u017e, Slovenia."},{"key":"S1351324919000354_ref44","unstructured":"T\u00e4ckstr\u00f6m, O. , McDonald, R. and Nivre, J. (2013). Target language adaptation of discriminative transfer parsers. In Proceedings of NAACL HLT, Atlanta, pp. 1061\u20131071."},{"key":"S1351324919000354_ref47","first-page":"377","article-title":"Stochastic inversion transduction grammars and bilingual parsing of parallel corpora","volume":"23","author":"Wu","year":"1997","journal-title":"Computational Linguistics"},{"key":"S1351324919000354_ref14","unstructured":"Dinu, G. , Lazaridou, A. and Baroni, M. (2014). Improving zero-shot learning by mitigating the hubness problem. arXiv preprint arXiv:1412.6568."}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324919000354","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,27]],"date-time":"2022-09-27T12:32:04Z","timestamp":1664281924000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324919000354\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,4]]},"references-count":47,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,3]]}},"alternative-id":["S1351324919000354"],"URL":"https:\/\/doi.org\/10.1017\/s1351324919000354","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,9,4]]}}}