{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,1,25]],"date-time":"2024-01-25T08:19:04Z","timestamp":1706170744377},"reference-count":44,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2016,6,15]],"date-time":"2016-06-15T00:00:00Z","timestamp":1465948800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2016,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper highlights some of the recent developments in the field of machine translation using comparable corpora. We start by updating previous definitions of comparable corpora and then look at bilingual versions of continuous vector space models. Recently, neural networks have been used to obtain latent context representations with only few dimensions which are often called word embeddings. These promising new techniques cannot only be applied to parallel but also to comparable corpora. Subsequent sections of the paper discuss work specifically targeting at machine translation using comparable corpora, as well as work dealing with the extraction of parallel segments from comparable corpora. Finally, we give an overview on the design and the results of a recent shared task on measuring document comparability across languages.<\/jats:p>","DOI":"10.1017\/s1351324916000115","type":"journal-article","created":{"date-parts":[[2016,6,15]],"date-time":"2016-06-15T18:25:18Z","timestamp":1466015118000},"page":"501-516","source":"Crossref","is-referenced-by-count":6,"title":["Recent advances in machine translation using comparable corpora"],"prefix":"10.1017","volume":"22","author":[{"given":"REINHARD","family":"RAPP","sequence":"first","affiliation":[]},{"given":"SERGE","family":"SHAROFF","sequence":"additional","affiliation":[]},{"given":"PIERRE","family":"ZWEIGENBAUM","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2016,6,15]]},"reference":[{"key":"S1351324916000115_ref021","doi-asserted-by":"crossref","unstructured":"Munteanu D. S. and Marcu D. 2002. Processing comparable corpora with bilingual suffix trees. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, USA. Association for Computational Linguistics.","DOI":"10.3115\/1118693.1118730"},{"key":"S1351324916000115_ref016","doi-asserted-by":"crossref","unstructured":"Li B. and Gaussier E. 2013. Exploiting comparable corpora for lexicon extraction: measuring and improving corpus quality. In S. Sharoff , R. Rapp , P. Zweigenbaum , and P. Fung (eds.), Building and Using Comparable Corpora, pp. 131\u2013149. Springer-Verlag.","DOI":"10.1007\/978-3-642-20128-8_7"},{"key":"S1351324916000115_ref004","unstructured":"Bouamor D. , Popescu A. , Semmar N. and Zweigenbaum P. 2013. Building specialized bilingual lexicons using large scale background knowledge. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, Association for Computational Linguistics, pp. 479\u2013489."},{"key":"S1351324916000115_ref013","unstructured":"Klementiev A. , Titov I. and Bhattarai B. 2012a. Inducing crosslingual distributed representations of words. In Proceedings of COLING 2012, Mumbai, India, The COLING 2012 Organizing Committee, pp. 1459\u20131474."},{"key":"S1351324916000115_ref035","doi-asserted-by":"crossref","unstructured":"Sharoff S. , Zweigenbaum P. and Rapp R. 2015. Bucc shared task: cross-language document similarity. In Proceedings of the 8th Workshop on Building and Using Comparable Corpora, Beijing, China, Association for Computational Linguistics, pp. 74\u201378.","DOI":"10.18653\/v1\/W15-3411"},{"key":"S1351324916000115_ref028","unstructured":"Rapp R. , Zweigenbaum P. and Sharoff S. 2010. Preface. In Proceedings of the 3rd Workshop on Building and Using Comparable Corpora at LREC 2010, page V, Valletta, Malta. European Language Resources Association (ELRA)."},{"key":"S1351324916000115_ref003","unstructured":"Beigman Klebanov B. , and Flor M. 2013. Associative texture is lost in translation. In Proceedings of the Workshop on Discourse in Machine Translation, Sofia, Bulgaria, Association for Computational Linguistics, pp. 27\u201332."},{"key":"S1351324916000115_ref025","doi-asserted-by":"crossref","unstructured":"Nuhn M. , Schamper J. and Ney H. 2015. Unravel - a decipherment toolkit. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China, pp. 549\u2013553.","DOI":"10.3115\/v1\/P15-2090"},{"key":"S1351324916000115_ref018","unstructured":"Mikolov T. , Chen K. , Corrado G. and Dean J. 2013a. Efficient estimation of word representations in vector space. In Proceedings of the Workshop at ICLR'13."},{"key":"S1351324916000115_ref009","unstructured":"Gabrilovich E. and Markovitch S. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc, pp. 1606\u20131611."},{"key":"S1351324916000115_ref019","unstructured":"Mikolov T. , Le Q. V. and Sutskever I. 2013b. Exploiting similarities among languages for machine translation. CoRR, abs\/1309.4168."},{"key":"S1351324916000115_ref024","doi-asserted-by":"crossref","unstructured":"Nuhn M. and Ney H. 2014. Em decipherment for large vocabularies. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, pp. 759\u2013764.","DOI":"10.3115\/v1\/P14-2123"},{"key":"S1351324916000115_ref015","unstructured":"Li B. and Gaussier E. 2010. Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China, Coling 2010 Organizing Committee, pp. 644\u2013652."},{"key":"S1351324916000115_ref036","unstructured":"Su F. and Babych B. 2012. Development and application of a cross-language document comparability metric. In N. Calzolari , K. Choukri , T. Declerck , M. U. Dogan , B. Maegaard , J. Mariani , J. Odijk , and S. Piperidis (eds.), Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, European Language Resources Association (ELRA), pp. 3956\u20133962."},{"key":"S1351324916000115_ref011","doi-asserted-by":"crossref","unstructured":"Grefenstette G. 1992. SEXTANT: exploring unexplored contexts for semantic extraction from syntactic analysis. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, Newark, Delaware, USA, Association for Computational Linguistics, pp. 324\u2013326.","DOI":"10.3115\/981967.982020"},{"key":"S1351324916000115_ref026","doi-asserted-by":"crossref","unstructured":"Rapp R. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd ACL, Cambridge, MA, pp. 320\u2013322.","DOI":"10.3115\/981658.981709"},{"key":"S1351324916000115_ref001","doi-asserted-by":"crossref","unstructured":"Abdul-Rauf S. , and Schwenk H. 2009. On the use of comparable corpora to improve smt performance. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), Athens, Greece, Association for Computational Linguistics.","DOI":"10.3115\/1609067.1609068"},{"key":"S1351324916000115_ref044","unstructured":"Zou W. Y. , Socher R. , Cer D. and Manning C. D. 2013. Bilingual word embeddings for phrase-based machine translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, Association for Computational Linguistics, pp. 1393\u20131398."},{"key":"S1351324916000115_ref043","doi-asserted-by":"crossref","unstructured":"Zafarian A. , Agha Sadeghi A. P. , Azadi F. , Ghiasifard S. , Ali Panahloo Z. , Bakhshaei S. , and Mohammadzadeh Ziabary S. M. 2015. AUT document alignment framework for BUCC workshop shared task. In Proceedings of the Workshop on Building and Using Comparable Corpora at ACL 2015.","DOI":"10.18653\/v1\/W15-3412"},{"key":"S1351324916000115_ref031","doi-asserted-by":"crossref","unstructured":"Saluja A. , Hassan H. , Toutanova K. and Quirk C. 2014. Graph-based semi-supervised learning of translation models from monolingual data. In Proceedings of the 52nd ACL, Baltimore, MD, June.","DOI":"10.3115\/v1\/P14-1064"},{"key":"S1351324916000115_ref014","unstructured":"Klementiev A. , Titov I. and Bhattarai B. 2012b. Inducing crosslingual distributed representations of words. In Proceedings of COLING 2012, Mumbai, India, The COLING 2012 Organizing Committee, pp. 1459\u20131474."},{"key":"S1351324916000115_ref007","doi-asserted-by":"crossref","unstructured":"Faruqui M. and Dyer C. 2014. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, Association for Computational Linguistics, pp. 462\u2013471.","DOI":"10.3115\/v1\/E14-1049"},{"key":"S1351324916000115_ref020","doi-asserted-by":"crossref","unstructured":"Morin E. , Hazem A. , Boudin F. and Loginova-Clouet E. 2015. LINA: identifying comparable documents from wikipedia. In Proceedings of the Workshop on Building and Using Comparable Corpora at ACL 2015.","DOI":"10.18653\/v1\/W15-3413"},{"key":"S1351324916000115_ref002","first-page":"193","article-title":"Vector disambiguation for translation extraction from comparable corpora","volume":"37","author":"Apidianaki","year":"2013","journal-title":"Informatica (Slovenia)"},{"key":"S1351324916000115_ref006","doi-asserted-by":"crossref","unstructured":"Dou Q. , Vaswani A. , Knight K. and Dyer C. 2015. Unifying bayesian inference and vector space models for improved decipherment. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, Association for Computational Linguistics, pp. 836\u2013845.","DOI":"10.3115\/v1\/P15-1081"},{"key":"S1351324916000115_ref033","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-20128-8"},{"key":"S1351324916000115_ref039","unstructured":"Vulic I. and Moens M.-F. 2014a. Probabilistic models of cross-lingual semantic similarity in context based on latent cross-lingual concepts induced from comparable data. In EMNLP, pp. 349\u2013362."},{"key":"S1351324916000115_ref029","unstructured":"Ravi S. and Knight K. 2008. Attacking decipherment problems optimally with low-order ngram models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Honoluli, Hawaii, Association for Computational Linguistics, pp. 812\u2013819."},{"key":"S1351324916000115_ref017","unstructured":"Maia B. 2003. What are comparable corpora. In Multilingual Corpora: Linguistic Requirements and Technical Perspectives. Workshop at the Corpus Linguistics Conference, Lancaster, UK."},{"key":"S1351324916000115_ref012","unstructured":"Haghighi A. , Liang P. , Berg-Kirkpatrick T. , and Klein D. 2008. Learning bilingual lexicons from monolingual corpora. In Proceedings of ACL-08: HLT, Columbus, Ohio, Association for Computational Linguistics, pp. 771\u2013779."},{"key":"S1351324916000115_ref037","unstructured":"T\u00e4ckstr\u00f6m O. , McDonald R. , and Uszkoreit J. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT '12, Stroudsburg, PA, USA, Association for Computational Linguistics, pp. 477\u2013487."},{"key":"S1351324916000115_ref005","unstructured":"Chandar A. P. S. , Lauly S. , Larochelle H. , Khapra M. M. , Ravindran B. , Raykar V. C. , and Saha A. 2014. An autoencoder approach to learning bilingual word representations. In Z. Ghahramani , M. Welling , C. Cortes , N. D. Lawrence , and K. Q. Weinberger (eds.), Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp. 1853\u20131861, Montreal, Quebec, Canada."},{"key":"S1351324916000115_ref041","first-page":"15","volume-title":"Machine Translation of Languages","author":"Weaver","year":"1955"},{"key":"S1351324916000115_ref008","unstructured":"Fung P. , and McKeown K. 1997. Finding terminology translations from non-parallel corpora. In Proceedings of the 5th Annual Workshop on Very Large Corpora, pp. 192\u2013202, see http:\/\/anthology.aclweb.org\/W\/W97\/W97-0100.pdf."},{"key":"S1351324916000115_ref010","unstructured":"Gouws S. , Bengio Y. and Corrado G. 2015. BilBOWA: fast bilingual distributed representations without word alignments. In F. Bach, D. and Blei (eds.), Proceedings of the 32nd International Conference on Machine Learning, JMLR Workshop and Conference Proceedings, vol. 37, Lille, France."},{"key":"S1351324916000115_ref042","doi-asserted-by":"crossref","unstructured":"Wu D. and Fung P. 2005. Inversion transduction grammar constraints for mining parallel sentences from quasi-comparable corpora. In Natural Language Processing\u2013IJCNLP 2005, Springer, pp. 257\u2013268.","DOI":"10.1007\/11562214_23"},{"key":"S1351324916000115_ref027","doi-asserted-by":"crossref","unstructured":"Rapp R. 1999. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th ACL, Maryland, Association for Computational Linguistics, pp. 395\u2013398.","DOI":"10.3115\/1034678.1034756"},{"key":"S1351324916000115_ref034","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-20128-8"},{"key":"S1351324916000115_ref022","doi-asserted-by":"publisher","DOI":"10.1162\/089120105775299168"},{"key":"S1351324916000115_ref032","unstructured":"Seraj R. M. 2015. Paraphrases for Statistical Machine Translation. PhD Thesis, Simon Fraser University."},{"key":"S1351324916000115_ref038","doi-asserted-by":"crossref","unstructured":"Tang L. , Wang T. and Chen Y. 2015. Problems of alignment in paraconc for a case study. In Ally Hu (ed.): Computer Science and Applications. Proceedings of the 2014 Asia-Pacific Conference on Computer Science and Applications, Shanghai, China, Taylor & Francis, London, UK, pp. 57\u201362.","DOI":"10.1201\/b18508-12"},{"key":"S1351324916000115_ref030","unstructured":"Ravi S. and Knight K. 2011. Deciphering foreign language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, Association for Computational Linguistics, pp. 812\u2013819."},{"key":"S1351324916000115_ref023","doi-asserted-by":"crossref","unstructured":"Munteanu D. S. and Marcu D. 2006. Extracting parallel sub-sentential fragments from non-parallel corpora. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING\/ACL 2006), Sydney, Australia, Association for Computational Linguistics.","DOI":"10.3115\/1220175.1220186"},{"key":"S1351324916000115_ref040","unstructured":"Vulic I. and Moens M.-F. 2014b. Probabilistic models of cross-lingual semantic similarity in context based on latent cross-lingual concepts induced from comparable data. In EMNLP, pp. 349\u2013362."}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324916000115","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,9,9]],"date-time":"2019-09-09T19:51:44Z","timestamp":1568058704000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324916000115\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,15]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,7]]}},"alternative-id":["S1351324916000115"],"URL":"https:\/\/doi.org\/10.1017\/s1351324916000115","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,6,15]]}}}