{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T06:24:29Z","timestamp":1771655069263,"version":"3.50.1"},"reference-count":71,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T00:00:00Z","timestamp":1643241600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T00:00:00Z","timestamp":1643241600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2022,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Comparable corpora can benefit the development of Neural Machine Translation models, in particular for under-resourced languages. We present a case study centred on the exploitation of a large comparable corpus for Basque-Spanish, created from independently-produced news by the Basque public broadcaster<jats:sc>eitb<\/jats:sc>, where we evaluate the impact of different techniques to exploit the original data, in order to complement parallel datasets for this language pair in both translation directions. Two efficient methods for parallel sentence mining are explored, which identified a common core of approximately half of the total number of aligned sentences, each one uniquely identifying valid parallel sentences not captured by the other method. Filtering the data via identification of length-difference outliers proved highly effective to improve the models, as was the use of tags to discriminate between comparable and parallel data in the training corpora. The use of backtranslated data is also evaluated in this work, with results indicating that alignment-based datasets remain the most beneficial, although complementary backtranslations should also be included to fully exploit the available comparable data. Overall, the results in this work demonstrate that this type of data needs to be carefully analysed prior to its use as training data for Neural Machine Translation, since issues such as information imbalance between source and target data can lead to unoptimal results for a given translation pair.<\/jats:p>","DOI":"10.1007\/s10579-021-09572-2","type":"journal-article","created":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T00:03:17Z","timestamp":1643241797000},"page":"943-971","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Making the most of comparable corpora in Neural Machine Translation: a case study"],"prefix":"10.1007","volume":"56","author":[{"given":"Harritxu","family":"Gete","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7253-1693","authenticated-orcid":false,"given":"Thierry","family":"Etchegoyhen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,1,27]]},"reference":[{"key":"9572_CR1","doi-asserted-by":"crossref","unstructured":"Abdul-Rauf, S., & Schwenk, H. (2009). On the use of comparable corpora to improve SMT performance. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (pp. 16\u201323). Association for Computational Linguistics.","DOI":"10.3115\/1609067.1609068"},{"issue":"4","key":"9572_CR2","doi-asserted-by":"publisher","first-page":"341","DOI":"10.1007\/s10590-011-9114-9","volume":"25","author":"S Abdul-Rauf","year":"2011","unstructured":"Abdul-Rauf, S., & Schwenk, H. (2011). Parallel sentence generation from comparable corpora for improved SMT. Machine Translation, 25(4), 341\u2013375.","journal-title":"Machine Translation"},{"key":"9572_CR3","doi-asserted-by":"crossref","unstructured":"Artetxe, M., Labaka, G., Agirre, E., & Cho, K. (2018). Unsupervised neural machine translation. In International Conference on Learning Representations.","DOI":"10.18653\/v1\/D18-1399"},{"key":"9572_CR4","doi-asserted-by":"crossref","unstructured":"Artetxe, M., & Schwenk, H. (2019). Margin-based parallel corpus mining with multilingual sentence embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3197\u20133203). Association for Computational Linguistics.","DOI":"10.18653\/v1\/P19-1309"},{"key":"9572_CR5","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1007\/s10590-019-09234-9","volume":"33","author":"A Azpeitia","year":"2019","unstructured":"Azpeitia, A., & Etchegoyhen, T. (2019). Efficient document alignment across scenarios. Machine Translation, 33, 205\u2013237.","journal-title":"Machine Translation"},{"key":"9572_CR6","doi-asserted-by":"crossref","unstructured":"Azpeitia, A., Etchegoyhen, T., & Mart\u00ednez\u00a0Garcia, E. (2017). Weighted set-theoretic alignment of comparable sentences. In Proceedings of the Tenth Workshop on Building and Using Comparable Corpora (pp. 41\u201345).","DOI":"10.18653\/v1\/W17-2508"},{"key":"9572_CR7","unstructured":"Azpeitia, A., Etchegoyhen, T., & Mart\u00ednez\u00a0Garcia, E. (2018). Extracting parallel sentences from comparable corpora with STACC variants. In Proceedings of the Eleventh Workshop on Building and Using Comparable Corpora (pp. 48\u201352)."},{"key":"9572_CR8","unstructured":"Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations."},{"key":"9572_CR9","unstructured":"Belinkov, Y., & Bisk, Y. (2018). Synthetic and natural noise both break neural machine translation. In Proceedings of the 6th International Conference on Learning Representations."},{"key":"9572_CR10","doi-asserted-by":"crossref","unstructured":"Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huang, S., Huck, M., Koehn, P., Liu, Q., Logacheva, V., Monz, C., Negri, M., Post, M., Rubino, R., Specia, L., & Turchi, M. (2017). Findings of the 2017 conference on machine translation. In Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers (pp. 169\u2013214).","DOI":"10.18653\/v1\/W17-4717"},{"key":"9572_CR11","doi-asserted-by":"crossref","unstructured":"Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., Jimeno\u00a0Yepes, A., Koehn, P., Logacheva, V., Monz, C., Negri, M., Neveol, A., Neves, M., Popel, M., Post, M., Rubino, R., Scarton, C., Specia, L., Turchi, M., Verspoor, K., & Zampieri, M. (2016). Findings of the 2016 conference on machine translation. In Proceedings of the First Conference on Machine Translation (pp. 131\u2013198).","DOI":"10.18653\/v1\/W16-2301"},{"key":"9572_CR12","doi-asserted-by":"crossref","unstructured":"Buck, C., & Koehn, P. (2016a). Findings of the WMT 2016 bilingual document alignment shared task. In Proceedings of the First Conference on Machine Translation (pp. 554\u2013563). Association for Computational Linguistics.","DOI":"10.18653\/v1\/W16-2347"},{"key":"9572_CR13","doi-asserted-by":"crossref","unstructured":"Buck, C., & Koehn, P. (2016b). Quick and reliable document alignment via TF\/IDF-weighted cosine distance. In Proceedings of the First Conference on Machine Translation (pp. 672\u2013678). Association for Computational Linguistics.","DOI":"10.18653\/v1\/W16-2365"},{"key":"9572_CR14","doi-asserted-by":"crossref","unstructured":"Caswell, I., Chelba, C., & Grangier, D. (2019). Tagged back-translation. In Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers) (pp. 53\u201363). Association for Computational Linguistics.","DOI":"10.18653\/v1\/W19-5206"},{"key":"9572_CR15","unstructured":"Chen, J., Chau, R., & Yeh, C.-H. (2004). Discovering parallel text from the World Wide Web. In Proceedings of the Second Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32, ACSW Frontiers \u201904 (pp. 157\u2013161). Australian Computer Society, Inc."},{"key":"9572_CR16","unstructured":"Chen, J., & Nie, J.-Y. (2000). Parallel web text mining for cross-language IR. In Content-Based Multimedia Information Access - Volume 1 (pp. 62\u201377). Centre des hautes \u00e9tudes internationales d\u2019informatique documentaire."},{"key":"9572_CR17","doi-asserted-by":"crossref","unstructured":"Cheng, Y., Tu, Z., Meng, F., Zhai, J., & Liu, Y. (2018). Towards robust neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1756\u20131766). Association for Computational Linguistics.","DOI":"10.18653\/v1\/P18-1163"},{"key":"9572_CR18","unstructured":"Dyer, C., Chahuneau, V., & Smith, N.\u00a0A. (2013). A simple, fast, and effective reparameterization of IBM model 2. In Proceedings of The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies."},{"key":"9572_CR19","doi-asserted-by":"crossref","unstructured":"Edunov, S., Ott, M., Auli, M., & Grangier, D. (2018). Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 489\u2013500). Association for Computational Linguistics.","DOI":"10.18653\/v1\/D18-1045"},{"key":"9572_CR20","doi-asserted-by":"crossref","unstructured":"Espl\u00e1-Gomis, M., Forcada, M.\u00a0L., Ortiz-Rojas, S., & Ferr\u00e1ndez-Tordera, J. (2016). Bitextor\u2019s participation in WMT\u201916: shared task on document alignment. In Proceedings of the First Conference on Machine Translation (pp. 685\u2013691).","DOI":"10.18653\/v1\/W16-2367"},{"issue":"2","key":"9572_CR21","first-page":"243","volume":"4","author":"T Etchegoyhen","year":"2016","unstructured":"Etchegoyhen, T., & Azpeitia, A. (2016a). A portable method for parallel and comparable document alignment. Baltic Journal of Modern Computing, 4(2), 243\u2013255.","journal-title":"Baltic Journal of Modern Computing"},{"key":"9572_CR22","doi-asserted-by":"crossref","unstructured":"Etchegoyhen, T., & Azpeitia, A. (2016b). Set-theoretic alignment for comparable corpora. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, volume 1: Long Papers (pp. 2009\u20132018).","DOI":"10.18653\/v1\/P16-1189"},{"key":"9572_CR23","unstructured":"Etchegoyhen, T., Azpeitia, A., & P\u00e9rez, N. (2016). Exploiting a large strongly comparable corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation."},{"key":"9572_CR24","unstructured":"Etchegoyhen, T., & Gete, H. (2020). Handle with care: A case study in comparable corpora exploitation for neural machine translation. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 3792\u20133800). European Language Resources Association."},{"key":"9572_CR25","unstructured":"Etchegoyhen, T., Mart\u00ednez\u00a0Garcia, E., Azpeitia, A., Labaka, G., Alegria, I., Cortes\u00a0Etxabe, I., Jauregi\u00a0Carrera, A., Ellakuria\u00a0Santos, I., Martin, M., & Calonge, E. (2018). Neural machine translation of Basque. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation (pp. 139\u2013148)."},{"key":"9572_CR26","unstructured":"Fung, P., & Cheung, P. (2004). Mining very non-parallel corpora: Parallel sentence and lexicon extraction via bootstrapping and E.M. In Proceedings of Empirical Methods in Natural Language Processing (pp. 57\u201363)."},{"key":"9572_CR27","doi-asserted-by":"crossref","unstructured":"Germann, U. (2016). Bilingual document alignment with latent semantic indexing. In Proceedings of the First Conference on Machine Translation (pp. 692\u2013696). Association for Computational Linguistics.","DOI":"10.18653\/v1\/W16-2368"},{"key":"9572_CR28","doi-asserted-by":"crossref","unstructured":"Gomes, L., & Lopes, G.\u00a0P. (2016). First steps towards coverage-based document alignment. In Proceedings of the First Conference on Machine Translation (pp. 697\u2013702). Association for Computational Linguistics.","DOI":"10.18653\/v1\/W16-2369"},{"key":"9572_CR29","doi-asserted-by":"crossref","unstructured":"Gr\u00e9goire, F., & Langlais, P. (2017). BUCC 2017 shared task: a first attempt toward a deep learning framework for identifying parallel sentences in comparable corpora. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora (pp. 46\u201350).","DOI":"10.18653\/v1\/W17-2509"},{"key":"9572_CR30","unstructured":"Iglewicz, B., & Hoaglin, D. (1993). Volume 16: How to detect and handle outliers. The ASQC basic references in quality control: statistical techniques, 16."},{"key":"9572_CR31","unstructured":"Ion, R., Ceau\u015fu, A., & Irimia, E. (2011). An expectation maximization algorithm for textual unit alignment. In Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web (pp. 128\u2013135). Association for Computational Linguistics."},{"key":"9572_CR32","unstructured":"Irvine, A., & Callison-Burch, C. (2013). Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the Eighth Workshop on Statistical Machine Translation (pp. 262\u2013270)."},{"key":"9572_CR33","first-page":"241","volume":"37","author":"P Jaccard","year":"1901","unstructured":"Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des Dranses et dans quelques r\u00e9gions voisines. Bulletin de la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles, 37, 241\u2013272.","journal-title":"Bulletin de la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles"},{"key":"9572_CR34","doi-asserted-by":"crossref","unstructured":"Junczys-Dowmunt, M., Grundkiewicz, R., Dwojak, T., Hoang, H., Heafield, K., Neckermann, T., Seide, F., Germann, U., Aji, A.\u00a0F., Bogoychev, N., Martins, A. F.\u00a0T., & Birch, A. (2018). Marian: Fast neural machine translation in C++. In Proceedings of 56th Annual Meeting of the Association for Computational Linguistics-System Demonstrations (pp. 116\u2013121).","DOI":"10.18653\/v1\/P18-4020"},{"key":"9572_CR35","doi-asserted-by":"crossref","unstructured":"Khayrallah, H., & Koehn, P. (2018). On the impact of various types of noise on neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation (pp. 74\u201383).","DOI":"10.18653\/v1\/W18-2709"},{"key":"9572_CR36","doi-asserted-by":"crossref","unstructured":"Kobus, C., Crego, J., & Senellart, J. (2017). Domain control for neural machine translation. In Proceedings of Recent Advances in Natural Language Processing (pp. 372\u2013378). INCOMA Ltd.","DOI":"10.26615\/978-954-452-049-6_049"},{"key":"9572_CR37","unstructured":"Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 388\u2013395)."},{"key":"9572_CR38","doi-asserted-by":"crossref","unstructured":"Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et\u00a0al. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (pp. 177\u2013180).","DOI":"10.3115\/1557769.1557821"},{"key":"9572_CR39","unstructured":"Lample, G., Conneau, A., Denoyer, L., & Ranzato, M. (2018). Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations."},{"key":"9572_CR40","doi-asserted-by":"crossref","unstructured":"Li, B., & Gaussier, E. (2013). Exploiting comparable corpora for lexicon extraction: Measuring and improving corpus quality. In Building and Using Comparable Corpora (pp. 131\u2013149).","DOI":"10.1007\/978-3-642-20128-8_7"},{"key":"9572_CR41","unstructured":"Ma, X., & Liberman, M. (1999). Bits: A method for bilingual text search over the web. In Proceedings of Machine Translation Summit VII (pp. 538\u2013542)."},{"key":"9572_CR42","doi-asserted-by":"crossref","unstructured":"Morin, E., Hazem, A., Boudin, F., & Clouet, E.\u00a0L. (2015). Lina: Identifying comparable documents from Wikipedia. In Proceedings of the Eighth Workshop on Building and Using Comparable Corpora.","DOI":"10.18653\/v1\/W15-3413"},{"key":"9572_CR43","doi-asserted-by":"crossref","unstructured":"Munteanu, D.\u00a0S., & Marcu, D. (2002). Processing comparable corpora with bilingual suffix trees. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 289\u2013295).","DOI":"10.3115\/1118693.1118730"},{"issue":"4","key":"9572_CR44","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1162\/089120105775299168","volume":"31","author":"DS Munteanu","year":"2005","unstructured":"Munteanu, D. S., & Marcu, D. (2005). Improving machine translation performance by exploiting non-parallel corpora. Computational Linguistics, 31(4), 477\u2013504.","journal-title":"Computational Linguistics"},{"key":"9572_CR45","doi-asserted-by":"crossref","unstructured":"Papavassiliou, V., Prokopidis, P., & Piperidis, S. (2016). The ILSP\/ARC submission to the WMT 2016 bilingual document alignment shared task. In Proceedings of the First Conference on Machine Translation (pp. 733\u2013739).","DOI":"10.18653\/v1\/W16-2375"},{"key":"9572_CR46","doi-asserted-by":"crossref","unstructured":"Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (pp. 311\u2013318).","DOI":"10.3115\/1073083.1073135"},{"key":"9572_CR48","unstructured":"Poncelas, A., Shterionov, D.\u00a0S., Way, A., de\u00a0Buy\u00a0Wenniger, G.\u00a0M., & Passban, P. (2018). Investigating backtranslation in neural machine translation. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation (pp. 249\u2013258)."},{"key":"9572_CR49","doi-asserted-by":"crossref","unstructured":"Post, M. (2018). A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers (pp. 186\u2013191).","DOI":"10.18653\/v1\/W18-6319"},{"key":"9572_CR50","doi-asserted-by":"crossref","unstructured":"Rapp, R. (1995). Identifying word translations in non-parallel texts. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics (pp. 320\u2013322).","DOI":"10.3115\/981658.981709"},{"key":"9572_CR51","doi-asserted-by":"crossref","unstructured":"Resnik, P. (1999). Mining the Web for bilingual text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (pp. 527\u2013534).","DOI":"10.3115\/1034678.1034757"},{"issue":"3","key":"9572_CR52","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1162\/089120103322711578","volume":"29","author":"P Resnik","year":"2003","unstructured":"Resnik, P., & Smith, N. A. (2003). The web as a parallel corpus. Computational Linguistics, 29(3), 349\u2013380.","journal-title":"Computational Linguistics"},{"key":"9572_CR53","doi-asserted-by":"crossref","unstructured":"Sarikaya, R., Maskey, S., Zhang, R., Jan, E.-E., Wang, D., Ramabhadran, B., & Roukos, S. (2009). Iterative sentence-pair extraction from quasi-parallel corpora for machine translation. In Proceedings of InterSpeech (pp. 432\u2013435).","DOI":"10.21437\/Interspeech.2009-156"},{"key":"9572_CR54","doi-asserted-by":"crossref","unstructured":"Schwenk, H. (2018). Filtering and mining parallel data in a joint multilingual space. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 228\u2013234).","DOI":"10.18653\/v1\/P18-2037"},{"key":"9572_CR55","unstructured":"Schwenk, H., Chaudhary, V., Sun, S., Gong, H., & Guzm\u00e1n, F. (2019). WikiMatrix: Mining 135m parallel sentences in 1620 language pairs from Wikipedia. CoRR, abs\/1907.05791."},{"key":"9572_CR56","doi-asserted-by":"crossref","unstructured":"Sennrich, R., Haddow, B., & Birch, A. (2016a). Controlling politeness in neural machine translation via side constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 35\u201340). Association for Computational Linguistics.","DOI":"10.18653\/v1\/N16-1005"},{"key":"9572_CR57","doi-asserted-by":"crossref","unstructured":"Sennrich, R., Haddow, B., & Birch, A. (2016b). Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 86\u201396).","DOI":"10.18653\/v1\/P16-1009"},{"key":"9572_CR58","doi-asserted-by":"crossref","unstructured":"Sennrich, R., Haddow, B., & Birch, A. (2016c). Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers (pp. 1715\u20131725).","DOI":"10.18653\/v1\/P16-1162"},{"key":"9572_CR59","volume-title":"Building and using comparable corpora","author":"S Sharoff","year":"2016","unstructured":"Sharoff, S., Rapp, R., Zweigenbaum, P., & Fung, P. (2016). Building and using comparable corpora. Incorporated: Springer Publishing Company."},{"key":"9572_CR60","doi-asserted-by":"crossref","unstructured":"Sharoff, S., Zweigenbaum, P., & Rapp, R. (2015). BUCC shared task: Cross-language document similarity. In Proceedings of the 8th Workshop on Building and Using Comparable Corpora (pp. 74\u201378).","DOI":"10.18653\/v1\/W15-3411"},{"key":"9572_CR61","unstructured":"Skadi\u0146a, I., Aker, A., Mastropavlos, N., Su, F., Tufis, D., Verlic, M., Vasiljevs, A., Babych, B., Clough, P., Gaizauskas, R., et\u00a0al. (2012). Collecting and using comparable corpora for statistical machine translation. In Proceedings of the 8th International Conference on Language Resources and Evaluation."},{"key":"9572_CR62","unstructured":"Smith, J. R., Quirk, C., & Toutanova, K. (2010). Extracting parallel sentences from comparable corpora using document level alignment. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 403\u2013411)."},{"key":"9572_CR63","unstructured":"Sperber, M., Niehues, J., & Waibel, A. (2017). Toward robust neural machine translation for noisy input sequences. In Proceedings of the 14th International Workshop on Spoken Language Translation (pp. 90\u201396)."},{"key":"9572_CR64","unstructured":"Stef\u0103nescu, D., Ion, R., & Hunsicker, S. (2012). Hybrid parallel sentence mining from comparable corpora. In Proceedings of the 16th Conference of the European Association for Machine Translation (pp. 137\u2013144)."},{"key":"9572_CR65","unstructured":"Tiedemann, J. (2012). Parallel data, tools and interfaces in OPUS. In Proceedings of the 8th Language Resources and Evaluation Conference (pp. 2214\u20132218)."},{"key":"9572_CR66","unstructured":"Uszkoreit, J., Ponte, J.\u00a0M., Popat, A.\u00a0C., & Dubiner, M. (2010). Large scale parallel document mining for machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 1101\u20131109)."},{"key":"9572_CR67","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, \u0141, & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems 30 (NIPS 2017) (pp. 5998\u20136008)."},{"key":"9572_CR68","unstructured":"Yamagishi, H., Kanouchi, S., Sato, T., & Komachi, M. (2016). Controlling the voice of a sentence in Japanese-to-English neural machine translation. In Proceedings of the 3rd Workshop on Asian Translation (pp. 203\u2013210)."},{"key":"9572_CR69","doi-asserted-by":"crossref","unstructured":"Zafarian, A., Aghasadeghi, A., Azadi, F., Ghiasifard, S., Alipanahloo, Z., Bakhshaei, S., & Ziabary, S. M.\u00a0M. (2015). AUT document alignment framework for bucc workshop shared task. In Proceedings of the 8th Workshop on Building and Using Comparable Corpora (p.\u00a079).","DOI":"10.18653\/v1\/W15-3412"},{"key":"9572_CR70","doi-asserted-by":"crossref","unstructured":"Zhao, B., & Vogel, S. (2002). Adaptive parallel sentences mining from web bilingual news collection. In Proceedings of the 2002 IEEE International Conference on Data Mining (pp. 745\u2013748).","DOI":"10.1109\/ICDM.2002.1184044"},{"key":"9572_CR71","doi-asserted-by":"crossref","unstructured":"Zweigenbaum, P., Sharoff, S., & Rapp, R. (2017). Overview of the second BUCC shared task: Spotting parallel sentences in comparable corpora. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora (pp. 60\u201367). Association for Computational Linguistics.","DOI":"10.18653\/v1\/W17-2512"},{"key":"9572_CR47","doi-asserted-by":"crossref","unstructured":"Zweigenbaum, P., Sharoff, S., & Rapp, R. (2018). Overview of the third BUCC shared task: Spotting parallel sentences in comparable corpora. In Rapp, R., Zweigenbaum, P., and Sharoff, S. (Eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation.","DOI":"10.18653\/v1\/W17-2512"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-021-09572-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-021-09572-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-021-09572-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,17]],"date-time":"2024-09-17T10:01:33Z","timestamp":1726567293000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-021-09572-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,27]]},"references-count":71,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["9572"],"URL":"https:\/\/doi.org\/10.1007\/s10579-021-09572-2","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"value":"1574-020X","type":"print"},{"value":"1574-0218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,27]]},"assertion":[{"value":"10 December 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 January 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}