{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T20:22:32Z","timestamp":1776975752687,"version":"3.51.4"},"reference-count":61,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2021,10,4]],"date-time":"2021-10-04T00:00:00Z","timestamp":1633305600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2023,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of formal concept analysis to first construct concept vectors which can be used to enhance comparable corpora through clustering techniques. We then show how one can extract bilingual lexicons of improved quality from these enhanced corpora. We finally show that the bilingual lexicons obtained can complement existing bilingual dictionaries and improve cross-language information retrieval systems.<\/jats:p>","DOI":"10.1017\/s135132492100022x","type":"journal-article","created":{"date-parts":[[2021,10,4]],"date-time":"2021-10-04T08:24:51Z","timestamp":1633335891000},"page":"138-161","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":1,"title":["Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis"],"prefix":"10.1017","volume":"29","author":[{"given":"Mohamed","family":"Chebel","sequence":"first","affiliation":[]},{"given":"Chiraz","family":"Latiri","sequence":"additional","affiliation":[]},{"given":"Eric","family":"Gaussier","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2021,10,4]]},"reference":[{"key":"S135132492100022X_ref17","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007974605290"},{"key":"S135132492100022X_ref52","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"S135132492100022X_ref7","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-57454-7_46"},{"key":"S135132492100022X_ref59","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1268"},{"key":"S135132492100022X_ref61","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1179"},{"key":"S135132492100022X_ref3","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956891"},{"key":"S135132492100022X_ref10","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/677"},{"key":"S135132492100022X_ref4","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45486-1_4"},{"key":"S135132492100022X_ref56","unstructured":"Tamura, A. , Watanabe, T. and Sumita, E. (2012). Bilingual lexicon extraction from comparable corpora using label propagation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012. Association for Computational Linguistics, pp. 24\u201336."},{"key":"S135132492100022X_ref5","unstructured":"Chandar, A.P.S. , Lauly, S. , Larochelle, H. , Khapra, M.M. , Ravindran, B. , Raykar, V. and Saha, A. (2014). An autoencoder approach to learning bilingual word representations. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2014, Cambridge, MA, USA: MIT Press, pp. 1853\u20131861."},{"key":"S135132492100022X_ref8","unstructured":"Chiao, Y. and Zweigenbaum, P. (2003). The effect of a general lexicon in corpus-based identification of French-English medical word translations. In The New Navigators: from Professionals to Patients - Proceedings of MIE2003, Saint Malo, France, pp. 397\u2013402."},{"key":"S135132492100022X_ref58","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2014.08.003"},{"key":"S135132492100022X_ref48","doi-asserted-by":"publisher","DOI":"10.3115\/981658.981709"},{"key":"S135132492100022X_ref21","unstructured":"Haghighi, A. , Liang, P. , Berg-Kirkpatrick, T. and Klein, D. (2008). Learning bilingual lexicons from monolingual corpora. In ACL 2008, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, June 15\u201320, 2008, Columbus, Ohio, USA, pp. 771\u2013779."},{"key":"S135132492100022X_ref26","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-018-2245-8"},{"key":"S135132492100022X_ref27","unstructured":"Irvine, A. and Callison-Burch, C. (2013). Supervised bilingual lexicon induction with multiple monolingual signals. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9\u201314, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pp. 518\u2013523."},{"key":"S135132492100022X_ref28","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00284"},{"key":"S135132492100022X_ref29","unstructured":"Jagarlamudi, J. , Udupa, R. , Daum\u00e9, H. III and Bhole A. (2011). Improving bilingual projections via sparse covariance matrices. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27\u201331 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 930\u2013940."},{"key":"S135132492100022X_ref44","doi-asserted-by":"crossref","unstructured":"Pennington, J. , Socher, R. and Manning, C. (2014), October. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 1532\u20131543.","DOI":"10.3115\/v1\/D14-1162"},{"key":"S135132492100022X_ref20","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1219022"},{"key":"S135132492100022X_ref9","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2004.07.015"},{"key":"S135132492100022X_ref14","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-49478-2_1"},{"key":"S135132492100022X_ref6","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2015.08.176"},{"key":"S135132492100022X_ref31","doi-asserted-by":"crossref","unstructured":"Langlais, P. and Jakubina, L. (2017). Reranking translation candidates produced by several bilingual word similarity sources. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3\u20137, 2017, Volume 2: Short Papers, pp. 605\u2013611.","DOI":"10.18653\/v1\/E17-2096"},{"key":"S135132492100022X_ref34","unstructured":"Li, B. and Gaussier, \u00c9. (2010). Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 23\u201327 August 2010, Beijing, China, pp. 644\u2013652."},{"key":"S135132492100022X_ref13","doi-asserted-by":"publisher","DOI":"10.3115\/981658.981690"},{"key":"S135132492100022X_ref36","unstructured":"Li, B. , Gaussier, \u00c9. and Aizawa, A.N. (2011). Clustering comparable corpora for bilingual lexicon extraction. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA - Short Papers, pp. 473\u2013478."},{"key":"S135132492100022X_ref25","unstructured":"Hazem, A. and Morin, E. (2018). Leveraging meta-embeddings for bilingual lexicon extraction from specialized comparable corpora. In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20\u201326, 2018, pp. 937\u2013949."},{"key":"S135132492100022X_ref40","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324916000140"},{"key":"S135132492100022X_ref47","unstructured":"Prochasson, E. , Morin, E. and Kageura, K. (2009, August). Anchor points for bilingual lexicon extraction from small comparable corpora. In Machine Translation Summit, France, pp. 8."},{"key":"S135132492100022X_ref38","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071"},{"key":"S135132492100022X_ref37","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-3405"},{"key":"S135132492100022X_ref42","doi-asserted-by":"crossref","unstructured":"Otero, P.G. (2008). Comparing window and syntax based strategies for semantic extraction. In Computational Processing of the Portuguese Language, 8th International Conference, PROPOR 2008, Aveiro, Portugal, September 8\u201310, 2008, Proceedings, pp. 41\u201350.","DOI":"10.1007\/978-3-540-85980-2_5"},{"key":"S135132492100022X_ref23","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28601-8_8"},{"key":"S135132492100022X_ref49","doi-asserted-by":"publisher","DOI":"10.3115\/1034678.1034756"},{"key":"S135132492100022X_ref11","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/E14-1049"},{"key":"S135132492100022X_ref51","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10888-9_11"},{"key":"S135132492100022X_ref57","doi-asserted-by":"publisher","DOI":"10.1613\/jair.4986"},{"key":"S135132492100022X_ref50","first-page":"143","volume-title":"Chapter Relevance Weighting of Search Terms","author":"Robertson","year":"1988"},{"key":"S135132492100022X_ref46","unstructured":"Prochasson, E. and Fung, P. (2011). Rare word translation extraction from aligned comparable documents. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19\u201324 June, 2011, Portland, Oregon, USA, pp. 1327\u20131335."},{"key":"S135132492100022X_ref33","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2004.06.012"},{"key":"S135132492100022X_ref16","doi-asserted-by":"crossref","unstructured":"Fung, P. and Lo, Y.Y. (1998). An IR approach for translating new words from nonparallel, comparable texts. In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, COLING-ACL 1998, August 10\u201314, 1998, Universit\u00e9 de Montr\u00e9al, Montr\u00e9al, Quebec, Canada. Proceedings of the Conference, pp. 414\u2013420.","DOI":"10.3115\/980451.980916"},{"key":"S135132492100022X_ref18","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-59830-2"},{"key":"S135132492100022X_ref32","unstructured":"Laroche, A. and Langlais, P. (2010). Revisiting context-based projection methods for term-translation spotting in comparable corpora. In COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 23-27 August 2010, Beijing, China, pp. 617\u2013625."},{"key":"S135132492100022X_ref35","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28997-2_24"},{"key":"S135132492100022X_ref55","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1072"},{"key":"S135132492100022X_ref19","doi-asserted-by":"publisher","DOI":"10.3115\/1596374.1596397"},{"key":"S135132492100022X_ref24","unstructured":"Hazem, A. and Morin, E. (2017). Bilingual word embeddings for bilingual terminology extraction from specialized comparable corpora. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, pp. 685\u2013693."},{"key":"S135132492100022X_ref43","doi-asserted-by":"publisher","DOI":"10.1007\/s10590-007-9029-7"},{"key":"S135132492100022X_ref15","unstructured":"Fung, P. and Cheung, P. (2004). Mining very-non-parallel corpora: Parallel sentence and lexicon extraction via bootstrapping and E. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, A meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL 2004, 25\u201326 July 2004, Barcelona, Spain, pp. 57\u201363."},{"key":"S135132492100022X_ref12","unstructured":"Fayyad, U.M. and Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In IJCAI, pp. 1022\u20131029."},{"key":"S135132492100022X_ref54","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220444"},{"key":"S135132492100022X_ref22","doi-asserted-by":"publisher","DOI":"10.1145\/335191.335372"},{"key":"S135132492100022X_ref39","unstructured":"Mikolov, T. , Sutskever, I. , Chen, K. , Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a Meeting held December 5\u20138, 2013, Lake Tahoe, Nevada, United States, pp. 3111\u20133119."},{"key":"S135132492100022X_ref45","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290957"},{"key":"S135132492100022X_ref30","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.560"},{"key":"S135132492100022X_ref53","unstructured":"Savoy, J. (2003). Report on CLEF-2003 multilingual tracks. In Working Notes for CLEF 2003 Workshop co-located with the 7th European Conference on Digital Libraries (ECDL 2003), Trondheim, Norway, August 21\u201322, 2003."},{"key":"S135132492100022X_ref1","first-page":"80","volume-title":"Effective Use of Dependency Structure for Bilingual Lexicon Creation","author":"Andrade","year":"2011"},{"key":"S135132492100022X_ref60","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2005.60"},{"key":"S135132492100022X_ref2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1250"},{"key":"S135132492100022X_ref41","doi-asserted-by":"publisher","DOI":"10.1162\/0891201042544884"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S135132492100022X","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T09:05:42Z","timestamp":1675155942000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S135132492100022X\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,4]]},"references-count":61,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1]]}},"alternative-id":["S135132492100022X"],"URL":"https:\/\/doi.org\/10.1017\/s135132492100022x","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,4]]},"assertion":[{"value":"\u00a9 The Author(s), 2021. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}}]}}