{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:09:45Z","timestamp":1760148585397,"version":"build-2065373602"},"reference-count":22,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,5,21]],"date-time":"2023-05-21T00:00:00Z","timestamp":1684627200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62266028","62241604","202001AT070046","202301AT070015","202202AE090008-3"],"award-info":[{"award-number":["62266028","62241604","202001AT070046","202301AT070015","202202AE090008-3"]}]},{"name":"Fundamental Research Project of Yunnan Province, China","award":["62266028","62241604","202001AT070046","202301AT070015","202202AE090008-3"],"award-info":[{"award-number":["62266028","62241604","202001AT070046","202301AT070015","202202AE090008-3"]}]},{"name":"Yunnan Key Research Projects","award":["62266028","62241604","202001AT070046","202301AT070015","202202AE090008-3"],"award-info":[{"award-number":["62266028","62241604","202001AT070046","202301AT070015","202202AE090008-3"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Parallel sentences play a crucial role in various NLP tasks, particularly for cross-lingual tasks such as machine translation. However, due to the time-consuming and laborious nature of manual construction, many low-resource languages still suffer from a lack of large-scale parallel data. The objective of pseudo-parallel sentence extraction is to automatically identify sentence pairs in different languages that convey similar meanings. Earlier methods heavily relied on parallel data, which is unsuitable for low-resource scenarios. The current mainstream research direction is to use transfer learning or unsupervised learning based on cross-lingual word embeddings and multilingual pre-trained models; however, these methods are ineffective for languages with substantial differences. To address this issue, we propose a sentence extraction method that leverages image information fusion to extract Chinese\u2013Vietnamese pseudo-parallel sentences from collections of bilingual texts. Our method first employs an adaptive image and text feature fusion strategy to efficiently extract the bilingual parallel sentence pair, and then, a multimodal fusion method is presented to balance the information between the image and text modalities. The experiments on multiple benchmarks show that our method achieves promising results compared to a competitive baseline by infusing additional external image information.<\/jats:p>","DOI":"10.3390\/info14050298","type":"journal-article","created":{"date-parts":[[2023,5,22]],"date-time":"2023-05-22T02:00:27Z","timestamp":1684720827000},"page":"298","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Chinese\u2013Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1760-7231","authenticated-orcid":false,"given":"Yonghua","family":"Wen","sequence":"first","affiliation":[{"name":"Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China"},{"name":"School of mathematics and computer science, Yunnan Minzu University, Kunming 650500, China"}]},{"given":"Junjun","family":"Guo","sequence":"additional","affiliation":[{"name":"Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China"}]},{"given":"Zhiqiang","family":"Yu","sequence":"additional","affiliation":[{"name":"Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China"},{"name":"School of mathematics and computer science, Yunnan Minzu University, Kunming 650500, China"}]},{"given":"Zhengtao","family":"Yu","sequence":"additional","affiliation":[{"name":"Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,5,21]]},"reference":[{"key":"ref_1","unstructured":"Koehn, P., and Knowles, R. (August, January 20). Six Challenges for Neural Machine Translation. Proceedings of the First Workshop on Neural Machine Translation, Vancouver, BC, Canada."},{"key":"ref_2","unstructured":"Gr\u00e9goire, F., and Langlais, P. (2018, January 20\u201326). Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation. Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"78928","DOI":"10.1109\/ACCESS.2022.3186978","article-title":"A Method of Chinese-Vietnamese Bilingual Corpus Construction for Machine Translation","volume":"10","author":"Tran","year":"2022","journal-title":"IEEE Access"},{"key":"ref_4","unstructured":"Smith, J.R., Quirk, C., and Toutanova, K. (2010, January 2\u20134). Extracting parallel sentences from comparable corpora using document level alignment. Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technologies, Los Angeles, CA, USA."},{"key":"ref_5","unstructured":"Karimi, A., Ansari, E., and Bigham, B.S. (2018, January 7\u201312). Extracting an English-Persian parallel corpus from comparable corpora. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan."},{"key":"ref_6","unstructured":"Marie, B., and Fujita, A. (August, January 20). Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2): Short Papers, Vancouver, BC, Canada."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1340","DOI":"10.1109\/JSTSP.2017.2764273","article-title":"An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification","volume":"11","author":"Varga","year":"2017","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_8","unstructured":"Hangya, V., Braune, F., and Kalasouskaya, Y. (August, January 28). Unsupervised Parallel Sentence Extraction from Comparable Corpora. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_9","first-page":"5296946","article-title":"Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision","volume":"2022","author":"Xiayang","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_10","first-page":"64","article-title":"Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs","volume":"22","author":"Shaolin","year":"2023","journal-title":"ACM Trans. Asian Low Resour. Lang. Inf. Process."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Sun, Y., Zhu, S., Yifan, F., and Mi, C. (2021, January 6\u201311). Parallel sentences mining with transfer learning in an unsupervised setting. Proceedings of the 2021 Conference of North American Chapter of the Association for Computational Linguistics (NAACL\u20192021), Mexico City, Mexico.","DOI":"10.18653\/v1\/2021.naacl-srw.17"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kvapil\u00edkov\u00e1, I., Artetxe, M., Labaka, G., Agirre, E., and Bojar, O. (2020, January 5\u201310). Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL\u20192020), Online.","DOI":"10.18653\/v1\/2020.acl-srw.34"},{"key":"ref_13","first-page":"2207","article-title":"Cross-lingual retrieval for iterative self-supervised training","volume":"33","author":"Tran","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_14","first-page":"423","article-title":"Multimodal Machine Learning: A Survey and Taxonomy","volume":"41","author":"Ahuja","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yao, S., and Wan, X. (2020, January 5\u201310). Multimodal Transformer for Multimodal Machine Translation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL\u20192020), Online.","DOI":"10.18653\/v1\/2020.acl-main.400"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Caglayan, O., Kuyu, M., Amac, M.S., Madhyastha, P., Erdem, E., Erdem, A., and Lucia, S. (2021, January 19\u201323). Cross-lingual Visual Pre-training for Multimodal Machine Translation. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online.","DOI":"10.18653\/v1\/2021.eacl-main.112"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1970","DOI":"10.1109\/TASLP.2019.2937190","article-title":"Neural machine translation with sentence-level topic context","volume":"27","author":"Chen","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_18","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), Long Beach, CA, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_20","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201324). Learning Transferable Visual Models from Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning (ICML\u20192021), Online."},{"key":"ref_21","unstructured":"Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm\u00e1n, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (August, January 18). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_22","unstructured":"Kingma, D., and Ba, J. (2015, January 7\u20139). Adam: A method for stochastic optimization. Proceedings of the 3th International Conference on Learning Representations, San Diego, CA, USA."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/5\/298\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:39:33Z","timestamp":1760125173000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/5\/298"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,21]]},"references-count":22,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["info14050298"],"URL":"https:\/\/doi.org\/10.3390\/info14050298","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2023,5,21]]}}}