{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T03:19:36Z","timestamp":1761621576218,"version":"3.41.2"},"reference-count":37,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,4,12]],"date-time":"2021-04-12T00:00:00Z","timestamp":1618185600000},"content-version":"vor","delay-in-days":101,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Development of Shanghai Industrial Internet","award":["2019-GYHLW-01004","20200302075","2018HY020"],"award-info":[{"award-number":["2019-GYHLW-01004","20200302075","2018HY020"]}]},{"name":"Research on Social Sciences Development in Hebei Province","award":["2019-GYHLW-01004","20200302075","2018HY020"],"award-info":[{"award-number":["2019-GYHLW-01004","20200302075","2018HY020"]}]},{"name":"Marine Science Research Project of Hebei Normal University of Science & Technology","award":["2019-GYHLW-01004","20200302075","2018HY020"],"award-info":[{"award-number":["2019-GYHLW-01004","20200302075","2018HY020"]}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computational Intelligence and Neuroscience"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>Scaling natural language processing (NLP) to low\u2010resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low\u2010resource English\u2010Twi translation based on filtered synthetic\u2010parallel corpora. It is often perplexing to learn and understand what a good\u2010quality corpus looks like in low\u2010resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low\u2010resource language pairs, we propose to expand the training data by injecting synthetic\u2010parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence\u2010level similarity metrics after round\u2010trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence\u2010level similarity metrics significantly improves the original out\u2010of\u2010the\u2010box MT systems for low\u2010resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.<\/jats:p>","DOI":"10.1155\/2021\/6682385","type":"journal-article","created":{"date-parts":[[2021,4,12]],"date-time":"2021-04-12T23:17:42Z","timestamp":1618269462000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Pseudotext Injection and Advance Filtering of Low\u2010Resource Corpus for Neural Machine Translation"],"prefix":"10.1155","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4426-6137","authenticated-orcid":false,"given":"Michael","family":"Adjeisah","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8461-9473","authenticated-orcid":false,"given":"Guohua","family":"Liu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1757-9162","authenticated-orcid":false,"given":"Douglas Omwenga","family":"Nyabuga","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5761-9515","authenticated-orcid":false,"given":"Richard Nuetey","family":"Nortey","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2323-4681","authenticated-orcid":false,"given":"Jinling","family":"Song","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2021,4,12]]},"reference":[{"key":"e_1_2_9_1_2","unstructured":"MartinusL.andAbbottZ. J. Benchmarking neural machine translation for southern African languages Proceedings of the 2019 Workshop on Widening NLP August 2019 Florence Italy 98\u2013101."},{"key":"e_1_2_9_2_2","doi-asserted-by":"crossref","unstructured":"KoehnP.andKnowlesR. Six challenges for neural machine translation Proceedings of the First Workshop on Neural Machine Translation August 2017 Vancouver Canada 28\u201339.","DOI":"10.18653\/v1\/W17-3204"},{"key":"e_1_2_9_3_2","doi-asserted-by":"crossref","unstructured":"ZhangJ.andZongC. Exploiting source-side monolingual data in neural machine translation Proceedings of the EMNLP 2016-Conference on Empirical Methods in Natural Language Processing November 2016 Austin TX USA 1535\u20131545.","DOI":"10.18653\/v1\/D16-1160"},{"key":"e_1_2_9_4_2","doi-asserted-by":"crossref","unstructured":"SennrichR. HaddowB. andBirchA. Improving neural machine translation models with monolingual data 1 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics August 2016 Berlin Germany 86\u201396.","DOI":"10.18653\/v1\/P16-1009"},{"key":"e_1_2_9_5_2","doi-asserted-by":"crossref","unstructured":"LittellP. LarkinS. StewartD. SimardM. GoutteC. andLoC. Measuring sentence parallelism using Mahalanobis distances: the NRC unsupervised submissions to the WMT18 parallel corpus filtering shared task Proceedings of the Third Conference on Machine Translation: Shared Task Papers October 2018 Belgium Brussels 900\u2013907.","DOI":"10.18653\/v1\/W18-6480"},{"key":"e_1_2_9_6_2","doi-asserted-by":"crossref","unstructured":"ZhangZ. LiuS. LiM. ZhouM. andChenE. Joint training for neural machine translation models with monolingual data Proceedings of the 32nd AAAI Conference on Artificial Intelligence February 2018 New Orleans LA USA 555\u2013562.","DOI":"10.1609\/aaai.v32i1.11248"},{"key":"e_1_2_9_7_2","unstructured":"UeffingN. HaffariG. andSarkarA. Transductive learning for statistical machine translation Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics June 2007 Prague Czech Republic 25\u201332."},{"key":"e_1_2_9_8_2","unstructured":"SchwenkH. Investigations on large-scale lightly-supervised training for statistical machine translation Proceedings of the International Workshop on Spoken Language Translation October 2008 Honolulu HI USA 182\u2013189."},{"key":"e_1_2_9_9_2","doi-asserted-by":"crossref","unstructured":"BertoldiN.andFedericoM. Domain adaptation for statistical machine translation with monolingual resources Proceedings of the 4th Workshop on Statistical Machine Translation March 2009 Athens Greece 182\u2013189.","DOI":"10.3115\/1626431.1626468"},{"key":"e_1_2_9_10_2","unstructured":"HsiehA.-C. HuangH.-H. andChenH.-H. Uses of monolingual in-domain corpora for cross-domain adaptation with hybrid MT approaches Proceedings of the Second Workshop on Hybrid Approaches to Translation August 2013 Sofia Bulgaria 117\u2013122."},{"key":"e_1_2_9_11_2","unstructured":"CotterellR.andKreutzerJ. Explaining and generalizing back-translation through wake-sleep 2018 http:\/\/arxiv.org\/abs\/1806.04402."},{"key":"e_1_2_9_12_2","doi-asserted-by":"crossref","unstructured":"HoangV. C. D. KoehnP. HaffariG. andCohnT. Iterative back-translation for neural machine translation Proceedings of the 2nd Workshop on Neural Machine Translation and Generation July 2018 Melbourne Australia 18\u201324.","DOI":"10.18653\/v1\/W18-2703"},{"key":"e_1_2_9_13_2","doi-asserted-by":"crossref","unstructured":"EdunovS. OttM. AuliM. andGrangierD. Understanding back-translation at scale Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing October 2018 Brussels Belgium 489\u2013500.","DOI":"10.18653\/v1\/D18-1045"},{"key":"e_1_2_9_14_2","doi-asserted-by":"crossref","unstructured":"ImamuraK. FujitaA. andSumitaE. Enhancement of encoder and attention using target monolingual corpora in neural machine translation Proceedings of the 2nd Workshop on Neural Machine Translation and Generation July 2018 Melbourne Australia 55\u201363.","DOI":"10.18653\/v1\/W18-2707"},{"key":"e_1_2_9_15_2","doi-asserted-by":"crossref","unstructured":"NiuX. DenkowskiM. andCarpuatM. Bi-directional neural machine translation with synthetic parallel data Proceedings of the 2nd Workshop on Neural Machine Translation and Generation July 2018 Melbourne Australia 84\u201391.","DOI":"10.18653\/v1\/W18-2710"},{"key":"e_1_2_9_16_2","unstructured":"AxelrodA. HeX. andGaoJ. Domain adaptation via pseudo in-domain data selection Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing July 2011 Edinburgh UK 355\u2013362."},{"key":"e_1_2_9_17_2","unstructured":"MooreR. C.andLewisW. Intelligent selection of language model training data Proceedings of the ACL 2010 Conference Short Papers July 2010 Uppsala Sweden 220\u2013224."},{"key":"e_1_2_9_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3341726"},{"key":"e_1_2_9_19_2","doi-asserted-by":"publisher","DOI":"10.1155\/2014\/745485"},{"key":"e_1_2_9_20_2","doi-asserted-by":"crossref","unstructured":"Y\u0131ld\u0131zE. Tantu\u011fA. C. andDiriB. The effect of parallel corpus quality vs. size in English-to-Turkish SMT Proceedings of the 6th International Conference on Web Services and Semantic Technology July 2014 Chennai India 21\u201330.","DOI":"10.5121\/csit.2014.4710"},{"key":"e_1_2_9_21_2","doi-asserted-by":"crossref","unstructured":"van der WeesM. BisazzaA. andMonzC. Dynamic data selection for neural machine translation Proceedings of the EMNLP 2017-Conference on Empirical Methods in Natural Language Processing September 2017 Copenhagen Denmark 1400\u20131410.","DOI":"10.18653\/v1\/D17-1147"},{"key":"e_1_2_9_22_2","unstructured":"HeD. XiaY. QinT.et al. Dual learning for machine translation Proceedings of the Conference on Advances in Neural Information Processing Systems December 2016 Barcelona Spain 820\u2013828."},{"key":"e_1_2_9_23_2","unstructured":"ArtetxeM. LabakaG. AgirreE. andChoK. Unsupervised neural machine translation Proceedings of the International Conference on Learning Representations April 2018 Vancouver Canada."},{"key":"e_1_2_9_24_2","unstructured":"LampleG. ConneauA. DenoyerL. andRanzatoM. Unsupervised machine translation using monolingual corpora only Proceedings of the International Conference on Learning Representations April 2017 Toulon France."},{"key":"e_1_2_9_25_2","doi-asserted-by":"crossref","unstructured":"AdjeisahM. LiuG. NorteyN. R. SongJ. LampteyO. K. andFrimpongN. F. Twi corpus: a massively Twi-to-handful languages parallel bible corpus Proceedings of the 2020 IEEE International Conference on Parallel & Distributed Processing with Applications Big Data & Cloud Computing Sustainable Computing & Communications Social Computing & Networking (ISPA\/BDCloud\/SocialCom\/SustainCom) December 2020 Exeter UK 1043\u20131049.","DOI":"10.1109\/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00157"},{"key":"e_1_2_9_26_2","unstructured":"VaswaniA. ShazeerN. ParmarN.et al. Attention is all you need Proceedings of the Advances in Neural Information Processing Systems 30 December 2017 Long Beach CA USA 5998\u20136008."},{"key":"e_1_2_9_27_2","doi-asserted-by":"publisher","DOI":"10.1055\/s-0038-1623595"},{"key":"e_1_2_9_28_2","doi-asserted-by":"crossref","unstructured":"SongY.andRothD. Unsupervised sparse vector densification for short text similarity Proceedings of the NAACL HLT 2015-2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies May 2015 Denver CO USA 1275\u20131280.","DOI":"10.3115\/v1\/N15-1138"},{"key":"e_1_2_9_29_2","doi-asserted-by":"crossref","unstructured":"KleinG. KimY. DengY. SenellartJ. andRushA. M. OpenNMT: open-source toolkit for neural machine translation Proceedings of the ACL 2017-55th Annual Meeting of the Association for Computational Linguistics July 2017 Vancouver Canada 67\u201372.","DOI":"10.18653\/v1\/P17-4012"},{"key":"e_1_2_9_30_2","unstructured":"MartinusL. WebsterJ. MoonsamyJ. JnrM. S. MoosaR. andFaironR. Neural machine translation for South Africa\u2019s official languages 2020 http:\/\/arxiv.org\/abs\/2005.06609."},{"key":"e_1_2_9_31_2","doi-asserted-by":"crossref","unstructured":"Agi\u0107\u017d.andVuli\u0107I. JW300: a wide-coverage parallel corpus for low-resource languages Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics July 2019 Florence Italy 3204\u20133210.","DOI":"10.18653\/v1\/P19-1310"},{"key":"e_1_2_9_32_2","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Nitish S.","year":"2014","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_2_9_33_2","doi-asserted-by":"crossref","unstructured":"PapineniK. RoukosS. WardT. andZhuW.-J. BLEU: a method for automatic evaluation of machine translation Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) July 2002 Philadelphia PA USA 311\u2013318.","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_2_9_34_2","unstructured":"SnoverM. DorrB. SchwartzR. MicciullaL. andWeischedelR. A study of translation error rate with targeted human annotation Proceedings of the Association for Machine Transaltion in the Americas (AMTA 2006) August 2006 Cambridge MA USA."},{"key":"e_1_2_9_35_2","unstructured":"NekotoW. MarivateV. MatsilaT.et al. Participatory research for low-resourced machine translation: a case study in African languages 2020 http:\/\/arxiv.org\/abs\/2010.02353."},{"key":"e_1_2_9_36_2","doi-asserted-by":"crossref","unstructured":"LampleG. OttM. ConneauA. DenoyerL. andRanzatoM. Phrase-based & neural unsupervised machine translation Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing October 2018 Brussels Belgium 5039\u20135049.","DOI":"10.18653\/v1\/D18-1549"},{"key":"e_1_2_9_37_2","doi-asserted-by":"crossref","unstructured":"ArtetxeM. LabakaG. andAgirreE. An effective approach to unsupervised machine translation \u201c Proceedings of the ACL 2019-57th Annual Meeting of the Association for Computational Linguistics July 2019 Florence Italy 194\u2013203.","DOI":"10.18653\/v1\/P19-1019"}],"container-title":["Computational Intelligence and Neuroscience"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/6682385.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/6682385.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/6682385","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T10:50:58Z","timestamp":1722941458000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/6682385"}},"subtitle":[],"editor":[{"given":"Qiangqiang","family":"Yuan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/6682385"],"URL":"https:\/\/doi.org\/10.1155\/2021\/6682385","archive":["Portico"],"relation":{},"ISSN":["1687-5265","1687-5273"],"issn-type":[{"type":"print","value":"1687-5265"},{"type":"electronic","value":"1687-5273"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2020-11-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-03-19","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"6682385"}}