{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T13:39:11Z","timestamp":1762177151550,"version":"build-2065373602"},"reference-count":46,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,1,19]],"date-time":"2019-01-19T00:00:00Z","timestamp":1547856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61876190"],"award-info":[{"award-number":["61876190"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>This study presents linguistically augmented models of phrase-based statistical machine translation (PBSMT) using different linguistic features (factors) on the top of the source surface form. The architecture addresses two major problems occurring in machine translation, namely the poor performance of direct translation from a highly-inflected and morphologically complex language into morphologically poor languages, and the data sparseness issue, which becomes a significant challenge under low-resource conditions. We use three factors (lemma, part-of-speech tags, and morphological features) to enrich the input side with additional information to improve the quality of direct translation from Arabic to Chinese, considering the importance and global presence of this language pair as well as the limitation of work on machine translation between these two languages. In an effort to deal with the issue of the out of vocabulary (OOV) words and missing words, we propose the best combination of factors and models based on alternative paths. The proposed models were compared with the standard PBSMT model which represents the baseline of this work, and two enhanced approaches tokenized by a state-of-the-art external tool that has been proven to be useful for Arabic as a morphologically rich and complex language. The experiment was performed with a Moses decoder on freely available data extracted from a multilingual corpus from United Nation documents (MultiUN). Results of a preliminary evaluation in terms of BLEU scores show that the use of linguistic features on the Arabic side considerably outperforms baseline and tokenized approaches, the system can consistently reduce the OOV rate as well.<\/jats:p>","DOI":"10.3390\/fi11010022","type":"journal-article","created":{"date-parts":[[2019,1,22]],"date-time":"2019-01-22T03:08:22Z","timestamp":1548126502000},"page":"22","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Improved Arabic\u2013Chinese Machine Translation with Linguistic Input Features"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3957-8622","authenticated-orcid":false,"given":"Fares","family":"Aqlan","sequence":"first","affiliation":[{"name":"School of Information Science and Engineering, Central South University, Changsha 410083, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0172-4070","authenticated-orcid":false,"given":"Xiaoping","family":"Fan","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Central South University, Changsha 410083, China"},{"name":"Academy of Financial and Economic Big Data, Hunan University of Finance and Economics (HUFE), Changsha 410205, China"}]},{"given":"Abdullah","family":"Alqwbani","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Central South University, Changsha 410083, China"}]},{"given":"Akram","family":"Al-Mansoub","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, South China University of Technology (SCUT), Guangzhou 510006, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,1,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1515\/pralin-2017-0013","article-title":"Is Neural Machine Translation the New State of the Art?","volume":"108","author":"Castilho","year":"2017","journal-title":"Prague Bull. Math. Linguist."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1162\/0891201042544884","article-title":"The Alignment Template Approach to Statistical Machine Translation","volume":"30","author":"Och","year":"2004","journal-title":"Comput. Linguist."},{"key":"ref_3","unstructured":"Mehay, D.N., and Brew, C. (2012, January 7\u20138). CCG syntactic reordering models for phrase-based machine translation. Proceedings of the Seventh Workshop on Statistical Machine Translation, Montreal, QC, Canada."},{"key":"ref_4","unstructured":"Koehn, P., and Hoang, H. (2007, January 28\u201330). Factored translation models. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Salerno, V.M., and Rabbeni, G. (2018). An Extreme Learning Machine Approach to Effective Energy Disaggregation. Electronics, 7.","DOI":"10.20944\/preprints201808.0551.v1"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1959","DOI":"10.1109\/TNNLS.2016.2550532","article-title":"Adaptation to New Microphones Using Artificial Neural Networks With Trainable Activation Functions","volume":"28","author":"Siniscalchi","year":"2017","journal-title":"IEEE Trans. Neural. Netw. Learn. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, A., Wang, H., Li, S., Cui, Y., Liu, Z., Yang, G., and Hu, J. (2018). Transfer Learning with Deep Recurrent Neural Networks for Remaining Useful Life Estimation. Appl. Sci., 8.","DOI":"10.3390\/app8122416"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Anastasopoulos, A., and Chiang, D. (2017, January 6\u20137). A case study on using speech-to-translation alignments for language documentation. Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages, Honolulu, HI, USA.","DOI":"10.18653\/v1\/W17-0123"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Adda, G., St\u00fcker, S., Adda-Decker, M., Ambouroue, O., Besacier, L., Blachon, D., Bonneau-Maynard, H., Godard, P., Hamlaoui, F., and Idiatov, D. (2016, January 9\u201312). Breaking the unwritten language barrier: The BULB project. Proceedings of the SLTU (Spoken Language Technologies for Under-Resourced Languages), Yogyakarta, Indonesia.","DOI":"10.1016\/j.procs.2016.04.023"},{"key":"ref_10","unstructured":"Post, M., Kumar, G., Lopez, A., Karakos, D., Callison-Burch, C., and Khudanpur, S. (2013, January 5\u20136). Improved speech-to-text translation with the Fisher and Callhome Spanish\u2013English speech translation corpus. Proceedings of the IWSLT, Heidelberg, Germany."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1109\/TASLP.2016.2621659","article-title":"ASR for Under-Resourced Languages from Probabilistic Transcription","volume":"25","author":"Jyothi","year":"2017","journal-title":"IEEE\/ACM Trans. Audio Speech Lang."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sadat, F., and Habash, N. (2006, January 20). Combination of Arabic preprocessing schemes for statistical machine translation. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia.","DOI":"10.3115\/1220175.1220176"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Habash, N. (2008, January 16\u201317). Four techniques for online handling of out-of-vocabulary words in Arabic-English statistical machine translation. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, Columbus, OH, USA.","DOI":"10.3115\/1557690.1557706"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, S., Zhao, Z., Hu, R., Li, W., Liu, T., and Du, X. (arXiv, 2018). Analogical reasoning on Chinese morphological and semantic relations, arXiv.","DOI":"10.18653\/v1\/P18-2023"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Koehn, P., and Knowles, R. (arXiv, 2017). Six challenges for neural machine translation, arXiv.","DOI":"10.18653\/v1\/W17-3204"},{"key":"ref_16","unstructured":"Almahairi, A., Cho, K., Habash, N., and Courville, A. (arXiv, 2016). First result on Arabic neural machine translation, arXiv."},{"key":"ref_17","unstructured":"Belinkov, Y., and Glass, J. (2016, January 1). Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results. Proceedings of the Workshop on Semitic Machine Translation, Austin, TX, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Habash, N., and Hu, J. (2009, January 30\u201331). Improving Arabic-Chinese statistical machine translation using English as pivot language. Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, Greece.","DOI":"10.3115\/1626431.1626467"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1007\/s10590-011-9103-z","article-title":"Machine Translation between Hebrew and Arabic","volume":"26","author":"Shilon","year":"2012","journal-title":"Mach. Transl."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"666","DOI":"10.3923\/itj.2010.666.672","article-title":"Arabic-Chinese and Chinese-Arabic Phrase-Based Statistical Machine Translation Systems","volume":"9","author":"Ghurab","year":"2010","journal-title":"Inf. Technol. J."},{"key":"ref_21","unstructured":"Junczys-Dowmunt, M., Dwojak, T., and Hoang, H. (2016, January 8\u20139). Is neural machine translation ready for deployment? A case study on 30 translation directions. Proceedings of the IWSLT, Seattle, WA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1515\/pralin-2017-0025","article-title":"Optimizing Tokenization Choice for Machine Translation across Multiple Target Languages","volume":"108","author":"Zalmout","year":"2017","journal-title":"Prague Bull. Math. Linguist."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1260","DOI":"10.1109\/TASL.2008.925870","article-title":"Syntactically Lexicalized Phrase-Based SMT","volume":"16","author":"Hassan","year":"2008","journal-title":"IEEE Trans. Audio Speech Lang."},{"key":"ref_24","unstructured":"Avramidis, E., and Koehn, P. (2008, January 15\u201320). Enriching morphologically poor languages for statistical machine translation. Proceedings of the ACL\/HLT, Columbus, OH, USA."},{"key":"ref_25","unstructured":"Bhattacharyya, P. (arXiv, 2017). Role of Morphology Injection in Statistical Machine Translation, arXiv."},{"key":"ref_26","first-page":"1","article-title":"Comparing Standard and Factored Models in Statistical Machine Translation from English to Slovene Using the Moses System","volume":"5","author":"Krek","year":"2018","journal-title":"Slov. 2.0 Empir. Appl. Interdiscip. Res."},{"key":"ref_27","unstructured":"Durrani, N., Haddow, B., Heafield, K., and Koehn, P. (2013, January 8\u20139). Edinburgh\u2019s machine translation systems for European language pairs. Proceedings of the Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Badr, I., Zbib, R., and Glass, J. (2008, January 16\u201317). Segmentation for English-to-Arabic statistical machine translation. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, Columbus, OH, USA.","DOI":"10.3115\/1557690.1557732"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"3071","DOI":"10.1007\/s13369-016-2075-9","article-title":"A Novel Approach by Injecting CCG Supertags into an Arabic\u2013English Factored Translation Machine","volume":"41","author":"Rajeh","year":"2016","journal-title":"Arab. J. Sci. Eng."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, S., Wong, D.F., and Chao, L.S. (2012, January 15\u201317). Korean-Chinese statistical translation model. Proceedings of the 2012 International Conference on Machine Learning and Cybernetics (ICMLC), Xi\u2019an, China.","DOI":"10.1109\/ICMLC.2012.6359022"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sennrich, R., and Haddow, B. (2016, January 11\u201312). Linguistic input features improve neural machine translation. Proceedings of the First Conference on Machine Translation, Berlin, Germany.","DOI":"10.18653\/v1\/W16-2209"},{"key":"ref_32","unstructured":"Garc\u00eda-Mart\u00ednez, M., Barrault, L., and Bougares, F. (arXiv, 2016). Factored neural machine translation, arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1515\/pralin-2017-0018","article-title":"Pre-Reordering for Neural Machine Translation: Helpful or Harmful?","volume":"108","author":"Du","year":"2017","journal-title":"Prague Bull. Math. Linguist."},{"key":"ref_34","unstructured":"Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., and Zens, R. (2007, January 25\u201327). Moses: Open source toolkit for statistical machine translation. Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, Czech Republic."},{"key":"ref_35","unstructured":"Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., and Roth, R. (2014, January 26\u201331). MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic. Proceedings of the LREC, Reykjavik, Iceland."},{"key":"ref_36","unstructured":"Sajjad, H., Guzm\u00e1n, F., Nakov, P., Abdelali, A., Murray, K., Al Obaidli, F., and Vogel, S. (2013, January 5\u20136). QCRI at IWSLT 2013: Experiments in Arabic-English and English-Arabic spoken language translation. Proceedings of the 10th International Workshop on Spoken Language Technology (IWSLT), Heidelberg, Germany."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Habash, N., Soudi, A., and Buckwalter, T. (2007). On Arabic Transliteration. Arabic Computational Morphology, Springer.","DOI":"10.1007\/978-1-4020-6046-5_2"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Aldarmaki, H., and Diab, M. (2015, January 30). Robust part-of-speech tagging of Arabic text. Proceedings of the Second Workshop on Arabic Natural Language Processing, Beijing, China.","DOI":"10.18653\/v1\/W15-3222"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Birch, A., Osborne, M., and Koehn, P. (2007, January 23). CCG supertags in factored statistical machine translation. Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic.","DOI":"10.3115\/1626355.1626357"},{"key":"ref_40","unstructured":"Eisele, A., and Chen, Y. (2010, January 17\u201323). MultiUN: A Multilingual Corpus from United Nation Documents. Proceedings of the LREC, Valletta, Malta."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Chang, P.-C., Galley, M., and Manning, C.D. (2008, January 19). Optimizing Chinese word segmentation for machine translation performance. Proceedings of the Third Workshop on Statistical Machine Translation, Columbus, OH, USA.","DOI":"10.3115\/1626394.1626430"},{"key":"ref_42","unstructured":"Tseng, H., Chang, P., Andrew, G., Jurafsky, D., and Manning, C. (2005, January 14\u201315). A conditional random field word segmenter for sighan bakeoff 2005. Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1162\/089120103321337421","article-title":"A Systematic Comparison of Various Statistical Alignment Models","volume":"29","author":"Och","year":"2003","journal-title":"Comput. Linguist."},{"key":"ref_44","unstructured":"Heafield, K., Pouzyrevsky, I., Clark, J.H., and Koehn, P. (2013, January 4\u20139). Scalable modified Kneser-Ney language model estimation. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002, January 7\u201312). BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Stroudsburg, PA, USA.","DOI":"10.3115\/1073083.1073135"},{"key":"ref_46","unstructured":"Koehn, P. (2004, January 25\u201326). Statistical significance tests for machine translation evaluation. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona, Spain."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/11\/1\/22\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:27:23Z","timestamp":1760185643000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/11\/1\/22"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,19]]},"references-count":46,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,1]]}},"alternative-id":["fi11010022"],"URL":"https:\/\/doi.org\/10.3390\/fi11010022","relation":{},"ISSN":["1999-5903"],"issn-type":[{"type":"electronic","value":"1999-5903"}],"subject":[],"published":{"date-parts":[[2019,1,19]]}}}