{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,9,22]],"date-time":"2022-09-22T02:28:50Z","timestamp":1663813730148},"reference-count":20,"publisher":"World Scientific Pub Co Pte Ltd","issue":"04","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. As. Lang. Proc."],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:p> Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus. <\/jats:p>","DOI":"10.1142\/s2717554520500174","type":"journal-article","created":{"date-parts":[[2021,5,18]],"date-time":"2021-05-18T11:32:37Z","timestamp":1621337557000},"page":"2050017","source":"Crossref","is-referenced-by-count":1,"title":["Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation"],"prefix":"10.1142","volume":"30","author":[{"given":"Rashmini","family":"Naranpanawa","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, University of Moratuwa, Katubedda 10400, Sri Lanka"}]},{"given":"Ravinga","family":"Perera","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Moratuwa, Katubedda 10400, Sri Lanka"}]},{"given":"Thilakshi","family":"Fonseka","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Moratuwa, Katubedda 10400, Sri Lanka"}]},{"given":"Uthayasanker","family":"Thayasivam","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Moratuwa, Katubedda 10400, Sri Lanka"}]}],"member":"219","published-online":{"date-parts":[[2021,5,17]]},"reference":[{"key":"S2717554520500174BIB002","doi-asserted-by":"publisher","DOI":"10.1109\/IALP.2018.8629113"},{"key":"S2717554520500174BIB003","first-page":"128","volume-title":"3rd Int. Conf. Linguistics in Sri Lanka (ICLSL)","author":"Shanmugarasa Y.","year":"2017"},{"key":"S2717554520500174BIB004","doi-asserted-by":"publisher","DOI":"10.1109\/NITC.2018.8550069"},{"key":"S2717554520500174BIB005","volume-title":"31st Conf. Neural Information Processing Systems (NIPS)","author":"Vaswani A.","year":"2017"},{"key":"S2717554520500174BIB006","first-page":"305","volume-title":"Int. Conf. on Asian Language Processing (IALP)","author":"Fonseka T.","year":"2020"},{"key":"S2717554520500174BIB007","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"key":"S2717554520500174BIB008","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1007"},{"key":"S2717554520500174BIB009","first-page":"311","volume-title":"Proc. 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni K.","year":"2002"},{"key":"S2717554520500174BIB010","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"S2717554520500174BIB011","doi-asserted-by":"publisher","DOI":"10.1142\/S0218488598000094"},{"key":"S2717554520500174BIB013","first-page":"3104","volume-title":"Proc. 27th Int. Conf. on Neural Information Processing Systems","volume":"2","author":"Sutskever I.","year":"2014"},{"key":"S2717554520500174BIB014","volume-title":"3rd Int. Conf. Learning Representations (ICLR)","author":"Bahdanau D.","year":"2015"},{"key":"S2717554520500174BIB015","volume-title":"32nd Pacific Asia Conf. on Language, Information and Computation: 5th Workshop on Asian Translation","author":"Sen S.","year":"2018"},{"key":"S2717554520500174BIB016","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1632"},{"key":"S2717554520500174BIB019","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.170"},{"key":"S2717554520500174BIB020","doi-asserted-by":"publisher","DOI":"10.3115\/1557769.1557821"},{"key":"S2717554520500174BIB021","doi-asserted-by":"publisher","DOI":"10.1109\/MERCon.2018.8421901"},{"key":"S2717554520500174BIB022","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-4009"},{"key":"S2717554520500174BIB023","first-page":"66","volume-title":"Proc. 2018 Conf. Empirical Methods in Natural Language Processing (System Demonstrations)","author":"Kudo T."},{"key":"S2717554520500174BIB024","volume-title":"3rd Int. Conf. Learning Representations (ICLR)","author":"Kingma D. P.","year":"2015"}],"container-title":["International Journal of Asian Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S2717554520500174","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,6,10]],"date-time":"2021-06-10T04:27:36Z","timestamp":1623299256000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S2717554520500174"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":20,"journal-issue":{"issue":"04","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["10.1142\/S2717554520500174"],"URL":"https:\/\/doi.org\/10.1142\/s2717554520500174","relation":{},"ISSN":["2717-5545","2424-791X"],"issn-type":[{"value":"2717-5545","type":"print"},{"value":"2424-791X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12]]}}}