{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:00:12Z","timestamp":1757617212113,"version":"3.44.0"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"25","license":[{"start":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T00:00:00Z","timestamp":1729036800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T00:00:00Z","timestamp":1729036800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Multimed Tools Appl"],"DOI":"10.1007\/s11042-024-20277-w","type":"journal-article","created":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T05:03:04Z","timestamp":1729054984000},"page":"29715-29732","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["BPE and morphologically segmented phrase based statistical machine translation system for Indian languages to resource constrained language Bodo"],"prefix":"10.1007","volume":"84","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1669-1726","authenticated-orcid":false,"given":"Sanjib","family":"Narzary","sequence":"first","affiliation":[]},{"given":"Maharaj","family":"Brahma","sequence":"additional","affiliation":[]},{"given":"Sukumar","family":"Nandi","sequence":"additional","affiliation":[]},{"given":"Bidisha","family":"Som","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,10,16]]},"reference":[{"key":"20277_CR1","unstructured":"Census (2011) Comparative speakers\u2019 strength of languages and mother tongues - 1971, 1981, 1991, 2001 and 2011. Census of India 2011"},{"key":"20277_CR2","unstructured":"Narzary S, Brahma M, Narzary M, Muchahary G, Singh PK, Senapati A, Nandi S, Som B (2022) Generating monolingual dataset for low resource language Bodo from old books using Google keep. In: Proceedings of the thirteenth language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 6563\u20136570. https:\/\/aclanthology.org\/2022.lrec-1.705"},{"key":"20277_CR3","unstructured":"Wary A (2017) Revisiting bodo medium secondary school education in udalguri. JTICI, 1\u201328"},{"key":"20277_CR4","unstructured":"Brahma B, Barman AK, Sarma SK, Boro B (2012) Corpus building of literary lesser rich language-Bodo: insights and challenges. In: Proceedings of the 10th workshop on asian language resources. The COLING 2012 Organizing Committee, Mumbai, India, pp 29\u201334. https:\/\/aclanthology.org\/W12-5204"},{"key":"20277_CR5","unstructured":"NLLB Team, Costa-juss\u00e1 MR, Cross J, \u00c7elebi O, Elbayad M, Heafield K, Heffernan K, Kalbassi E, Lam J, Licht D, Maillard J, Sun A, Wang S, Wenzek G, Youngblood A, Akula B, Barrault L, Mejia-Gonzalez G, Hansanti P, Hoffman J, Jarrett S, Sadagopan KR, Rowe D, Spruit S, Tran C, Andrews P, Ayan NF, Bhosale S, Edunov S, Fan A, Gao C, Goswami V, Guzm\u00e1n F, Koehn P, Mourachko A, Ropers C, Saleem S, Schwenk H, Wang J (2022) No language left behind: scaling human-centered machine translation"},{"key":"20277_CR6","unstructured":"Zhang Y, Han W, Qin J, Wang Y, Bapna A, Chen Z, Chen N, Li B, Axelrod V, Wang G, Meng Z, Hu K, Rosenberg A, Prabhavalkar R, Park DS, Haghani P, Riesa J, Perng G, Soltau H, Strohman T, Ramabhadran B, Sainath T, Moreno P, Chiu C-C, Schalkwyk J, Beaufays F, Wu Y (2023) Google USM: scaling automatic speech recognition beyond 100 languages"},{"key":"20277_CR7","doi-asserted-by":"publisher","unstructured":"Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp 311\u2013318. https:\/\/doi.org\/10.3115\/1073083.1073135. https:\/\/aclanthology.org\/P02-1040","DOI":"10.3115\/1073083.1073135"},{"key":"20277_CR8","unstructured":"Ramanathan A, Hegde J, Shah R, Bhattacharyya P, Sasikumar M (2008) Simple syntactic and morphological processing can help english-hindi statistical machine translation. In: Proceedings of the third international joint conference on natural language processing: volume-I"},{"key":"20277_CR9","unstructured":"Bhattacharyya P (2010) IndoWordNet. In: Proceedings of the seventh international conference on language resources and evaluation (LREC\u201910). European Language Resources Association (ELRA), Valletta, Malta. http:\/\/www.lrec-conf.org\/proceedings\/lrec2010\/pdf\/939_Paper.pdf"},{"key":"20277_CR10","unstructured":"Bisazza A, Federico M (2009) Morphological pre-processing for turkish to english statistical machine translation. In: Proceedings of the 6th international workshop on spoken language translation: papers"},{"issue":"08","key":"20277_CR11","first-page":"2749","volume":"2","author":"P Unnikrishnan","year":"2010","unstructured":"Unnikrishnan P, Antony P, Soman K (2010) A novel approach for english to south dravidian language statistical machine translation system. Int J Comput Sci Eng (IJCSE) 2(08):2749\u20132759","journal-title":"Int J Comput Sci Eng (IJCSE)"},{"issue":"5","key":"20277_CR12","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1111\/j.1749-818X.2011.00274.x","volume":"5","author":"M Hearne","year":"2011","unstructured":"Hearne M, Way A (2011) Statistical machine translation: a guide for linguists and translators. Lang Linguist Compass 5(5):205\u2013226","journal-title":"Lang Linguist Compass"},{"key":"20277_CR13","unstructured":"Banerjee T, Kunchukuttan A, Bhattacharya P (2018) Multilingual Indian language translation system at WAT 2018: many-to-one phrase-based SMT. In: Proceedings of the 32nd Pacific Asia conference on language, information and computation: 5th workshop on Asian translation: 5th workshop on Asian translation. Association for Computational Linguistics, Hong Kong. https:\/\/aclanthology.org\/Y18-3013"},{"key":"20277_CR14","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1007\/978-981-10-4603-2_20","volume-title":"Advanced computing and communication technologies","author":"M Saiful Islam","year":"2018","unstructured":"Saiful Islam M, Purkayastha BS (2018) English to bodo phrase-based statistical machine translation. In: Choudhary RK, Mandal JK, Bhattacharyya D (eds) Advanced computing and communication technologies. Springer, Singapore, pp 207\u2013217"},{"key":"20277_CR15","doi-asserted-by":"publisher","unstructured":"Narzary S, Brahma M, Singha B, Brahma R, Dibragede B, Barman S, Nandi S, Som B (2019) Attention based englishbodo neural machine translation system for tourism domain. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp 335\u2013343. https:\/\/doi.org\/10.1109\/ICCMC.2019.8819699","DOI":"10.1109\/ICCMC.2019.8819699"},{"key":"20277_CR16","doi-asserted-by":"publisher","unstructured":"Smit P, Virpioja S, Gr\u00f6nroos S-A, Kurimo M (2014) Morfessor 2.0: toolkit for statistical morphological segmentation. In: Proceedings of the demonstrations at the 14th conference of the European chapter of the association for computational linguistics. Association for Computational Linguistics, Gothenburg, Sweden, pp 21\u201324. https:\/\/doi.org\/10.3115\/v1\/E14-2006. https:\/\/aclanthology.org\/E14-2006","DOI":"10.3115\/v1\/E14-2006"},{"key":"20277_CR17","unstructured":"Shibata Y, Kida T, Fukamachi S, Takeda M, Shinohara A, Shinohara T, Arikawa S (1999) Byte pair encoding: a text compression scheme that accelerates pattern matching. Technical Report DOI-TR-161, Department of Informatics, Kyushu University"},{"key":"20277_CR18","unstructured":"Kunchukuttan A, Shah M, Prakash P, Bhattacharyya P (2017) Utilizing lexical similarity between related, low-resource languages for pivot-based SMT. In: Proceedings of the eighth international joint conference on natural language processing (volume 2: short papers). Asian Federation of Natural Language Processing, Taipei, Taiwan, pp 283\u2013289. https:\/\/aclanthology.org\/I17-2048"},{"issue":"2","key":"20277_CR19","first-page":"263","volume":"19","author":"PF Brown","year":"1993","unstructured":"Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263\u2013311","journal-title":"Comput Linguist"},{"key":"20277_CR20","doi-asserted-by":"crossref","unstructured":"Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R et al (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pp 177\u2013180","DOI":"10.3115\/1557769.1557821"},{"key":"20277_CR21","unstructured":"Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of machine translation summit X: papers, Phuket, Thailand, pp 79\u201386. https:\/\/aclanthology.org\/2005.mtsummit-papers.11"},{"key":"20277_CR22","doi-asserted-by":"crossref","unstructured":"W\u00e4schle K, Riezler S (2012) Analyzing parallelism and domain similarities in the marec patent corpus. http:\/\/www.cl.uni-heidelberg.de\/~riezler\/publications\/papers\/IRF2012.pdf","DOI":"10.1007\/978-3-642-31274-8_2"},{"key":"20277_CR23","unstructured":"AI4Bharat, Gala J, Chitale PA, AK R, Doddapaneni S, Gumma V, Kumar A, Nawale J, Sujatha A, Puduppully R, Raghavan V, Kumar P, Khapra MM, Dabre R, Kunchukuttan A (2023) Indictrans2: towards high-quality and accessible machine translation models for all 22 scheduled indian languages. arXiv preprint arXiv:2305.16307"},{"key":"20277_CR24","doi-asserted-by":"crossref","unstructured":"Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Mit CM, Zens R, Aachen R, Dyer C, Bojar O, Cornell EH (2007) Moses: open source toolkit for statistical machine translation itc-irst 2, pp 177\u2013180. http:\/\/www.statmt.org\/moses\/","DOI":"10.3115\/1557769.1557821"},{"key":"20277_CR25","unstructured":"Baro D (2017) Process of word formation in bodo. International Journal of Humanities and Social Science Invention, 6\u2013114658"},{"key":"20277_CR26","doi-asserted-by":"publisher","unstructured":"Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Berlin, Germany, pp 1715\u20131725. https:\/\/doi.org\/10.18653\/v1\/P16-1162. https:\/\/aclanthology.org\/P16-1162","DOI":"10.18653\/v1\/P16-1162"},{"key":"20277_CR27","unstructured":"Radford A, Narasimhan K, Salimans T, Sutskever I et al (2018) Improving language understanding by generative pre-training"},{"issue":"8","key":"20277_CR28","first-page":"9","volume":"1","author":"A Radford","year":"2019","unstructured":"Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9","journal-title":"OpenAI blog"},{"key":"20277_CR29","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877\u20131901","journal-title":"Adv Neural Inf Process Syst"},{"key":"20277_CR30","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805"},{"key":"20277_CR31","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692"},{"key":"20277_CR32","doi-asserted-by":"crossref","unstructured":"Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"20277_CR33","unstructured":"He P, Liu X, Gao J, Chen W (2021) Deberta: decoding-enhanced bert with disentangled attention. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=XPZIaotutsD"},{"key":"20277_CR34","unstructured":"Kunchukuttan A (2020) IndoWordnet parallel corpus. https:\/\/github.com\/anoopkunchukuttan\/indowordnet_parallel"},{"key":"20277_CR35","unstructured":"Kunchukuttan A (2020) The IndicNLP Library. https:\/\/github.com\/anoopkunchukuttan\/indic_nlp_library\/blob\/master\/docs\/indicnlp.pdf"},{"key":"20277_CR36","doi-asserted-by":"publisher","unstructured":"Kunchukuttan A, Bhattacharyya P (2016) Orthographic syllable as basic unit for SMT between related languages. In: Su J, Duh K, Carreras X (eds) Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp 1912\u20131917. https:\/\/doi.org\/10.18653\/v1\/D16-1196. https:\/\/aclanthology.org\/D16-1196","DOI":"10.18653\/v1\/D16-1196"}],"container-title":["Multimedia Tools and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-024-20277-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11042-024-20277-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-024-20277-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T23:31:07Z","timestamp":1757115067000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11042-024-20277-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,16]]},"references-count":36,"journal-issue":{"issue":"25","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["20277"],"URL":"https:\/\/doi.org\/10.1007\/s11042-024-20277-w","relation":{},"ISSN":["1573-7721"],"issn-type":[{"type":"electronic","value":"1573-7721"}],"subject":[],"published":{"date-parts":[[2024,10,16]]},"assertion":[{"value":"20 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 June 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 September 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 October 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We affirm that the works carried by us are, to the best of our knowledge, the original, unaltered works of the authors, and that when we have used the work of others, we have appropriately cited their original source. Other than the Multimedia Tools and Application journal, our work is not published in any other journals.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest\/Competing Interests"}}]}}