{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,15]],"date-time":"2025-11-15T10:35:34Z","timestamp":1763202934789,"version":"3.41.2"},"reference-count":31,"publisher":"World Scientific Pub Co Pte Ltd","issue":"01","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. As. Lang. Proc."],"published-print":{"date-parts":[[2024,3]]},"abstract":"<jats:p> Automatic Image Captioning (AIC) refers to the process of synthesizing semantically and syntactically correct descriptions for images. Existing research on AIC has predominantly focused on the English language. Comparatively, lower numbers of works have focused on developing captioning systems for low-resource Indian languages like Assamese. This paper investigates AIC for the Assamese language using two distinct approaches. The first approach involves utilizing state-of-the-art AIC model pretrained on an English image-caption dataset to generate English captions for input images. Next, these English captions are translated to the Assamese language using a publicly available automatic translator. The second approach involves exclusively training the AIC model using an Assamese image-caption dataset to predict captions directly in Assamese. The experiments are performed on two types of state-of-art models, one which uses LSTM as a decoder and the other one uses a transformer. Through extensive experimentation, the performance of these approaches is evaluated both quantitatively and qualitatively. The quantitative results are obtained using automatic metrics such as BLEU-n and CIDEr. For qualitative analysis, human evaluation is performed. The comparative performances between the two approaches reveal that models trained exclusively on Assamese image-caption datasets achieve superior results both in terms of quantitative measures and qualitative assessment when compared to models pretrained on English and subsequently translated into Assamese. <\/jats:p>","DOI":"10.1142\/s2717554524500048","type":"journal-article","created":{"date-parts":[[2024,6,17]],"date-time":"2024-06-17T09:00:24Z","timestamp":1718614824000},"source":"Crossref","is-referenced-by-count":3,"title":["Impact of Language-Specific Training on Image Caption Synthesis: A Case Study on Low-Resource Assamese Language"],"prefix":"10.1142","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-1159-3118","authenticated-orcid":false,"given":"Pankaj","family":"Choudhury","sequence":"first","affiliation":[{"name":"Centre for Linguistic Science and Technology, Indian Institute of Technology Guwahati, Assam, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2885-0026","authenticated-orcid":false,"given":"Prithwijit","family":"Guha","sequence":"additional","affiliation":[{"name":"Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Assam, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5869-1057","authenticated-orcid":false,"given":"Sukumar","family":"Nandi","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India"}]}],"member":"219","published-online":{"date-parts":[[2024,7,29]]},"reference":[{"key":"S2717554524500048BIB001","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3148210"},{"key":"S2717554524500048BIB002","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15561-1_2"},{"key":"S2717554524500048BIB003","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.162"},{"key":"S2717554524500048BIB004","first-page":"220","volume-title":"Proc. Fifteenth Conf. Computational Natural Language Learning","author":"Li S.","year":"2011"},{"key":"S2717554524500048BIB005","first-page":"747","volume-title":"Proc. 13th Conf. Eur. Chapter of the Association for Computational Linguistics","author":"Mitchell M.","year":"2012"},{"key":"S2717554524500048BIB006","doi-asserted-by":"publisher","DOI":"10.1145\/3295748"},{"key":"S2717554524500048BIB007","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.05.080"},{"key":"S2717554524500048BIB008","first-page":"2048","volume-title":"Int. Conf. Machine Learning","author":"Xu K.","year":"2015"},{"key":"S2717554524500048BIB009","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"S2717554524500048BIB010","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01098"},{"key":"S2717554524500048BIB013","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-33-6881-1_36"},{"key":"S2717554524500048BIB015","doi-asserted-by":"publisher","DOI":"10.1109\/IALP61005.2023.10337310"},{"key":"S2717554524500048BIB016","first-page":"3104","volume":"27","author":"Sutskever I.","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"S2717554524500048BIB017","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"S2717554524500048BIB019","first-page":"91","volume-title":"Advances in Neural Information Processing Systems","volume":"28","author":"Ren S.","year":"2015"},{"key":"S2717554524500048BIB020","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00473"},{"key":"S2717554524500048BIB021","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1238"},{"key":"S2717554524500048BIB023","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","volume":"30","author":"Vaswani A.","year":"2017"},{"key":"S2717554524500048BIB024","doi-asserted-by":"publisher","DOI":"10.3390\/app8050739"},{"key":"S2717554524500048BIB025","first-page":"5987","volume-title":"2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR)","author":"Xie S.","year":"2016"},{"key":"S2717554524500048BIB026","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01059"},{"key":"S2717554524500048BIB027","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00902"},{"key":"S2717554524500048BIB028","first-page":"11137","volume-title":"Advances in Neural Information Processing Systems","volume":"32","author":"Herdade S.","year":"2019"},{"key":"S2717554524500048BIB029","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01034"},{"key":"S2717554524500048BIB030","first-page":"153","volume-title":"Proc. Asian Conf. Computer Vision","author":"He S.","year":"2020"},{"key":"S2717554524500048BIB031","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i1.19940"},{"key":"S2717554524500048BIB032","doi-asserted-by":"publisher","DOI":"10.1145\/3432246"},{"key":"S2717554524500048BIB033","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2019.06.100"},{"key":"S2717554524500048BIB034","doi-asserted-by":"publisher","DOI":"10.1007\/s42979-021-00975-0"},{"key":"S2717554524500048BIB035","first-page":"263","volume-title":"Proc. 34th Conf. Computational Linguistics and Speech Processing (ROCLING 2022)","author":"Nath P.","year":"2022"},{"key":"S2717554524500048BIB036","first-page":"743","volume-title":"Proc. 37th Pacific Asia Conf. Language, Information and Computation","author":"Choudhury P.","year":"2023"}],"container-title":["International Journal of Asian Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S2717554524500048","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T07:56:28Z","timestamp":1722930988000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S2717554524500048"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3]]},"references-count":31,"journal-issue":{"issue":"01","published-print":{"date-parts":[[2024,3]]}},"alternative-id":["10.1142\/S2717554524500048"],"URL":"https:\/\/doi.org\/10.1142\/s2717554524500048","relation":{},"ISSN":["2717-5545","2424-791X"],"issn-type":[{"type":"print","value":"2717-5545"},{"type":"electronic","value":"2424-791X"}],"subject":[],"published":{"date-parts":[[2024,3]]},"article-number":"2450004"}}