{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T06:36:04Z","timestamp":1777703764838,"version":"3.51.4"},"reference-count":42,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2019,7,31]],"date-time":"2019-07-31T00:00:00Z","timestamp":1564531200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"published-print":{"date-parts":[[2019,12,23]]},"abstract":"<jats:p>Understanding the context with generation of textual description from an input image is an active and challenging research topic in computer vision and natural language processing. However, in the case of Bengali language, the problem is still unexplored. In this paper, we address a standard approach for Bengali image caption generation though subsampling the machine translated dataset. Later, we use several pre-processing techniques with the state-of-the-art CNN-LSTM architecture-based models. The experiment is conducted on standard Flickr-8K dataset, along with several modifications applied to adapt with the Bengali language. The training caption subsampled dataset is computed for both Bengali and English languages for further experiments with 16 distinct models developed in the entire training process. The trained models for both languages are analyzed with respect to several caption evaluation metrics. Further, we establish a baseline performance in Bengali image captioning defining the limitation of current word embedding approaches compared to internal local embedding.<\/jats:p>","DOI":"10.3233\/jifs-179351","type":"journal-article","created":{"date-parts":[[2019,8,2]],"date-time":"2019-08-02T12:17:09Z","timestamp":1564748229000},"page":"7427-7439","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":11,"title":["Oboyob: A sequential-semantic Bengali image captioning engine"],"prefix":"10.1177","volume":"37","author":[{"given":"Tonmoay","family":"Deb","sequence":"first","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, North South University, Bangladesh"}]},{"given":"Mohammad Zariff Ahsham","family":"Ali","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, North South University, Bangladesh"}]},{"given":"Sanchita","family":"Bhowmik","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, North South University, Bangladesh"}]},{"given":"Adnan","family":"Firoze","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, North South University, Bangladesh"}]},{"given":"Syed Shahir","family":"Ahmed","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, North South University, Bangladesh"}]},{"given":"Muhammad Abeer","family":"Tahmeed","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, North South University, Bangladesh"}]},{"given":"N.S.M. Rezaur","family":"Rahman","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, North South University, Bangladesh"}]},{"given":"Rashedur M.","family":"Rahman","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, North South University, Bangladesh"}]}],"member":"179","published-online":{"date-parts":[[2019,7,31]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.3994"},{"key":"e_1_3_1_3_2","unstructured":"JoulinA. GraveE. BojanowskiP. and MikolovT. Bag of tricks for efficient text classification arXiv preprint arXiv:1607.01759 2016."},{"key":"e_1_3_1_4_2","first-page":"3156","volume-title":"Show and tell: A neural image caption generator","author":"Vinyals O.","year":"2015","unstructured":"VinyalsO., ToshevA., BengioS. and ErhanD., Show and tell: A neural image caption generator, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3156\u20133164."},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"TantiM. GattA. and CamilleriK.P. What is the role of recurrent neural networks (rnns) in an image caption generator? arXiv preprint arXiv:1708.02043 2017.","DOI":"10.18653\/v1\/W17-3506"},{"key":"e_1_3_1_6_2","unstructured":"SimonyanK. and ZissermanA. Very deep convolutional networks for largescale image recognition arXiv preprint arXiv:1409.1556 2014."},{"key":"e_1_3_1_7_2","first-page":"1097","volume-title":"Imagenet classification with deep convolutional neural networks","author":"Krizhevsky A.","year":"2012","unstructured":"KrizhevskyA., SutskeverI. and HintonG.E., Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, 2012, pp. 1097\u20131105."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"issue":"1","key":"e_1_3_1_9_2","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava N.","year":"2014","unstructured":"SrivastavaN., HintonG., KrizhevskyA., SutskeverI., SalakhutdinovR. Dropout: A simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research 15(1) (2014), 1929\u20131958.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_3_1_10_2","doi-asserted-by":"crossref","unstructured":"DebT. ArmanA. and FirozeA. Machine Cognition of Violence in Videos Using Novel Outlier-Resistant VLAD in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) 2018 pp. 989\u2013994.","DOI":"10.1109\/ICMLA.2018.00161"},{"key":"e_1_3_1_11_2","first-page":"311","volume-title":"Bleu: A method for automatic evaluation of machine translation","author":"Papineni K.","year":"2002","unstructured":"PapineniK., RoukosS., WardT. and ZhuW.-J. Bleu: A method for automatic evaluation of machine translation, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 2002, pp. 311\u2013318."},{"key":"e_1_3_1_12_2","first-page":"2422","volume-title":"Mind\u2019s eye: A recurrent visual representation for image caption generation","author":"Chen X.","year":"2015","unstructured":"ChenX. and Lawrence ZitnickC., Mind\u2019s eye: A recurrent visual representation for image caption generation, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2422\u20132431."},{"key":"e_1_3_1_13_2","first-page":"247","volume-title":"Bilingual machine translation: English to bengali","author":"Bal S.","year":"2019","unstructured":"BalS., MahantaS., MandalL., ParekhR., Bilingual machine translation: English to bengali, in Proceedings of International Ethical Hacking Conference 2018, Springer, 2019, pp. 247\u2013259."},{"key":"e_1_3_1_14_2","unstructured":"RanzatoM. ChopraS. AuliM. and ZarembaW. Sequence level training with recurrent neural networks arXiv preprint arXiv:1511.06732 2015."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1168"},{"key":"e_1_3_1_16_2","volume-title":"Baby talk: Understanding and generating image descriptions","author":"Kulkarni G.","year":"2011","unstructured":"KulkarniG., PremrajV., DharS., LiS., ChoiY., BergA.C., BergT.L., Baby talk: Understanding and generating image descriptions, in Proceedings of the 24th CVPR, Citeseer, 2011."},{"key":"e_1_3_1_17_2","unstructured":"MaoJ. XuW. YangY. WangJ. HuangZ. and YuilleA. Deep captioning with multimodal recurrent neural networks (m-rnn) arXiv preprint arXiv:1412.6632 2014."},{"key":"e_1_3_1_18_2","unstructured":"XuK. BaJ. KirosR. ChoK. CourvilleA. SalakhudinovR. ZemelR. and BengioY. Show attend and tell: Neural image caption generation with visual attention in International Conference on Machine Learning 2015 pp. 2048\u20132057."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00110"},{"key":"e_1_3_1_20_2","unstructured":"MaoJ. XuW. YangY. WangJ. and YuilleA.L. Explain images with multimodal recurrent neural networks arXiv preprint arXiv:1410.1090 2014."},{"key":"e_1_3_1_21_2","first-page":"451","volume-title":"A bengali-sylheti rule-based dialect translation system: Proposal and preliminary system","author":"Chakraborty S.","year":"2018","unstructured":"ChakrabortyS., SinhaA. and NathS., A bengali-sylheti rule-based dialect translation system: Proposal and preliminary system, in Proceedings of the International Conference on Computing and Communication Systems, Springer 2018, pp. 451\u2013460."},{"key":"e_1_3_1_22_2","first-page":"12","article-title":"Inception-v4, inception-resnet and the impact of residual connections on learning","author":"Szegedy C.","year":"2017","unstructured":"SzegedyC., IoffeS., VanhouckeV., AlemiA.A., Inception-v4, inception-resnet and the impact of residual connections on learning, in AAAI, 42017, p. 12.","journal-title":"AAAI"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324918000098"},{"key":"e_1_3_1_24_2","unstructured":"DevlinJ. ChengH. FangH. GuptaS. DengL. HeX. ZweigG. and MitchellM. Language models for image captioning: The quirks and what works CoRR abs\/1505.01809 2015."},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","unstructured":"YouQ. JinH. WangZ. FangC. and LuoJ. Image captioning with semantic attention 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016.","DOI":"10.1109\/CVPR.2016.503"},{"key":"e_1_3_1_26_2","doi-asserted-by":"crossref","unstructured":"VedantamR. ZitnickC.L. and ParikhD. Cider: Consensus-based image description evaluation 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015.","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"e_1_3_1_27_2","unstructured":"MikolovT. ChenK. CorradoG. and DeanJ. Efficient estimation of word representations in vector space CoRR abs\/1301.3781 2013."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2911996.2912049"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123366"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-4020"},{"key":"e_1_3_1_31_2","unstructured":"ChenX. FangH. LinT. VedantamR. GuptaS. Doll\u00e1rP. and ZitnickC.L. Microsoft COCO captions: Data collection and evaluation server CoRR abs\/1504.00325 2015."},{"key":"e_1_3_1_32_2","unstructured":"GraveE. BojanowskiP. GuptaP. JoulinA. and MikolovT. Learning word vectors for 157 languages CoRR abs\/1802.06893 2018."},{"key":"e_1_3_1_33_2","volume-title":"Meteor universal: Language specific translation evaluation for any target language","author":"Denkowski M.","year":"2014","unstructured":"DenkowskiM. and LavieA., Meteor universal: Language specific translation evaluation for any target language, Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014."},{"key":"e_1_3_1_34_2","article-title":"Rouge: A package for automatic evaluation of summaries","author":"Lin C.-Y.","year":"2004","unstructured":"LinC.-Y., Rouge: A package for automatic evaluation of summaries, Text Summarization Branches Out, 2004.","journal-title":"Text Summarization Branches Out"},{"key":"e_1_3_1_35_2","unstructured":"ClevertD. UnterthinerT. and HochreiterS. Fast and accurate deep network learning by exponential linear units (elus) CoRR abs\/1511.07289 2015."},{"key":"e_1_3_1_36_2","doi-asserted-by":"crossref","unstructured":"DengJ. DongW. SocherR. LiL.-J. LiK. and Fei-FeiL. Imagenet: A large-scale hierarchical image database in Computer Vision and Pattern Recognition 2009 CVPR 2009 IEEE Conference on IEEE 2009 pp. 248\u2013255.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","unstructured":"ElliottD. FrankS. Sima\u2019anK. and SpeciaL. Multi30k: Multilingual english-german image descriptions CoRR abs\/1605.00459 2016.","DOI":"10.18653\/v1\/W16-3210"},{"key":"e_1_3_1_38_2","unstructured":"Callison-BurchC. OsborneM. and KoehnP. Re-evaluation the role of bleu in machine translation research in 11th Conference of the European Chapter of the Association for Computational Linguistics 2006."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D13-1128"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-007-9031-y"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICACCI.2014.6968484"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1103\/RevModPhys.89.015004"},{"key":"e_1_3_1_43_2","doi-asserted-by":"crossref","unstructured":"RahmanM. MohammedN. MansoorN. and MomenS. Chittron: An automatic bangla image captioning system arXiv preprint arXiv:1809.00339 2018.","DOI":"10.1016\/j.procs.2019.06.100"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-179351","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.3233\/JIFS-179351","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-179351","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:39:32Z","timestamp":1777455572000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.3233\/JIFS-179351"}},"subtitle":[],"editor":[{"given":"Ngoc Thanh","family":"Nguyen","sequence":"additional","affiliation":[]},{"given":"Edward","family":"Szczerbicki","sequence":"additional","affiliation":[]},{"given":"Bogdan","family":"Trawi\u0144ski","sequence":"additional","affiliation":[]},{"given":"Van Du","family":"Nguyen","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,7,31]]},"references-count":42,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2019,12,23]]}},"alternative-id":["10.3233\/JIFS-179351"],"URL":"https:\/\/doi.org\/10.3233\/jifs-179351","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,7,31]]}}}