{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T02:47:56Z","timestamp":1747190876765,"version":"3.40.5"},"reference-count":31,"publisher":"Wiley","license":[{"start":{"date-parts":[[2022,9,17]],"date-time":"2022-09-17T00:00:00Z","timestamp":1663372800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Applied Computational Intelligence and Soft Computing"],"published-print":{"date-parts":[[2022,9,17]]},"abstract":"<jats:p>Automatic image caption generation is an intricate task of describing an image in natural language by gaining insights present in an image. Featuring facial expressions in the conventional image captioning system brings out new prospects to generate pertinent descriptions, revealing the emotional aspects of the image. The proposed work encapsulates the facial emotional features to produce more expressive captions similar to human-annotated ones with the help of Cross Stage Partial Dense Network (CSPDenseNet) and Self-attentive Bidirectional Long Short-Term Memory (BiLSTM) network. The encoding unit captures the facial expressions and dense image features using a Facial Expression Recognition (FER) model and CSPDense neural network, respectively. Further, the word embedding vectors of the ground truth image captions are created and learned using the Word2Vec embedding technique. Then, the extracted image feature vectors and word vectors are fused to form an encoding vector representing the rich image content. The decoding unit employs a self-attention mechanism encompassed with BiLSTM to create more descriptive and relevant captions in natural language. The Flickr11k dataset, a subset of the Flickr30k dataset is used to train, test, and evaluate the present model based on five benchmark image captioning metrics. They are BiLingual Evaluation Understudy (BLEU), Metric for Evaluation of Translation with Explicit Ordering (METEOR), Recall-Oriented Understudy for Gisting Evaluation (ROGUE), Consensus-based Image Description Evaluation (CIDEr), and Semantic Propositional Image Caption Evaluation (SPICE). The experimental analysis indicates that the proposed model enhances the quality of captions with 0.6012(BLEU-1), 0.3992(BLEU-2), 0.2703(BLEU-3), 0.1921(BLEU-4), 0.1932(METEOR), 0.2617(CIDEr), 0.4793(ROUGE-L), and 0.1260(SPICE) scores, respectively, using additive emotional characteristics and behavioral components of the objects present in the image.<\/jats:p>","DOI":"10.1155\/2022\/2756396","type":"journal-article","created":{"date-parts":[[2022,9,17]],"date-time":"2022-09-17T15:20:10Z","timestamp":1663428010000},"page":"1-13","source":"Crossref","is-referenced-by-count":1,"title":["Caption Generation Based on Emotions Using CSPDenseNet and BiLSTM with Self-Attention"],"prefix":"10.1155","volume":"2022","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0173-7955","authenticated-orcid":true,"given":"Kavi Priya","family":"S","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Mepco Schlenk Engineering College (Autonomous), Sivakasi 626005, Tamil Nadu, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8135-9661","authenticated-orcid":true,"given":"Pon Karthika","family":"K","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Mepco Schlenk Engineering College (Autonomous), Sivakasi 626005, Tamil Nadu, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6044-6667","authenticated-orcid":true,"given":"Jayakumar","family":"Kaliappan","sequence":"additional","affiliation":[{"name":"Department of Analytics, School of Computer Science and Engineering, Vellore Institute of Technology (VIT), Vellore 632014, Tamil Nadu, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9994-9424","authenticated-orcid":true,"given":"Senthil Kumaran","family":"Selvaraj","sequence":"additional","affiliation":[{"name":"Department of Manufacturing Engineering, School of Mechanical Engineering (SMEC), Vellore Institute of Technology (VIT), Vellore 632014, Tamil Nadu, India"}]},{"given":"Nagalakshmi","family":"R","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Faculty of Engineering and Technology, Kalinga University, Raipur, Chhattisgarh, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2974-7805","authenticated-orcid":true,"given":"Baye","family":"Molla","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Engineering and Technology College, Dilla University, P.O.Box. 419, Dilla, Ethiopia"}]}],"member":"311","reference":[{"first-page":"1014","article-title":"IoT based automation of real time in-pipe contamination detection system in drinking water","author":"S. K. Priya","key":"1"},{"key":"2","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.asoc.2014.12.019","article-title":"Heuristic routing with bandwidth and energy constraints in sensor networks","volume":"29","author":"S. K. Priya","year":"2015","journal-title":"Applied Soft Computing"},{"key":"3","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2021.107918"},{"key":"4","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2020.106983"},{"key":"5","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2020.106198"},{"key":"6","doi-asserted-by":"publisher","DOI":"10.3390\/math9070730"},{"key":"7","doi-asserted-by":"publisher","DOI":"10.1145\/3295748"},{"key":"8","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.05.080"},{"key":"9","doi-asserted-by":"publisher","DOI":"10.1155\/2020\/3062706"},{"key":"10","article-title":"Face-cap: image captioning using facial expression analysis","volume-title":"Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science","author":"O. Nezami","year":"2019"},{"key":"11","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v30i1.10475","article-title":"SentiCap: generating image descriptions with sentiments","author":"A. Mathews","year":"2016"},{"volume-title":"CSPNet: A New Backbone that Can Enhance Learning Capability of CNN","author":"C. Wang","key":"12"},{"key":"13","first-page":"15","article-title":"Every picture tells a story: generating sentences from images","volume-title":"European Conference on Computer Vision","author":"A. Farhadi","year":"2010"},{"first-page":"2891","article-title":"Baby talk: understanding and generating image descriptions","author":"G. Kulkarni","key":"14"},{"key":"15","first-page":"1143","article-title":"Im2Text: describing images using 1 million captioned photographs","volume":"24","author":"V. Ordonez","year":"2011","journal-title":"Advances in Neural Information Processing Systems"},{"first-page":"2596","article-title":"Automatic concept discovery from parallel text and visual corpora","author":"C. Sun","key":"16"},{"first-page":"3156","article-title":"Show and tell: a neural image caption generator","author":"O. Vinyals","key":"17"},{"author":"K. Xu","key":"18","article-title":"Show, attend and tell: neural image caption generation with visual attention"},{"first-page":"1","article-title":"Bi-San-cap: bi-directional self-attention for image captioning","author":"M. Z. Hossain","key":"19"},{"key":"20","doi-asserted-by":"publisher","DOI":"10.1155\/2020\/8909458"},{"key":"21","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2020.115836"},{"key":"22","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/4001460"},{"key":"23","doi-asserted-by":"publisher","DOI":"10.1109\/tip.2022.3183434"},{"key":"24","doi-asserted-by":"crossref","DOI":"10.1109\/TPAMI.2022.3148210","article-title":"From show to tell: a survey on deep learning-based image captioning","author":"M. Stefanini","year":"2022"},{"key":"25","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.12025"},{"article-title":"Very deep convolutional networks for large-scale image recognition","year":"2014","author":"K. Simonyan","key":"26"},{"article-title":"BLEU: A method for automatic evaluation of machine translation","author":"K. Papineni","key":"27","doi-asserted-by":"crossref","DOI":"10.3115\/1073083.1073135"},{"first-page":"376","article-title":"Meteor universal: language specific translation evaluation for any target language","author":"M. Denkowski","key":"28"},{"author":"C. Lin","key":"29","article-title":"ROUGE: a package for automatic evaluation of summaries"},{"key":"30","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2015.7299087","article-title":"CIDEr: consensus-based image description evaluation","author":"R. Vedantam","year":"2015"},{"key":"31","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-46454-1_24","article-title":"SPICE: Semantic propositional image caption evaluation","author":"P. Anderson","year":"2016"}],"container-title":["Applied Computational Intelligence and Soft Computing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/acisc\/2022\/2756396.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/acisc\/2022\/2756396.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/acisc\/2022\/2756396.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,17]],"date-time":"2022-09-17T15:20:15Z","timestamp":1663428015000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.hindawi.com\/journals\/acisc\/2022\/2756396\/"}},"subtitle":[],"editor":[{"given":"Dimitrios A.","family":"Karras","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,9,17]]},"references-count":31,"alternative-id":["2756396","2756396"],"URL":"https:\/\/doi.org\/10.1155\/2022\/2756396","relation":{},"ISSN":["1687-9732","1687-9724"],"issn-type":[{"type":"electronic","value":"1687-9732"},{"type":"print","value":"1687-9724"}],"subject":[],"published":{"date-parts":[[2022,9,17]]}}}