{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,25]],"date-time":"2025-06-25T04:09:42Z","timestamp":1750824582927,"version":"3.41.0"},"reference-count":54,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T00:00:00Z","timestamp":1750723200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T00:00:00Z","timestamp":1750723200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["SN COMPUT. SCI."],"DOI":"10.1007\/s42979-025-04111-0","type":"journal-article","created":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T16:12:48Z","timestamp":1750781568000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Innovative Dual-Objective Framework for Image Captioning: Harmonizing Visual Analysis and Linguistic Precision"],"prefix":"10.1007","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5611-5219","authenticated-orcid":false,"given":"Ranjith Gnana Suthakar","family":"Alphonse Raj","sequence":"first","affiliation":[]},{"given":"B. J.","family":"Sandesh","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,24]]},"reference":[{"key":"4111_CR1","doi-asserted-by":"crossref","unstructured":"Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L. Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 6077\u20136086.","DOI":"10.1109\/CVPR.2018.00636"},{"key":"4111_CR2","unstructured":"He K, Gkioxari G, Doll\u00b4ar P, Girshick R. Bounding boxes for weakly super-vised object localization. IEEE Trans. Pattern Anal. Mach. Intell. 2019."},{"key":"4111_CR3","unstructured":"Liu S, Zhu Z, Ye N, Guadarrama S, Murphy, K. Improving image captioning with better use of captions. In: Proceedings of the International Conference on Learning Representations (ICLR); 2021."},{"key":"4111_CR4","unstructured":"Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y. Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning (ICML); 2015."},{"key":"4111_CR5","doi-asserted-by":"crossref","unstructured":"Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015.","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"4111_CR6","unstructured":"Ren S, He K, Girshick R, Sun J. Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017."},{"key":"4111_CR7","doi-asserted-by":"crossref","unstructured":"Fang H, Gupta S, Iandola F, Srivastava RK, Deng L, Doll\u00b4ar P, Gao J, He X, Mitchell M, Platt JC, Zitnick CL, Zweig G. From captions to visual concepts and back. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015.","DOI":"10.1109\/CVPR.2015.7298754"},{"key":"4111_CR8","doi-asserted-by":"crossref","unstructured":"Rennie SJ, Marcheret E, Mroueh Y, Ross J, Goel V. Self-critical sequence training for image captioning. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 2017. p. 7008\u20137024 .","DOI":"10.1109\/CVPR.2017.131"},{"key":"4111_CR9","doi-asserted-by":"crossref","unstructured":"Liu X, Li H, Shao J, Chen D, Wang X. Show, tell and discriminate: image captioning by self-retrieval with partially labeled data. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 338\u2013354.","DOI":"10.1007\/978-3-030-01267-0_21"},{"key":"4111_CR10","doi-asserted-by":"crossref","unstructured":"Gu J, Cai J, Wang G, Chen T. Stack-captioning: coarse-to-fine learning for image captioning. In: Proceedings of the AAAI conference on artificial intelligence. 2018. p. 32.","DOI":"10.1609\/aaai.v32i1.12266"},{"key":"4111_CR11","doi-asserted-by":"crossref","unstructured":"Chen C, Mu S, Xiao W, Ye Z, Wu L, Ju Q. Improving image captioning with conditional generative adversarial nets. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33; 2019. p. 8142\u20138150.","DOI":"10.1609\/aaai.v33i01.33018142"},{"key":"4111_CR12","doi-asserted-by":"crossref","unstructured":"Wang Z, Huang Z, Luo Y. Human consensus-oriented image captioning. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 2021; 659\u2013665.","DOI":"10.24963\/ijcai.2020\/92"},{"key":"4111_CR13","doi-asserted-by":"crossref","unstructured":"Hessel J, Holtzman A, Forbes M, Bras RL, Choi Y. Clipscore: a reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718; 2021.","DOI":"10.18653\/v1\/2021.emnlp-main.595"},{"issue":"2","key":"4111_CR14","doi-asserted-by":"publisher","first-page":"890","DOI":"10.1109\/TCYB.2022.3156367","volume":"54","author":"Y Yang","year":"2022","unstructured":"Yang Y, Wei H, Zhu H, Yu D, Xiong H, Yang J. Exploiting cross-modal prediction and relation consistency for semisupervised image captioning. IEEE Trans Cybern. 2022;54(2):890\u2013902.","journal-title":"IEEE Trans Cybern"},{"key":"4111_CR15","first-page":"79124","volume":"36","author":"Z Yue","year":"2024","unstructured":"Yue Z, Hu A, Zhang L, Jin Q. Learning descriptive image captioning via semipermeable maximum likelihood estimation. Adv Neural Inf Process Syst. 2024;36:79124\u201341.","journal-title":"Adv Neural Inf Process Syst"},{"issue":"1","key":"4111_CR16","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1109\/TII.2021.3085669","volume":"18","author":"Y Li","year":"2022","unstructured":"Li Y, Wang H, Zhang Q, Yang X, Li J. Automatic detection and classification system of domestic waste via multi-model cascaded convolutional neural network. IEEE Trans Ind Inform. 2022;18(1):163\u201373.","journal-title":"IEEE Trans Ind Inform"},{"issue":"8","key":"4111_CR17","doi-asserted-by":"publisher","first-page":"2546","DOI":"10.1109\/TVCG.2019.2894627","volume":"26","author":"B Zhang","year":"2020","unstructured":"Zhang B, Sheng B, Li P, Lee TY. Depth of field rendering using multilayer-neighborhood optimization. IEEE Trans Vis Comput Graph. 2020;26(8):2546\u201359.","journal-title":"IEEE Trans Vis Comput Graph"},{"key":"4111_CR18","doi-asserted-by":"crossref","unstructured":"Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL. Microsoft coco: common objects in context. In: European conference on computer vision; 2014. p. 740\u2013755.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"4111_CR19","doi-asserted-by":"crossref","unstructured":"Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics; 2002. p. 311\u2013318.","DOI":"10.3115\/1073083.1073135"},{"key":"4111_CR20","unstructured":"Pu Y, Gan Z, Henao R, Yuan X, Li C, Stevens A, Carin L. Variational autoencoder for deep learning of images, labels and captions. Adv. Neural Inf. Process. Syst. 2016; 29."},{"key":"4111_CR21","doi-asserted-by":"crossref","unstructured":"Ren Z, Wang X, Zhang N, Lv X, Li LJ. Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 290\u2013298.","DOI":"10.1109\/CVPR.2017.128"},{"key":"4111_CR22","doi-asserted-by":"crossref","unstructured":"Li L, Tang S, Deng L, Zhang Y, Tian Q. Image caption with global-local attention. In: Proceedings of the AAAI Conference on Artificial Intelligence; 2017. p. 31.","DOI":"10.1609\/aaai.v31i1.11236"},{"key":"4111_CR23","unstructured":"Olimov F, Dubey S, Shrestha L, Tin TT, Jeon M. Image captioning using multiple transformers for self-attention mechanism. arXiv preprint arXiv:2103.05103; 2021."},{"key":"4111_CR24","unstructured":"Abedi A, Karshenas H, Adibi P. Multi-modal reward for visual relationships-based image captioning. arXiv preprint arXiv:2303.10766; 2023."},{"issue":"5","key":"4111_CR25","first-page":"5910","volume":"53","author":"Z Lian","year":"2023","unstructured":"Lian Z, Zhang Y, Li H, Wang R, Hu X. Cross modification attention-based deliberation model for image captioning. Appl Intell. 2023;53(5):5910\u201333.","journal-title":"Appl Intell"},{"key":"4111_CR26","doi-asserted-by":"crossref","unstructured":"Chen H, Ding G, Zhao S, Han J. Temporal-difference learning with sampling baseline for image captioning. In: Proceedings of the AAAI Conference on Artificial Intelligence; 2018. p. 32.","DOI":"10.1609\/aaai.v32i1.12263"},{"key":"4111_CR27","doi-asserted-by":"crossref","unstructured":"Wang W, Chen Z, Hu H. Hierarchical attention network for image captioning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Association for Computational Linguistics (ACL); 2019. https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/4924","DOI":"10.1609\/aaai.v33i01.33018957"},{"key":"4111_CR28","doi-asserted-by":"publisher","first-page":"812","DOI":"10.1016\/j.ins.2022.12.018","volume":"623","author":"S Dubey","year":"2023","unstructured":"Dubey S, Olimov F, Rafique MA, Kim J, Jeon M. Label-attention transformer with geometrically coherent objects for image captioning. Inf Sci. 2023;623:812\u201331.","journal-title":"Inf Sci"},{"key":"4111_CR29","doi-asserted-by":"crossref","unstructured":"Huang L, Wang W, Chen J, Wei XY. Attention on attention for image captioning. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision; 2019. p. 4634\u20134643.","DOI":"10.1109\/ICCV.2019.00473"},{"key":"4111_CR30","doi-asserted-by":"crossref","unstructured":"Luo Y, Ji J, Sun X, Cao L, Wu Y, Huang F, Lin CW, Ji R. Dual-level collaborative transformer for image captioning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35; 2021. p. 2286\u20132293.","DOI":"10.1609\/aaai.v35i3.16328"},{"issue":"2","key":"4111_CR31","first-page":"487","volume":"7","author":"R Mulyawan","year":"2023","unstructured":"Mulyawan R, Sunyoto A, Muhammad AHM. Pre-trained cnn architecture analysis for transformer-based indonesian image caption generation model. Int J Inform Vis. 2023;7(2):487\u201393.","journal-title":"Int J Inform Vis"},{"key":"4111_CR32","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1016\/j.neucom.2022.01.081","volume":"482","author":"JH Tan","year":"2022","unstructured":"Tan JH, Tan YH, Chan CS, Chuah JH. Acort: A compact object relation transformer for parameter efficient image captioning. Neurocomputing. 2022;482:60\u201372.","journal-title":"Neurocomputing"},{"key":"4111_CR33","first-page":"1","volume":"45","author":"R Cantini","year":"2025","unstructured":"Cantini R, Cosentino C, Marozzo F, Talia D, Trunfio P. Harnessing prompt-based large language models for disaster monitoring and automated reporting from social media feedback. Online Soc Netw Media. 2025;45:1\u201314.","journal-title":"Online Soc Netw Media"},{"key":"4111_CR34","doi-asserted-by":"publisher","unstructured":"Tan M, Le Q. Efficientnetv2: Smaller models and faster training. arXiv; 2021. p. 1\u201311. https:\/\/doi.org\/10.48550\/arXiv.2104.00298","DOI":"10.48550\/arXiv.2104.00298"},{"key":"4111_CR35","doi-asserted-by":"crossref","unstructured":"Huang L, Wang W, Chen J & Wei XY. Attention on attention for image captioning. In: Proceedings of the IEEE\/CVF international conference on computer vision; 2019. p. 4634\u20134643. https:\/\/openaccess.thecvf.com\/content_ICCV_2019\/papers\/Huang_Attention_on_Attention_for_Image_Captioning_ICCV_2019_paper.pdf","DOI":"10.1109\/ICCV.2019.00473"},{"key":"4111_CR36","doi-asserted-by":"publisher","unstructured":"Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning transferable visual models from natural language supervision. arXiv; 2021. p. 1\u201348. https:\/\/doi.org\/10.48550\/arXiv.2103.00020","DOI":"10.48550\/arXiv.2103.00020"},{"issue":"1","key":"4111_CR37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10994-024-06727-4","volume":"114","author":"R Cantini","year":"2025","unstructured":"Cantini R, Cosentino C, Kilanioti I, Marozzo F, Talia D. Unmasking deception: a topic-oriented multimodal approach to uncover false information on social media. Mach Learn. 2025;114(1):1\u201322.","journal-title":"Mach Learn"},{"key":"4111_CR38","first-page":"1","volume":"23","author":"J Wu","year":"2020","unstructured":"Wu J, Chen T, Wu H, Yang Z, Luo G, Lin L. Fine-grained image captioning with global-local discriminative objective. IEEE Trans Multimed. 2020;23:1\u201316.","journal-title":"IEEE Trans Multimed"},{"key":"4111_CR39","doi-asserted-by":"publisher","first-page":"30615","DOI":"10.1007\/s11042-020-09539-5","volume":"79","author":"T do Carmo Nogueira","year":"2020","unstructured":"do Carmo Nogueira T, Vinhal CDN, da Cruz J\u00fanior G, Ullmann MRD. Reference-based model using multimodal gated recurrent units for image captioning. Multimed Tools Appl. 2020;79:30615\u201335.","journal-title":"Multimed. Tools Appl."},{"key":"4111_CR40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.ijar.2020.12.016","volume":"131","author":"C Cheng","year":"2021","unstructured":"Cheng C, Li C, Han Y, Zhu Y. A semi-supervised deep learning image caption model based on Pseudo Label and N-gram. Int J Approx Reason. 2021;131:1\u201315.","journal-title":"Int J Approx Reason"},{"key":"4111_CR41","doi-asserted-by":"publisher","unstructured":"Adarsh NL, Arun PV, Aravindh NL. Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback. arXiv; 2024. p. 1\u20136. https:\/\/doi.org\/10.48550\/arXiv.2403.06735","DOI":"10.48550\/arXiv.2403.06735"},{"key":"4111_CR42","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3389\/fnins.2023.1270850","volume":"17","author":"T Bai","year":"2023","unstructured":"Bai T, Zhou S, Pang Y, Luo J, Wang H, Du Y. An image caption model based on attention mechanism and deep reinforcement learning. Front Neurosci. 2023;17:1\u201314.","journal-title":"Front Neurosci"},{"key":"4111_CR43","doi-asserted-by":"publisher","unstructured":"Shi Z, Zhou X, Qiu X, Zhu X. Improving image captioning with better use of captions. arXiv; 2020. p. 1\u201311. https:\/\/doi.org\/10.48550\/arXiv.2006.11807","DOI":"10.48550\/arXiv.2006.11807"},{"issue":"1","key":"4111_CR44","first-page":"1","volume":"17","author":"H Lu","year":"2021","unstructured":"Lu H, Yang R, Deng Z, Zhang Y, Gao G, Lan R. Chinese image captioning via fuzzy attention-based DenseNet-BiLSTM. ACM Trans Multimed Comput Commun Appl. 2021;17(1):1\u201318.","journal-title":"ACM Trans Multimed Comput Commun Appl"},{"key":"4111_CR45","doi-asserted-by":"crossref","unstructured":"Cantini R, Cosentino C, Kilanioti I, Marozzo F, Talia D. Unmasking COVID-19 false information on twitter: a topic-based approach with BERT. In: International conference on discovery science; 2023. p. 1\u201316. https:\/\/scalab.dimes.unical.it\/papers\/pdf\/Unmasking_COVID_19_False_Information_on_Twitter_FinalVersion.pdf","DOI":"10.1007\/978-3-031-45275-8_9"},{"key":"4111_CR46","doi-asserted-by":"publisher","unstructured":"Cantini R, Cosentino C, Marozzo F. Multi-dimensional classification on social media data for detailed reporting with large language models. In: International conference on artificial intelligence applications and innovations; 2024. p. 100\u2013114. https:\/\/doi.org\/10.1007\/978-3-031-63215-0_8","DOI":"10.1007\/978-3-031-63215-0_8"},{"issue":"2","key":"4111_CR47","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s11063-024-11527-x","volume":"56","author":"Q Sun","year":"2024","unstructured":"Sun Q, Zhang J, Fang Z, Gao Y. Self-enhanced attention for image captioning. Neural Process Lett. 2024;56(2):1\u201318.","journal-title":"Neural Process Lett"},{"key":"4111_CR48","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001424540181","author":"\u00d6 \u00c7ayl\u0131","year":"2024","unstructured":"\u00c7ayl\u0131 \u00d6, K\u0131l\u0131\u00e7 V, Onan A, Wang W. Multi-layer gated recurrent unit based recurrent neural network for image captioning. Int J Pattern Recognit Artif Intell. 2024. https:\/\/doi.org\/10.1142\/S0218001424540181.","journal-title":"Int J Pattern Recognit Artif Intell"},{"issue":"14","key":"4111_CR49","doi-asserted-by":"publisher","first-page":"4778","DOI":"10.1049\/ipr2.13287","volume":"18","author":"MM Rahman","year":"2024","unstructured":"Rahman MM, Uzzaman A, Sami SI, Khatun F, Bhuiyan MAA. A comprehensive construction of deep neural network-based encoder\u2013decoder framework for automatic image captioning systems. IET Image Process. 2024;18(14):4778\u201398.","journal-title":"IET Image Process"},{"key":"4111_CR50","doi-asserted-by":"crossref","unstructured":"Cooper E, Hsu L, Ramirez S. Adaptive semantic fusion for contextual image captioning. Preprints; 2025. p. 1\u201318. https:\/\/www.preprints.org\/manuscript\/202501.1491\/v1.","DOI":"10.20944\/preprints202501.1491.v1"},{"issue":"14","key":"4111_CR51","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/electronics12143187","volume":"12","author":"Y Lyu","year":"2023","unstructured":"Lyu Y, Liu Y, Zhao Q. Maintain a better balance between performance and cost for image captioning by a size-adjustable convolutional module. Electronics. 2023;12(14):1\u201320.","journal-title":"Electronics"},{"issue":"7","key":"4111_CR52","doi-asserted-by":"publisher","first-page":"1043","DOI":"10.1049\/cvi2.12305","volume":"18","author":"F Lv","year":"2024","unstructured":"Lv F, Wang R, Jing L, Dai P. HIST: Hierarchical and sequential transformer for image captioning. IET Comput Vis. 2024;18(7):1043\u201356.","journal-title":"IET Comput Vis"},{"issue":"13","key":"4111_CR53","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/app13137916","volume":"13","author":"T Xie","year":"2023","unstructured":"Xie T, Ding W, Zhang J, Wan X, Wang J. Bi-LS-AttM: a bidirectional LSTM and attention mechanism model for improving image captioning. Appl Sci. 2023;13(13):1\u201317.","journal-title":"Appl Sci"},{"issue":"1","key":"4111_CR54","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-024-69664-1","volume":"14","author":"AA Osman","year":"2024","unstructured":"Osman AA, Shalaby MAW, Soliman MM, Elsayed KM. Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture. Sci Rep. 2024;14(1):1\u201315.","journal-title":"Sci Rep"}],"container-title":["SN Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-025-04111-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42979-025-04111-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-025-04111-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T16:12:58Z","timestamp":1750781578000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42979-025-04111-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,24]]},"references-count":54,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["4111"],"URL":"https:\/\/doi.org\/10.1007\/s42979-025-04111-0","relation":{},"ISSN":["2661-8907"],"issn-type":[{"value":"2661-8907","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,24]]},"assertion":[{"value":"16 June 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 May 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 June 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests, financial or otherwise, that could be perceived to influence the work reported in this manuscript. No funds, grants, or other support were received during the preparation of this manuscript. The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interests"}},{"value":"This study did not involve human participants or animals. Ethical approval and informed consent were therefore not required.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical and Informed Consent for Data Used"}},{"value":"To meet the transparency requirements, we have made the source code and dataset publicly accessible. Researchers can access the preprocessing scripts, training procedures, and evaluation metrics via our GitHub repository and the COCO dataset portal. Detailed descriptions and usage instructions are provided to ensure that our methods can be replicated and built upon, promoting further advancements in IC.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Transparency and Reproducibility"}},{"value":"The MS COCO 2017 dataset provides a robust foundation for training and evaluating our IC model. Its diverse and extensive collection of annotated images ensures comprehensive coverage of various visual contexts and objects, which is essential for developing a model capable of generating high-quality captions across different scenarios. Our preprocessing pipeline not only enhances the quality of the training data but also sets a benchmark for future research in the field.The source code and detailed descriptions of the preprocessing steps are made publicly available in our GitHub () repository, and the dataset can be accessed through the COCO () website. These resources ensure transparency and reproducibility, facilitating further advancements in IC research.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Dataset Utilization and Benchmarking"}}],"article-number":"583"}}