{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,13]],"date-time":"2025-12-13T06:08:04Z","timestamp":1765606084481,"version":"3.48.0"},"reference-count":42,"publisher":"Wiley","issue":"12","license":[{"start":{"date-parts":[[2025,8,3]],"date-time":"2025-08-03T00:00:00Z","timestamp":1754179200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["advanced.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Advanced Intelligent Systems"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>Existing transformer\u2010based image captioning methods face two primary limitations: first, they struggle to adequately represent visual features from multiple regions during the encoding phase, and second, the decoder fails to effectively utilize future semantic information during the inference phase. To address these challenges, an attention\u2010enhanced image captioning model is proposed. During the encoding phase, multigranular visual features are integrated by combining cross\u2010attention and self\u2010attention mechanisms, fully utilizing both grid and regional features. Additionally, a novel dense global self\u2010attention module is introduced to enhance model performance with minimal computational cost by fully leveraging the contextual information and fine\u2010grained details of the image. This model is particularly well\u2010suited for biomimetic wearable devices, where real\u2010time visual assistance plays a crucial role in enhancing the user experience. In the decoding phase, a bidirectional decoding structure with an adaptive masking module is designed to dynamically adjust the focus on past and future semantic information, enabling the model to combine historical and future context effectively for generating more accurate and relevant descriptions. Experimental results on the MSCOCO dataset show that the model outperforms the baseline, achieving a 2.1 percentage point improvement in the CIDEr metric. Comprehensive hardware evaluations on the wearable platform demonstrate real\u2010time efficiency with minimal memory footprint, significantly outperforming state\u2010of\u2010the\u2010art models in edge deployment scenarios.<\/jats:p>","DOI":"10.1002\/aisy.202500104","type":"journal-article","created":{"date-parts":[[2025,8,4]],"date-time":"2025-08-04T03:49:20Z","timestamp":1754279360000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["End\u2010to\u2010End Attention\u2010Enhanced Transformer for Image Captioning in Biomimetic Wearable Devices"],"prefix":"10.1002","volume":"7","author":[{"given":"Yongyang","family":"Yin","sequence":"first","affiliation":[{"name":"School of Electronic Science and Engineering Nanjing University  Nanjing Jiangsu 210023 China"}]},{"given":"Hengyu","family":"Cao","sequence":"additional","affiliation":[{"name":"School of Information and Control Engineering China University of Mining and Technology  Xuzhou Jiangsu 221008 China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0107-4329","authenticated-orcid":false,"given":"Jun","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Electronic Science and Engineering Nanjing University  Nanjing Jiangsu 210023 China"}]}],"member":"311","published-online":{"date-parts":[[2025,8,3]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.62110\/sciencein.jist.2024.v12.811"},{"key":"e_1_2_10_3_1","first-page":"45466","volume":"2022","author":"Andrei Neculai Y. C.","year":"2022","journal-title":"Comput. Vis. Pattern Recognit."},{"key":"e_1_2_10_4_1","unstructured":"Z.Wang X.Li J.Yang Y.Liu S.Jiang inProc. of the IEEE\/CVF Int. Conf. on Computer Vision Paris FR 2\u20136 October2023 p.15625."},{"key":"e_1_2_10_5_1","first-page":"12996","volume":"45","author":"Xu Yang H. Z.","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_2_10_6_1","unstructured":"Z.Fang J.Wang X.Hu L.Liang Z.Gan L.Wang Z.Liu inProc. of the IEEE\/CVF Conf. on Computer Vision and Pattern Recognition New Orleans LA USA 18\u201324 June2022 p.18009."},{"key":"e_1_2_10_7_1","unstructured":"J. X.Yiyu Wang Y.Sun inAAAI Conf. on Artificial Intelligence Virtual 22 February\u20131 March2022 p.2585."},{"key":"e_1_2_10_8_1","first-page":"3104","volume":"27","author":"Sutskever I.","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_2_10_9_1","doi-asserted-by":"crossref","unstructured":"P.Anderson X.He C.Buehler D.Teney M.Johnson S.Gould L.Zhang inProc. of The IEEE Conf. on Computer Vision and Pattern Recognition Salt Lake City UT USA18\u201322 June2018 p.6077.","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_2_10_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3067449"},{"key":"e_1_2_10_11_1","volume":"28","author":"Ren S.","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"journal-title":"Arxiv Preprint Arxiv","year":"2025","author":"Huang Y.","key":"e_1_2_10_12_1"},{"key":"e_1_2_10_13_1","first-page":"1","volume":"8","author":"Xu Z.","year":"2024","journal-title":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."},{"key":"e_1_2_10_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3401450"},{"key":"e_1_2_10_15_1","first-page":"759","volume":"37","author":"Wei G.","year":"2023","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"e_1_2_10_16_1","first-page":"480","volume-title":"European Conf. On Computer Vision Cham","author":"Yang R.","year":"2022"},{"journal-title":"Arxiv Preprint Arxiv","year":"2024","author":"Merrill M. A.","key":"e_1_2_10_17_1"},{"key":"e_1_2_10_18_1","doi-asserted-by":"crossref","unstructured":"J.Zheng J.Zhang K.Yang K.Peng R.Stiefelhagen inIEEE Int. Conf. on Robotics and Automation (ICRA) Yokohama Japan 13\u201317 May2024 pp.2303\u20102309.","DOI":"10.1109\/ICRA57147.2024.10610333"},{"key":"e_1_2_10_19_1","unstructured":"N.Hyeon\u2010Woo K.Yu\u2010Ji B.Heo D.Han S. J.Oh T. H.Oh inProc. of the IEEE\/CVF Int. Conf. on Computer Vision Paris FR 2\u20136 October2023 pp.5807\u20135818."},{"key":"e_1_2_10_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2020.103234"},{"key":"e_1_2_10_21_1","first-page":"735","volume-title":"Chinese Conf. on Pattern Recognition and Computer Vision (PRCV)","author":"Yu Q.","year":"2022"},{"key":"e_1_2_10_22_1","unstructured":"S.Fawaz E.Mahmoud inBritish Machine Vision Conf. Cardiff UK 9\u201312 September2019 p.75."},{"journal-title":"Arxiv Preprint Arxiv","year":"2022","author":"Zhou Y.","key":"e_1_2_10_23_1"},{"key":"e_1_2_10_24_1","unstructured":"P.Zhang X.Li X.Hu J.Yang L.Zhang L.Wang Y.Choi J.Gao inProc. of the IEEE\/CVF Conf. on Computer Vision and Pattern Recognition Nashville TN USA 10\u201325 June2021 p.5579."},{"key":"e_1_2_10_25_1","doi-asserted-by":"crossref","unstructured":"K.He X.Zhang S.Ren J.Sun inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition Las Vegas NV USA 27\u201330 June2016 pp.770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"journal-title":"Arxiv","year":"2020","author":"Ruotian L.","key":"e_1_2_10_26_1"},{"key":"e_1_2_10_27_1","first-page":"740","volume":"8693","author":"Tsung\u2010Yi L.","year":"2015","journal-title":"Lect. Notes Comput. Sci."},{"key":"e_1_2_10_28_1","doi-asserted-by":"crossref","unstructured":"K.Papineni S.Roukos T.Ward W. J.Zhu inProc. of the 40th Annual Meeting of the Association for Computational Linguistics Philadelphia PA USA 6\u201312 July2002 p.311.","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_2_10_29_1","unstructured":"C. Y.Lin inText Summarization Branches Out Barcelona Spain 3\u20137 July2004 p.74."},{"key":"e_1_2_10_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3221290"},{"key":"e_1_2_10_31_1","unstructured":"S.Banerjee A.Lavie inProc. of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization Ann Arbor Michigan 5\u20139 June2005 p.65."},{"key":"e_1_2_10_32_1","doi-asserted-by":"crossref","unstructured":"R.Vedantam C.Lawrence Zitnick D.Parikh inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition Boston MA USA 7\u201312 June2015 p.4566.","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"e_1_2_10_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_31"},{"key":"e_1_2_10_34_1","first-page":"10685","author":"Xu Y.","year":"2019","journal-title":"Comput. Res. Repository"},{"key":"e_1_2_10_35_1","unstructured":"H.Lun W.Wenmin C.Jie W.Xiao\u2010Yong inProc. of the IEEE\/CVF Int. Conf. on Computer Vision Seoul Korea (South) 27 October\u20132 November2019 p.4633."},{"key":"e_1_2_10_36_1","unstructured":"Y.Pan T.Yao Y.Li T.Mei inComputer Vision and Pattern Recognition Seattle WA USA 13\u201019 June2020 p.10968."},{"key":"e_1_2_10_37_1","doi-asserted-by":"publisher","DOI":"10.3390\/app12094502"},{"key":"e_1_2_10_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.118474"},{"key":"e_1_2_10_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2023.109420"},{"key":"e_1_2_10_40_1","doi-asserted-by":"crossref","unstructured":"X.Li X.Yin C.Li P.Zhang X.Hu L.Zhang L.Wang H.Hu L.Dong F.Wei Y.Choi J.Gao in16th European Conf. Springer International Publishing Glasgow UK 23\u201028 August2020 p.121.","DOI":"10.1007\/978-3-030-58577-8_8"},{"key":"e_1_2_10_41_1","doi-asserted-by":"crossref","unstructured":"J.Li D. M.Vo A.Sugimoto H.Nakayama inProc. of the IEEE\/CVF Conf. on Computer Vision and Pattern Recognition Seattle WA USA 17\u201021 June2024 p.13733.","DOI":"10.1109\/CVPR52733.2024.01303"},{"key":"e_1_2_10_42_1","unstructured":"J.Li D.Li C.Xiong S.Hoi inInt. Conf. on Machine Learning Virtual 17\u201023 July2022 p.12888."},{"key":"e_1_2_10_43_1","unstructured":"Microsoft COCO: Common Objects in Context inEuropean Conf. on Computer Vision (ECCV) Amsterdam NLhttp:\/\/mscoco.org\/dataset\/#captions\u2010leaderboard(accessed: July 25 2016)."}],"container-title":["Advanced Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/advanced.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/aisy.202500104","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,13]],"date-time":"2025-12-13T06:04:56Z","timestamp":1765605896000},"score":1,"resource":{"primary":{"URL":"https:\/\/advanced.onlinelibrary.wiley.com\/doi\/10.1002\/aisy.202500104"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,3]]},"references-count":42,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1002\/aisy.202500104"],"URL":"https:\/\/doi.org\/10.1002\/aisy.202500104","archive":["Portico"],"relation":{},"ISSN":["2640-4567","2640-4567"],"issn-type":[{"type":"print","value":"2640-4567"},{"type":"electronic","value":"2640-4567"}],"subject":[],"published":{"date-parts":[[2025,8,3]]},"assertion":[{"value":"2025-01-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e202500104"}}