{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:16:06Z","timestamp":1750220166795,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":43,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,8,19]],"date-time":"2022-08-19T00:00:00Z","timestamp":1660867200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,19]]},"DOI":"10.1145\/3561613.3561617","type":"proceedings-article","created":{"date-parts":[[2022,11,9]],"date-time":"2022-11-09T18:18:15Z","timestamp":1668017895000},"page":"22-28","source":"Crossref","is-referenced-by-count":0,"title":["Hybrid-Spatial Transformer for Image Captioning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1788-3746","authenticated-orcid":false,"given":"Jincheng","family":"Zheng","sequence":"first","affiliation":[{"name":"University of Macau, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chi-Man","family":"Pun","sequence":"additional","affiliation":[{"name":"University of Macau, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,11,9]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11760-012-0340-2"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46454-1_24"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 5561\u20135570","author":"Aneja Jyoti","year":"2018","unstructured":"Jyoti Aneja , Aditya Deshpande , and Alexander\u00a0 G Schwing . 2018 . Convolutional image captioning . In Proceedings of the IEEE conference on computer vision and pattern recognition. 5561\u20135570 . Jyoti Aneja, Aditya Deshpande, and Alexander\u00a0G Schwing. 2018. Convolutional image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5561\u20135570."},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and\/or summarization. 65\u201372","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie . 2005 . METEOR: An automatic metric for MT evaluation with improved correlation with human judgments . In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and\/or summarization. 65\u201372 . Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and\/or summarization. 65\u201372."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-016-4276-3"},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8307\u20138316","author":"Cornia Marcella","year":"2019","unstructured":"Marcella Cornia , Lorenzo Baraldi , and Rita Cucchiara . 2019 . Show, control and tell: A framework for generating controllable and grounded captions . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8307\u20138316 . Marcella Cornia, Lorenzo Baraldi, and Rita Cucchiara. 2019. Show, control and tell: A framework for generating controllable and grounded captions. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8307\u20138316."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01059"},{"key":"e_1_3_2_1_9_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15561-1_2"},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4125\u20134134","author":"Feng Yang","year":"2019","unstructured":"Yang Feng , Lin Ma , Wei Liu , and Jiebo Luo . 2019 . Unsupervised image captioning . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4125\u20134134 . Yang Feng, Lin Ma, Wei Liu, and Jiebo Luo. 2019. Unsupervised image captioning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4125\u20134134."},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 6300\u20136308","author":"Gao Junlong","year":"2019","unstructured":"Junlong Gao , Shiqi Wang , Shanshe Wang , Siwei Ma , and Wen Gao . 2019 . Self-critical n-step training for image captioning . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 6300\u20136308 . Junlong Gao, Shiqi Wang, Shanshe Wang, Siwei Ma, and Wen Gao. 2019. Self-critical n-step training for image captioning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 6300\u20136308."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01034"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_16_1","volume-title":"Image captioning: Transforming objects into words. Advances in Neural Information Processing Systems 32","author":"Herdade Simao","year":"2019","unstructured":"Simao Herdade , Armin Kappeler , Kofi Boakye , and Joao Soares . 2019. Image captioning: Transforming objects into words. Advances in Neural Information Processing Systems 32 ( 2019 ). Simao Herdade, Armin Kappeler, Kofi Boakye, and Joao Soares. 2019. Image captioning: Transforming objects into words. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00378"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00473"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_1_20_1","volume-title":"International conference on machine learning. PMLR, 595\u2013603","author":"Kiros Ryan","year":"2014","unstructured":"Ryan Kiros , Ruslan Salakhutdinov , and Rich Zemel . 2014 . Multimodal neural language models . In International conference on machine learning. PMLR, 595\u2013603 . Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel. 2014. Multimodal neural language models. In International conference on machine learning. PMLR, 595\u2013603."},{"key":"e_1_3_2_1_21_1","volume-title":"Babytalk: Understanding and generating simple image descriptions","author":"Kulkarni Girish","year":"2013","unstructured":"Girish Kulkarni , Visruth Premraj , Vicente Ordonez , Sagnik Dhar , Siming Li , Yejin Choi , Alexander\u00a0 C Berg , and Tamara\u00a0 L Berg . 2013 . Babytalk: Understanding and generating simple image descriptions . IEEE transactions on pattern analysis and machine intelligence 35, 12(2013), 2891\u20132903. Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander\u00a0C Berg, and Tamara\u00a0L Berg. 2013. Babytalk: Understanding and generating simple image descriptions. IEEE transactions on pattern analysis and machine intelligence 35, 12(2013), 2891\u20132903."},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 8928\u20138937","author":"Li Guang","year":"2019","unstructured":"Guang Li , Linchao Zhu , Ping Liu , and Yi Yang . 2019 . Entangled transformer for image captioning . In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 8928\u20138937 . Guang Li, Linchao Zhu, Ping Liu, and Yi Yang. 2019. Entangled transformer for image captioning. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 8928\u20138937."},{"key":"e_1_3_2_1_23_1","volume-title":"Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74\u201381.","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin . 2004 . Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74\u201381. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74\u201381."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.345"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00754"},{"key":"e_1_3_2_1_27_1","volume-title":"Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311\u2013318","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni , Salim Roukos , Todd Ward , and Wei-Jing Zhu . 2002 . Bleu: a method for automatic evaluation of machine translation . In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311\u2013318 . Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311\u2013318."},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings of the IEEE international conference on computer vision. 1242\u20131250","author":"Pedersoli Marco","year":"2017","unstructured":"Marco Pedersoli , Thomas Lucas , Cordelia Schmid , and Jakob Verbeek . 2017 . Areas of attention for image captioning . In Proceedings of the IEEE international conference on computer vision. 1242\u20131250 . Marco Pedersoli, Thomas Lucas, Cordelia Schmid, and Jakob Verbeek. 2017. Areas of attention for image captioning. In Proceedings of the IEEE international conference on computer vision. 1242\u20131250."},{"key":"e_1_3_2_1_29_1","unstructured":"John Platt. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. (1998).  John Platt. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. (1998)."},{"key":"e_1_3_2_1_30_1","volume-title":"Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren , Kaiming He , Ross Girshick , and Jian Sun . 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 ( 2015 ). Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.131"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2017.08.051"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2019.03.055"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2928491"},{"key":"e_1_3_2_1_35_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 12516\u201312526","author":"Shuster Kurt","year":"2019","unstructured":"Kurt Shuster , Samuel Humeau , Hexiang Hu , Antoine Bordes , and Jason Weston . 2019 . Engaging image captioning via personality . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 12516\u201312526 . Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, and Jason Weston. 2019. Engaging image captioning via personality. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 12516\u201312526."},{"key":"e_1_3_2_1_36_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).  Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014)."},{"key":"e_1_3_2_1_37_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan\u00a0 N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan\u00a0N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_2_1_40_1","first-page":"1","article-title":"A new global best guided artificial bee colony algorithm with application in robot path planning","volume":"88","author":"Xu F.","year":"2020","unstructured":"F. Xu , H. Li , C.-M. Pun , H. Hu , Y. Li , Y. Song , and H. Gao . 2020 . A new global best guided artificial bee colony algorithm with application in robot path planning . Applied Soft Computing 88 (2020), 1 \u2013 31 . F. Xu, H. Li, C.-M. Pun, H. Hu, Y. Li, Y. Song, and H. Gao. 2020. A new global best guided artificial bee colony algorithm with application in robot path planning. Applied Soft Computing 88 (2020), 1\u201331.","journal-title":"Applied Soft Computing"},{"key":"e_1_3_2_1_41_1","volume-title":"International conference on machine learning. PMLR","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhudinov , Rich Zemel , and Yoshua Bengio . 2015 . Show, attend and tell: Neural image caption generation with visual attention . In International conference on machine learning. PMLR , 2048\u20132057. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. PMLR, 2048\u20132057."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01094"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_42"}],"event":{"name":"ICCCV 2022: 2022 The 5th International Conference on Control and Computer Vision","acronym":"ICCCV 2022","location":"Xiamen China"},"container-title":["2022 The 5th International Conference on Control and Computer Vision"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3561613.3561617","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3561613.3561617","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:35Z","timestamp":1750186835000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3561613.3561617"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,19]]},"references-count":43,"alternative-id":["10.1145\/3561613.3561617","10.1145\/3561613"],"URL":"https:\/\/doi.org\/10.1145\/3561613.3561617","relation":{},"subject":[],"published":{"date-parts":[[2022,8,19]]}}}