{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:14:32Z","timestamp":1750220072477,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":27,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,1,6]],"date-time":"2023-01-06T00:00:00Z","timestamp":1672963200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSFC","award":["NO. 62172138, NO. 61932009"],"award-info":[{"award-number":["NO. 62172138, NO. 61932009"]}]},{"name":"The University Synergy Innovation Program of Anhui Province","award":["NO. GXXT-2021-007 & Grant No. GXXT-2020-014"],"award-info":[{"award-number":["NO. GXXT-2021-007 & Grant No. GXXT-2020-014"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,1,6]]},"DOI":"10.1145\/3582649.3582658","type":"proceedings-article","created":{"date-parts":[[2023,4,7]],"date-time":"2023-04-07T16:23:28Z","timestamp":1680884608000},"page":"175-181","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["CITE: Compact Interactive TransformEr for Multilingual Image Captioning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9013-746X","authenticated-orcid":false,"given":"Yueyuan","family":"Xu","sequence":"first","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1042-8361","authenticated-orcid":false,"given":"Zhenzhen","family":"Hu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4986-3611","authenticated-orcid":false,"given":"Yuanen","family":"Zhou","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3181-1220","authenticated-orcid":false,"given":"Shijie","family":"Hao","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5461-3986","authenticated-orcid":false,"given":"Richang","family":"Hong","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, China"}]}],"member":"320","published-online":{"date-parts":[[2023,4,7]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3444685.3446322"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01059"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475439"},{"key":"e_1_3_2_1_4_1","volume-title":"Multilingual image description with neural sequence models. arXiv preprint arXiv:1510.04709","author":"Elliott Desmond","year":"2015","unstructured":"Desmond Elliott , Stella Frank , and Eva Hasler . 2015. Multilingual image description with neural sequence models. arXiv preprint arXiv:1510.04709 ( 2015 ). Desmond Elliott, Stella Frank, and Eva Hasler. 2015. Multilingual image description with neural sequence models. arXiv preprint arXiv:1510.04709 (2015)."},{"key":"e_1_3_2_1_5_1","volume-title":"Multi30k: Multilingual english-german image descriptions. arXiv preprint arXiv:1605.00459","author":"Elliott Desmond","year":"2016","unstructured":"Desmond Elliott , Stella Frank , Khalil Sima'an , and Lucia Specia . 2016. Multi30k: Multilingual english-german image descriptions. arXiv preprint arXiv:1605.00459 ( 2016 ). Desmond Elliott, Stella Frank, Khalil Sima'an, and Lucia Specia. 2016. Multi30k: Multilingual english-german image descriptions. arXiv preprint arXiv:1605.00459 (2016)."},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV). 503\u2013519","author":"Gu Jiuxiang","year":"2018","unstructured":"Jiuxiang Gu , Shafiq Joty , Jianfei Cai , and Gang Wang . 2018 . Unpaired image captioning by language pivoting . In Proceedings of the European Conference on Computer Vision (ECCV). 503\u2013519 . Jiuxiang Gu, Shafiq Joty, Jianfei Cai, and Gang Wang. 2018. Unpaired image captioning by language pivoting. In Proceedings of the European Conference on Computer Vision (ECCV). 503\u2013519."},{"key":"e_1_3_2_1_7_1","volume-title":"Image captioning: Transforming objects into words. Advances in Neural Information Processing Systems 32","author":"Herdade Simao","year":"2019","unstructured":"Simao Herdade , Armin Kappeler , Kofi Boakye , and Joao Soares . 2019. Image captioning: Transforming objects into words. Advances in Neural Information Processing Systems 32 ( 2019 ). Simao Herdade, Armin Kappeler, Kofi Boakye, and Joao Soares. 2019. Image captioning: Transforming objects into words. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00473"},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the second conference on machine translation. 458\u2013464","author":"Jaffe Alan","year":"2017","unstructured":"Alan Jaffe . 2017 . Generating image descriptions using multilingual data . In Proceedings of the second conference on machine translation. 458\u2013464 . Alan Jaffe. 2017. Generating image descriptions using multilingual data. In Proceedings of the second conference on machine translation. 458\u2013464."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_1_11_1","volume-title":"Proceedings of the 25th ACM international conference on Multimedia. 1549\u20131557","author":"Lan Weiyu","year":"2017","unstructured":"Weiyu Lan , Xirong Li , and Jianfeng Dong . 2017 . Fluency-guided cross-lingual image captioning . In Proceedings of the 25th ACM international conference on Multimedia. 1549\u20131557 . Weiyu Lan, Xirong Li, and Jianfeng Dong. 2017. Fluency-guided cross-lingual image captioning. In Proceedings of the 25th ACM international conference on Multimedia. 1549\u20131557."},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the 2016 ACM on international conference on multimedia retrieval. 271\u2013275","author":"Li Xirong","year":"2016","unstructured":"Xirong Li , Weiyu Lan , Jianfeng Dong , and Hailong Liu . 2016 . Adding chinese captions to images . In Proceedings of the 2016 ACM on international conference on multimedia retrieval. 271\u2013275 . Xirong Li, Weiyu Lan, Jianfeng Dong, and Hailong Liu. 2016. Adding chinese captions to images. In Proceedings of the 2016 ACM on international conference on multimedia retrieval. 271\u2013275."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2896494"},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"34","author":"Liu Yuchen","year":"2020","unstructured":"Yuchen Liu , Jiajun Zhang , Hao Xiong , Long Zhou , Zhongjun He , Hua Wu , Haifeng Wang , and Chengqing Zong . 2020 . Synchronous speech recognition and speech-to-text translation with interactive decoding . In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 34 . 8417\u2013 8424. Yuchen Liu, Jiajun Zhang, Hao Xiong, Long Zhou, Zhongjun He, Hua Wu, Haifeng Wang, and Chengqing Zong. 2020. Synchronous speech recognition and speech-to-text translation with interactive decoding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8417\u2013 8424."},{"key":"e_1_3_2_1_15_1","volume-title":"A better variant of self-critical sequence training. arXiv preprint arXiv:2003.09971","author":"Luo Ruotian","year":"2020","unstructured":"Ruotian Luo . 2020. A better variant of self-critical sequence training. arXiv preprint arXiv:2003.09971 ( 2020 ). Ruotian Luo. 2020. A better variant of self-critical sequence training. arXiv preprint arXiv:2003.09971 (2020)."},{"key":"e_1_3_2_1_16_1","volume-title":"Dual-level collaborative transformer for image captioning. arXiv preprint arXiv:2101.06462","author":"Luo Yunpeng","year":"2021","unstructured":"Yunpeng Luo , Jiayi Ji , Xiaoshuai Sun , Liujuan Cao , Yongjian Wu , Feiyue Huang , Chia-Wen Lin , and Rongrong Ji. 2021. Dual-level collaborative transformer for image captioning. arXiv preprint arXiv:2101.06462 ( 2021 ). Yunpeng Luo, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Yongjian Wu, Feiyue Huang, Chia-Wen Lin, and Rongrong Ji. 2021. Dual-level collaborative transformer for image captioning. arXiv preprint arXiv:2101.06462 (2021)."},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10971\u201310980","author":"Pan Yingwei","year":"2020","unstructured":"Yingwei Pan , Ting Yao , Yehao Li , and Tao Mei . 2020 . X-linear attention networks for image captioning . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10971\u201310980 . Yingwei Pan, Ting Yao, Yehao Li, and Tao Mei. 2020. X-linear attention networks for image captioning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10971\u201310980."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.131"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350996"},{"key":"e_1_3_2_1_20_1","volume-title":"Using artificial tokens to control languages for multilingual image caption generation. arXiv preprint arXiv:1706.06275","author":"Tsutsui Satoshi","year":"2017","unstructured":"Satoshi Tsutsui and David Crandall . 2017. Using artificial tokens to control languages for multilingual image caption generation. arXiv preprint arXiv:1706.06275 ( 2017 ). Satoshi Tsutsui and David Crandall. 2017. Using artificial tokens to control languages for multilingual image caption generation. arXiv preprint arXiv:1706.06275 (2017)."},{"key":"e_1_3_2_1_21_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3350\u20133355","author":"Wang Yining","year":"2019","unstructured":"Yining Wang , Jiajun Zhang , Long Zhou , Yuchen Liu , and Chengqing Zong . 2019 . Synchronously generating two languages with interactive decoding . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3350\u20133355 . Yining Wang, Jiajun Zhang, Long Zhou, Yuchen Liu, and Chengqing Zong. 2019. Synchronously generating two languages with interactive decoding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3350\u20133355."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01094"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00553"},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 4777\u20134786","author":"Zhou Yuanen","year":"2020","unstructured":"Yuanen Zhou , Meng Wang , Daqing Liu , Zhenzhen Hu , and Hanwang Zhang . 2020 . More grounded image captioning by distilling image-text matching model . In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 4777\u20134786 . Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, and Hanwang Zhang. 2020. More grounded image captioning by distilling image-text matching model. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 4777\u20134786."},{"key":"e_1_3_2_1_26_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 3139\u20133143","author":"Zhou Yuanen","year":"2021","unstructured":"Yuanen Zhou , Yong Zhang , Zhenzhen Hu , and Meng Wang . 2021 . Semi-Autoregressive Transformer for Image Captioning . In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 3139\u20133143 . Yuanen Zhou, Yong Zhang, Zhenzhen Hu, and Meng Wang. 2021. Semi-Autoregressive Transformer for Image Captioning. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 3139\u20133143."},{"key":"e_1_3_2_1_27_1","first-page":"223","volume-title":"no. 3","author":"Prasomsuk Sukchatri","year":"2017","unstructured":"Sukchatri Prasomsuk and Puthy Mol , \" Thai to Khmer Rule-Based Machine Translation Using Reordering Word to Phrase ,\" International Journal of Computer Theory and Engineering vol. 9 , no. 3 , pp. 223 - 228 , 2017 . Sukchatri Prasomsuk and Puthy Mol, \"Thai to Khmer Rule-Based Machine Translation Using Reordering Word to Phrase,\" International Journal of Computer Theory and Engineering vol. 9, no. 3, pp. 223-228, 2017."}],"event":{"name":"ICIGP 2023: 2023 The 6th International Conference on Image and Graphics Processing","acronym":"ICIGP 2023","location":"Chongqing China"},"container-title":["Proceedings of the 2023 6th International Conference on Image and Graphics Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582649.3582658","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3582649.3582658","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:14Z","timestamp":1750183754000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582649.3582658"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,6]]},"references-count":27,"alternative-id":["10.1145\/3582649.3582658","10.1145\/3582649"],"URL":"https:\/\/doi.org\/10.1145\/3582649.3582658","relation":{},"subject":[],"published":{"date-parts":[[2023,1,6]]},"assertion":[{"value":"2023-04-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}