{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T13:07:23Z","timestamp":1770815243256,"version":"3.50.1"},"reference-count":71,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,2,16]],"date-time":"2022-02-16T00:00:00Z","timestamp":1644969600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61802121, 61772191, and 61672222"],"award-info":[{"award-number":["61802121, 61772191, and 61672222"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004761","name":"Natural Science Foundation of Hunan Province","doi-asserted-by":"crossref","award":["2019JJ50057"],"award-info":[{"award-number":["2019JJ50057"]}],"id":[{"id":"10.13039\/501100004761","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Project of China","award":["2018YFB0704000"],"award-info":[{"award-number":["2018YFB0704000"]}]},{"name":"Science and Technology Key Projects of Hunan Province","award":["2015TP1004 and 2019GK2082"],"award-info":[{"award-number":["2015TP1004 and 2019GK2082"]}]},{"name":"Special Funds for the Construction of Innovative Provinces in Hunan Province of China","award":["2020SK2066 and 2019NK2022"],"award-info":[{"award-number":["2020SK2066 and 2019NK2022"]}]},{"name":"Science and Technology Project of Changsha City","award":["KQ1902051"],"award-info":[{"award-number":["KQ1902051"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,5,31]]},"abstract":"<jats:p>The newly emerging language-based video moment retrieval task aims at retrieving a target video moment from an untrimmed video given a natural language as the query. It is more applicable in reality since it is able to accurately localize a specific video moment, as compared to traditional whole video retrieval. In this work, we propose a novel solution to thoroughly investigate the language-based video moment retrieval issue under the adversarial learning. The key of our solution is to formulate the language-based video moment retrieval task as an adversarial learning problem with two tightly connected components. Specifically, a reinforcement learning is employed as a generator to produce a set of possible video moments. Meanwhile, a multi-task learning is utilized as a discriminator, which integrates inter-modal and intra-modal in a unified framework by employing a sequential update strategy. Finally, the generator and the discriminator are mutually reinforced in the adversarial learning, which is able to jointly optimize the performance of both video moment ranking and video moment localization. Extensive experimental results on two challenging benchmarks, i.e., Charades-STA and TACoS datasets, have well demonstrated the effectiveness and rationality of our proposed solution. Meanwhile, on the larger and unbiased datasets, i.e., ActivityNet Captions and ActivityNet-CD, our proposed framework exhibits excellent robustness.<\/jats:p>","DOI":"10.1145\/3478025","type":"journal-article","created":{"date-parts":[[2022,2,16]],"date-time":"2022-02-16T17:56:32Z","timestamp":1645034192000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["Moment is Important: Language-Based Video Moment Retrieval via Adversarial Learning"],"prefix":"10.1145","volume":"18","author":[{"given":"Yawen","family":"Zeng","sequence":"first","affiliation":[{"name":"Hunan University, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Da","family":"Cao","sequence":"additional","affiliation":[{"name":"Hunan University, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shaofei","family":"Lu","sequence":"additional","affiliation":[{"name":"Hunan University, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hanling","family":"Zhang","sequence":"additional","affiliation":[{"name":"Hunan University, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiao","family":"Xu","sequence":"additional","affiliation":[{"name":"CVTE Inc., Guangzhou, Guangdong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zheng","family":"Qin","sequence":"additional","affiliation":[{"name":"Hunan University, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,2,16]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.618"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3209998"},{"key":"e_1_3_2_4_2","article-title":"Social-enhanced attentive group recommendation","author":"Cao Da","year":"2019","unstructured":"Da Cao, Xiangnan He, Lianhai Miao, Guangyi Xiao, Hao Chen, and Jiao Xu. 2019. Social-enhanced attentive group recommendation. IEEE Transactions on Knowledge and Data Engineering (2019).","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351067"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413840"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413841"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1015"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018175"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240627"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018199"},{"key":"e_1_3_2_12_2","first-page":"1","article-title":"Look closer to ground better: Weakly-supervised temporal grounding of sentence in video","author":"Chen Zhenfang","year":"2020","unstructured":"Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, and Kwan-Yee K. Wong. 2020. Look closer to ground better: Weakly-supervised temporal grounding of sentence in video. arXiv preprint arXiv:2001.09308 (2020), 1\u201310.","journal-title":"arXiv preprint arXiv:2001.09308"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.5555\/3367243.3367349"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.563"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3127939"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220007"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018393"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3209981"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.5555\/3504035.3504427"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.493"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3323873.3325019"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313625"},{"key":"e_1_3_2_23_2","article-title":"Continuous control with deep reinforcement learning","author":"Lillicrap Timothy P.","year":"2015","unstructured":"Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).","journal-title":"arXiv preprint arXiv:1509.02971"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6820"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2965987"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210003"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240549"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3176647"},{"key":"e_1_3_2_29_2","first-page":"1","article-title":"Tripping through time: Efficient localization of activities in videos","author":"Meera Hahn","year":"2019","unstructured":"Hahn Meera, Kadav Asim, M. Rehg James, and Peter Graf Hans. 2019. Tripping through time: Efficient localization of activities in videos. arXiv preprint arXiv:1904-09936 (2019), 1\u201313.","journal-title":"arXiv preprint arXiv:1904-09936"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.433"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01186"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.5555\/3045390.3045594"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2019.01.012"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3284750"},{"key":"e_1_3_2_35_2","first-page":"706","volume-title":"Proceedings of the IEEE Conference on Computer Vision","author":"Ranjay Krishna","year":"2017","unstructured":"Krishna Ranjay, Hata Kenji, Ren Frederic, Fei-Fei Li, and Carlos Niebles Juan. 2017. Dense-captioning events in videos. In Proceedings of the IEEE Conference on Computer Vision. IEEE, 706\u2013715."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00207"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33718-5_11"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.327"},{"key":"e_1_3_2_39_2","article-title":"Continual learning in generative adversarial nets","author":"Seff Ari","year":"2017","unstructured":"Ari Seff, Alex Beatson, Daniel Suo, and Han Liu. 2017. Continual learning in generative adversarial nets. arXiv preprint arXiv:1705.08395 (2017).","journal-title":"arXiv preprint arXiv:1705.08395"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3337067"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3356316"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_31"},{"key":"e_1_3_2_43_2","first-page":"394","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Song Jingkuan","year":"2018","unstructured":"Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2018. Binary generative adversarial networks for image retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 394\u2013401."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.5555\/1795114.1795167"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6428"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3412236"},{"key":"e_1_3_2_47_2","article-title":"On catastrophic forgetting and mode collapse in generative adversarial networks","author":"Thanh-Tung Hoang","year":"2018","unstructured":"Hoang Thanh-Tung, Truyen Tran, and Svetha Venkatesh. 2018. On catastrophic forgetting and mode collapse in generative adversarial networks. arXiv preprint arXiv:1807.04015 (2018).","journal-title":"arXiv preprint arXiv:1807.04015"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123326"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3115432"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413975"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080786"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00042"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380098"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01155"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350919"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413862"},{"key":"e_1_3_2_57_2","article-title":"Augmented adversarial training for cross-modal retrieval","author":"Wu Yiling","year":"2020","unstructured":"Yiling Wu, Shuhui Wang, Guoli Song, and Qingming Huang. 2020. Augmented adversarial training for cross-modal retrieval. IEEE Transactions on Multimedia (2020).","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_58_2","first-page":"2986","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Xiao Shaoning","year":"2021","unstructured":"Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, and Jun Xiao. 2021. Boundary proposal network for two-stage natural language video localization. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 2986\u20132994."},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019062"},{"key":"e_1_3_2_60_2","article-title":"Local correspondence network for weakly supervised temporal sentence grounding","author":"Yang Wenfei","unstructured":"Wenfei Yang, Tianzhu Zhang, Yongdong Zhang, and Feng Wu. [n.d.]. Local correspondence network for weakly supervised temporal sentence grounding. IEEE ([n. d.]).","journal-title":"IEEE"},{"key":"e_1_3_2_61_2","first-page":"1","article-title":"A closer look at temporal sentence grounding in videos: Datasets and metrics","author":"Yuan Yitian","year":"2021","unstructured":"Yitian Yuan, Xiaohan Lan, Long Chen, Wei Liu, Xin Wang, and Wenwu Zhu. 2021. A closer look at temporal sentence grounding in videos: Datasets and metrics. arXiv preprint arXiv:2101.09028v2 (2021), 1\u201310.","journal-title":"arXiv preprint arXiv:2101.09028v2"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01030"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00225"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00134"},{"key":"e_1_3_2_65_2","first-page":"1","article-title":"Natural language video localization: A revisit in span-based question answering framework","author":"Zhang Hao","year":"2021","unstructured":"Hao Zhang, Aixin Sun, Wei Jing, Liangli Zhen, Joey Tianyi Zhou, and Rick Siow Mong Goh. 2021. Natural language video localization: A revisit in span-based question answering framework. arXiv preprint arXiv:2102.13558 (2021), 1\u201310.","journal-title":"arXiv preprint arXiv:2102.13558"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2922128"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6984"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350879"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331235"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3231739"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313609"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01174"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478025","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3478025","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:32Z","timestamp":1750188692000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478025"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,16]]},"references-count":71,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,5,31]]}},"alternative-id":["10.1145\/3478025"],"URL":"https:\/\/doi.org\/10.1145\/3478025","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,16]]},"assertion":[{"value":"2021-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-02-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}