{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,15]],"date-time":"2025-11-15T10:29:00Z","timestamp":1763202540519},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,8]]},"abstract":"<jats:p>This paper proposes an approach to Dense Video Captioning (DVC) without pairwise event-sentence annotation. First, we adopt the knowledge distilled from relevant and well solved tasks to generate high-quality event proposals. Then we incorporate contrastive loss and cycle-consistency loss typically applied to cross-modal retrieval tasks to build semantic matching between the proposals and sentences, which are eventually used to train the caption generation module. In addition, the parameters of matching module are initialized via pre-training based on annotated images to improve the matching performance. Extensive experiments on ActivityNet-Caption dataset reveal the significance of distillation-based event proposal generation and cross-modal retrieval-based semantic matching to weakly supervised DVC, and demonstrate the superiority of our method to existing state-of-the-art methods.<\/jats:p>","DOI":"10.24963\/ijcai.2021\/160","type":"proceedings-article","created":{"date-parts":[[2021,8,11]],"date-time":"2021-08-11T11:00:49Z","timestamp":1628679649000},"page":"1157-1164","source":"Crossref","is-referenced-by-count":5,"title":["Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching"],"prefix":"10.24963","author":[{"given":"Bofeng","family":"Wu","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China"},{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guocheng","family":"Niu","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xinyan","family":"Xiao","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jian","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang International Studies University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hua","family":"Wu","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"number":"30","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"acronym":"IJCAI-2021","name":"Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}","start":{"date-parts":[[2021,8,19]]},"theme":"Artificial Intelligence","location":"Montreal, Canada","end":{"date-parts":[[2021,8,27]]}},"container-title":["Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2021,8,11]],"date-time":"2021-08-11T11:01:42Z","timestamp":1628679702000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2021\/160"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2021,8]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2021\/160","relation":{},"subject":[],"published":{"date-parts":[[2021,8]]}}}