{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T18:16:02Z","timestamp":1771265762688,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,2,16]],"date-time":"2022-02-16T00:00:00Z","timestamp":1644969600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100003453","name":"Natural Science Foundation of Guangdong Province","doi-asserted-by":"crossref","award":["2019A1515011181"],"award-info":[{"award-number":["2019A1515011181"]}],"id":[{"id":"10.13039\/501100003453","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Science and Technology Innovation Commission of Shenzhen","award":["JCYJ20190808162613130"],"award-info":[{"award-number":["JCYJ20190808162613130"]}]},{"name":"Shenzhen high-level talents program"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,5,31]]},"abstract":"<jats:p>\n            With the rapid growth of video data, video summarization is a promising approach to shorten a lengthy video into a compact version. Although supervised summarization approaches have achieved state-of-the-art performance, they require frame-level annotated labels. Such an annotation process is time-consuming and tedious. In this article, we propose a novel deep summarization framework named\n            <jats:italic>Deep Semantic and Attentive Network for Video Summarization<\/jats:italic>\n            (DSAVS) that can select the most semantically representative summary by minimizing the distance between video representation and text representation without any frame-level labels. Another challenge associated with video summarization tasks mainly originates from the difficulty of considering temporal information over a long time. Long Short-Term Memory (LSTM) performs well for temporal dependencies modeling but does not work well with long video clips. Therefore, we introduce a self-attention mechanism into our summarization framework to capture the long-range temporal dependencies among the frames. Extensive experiments on two popular benchmark datasets, i.e., SumMe and TVSum, show that our proposed framework outperforms other state-of-the-art unsupervised approaches and even most supervised methods.\n          <\/jats:p>","DOI":"10.1145\/3477538","type":"journal-article","created":{"date-parts":[[2022,2,16]],"date-time":"2022-02-16T17:56:32Z","timestamp":1645034192000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["Deep Semantic and Attentive Network for Unsupervised Video Summarization"],"prefix":"10.1145","volume":"18","author":[{"given":"Sheng-Hua","family":"Zhong","sequence":"first","affiliation":[{"name":"College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China"}]},{"given":"Jingxu","family":"Lin","sequence":"additional","affiliation":[{"name":"College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China"}]},{"given":"Jianglin","family":"Lu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China"}]},{"given":"Ahmed","family":"Fares","sequence":"additional","affiliation":[{"name":"College of Computer Science and Software Engineering, Shenzhen University, China and Department of Electrical Engineering, the Computer Systems Engineering Program, Faculty of Engineering at Shoubra, Benha University, Cairo, Egypt"}]},{"given":"Tongwei","family":"Ren","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology, Nanjing University, Egypt"}]}],"member":"320","published-online":{"date-parts":[[2022,2,16]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-37731-1_40"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2011.2166951"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2010.08.004"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2012.10.002"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999849"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2012.2202676"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10584-0_33"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00685"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351056"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3372278.3390695"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00473"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.04.132"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2904996"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.348"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018537"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/33.3.239"},{"key":"e_1_3_2_19_2","article-title":"Adam: A method for stochastic optimization","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969442.2969607"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999257"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018658"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.318"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2014.08.002"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999959"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00778"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10599-4_35"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.334"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1873987"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00809"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01258-8_22"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_1"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.5555\/3504035.3504703"},{"key":"e_1_3_2_34_2","first-page":"5179","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Song Yale","year":"2015","unstructured":"Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. TVSum: Summarizing web videos using titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5179\u20135187."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.5555\/3504035.3504062"},{"key":"e_1_3_2_39_2","article-title":"Videoset: Video summary evaluation through text","author":"Yeung Serena","year":"2014","unstructured":"Serena Yeung, Alireza Fathi, and Fei-Fei Li. 2014. Videoset: Video summary evaluation through text. arXiv preprint arXiv:1406.5824.","journal-title":"arXiv preprint arXiv:1406.5824"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019143"},{"key":"e_1_3_2_41_2","first-page":"4694","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Ng Joe Yue-Hei","year":"2015","unstructured":"Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4694\u20134702."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46478-7_47"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_42"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123328"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00773"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.12.040"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.5555\/3504035.3504964"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.11"},{"key":"e_1_3_2_49_2","volume-title":"CRC Standard Probability and Statistics Tables and Formulae","author":"Zwillinger Daniel","year":"1999","unstructured":"Daniel Zwillinger and Stephen Kokoska. 1999. CRC Standard Probability and Statistics Tables and Formulae. CRC Press."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477538","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3477538","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:37Z","timestamp":1750183837000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477538"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,16]]},"references-count":48,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,5,31]]}},"alternative-id":["10.1145\/3477538"],"URL":"https:\/\/doi.org\/10.1145\/3477538","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,16]]},"assertion":[{"value":"2020-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-02-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}