{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:16:07Z","timestamp":1750220167525,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,8,19]],"date-time":"2022-08-19T00:00:00Z","timestamp":1660867200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Fundamental Research Funds for the Central University, China","award":["FRF-BD-20-11A"],"award-info":[{"award-number":["FRF-BD-20-11A"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,19]]},"DOI":"10.1145\/3561613.3561631","type":"proceedings-article","created":{"date-parts":[[2022,11,9]],"date-time":"2022-11-09T18:18:15Z","timestamp":1668017895000},"page":"113-119","source":"Crossref","is-referenced-by-count":1,"title":["VSSum: A Virtual Surveillance Dataset for Video Summary"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5000-0602","authenticated-orcid":false,"given":"Yanfei","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Computer and Communication Engineering, University of Science and Technology Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yulai","family":"Xie","sequence":"additional","affiliation":[{"name":"Data Technology, Hitachi China Research Laboratory, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Data Technology, Hitachi China Research Laboratory, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yiruo","family":"Dai","sequence":"additional","affiliation":[{"name":"Data Technology, Hitachi China Research Laboratory, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fang","family":"Ren","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, University of Science and Technology Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,11,9]]},"reference":[{"volume-title":"Retrieved","year":"2022","key":"e_1_3_2_1_1_1","unstructured":"Aglobex. Population System Full Pack . Retrieved May 22, 2022 from https:\/\/www.unrealengine.com\/marketplace\/zh-CN\/product\/population-system Aglobex. Population System Full Pack. Retrieved May 22, 2022 from https:\/\/www.unrealengine.com\/marketplace\/zh-CN\/product\/population-system"},{"key":"e_1_3_2_1_2_1","volume-title":"VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method. Pattern recognition letters 32, 1 (1","author":"Avila Sefd","year":"2011","unstructured":"Sefd Avila , Apb Lopes , A. D. Luz , and Ada Araujo . 2011 . VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method. Pattern recognition letters 32, 1 (1 January 2011), 56-68. https:\/\/doi.org\/10.1016\/j.patrec.2010.08.004 10.1016\/j.patrec.2010.08.004 Sefd Avila, Apb Lopes, A. D. Luz, and Ada Araujo. 2011. VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method. Pattern recognition letters 32, 1 (1 January 2011), 56-68. https:\/\/doi.org\/10.1016\/j.patrec.2010.08.004"},{"key":"e_1_3_2_1_3_1","volume-title":"Retrieved","author":"Carnegie Mellon University","year":"2022","unstructured":"Carnegie Mellon University . CMU Graphics Lab Motion Capture Database . Retrieved May 22, 2022 from http:\/\/mocap.cs.cmu.edu\/ Carnegie Mellon University. CMU Graphics Lab Motion Capture Database. Retrieved May 22, 2022 from http:\/\/mocap.cs.cmu.edu\/"},{"key":"e_1_3_2_1_4_1","first-page":"3","article-title":"Object Tracking across Non-Overlapping Views by Learning Inter-Camera Transfer Models","volume":"47","author":"Chen Xiaotang","year":"2014","unstructured":"Xiaotang Chen , Kaiqi Huang , and Tieniu Tan . 2014 . Object Tracking across Non-Overlapping Views by Learning Inter-Camera Transfer Models . Pattern Recognition 47 , 3 (March 2014), 1126-1137. http:\/\/doi.org\/10.1016\/j.patcog.2013.06.011 10.1016\/j.patcog.2013.06.011 Xiaotang Chen, Kaiqi Huang, and Tieniu Tan. 2014. Object Tracking across Non-Overlapping Views by Learning Inter-Camera Transfer Models. Pattern Recognition 47, 3 (March 2014), 1126-1137. http:\/\/doi.org\/10.1016\/j.patcog.2013.06.011","journal-title":"Pattern Recognition"},{"key":"e_1_3_2_1_5_1","volume-title":"Video Summarization with U-Shaped Transformer. Applied Intelligence (April","author":"Chen Yaosen","year":"2022","unstructured":"Yaosen Chen , Bing Guo , Yan Shen , Renshuang Zhou , Weichen Lu , Wei Wang , Xuming Wen , and Xinhua Suo . 2022. Video Summarization with U-Shaped Transformer. Applied Intelligence (April 2022 ), 1-17. https:\/\/doi.org\/10.1007\/s10489-022-03451-1 10.1007\/s10489-022-03451-1 Yaosen Chen, Bing Guo, Yan Shen, Renshuang Zhou, Weichen Lu, Wei Wang, Xuming Wen, and Xinhua Suo. 2022. Video Summarization with U-Shaped Transformer. Applied Intelligence (April 2022), 1-17. https:\/\/doi.org\/10.1007\/s10489-022-03451-1"},{"key":"e_1_3_2_1_6_1","first-page":"1","article-title":"Applying the Video Summarization Algorithm to Surveillance Systems","volume":"3","author":"Chung Yi-Nung","year":"2015","unstructured":"Yi-Nung Chung , Tun Chang Lu , Ming-Tsung Yeh , Yu-Xian Huang , and Chun-Yi Wu . 2015 . Applying the Video Summarization Algorithm to Surveillance Systems . Journal of Image and Graphics 3 , 1 (June 2015), 20-24. https:\/\/doi.org\/10.18178\/joig.3.1.20-24 10.18178\/joig.3.1.20-24 Yi-Nung Chung, Tun Chang Lu, Ming-Tsung Yeh, Yu-Xian Huang, and Chun-Yi Wu. 2015. Applying the Video Summarization Algorithm to Surveillance Systems. Journal of Image and Graphics 3, 1 (June 2015), 20-24. https:\/\/doi.org\/10.18178\/joig.3.1.20-24","journal-title":"Journal of Image and Graphics"},{"key":"e_1_3_2_1_7_1","first-page":"248","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A Large-Scale Hierarchical Image Database . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE , Miami, FL, USA , 248 - 255 . https:\/\/doi.org\/10.1109\/CVPR.2009.5206848 10.1109\/CVPR.2009.5206848 Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, USA, 248-255. https:\/\/doi.org\/10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_8_1","first-page":"1","volume-title":"Proceedings of the Conference on Robot Learning. PMLR","author":"Dosovitskiy Alexey","year":"2017","unstructured":"Alexey Dosovitskiy , German Ros , Felipe Codevilla , Antonio Lopez , and Vladlen Koltun . 2017 . CARLA: An Open Urban Driving Simulator . In Proceedings of the Conference on Robot Learning. PMLR , Mountain View, California , 1 - 16 . https:\/\/doi.org\/10.48550\/arXiv.1711.03938 10.48550\/arXiv.1711.03938 Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An Open Urban Driving Simulator. In Proceedings of the Conference on Robot Learning. PMLR, Mountain View, California, 1-16. https:\/\/doi.org\/10.48550\/arXiv.1711.03938"},{"key":"e_1_3_2_1_9_1","first-page":"39","volume-title":"Proceedings of the Asian Conference on Computer Vision. Springer","author":"Fajtl Jiri","year":"2018","unstructured":"Jiri Fajtl , Hajar Sadeghi Sokeh , Vasileios Argyriou , Dorothy Monekosso , and Paolo Remagnino . 2018 . Summarizing Videos with Attention . In Proceedings of the Asian Conference on Computer Vision. Springer , Cham, Perth, Australia , 39 - 54 . https:\/\/doi.org\/10.1007\/978-3-030-21074-8_4 10.1007\/978-3-030-21074-8_4 Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, and Paolo Remagnino. 2018. Summarizing Videos with Attention. In Proceedings of the Asian Conference on Computer Vision. Springer, Cham, Perth, Australia, 39-54. https:\/\/doi.org\/10.1007\/978-3-030-21074-8_4"},{"key":"e_1_3_2_1_10_1","first-page":"383","volume-title":"Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). IEEE","author":"Feng Xuming","year":"2021","unstructured":"Xuming Feng , Yaping Zhu , and Cheng Yang . 2021 . Video Summarization Based on Fusing Features and Shot Segmentation . In Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). IEEE , Beijing, China , 383 - 387 . https:\/\/doi.org\/10.1109\/IC-NIDC54101.2021.9660579 10.1109\/IC-NIDC54101.2021.9660579 Xuming Feng, Yaping Zhu, and Cheng Yang. 2021. Video Summarization Based on Fusing Features and Shot Segmentation. In Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). IEEE, Beijing, China, 383-387. https:\/\/doi.org\/10.1109\/IC-NIDC54101.2021.9660579"},{"volume-title":"Proceedings of the European Conference on Computer Vision. Springer","author":"Gygli M.","key":"e_1_3_2_1_11_1","unstructured":"M. Gygli , H. Grabner , H. Riemenschneider , and L. V. Gool . 2014. Creating Summaries from User Videos . In Proceedings of the European Conference on Computer Vision. Springer , Cham, Zurich, Switzerland, 505\u2013520. https:\/\/doi.org\/10.1007\/978-3-319-10584-0_33 10.1007\/978-3-319-10584-0_33 M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool. 2014. Creating Summaries from User Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 505\u2013520. https:\/\/doi.org\/10.1007\/978-3-319-10584-0_33"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_13_1","first-page":"1","volume-title":"Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA). IEEE","author":"Hsu Tzu-Chun","year":"2021","unstructured":"Tzu-Chun Hsu , Yi-Sheng Liao , and Chun-Rong Huang . 2021 . Video Summarization with Frame Index Vision Transformer . In Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA). IEEE , Aichi, Japan , 1 - 5 . https:\/\/doi.org\/10.23919\/MVA51890.2021.9511350 10.23919\/MVA51890.2021.9511350 Tzu-Chun Hsu, Yi-Sheng Liao, and Chun-Rong Huang. 2021. Video Summarization with Frame Index Vision Transformer. In Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA). IEEE, Aichi, Japan, 1-5. https:\/\/doi.org\/10.23919\/MVA51890.2021.9511350"},{"key":"e_1_3_2_1_14_1","article-title":"A Novel Key-Frames Selection Framework for Comprehensive Video Summarization","volume":"30","author":"Huang Cheng","year":"2019","unstructured":"Cheng Huang and Hongmei Wang . 2019 . A Novel Key-Frames Selection Framework for Comprehensive Video Summarization . IEEE Transactions on Circuits and Systems for Video Technology 30 , 2 (04 January 2019), 577-589. https:\/\/doi.org\/10.1109\/TCSVT.2019.2890899 10.1109\/TCSVT.2019.2890899 Cheng Huang and Hongmei Wang. 2019. A Novel Key-Frames Selection Framework for Comprehensive Video Summarization. IEEE Transactions on Circuits and Systems for Video Technology 30, 2 (04 January 2019), 577-589. https:\/\/doi.org\/10.1109\/TCSVT.2019.2890899","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_2_1_15_1","article-title":"User-Ranking Video Summarization with Multi-Stage Spatio\u2013Temporal Representation","volume":"28","author":"Huang Siyu","year":"2018","unstructured":"Siyu Huang , Xi Li , Zhongfei Zhang , Fei Wu , and Junwei Han . 2018 . User-Ranking Video Summarization with Multi-Stage Spatio\u2013Temporal Representation . IEEE Transactions on Image Processing 28 , 6 (21 December 2018), 2654-2664. https:\/\/doi.org\/10.1109\/TIP.2018.2889265 10.1109\/TIP.2018.2889265 Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, and Junwei Han. 2018. User-Ranking Video Summarization with Multi-Stage Spatio\u2013Temporal Representation. IEEE Transactions on Image Processing 28, 6 (21 December 2018), 2654-2664. https:\/\/doi.org\/10.1109\/TIP.2018.2889265","journal-title":"IEEE Transactions on Image Processing"},{"key":"#cr-split#-e_1_3_2_1_16_1.1","doi-asserted-by":"crossref","unstructured":"H Hwang C. Jang G. Park J. Cho and I. J. Kim. 2021. ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications. IEEE Access 4 (14 January 2021) 1-16. https:\/\/doi.org\/10.1109\/ACCESS.2021.3051842 10.1109\/ACCESS.2021.3051842","DOI":"10.1109\/ACCESS.2021.3051842"},{"key":"#cr-split#-e_1_3_2_1_16_1.2","doi-asserted-by":"crossref","unstructured":"H Hwang C. Jang G. Park J. Cho and I. J. Kim. 2021. ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications. IEEE Access 4 (14 January 2021) 1-16. https:\/\/doi.org\/10.1109\/ACCESS.2021.3051842","DOI":"10.1109\/ACCESS.2021.3051842"},{"key":"e_1_3_2_1_17_1","volume-title":"Retrieved","author":"Pi Soft","year":"2022","unstructured":"i Pi Soft LLC. iPi Mocap Studio . Retrieved May 22, 2022 from https:\/\/docs.ipisoft.com\/iPi_Mocap_Studio iPi Soft LLC. iPi Mocap Studio. Retrieved May 22, 2022 from https:\/\/docs.ipisoft.com\/iPi_Mocap_Studio"},{"key":"e_1_3_2_1_18_1","first-page":"1","article-title":"CE Video Summarization Using Relational Motion Histogram Descriptor","volume":"3","author":"Ben Ismail Mohamed Maher","year":"2015","unstructured":"Mohamed Maher Ben Ismail and Ouiem Bchir . 2015 . CE Video Summarization Using Relational Motion Histogram Descriptor . Journal of Image and Graphics 3 , 1 (June 2015), 34-39. https:\/\/doi.org\/10.18178\/joig.3.1.34-39 10.18178\/joig.3.1.34-39 Mohamed Maher Ben Ismail and Ouiem Bchir. 2015. CE Video Summarization Using Relational Motion Histogram Descriptor. Journal of Image and Graphics 3, 1 (June 2015), 34-39. https:\/\/doi.org\/10.18178\/joig.3.1.34-39","journal-title":"Journal of Image and Graphics"},{"key":"e_1_3_2_1_19_1","article-title":"Video Summarization with Attention-Based Encoder\u2013Decoder Networks","author":"Ji Zhong","year":"2019","unstructured":"Zhong Ji , Kailin Xiong , Yanwei Pang , and Xuelong Li . 2019 . Video Summarization with Attention-Based Encoder\u2013Decoder Networks . IEEE Transactions on Circuits and Systems for Video Technology 30 , June 2020 (14 March 2019), 1709-1717. https:\/\/doi.org\/10.1109\/TCSVT.2019.2904996 10.1109\/TCSVT.2019.2904996 Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li. 2019. Video Summarization with Attention-Based Encoder\u2013Decoder Networks. IEEE Transactions on Circuits and Systems for Video Technology 30, June 2020 (14 March 2019), 1709-1717. https:\/\/doi.org\/10.1109\/TCSVT.2019.2904996","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology 30"},{"key":"e_1_3_2_1_20_1","first-page":"6","article-title":"ImageNet Classification with Deep Convolutional Neural","volume":"60","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . 2012 . ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60 , 6 (June 2017), 84-90. https:\/\/doi.org\/10.1145\/3065386 10.1145\/3065386 Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60, 6 (June 2017), 84-90. https:\/\/doi.org\/10.1145\/3065386","journal-title":"Networks. Commun. ACM"},{"key":"e_1_3_2_1_21_1","first-page":"16266","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE","author":"Li Tianjiao","year":"2021","unstructured":"Tianjiao Li , Jun Liu , Wei Zhang , Yun Ni , Wenqian Wang , and Zhiheng Li . 2021 . UAV-Human A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE , Nashville, TN, USA , 16266 - 16275 . https:\/\/doi.org\/10.1109\/CVPR46437.2021.01600 10.1109\/CVPR46437.2021.01600 Tianjiao Li, Jun Liu, Wei Zhang, Yun Ni, Wenqian Wang, and Zhiheng Li. 2021. UAV-Human A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 16266-16275. https:\/\/doi.org\/10.1109\/CVPR46437.2021.01600"},{"key":"e_1_3_2_1_22_1","first-page":"2","article-title":"Keyframe-Based Video Summarization using Delaunay Clustering","volume":"6","author":"Mundur Padmavathi","year":"2006","unstructured":"Padmavathi Mundur , Yong Rao , and Yelena Yesha . 2006 . Keyframe-Based Video Summarization using Delaunay Clustering . International Journal on Digital Libraries 6 , 2 (April 2006), 219-232. https:\/\/doi.org\/10.1007\/s00799-005-0129-9 10.1007\/s00799-005-0129-9 Padmavathi Mundur, Yong Rao, and Yelena Yesha. 2006. Keyframe-Based Video Summarization using Delaunay Clustering. International Journal on Digital Libraries 6, 2 (April 2006), 219-232. https:\/\/doi.org\/10.1007\/s00799-005-0129-9","journal-title":"International Journal on Digital Libraries"},{"key":"e_1_3_2_1_23_1","first-page":"7596","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE","author":"Otani Mayu","year":"2019","unstructured":"Mayu Otani , Yuta Nakashima , Esa Rahtu , and Janne Heikkila . 2019 . Rethinking the Evaluation of Video Summaries . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE , Long Beach, CA, USA , 7596 - 7604 . https:\/\/doi.org\/10.1109\/CVPR.2019.00778 10.1109\/CVPR.2019.00778 Mayu Otani, Yuta Nakashima, Esa Rahtu, and Janne Heikkila. 2019. Rethinking the Evaluation of Video Summaries. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 7596-7604. https:\/\/doi.org\/10.1109\/CVPR.2019.00778"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1007\/978-3-642-40261-6_11","volume-title":"Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer","author":"Panagiotakis Costas","year":"2013","unstructured":"Costas Panagiotakis , Nelly Ovsepian , and Elena Michael . 2013 . Video Synopsis Based on a Sequential Distortion Minimization Method . In Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer , Berlin, Heidelberg, York, UK , 94 - 101 . https:\/\/doi.org\/10.1007\/978-3-642-40261-6_11 10.1007\/978-3-642-40261-6_11 Costas Panagiotakis, Nelly Ovsepian, and Elena Michael. 2013. Video Synopsis Based on a Sequential Distortion Minimization Method. In Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg, York, UK, 94-101. https:\/\/doi.org\/10.1007\/978-3-642-40261-6_11"},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of the European Conference on Computer Vision. Springer","author":"Potapov Danila","year":"2014","unstructured":"Danila Potapov , Matthijs Douze , Zaid Harchaoui , and Cordelia Schmid . 2014 . Category-Specific Video Summarization . In Proceedings of the European Conference on Computer Vision. Springer , Cham, Zurich, Switzerland, 540\u2013555. https:\/\/doi.org\/10.1007\/978-3-319-10599-4_35 10.1007\/978-3-319-10599-4_35 Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-Specific Video Summarization. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 540\u2013555. https:\/\/doi.org\/10.1007\/978-3-319-10599-4_35"},{"key":"e_1_3_2_1_26_1","first-page":"5179","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE","author":"Song Y.","year":"2015","unstructured":"Y. Song , J. Vallmitjana , A. Stent , and A. Jaimes . 2015. TVSum: Summarizing Web Videos Using Titles . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE , Boston, MA, USA , 5179 - 5187 . https:\/\/doi.org\/10.1109\/CVPR. 2015 .7299154 10.1109\/CVPR.2015.7299154 Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, USA, 5179-5187. https:\/\/doi.org\/10.1109\/CVPR.2015.7299154"},{"key":"e_1_3_2_1_27_1","first-page":"6479","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE","author":"Sultani Waqas","year":"2018","unstructured":"Waqas Sultani , Chen Chen , and Mubarak Shah . 2018 . Real-World Anomaly Detection in Surveillance Videos . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE , Salt Lake City, UT, USA , 6479 - 6488 . https:\/\/doi.org\/10.1109\/CVPR.2018.00678 10.1109\/CVPR.2018.00678 Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 6479-6488. https:\/\/doi.org\/10.1109\/CVPR.2018.00678"},{"key":"e_1_3_2_1_28_1","first-page":"1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE","author":"Szegedy Christian","year":"2015","unstructured":"Christian Szegedy , Wei Liu , Yangqing Jia , Pierre Sermanet , Scott Reed , Dragomir Anguelov , Dumitru Erhan , Vincent Vanhoucke , and Andrew Rabinovich . 2015 . Going Deeper with Convolutions . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE , Boston, MA , 1 - 9 . https:\/\/doi.org\/10.1109\/CVPR.2015.7298594 10.1109\/CVPR.2015.7298594 Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 1-9. https:\/\/doi.org\/10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_1_29_1","first-page":"1","volume-title":"Proceedings of the 2021 IEEE Western New York Image and Signal Processing Workshop (WNYISPW). IEEE","author":"Yeh Ryan","year":"2021","unstructured":"Ryan Yeh and Alexander Loui . 2021 . Synthesizing and Manipulating Natural Videos Using Image-to-Image Translation . In Proceedings of the 2021 IEEE Western New York Image and Signal Processing Workshop (WNYISPW). IEEE , Rochester, NY, USA , 1 - 5 . https:\/\/doi.org\/10.1109\/WNYISPW53194.2021.9661282 10.1109\/WNYISPW53194.2021.9661282 Ryan Yeh and Alexander Loui. 2021. Synthesizing and Manipulating Natural Videos Using Image-to-Image Translation. In Proceedings of the 2021 IEEE Western New York Image and Signal Processing Workshop (WNYISPW). IEEE, Rochester, NY, USA, 1-5. https:\/\/doi.org\/10.1109\/WNYISPW53194.2021.9661282"},{"key":"e_1_3_2_1_30_1","first-page":"609","volume-title":"Proceedings of the European Conference on Computer Vision. Springer","author":"Zeng Kuo-Hao","year":"2016","unstructured":"Kuo-Hao Zeng , Tseng-Hung Chen , Juan Carlos Niebles , and Min Sun . 2016 . Title Generation for User Generated Videos . In Proceedings of the European Conference on Computer Vision. Springer , Cham, Amsterdam, The Netherlands , 609 - 625 . https:\/\/doi.org\/10.1007\/978-3-319-46475-6_38 10.1007\/978-3-319-46475-6_38 Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, and Min Sun. 2016. Title Generation for User Generated Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Amsterdam, The Netherlands, 609-625. https:\/\/doi.org\/10.1007\/978-3-319-46475-6_38"},{"key":"e_1_3_2_1_31_1","first-page":"833","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE","author":"Zhang Cong","year":"2015","unstructured":"Cong Zhang , Hongsheng Li , Xiaogang Wang , and Xiaokang Yang . 2015 . Cross-Scene Crowd Counting via Deep Convolutional Neural Ntworks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE , Boston, MA , 833 - 841 . https:\/\/doi.org\/10.1109\/CVPR.2015.7298684 10.1109\/CVPR.2015.7298684 Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-Scene Crowd Counting via Deep Convolutional Neural Ntworks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 833-841. https:\/\/doi.org\/10.1109\/CVPR.2015.7298684"},{"key":"e_1_3_2_1_32_1","first-page":"11501","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE","author":"Zhang Tianyu","year":"2021","unstructured":"Tianyu Zhang , Lingxi Xie , Longhui Wei , Zijie Zhuang , Yongfei Zhang , Bo Li , and Qi Tian . 2021 . UnrealPerson: An Adaptive Pipeline Towards Costless Person Re-Identification . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE , Nashville, TN, USA , 11501 - 11510 . https:\/\/doi.org\/10.1109\/CVPR46437.2021.01134 10.1109\/CVPR46437.2021.01134 Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, and Qi Tian. 2021. UnrealPerson: An Adaptive Pipeline Towards Costless Person Re-Identification. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 11501-11510. https:\/\/doi.org\/10.1109\/CVPR46437.2021.01134"},{"key":"e_1_3_2_1_33_1","first-page":"589","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE","author":"Zhang Yingying","year":"2016","unstructured":"Yingying Zhang , Desen Zhou , Siqin Chen , Shenghua Gao , and Yi Ma . 2016 . Single-Image Crowd Counting via Multi-Column Convolutional Neural Network . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE , Las Vegas, NV, USA , 589 - 597 . https:\/\/doi.org\/10.1109\/CVPR.2016.70 10.1109\/CVPR.2016.70 Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 589-597. https:\/\/doi.org\/10.1109\/CVPR.2016.70"},{"key":"e_1_3_2_1_34_1","first-page":"4","article-title":"TTH-RNN: Tensor-Train Hierarchical Recurrent Neural Network for Video Summarization","volume":"68","author":"Zhao Bin","year":"2020","unstructured":"Bin Zhao , Xuelong Li , and Xiaoqiang Lu . 2020 . TTH-RNN: Tensor-Train Hierarchical Recurrent Neural Network for Video Summarization . IEEE Transactions on Industrial Electronics 68 , 4 (April 2021), 3629-3637. https:\/\/doi.org\/10.1109\/TIE.2020.2979573 10.1109\/TIE.2020.2979573 Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2020. TTH-RNN: Tensor-Train Hierarchical Recurrent Neural Network for Video Summarization. IEEE Transactions on Industrial Electronics 68, 4 (April 2021), 3629-3637. https:\/\/doi.org\/10.1109\/TIE.2020.2979573","journal-title":"IEEE Transactions on Industrial Electronics"},{"key":"e_1_3_2_1_35_1","article-title":"DSNet: A Flexible Detect-to-Summarize Network for Video Summarization","author":"Zhu Wencheng","year":"2021","unstructured":"Wencheng Zhu , Jiwen Lu , Jiahao Li , and Jie Zhou . 2021 . DSNet: A Flexible Detect-to-Summarize Network for Video Summarization . IEEE Transactions on Image Processing 30 (01 December 2020), 948-962. https:\/\/doi.org\/10.1109\/TIP.2020.3039886 10.1109\/TIP.2020.3039886 Wencheng Zhu, Jiwen Lu, Jiahao Li, and Jie Zhou. 2021. DSNet: A Flexible Detect-to-Summarize Network for Video Summarization. IEEE Transactions on Image Processing 30 (01 December 2020), 948-962. https:\/\/doi.org\/10.1109\/TIP.2020.3039886","journal-title":"IEEE Transactions on Image Processing 30 (01"}],"event":{"name":"ICCCV 2022: 2022 The 5th International Conference on Control and Computer Vision","acronym":"ICCCV 2022","location":"Xiamen China"},"container-title":["2022 The 5th International Conference on Control and Computer Vision"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3561613.3561631","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3561613.3561631","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:35Z","timestamp":1750186835000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3561613.3561631"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,19]]},"references-count":36,"alternative-id":["10.1145\/3561613.3561631","10.1145\/3561613"],"URL":"https:\/\/doi.org\/10.1145\/3561613.3561631","relation":{},"subject":[],"published":{"date-parts":[[2022,8,19]]}}}