{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T14:05:54Z","timestamp":1774533954641,"version":"3.50.1"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2014,8,1]],"date-time":"2014-08-01T00:00:00Z","timestamp":1406851200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100007128","name":"Natural Science Foundation of Shaanxi Province","doi-asserted-by":"crossref","award":["2010JZ011"],"award-info":[{"award-number":["2010JZ011"]}],"id":[{"id":"10.13039\/501100007128","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100008242","name":"National ICT Australia","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100008242","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2014,8]]},"abstract":"<jats:p>While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method. We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.<\/jats:p>","DOI":"10.1145\/2632267","type":"journal-article","created":{"date-parts":[[2014,8,29]],"date-time":"2014-08-29T13:03:31Z","timestamp":1409317411000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":37,"title":["A Top-Down Approach for Video Summarization"],"prefix":"10.1145","volume":"11","author":[{"given":"Genliang","family":"Guan","sequence":"first","affiliation":[{"name":"The University of Sydney, Australia"}]},{"given":"Zhiyong","family":"Wang","sequence":"additional","affiliation":[{"name":"The University of Sydney, Australia"}]},{"given":"Shaohui","family":"Mei","sequence":"additional","affiliation":[{"name":"Northwestern Polytechnical University, China"}]},{"given":"Max","family":"Ott","sequence":"additional","affiliation":[{"name":"National ICT Australia (NICTA), Australia"}]},{"given":"Mingyi","family":"He","sequence":"additional","affiliation":[{"name":"Northwestern Polytechnical University, China"}]},{"given":"David Dagan","family":"Feng","sequence":"additional","affiliation":[{"name":"The University of Sydney, Australia"}]}],"member":"320","published-online":{"date-parts":[[2014,9,4]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Achantay R.","unstructured":"R. Achantay , S. Hemamiz , F. Estraday , and S. Susstrunky . 2009. Frequency-tuned salient region detection . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. R. Achantay, S. Hemamiz, F. Estraday, and S. Susstrunky. 2009. Frequency-tuned salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-009-0277-9"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505652"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/2964398.2964450"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291025"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the International Conference on Computer Vision Systems.","author":"Chatzichristofis S. A.","unstructured":"S. A. Chatzichristofis and Y. S. Boutalis . 2008. CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval . In Proceedings of the International Conference on Computer Vision Systems. S. A. Chatzichristofis and Y. S. Boutalis. 2008. CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In Proceedings of the International Conference on Computer Vision Systems."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2008.2009703"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2291967"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2011.2166951"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2010.08.004"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/290747.290773"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the IEEE International Conference on Image Processing.","author":"Evangelopoulos G.","unstructured":"G. Evangelopoulos , K. Rapantzikos , A. Potamianos , P. Maragos , A. Zlatintsi , and Y. Avrithis . 2008. Movie summarization based on audio-visual saliency detection . In Proceedings of the IEEE International Conference on Image Processing. G. Evangelopoulos, K. Rapantzikos, A. Potamianos, P. Maragos, A. Zlatintsi, and Y. Avrithis. 2008. Movie summarization based on audio-visual saliency detection. In Proceedings of the IEEE International Conference on Image Processing."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2267205"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1136800"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-009-0307-7"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Gong Y.","unstructured":"Y. Gong and X. Liu . 2000. Video summarization using singular value decomposition . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Y. Gong and X. Liu. 2000. Video summarization using singular value decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2012.2214871"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMEW.2012.105"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043612.2043613"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.4156\/jdcta.vol4.issue3.20"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2072298.2072068"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2005.854230"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/265563.265572"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIS.2009.124"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2014.2319778"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the ICME Workshop on Emerging Multimedia Systems and Applications (EMSA'13)","author":"Lu S.","unstructured":"S. Lu , Z. Wang , Y. Song , T. Mei , and D. D. Feng . 2013. A bag-of-importance model for video summarization . In Proceedings of the ICME Workshop on Emerging Multimedia Systems and Applications (EMSA'13) . S. Lu, Z. Wang, Y. Song, T. Mei, and D. D. Feng. 2013. A bag-of-importance model for video summarization. In Proceedings of the ICME Workshop on Emerging Multimedia Systems and Applications (EMSA'13)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.350"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2008.2009241"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11222-007-9033-z"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2005.854410"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'14)","author":"Mei S.","unstructured":"S. Mei , G. Guan , Z. Wang , M. He , X.-S. Hua , and D. D. Feng . 2014. l2,0 constrained sparse dictionary selection for video summarization . In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'14) . S. Mei, G. Guan, Z. Wang, M. He, X.-S. Hua, and D. D. Feng. 2014. l2,0 constrained sparse dictionary selection for video summarization. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'14)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487268.2487269"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2005.188"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2007.04.002"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the International Conference on Computer Vision Theory and Applications.","author":"Muja M.","unstructured":"M. Muja and D. G. Lowe . 2009. Fast approximate nearest neighbors with automatic algorithm configuration . In Proceedings of the International Conference on Computer Vision Theory and Applications. M. Muja and D. G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In Proceedings of the International Conference on Computer Vision Theory and Applications."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00799-005-0129-9"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2004.841694"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2009.2013517"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the 17th International Conference on Machine Learning.","author":"Pelleg D.","unstructured":"D. Pelleg and A. W. Moore . 2000. X-means: Extending k-means with efficient estimation of the number of clusters . In Proceedings of the 17th International Conference on Machine Learning. D. Pelleg and A. W. Moore. 2000. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1198302.1198305"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2012.2185041"},{"key":"e_1_2_1_43_1","unstructured":"YouTube Statistics. 2012. http:\/\/www.youtube.com\/yt\/press\/statistics.html.  YouTube Statistics. 2012. http:\/\/www.youtube.com\/yt\/press\/statistics.html."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1282280.1282341"},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the IEEE International Conference on Image Processing.","author":"Zhuang Y.","unstructured":"Y. Zhuang , Y. Rui , T. Huang , and S. Mehrotraw . 1998. Adaptive key frame extraction using unsupervised clustering . In Proceedings of the IEEE International Conference on Image Processing. Y. Zhuang, Y. Rui, T. Huang, and S. Mehrotraw. 1998. Adaptive key frame extraction using unsupervised clustering. In Proceedings of the IEEE International Conference on Image Processing."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2632267","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2632267","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:56:13Z","timestamp":1750229773000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2632267"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,8]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.1145\/2632267"],"URL":"https:\/\/doi.org\/10.1145\/2632267","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,8]]},"assertion":[{"value":"2013-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-09-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}