{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:17:53Z","timestamp":1763468273763,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2015,8,24]],"date-time":"2015-08-24T00:00:00Z","timestamp":1440374400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2015,8,24]]},"abstract":"<jats:p>\n            In this article, we present a novel approach to segment discriminative patches in human activity videos. First, we adopt the spatio-temporal interest points (STIPs) to represent significant motion patterns in the video sequence. Then, nonnegative sparse coding is exploited to generate a sparse representation of each STIP descriptor. We construct the feature vector for each video by applying a two-stage sum-pooling and\n            <jats:italic>l<\/jats:italic>\n            <jats:sub>2<\/jats:sub>\n            -normalization operation. After training a multi-class classifier through the error-correcting code SVM, the discriminative portion of each video is determined as the patch that has the highest confidence while also being correctly classified according to the video category. Experimental results show that the video patches extracted by our method are more separable, while preserving the perceptually relevant portion of each activity.\n          <\/jats:p>","DOI":"10.1145\/2750780","type":"journal-article","created":{"date-parts":[[2015,8,26]],"date-time":"2015-08-26T14:00:30Z","timestamp":1440597630000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Segmentation of Discriminative Patches in Human Activity Video"],"prefix":"10.1145","volume":"12","author":[{"given":"Bo","family":"Zhang","sequence":"first","affiliation":[{"name":"University of Trento, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nicola","family":"Conci","sequence":"additional","affiliation":[{"name":"University of Trento, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Francesco G.B.","family":"De Natale","sequence":"additional","affiliation":[{"name":"University of Trento, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,8,24]]},"reference":[{"volume-title":"Proceedings of the ECCV Workshop on Multi-Camera and Multi-Modal Sensor Fusion Algorithms and Applications.","author":"Akman O.","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2013.2270402"},{"volume-title":"Proceedings of the 12th International Conference on Computer Vision. IEEE, 1365--1372","author":"Bourdev L. D.","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-012-0534-7"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2005.844447"},{"volume-title":"Mosift: Recognizing human actions in surveillance videos. Tech. Rep. CMU-CS-09-161","year":"2009","author":"Chen M. Y.","key":"e_1_2_1_6_1"},{"volume-title":"Proceedings of the 2nd Joint International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. IEEE, 65--72","author":"Doll\u00e1r P.","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.142"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2006.881969"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2006.881969"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2010.06.010"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2007.70711"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.253"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1525\/aa.1963.65.5.02a00020"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/NNSP.2002.1030067"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/1248547.1248551"},{"volume-title":"Proceedings of the British Machine Vision Conference. BMVA Press, 99","author":"Klaeser A.","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976603762552951"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-005-1838-7"},{"volume-title":"Proceedings of the International Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.","author":"Laptev I.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","unstructured":"J. Mairal. 2012. Sparse modeling software. INRIA. http:\/\/spams-devel.gforge.inria.fr\/index.html.  J. Mairal. 2012. Sparse modeling software. INRIA. http:\/\/spams-devel.gforge.inria.fr\/index.html."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553463"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1756008"},{"volume-title":"Proceedings of the European Conference on Computer Vision. Springer, 392--405","author":"Niebles J. C.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-007-0122-4"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.24"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.342"},{"volume-title":"Proceedings of the European Conference on Computer Vision. Springer, 577--590","author":"Raptis M.","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"M. S. Ryoo and J. K. Aggarwal. 2010. UT-Interaction dataset ICPR contest on semantic description of human activities (SDHA). http:\/\/cvrc.ece.utexas.edu\/SDHA2010\/Human_Interaction.html. (2010).  M. S. Ryoo and J. K. Aggarwal. 2010. UT-Interaction dataset ICPR contest on semantic description of human activities (SDHA). http:\/\/cvrc.ece.utexas.edu\/SDHA2010\/Human_Interaction.html. (2010).","DOI":"10.1007\/978-3-642-17711-8_28"},{"volume-title":"Proceedings of the International Conference on Neural Information Processing Systems.","author":"Sapiro G.","key":"e_1_2_1_30_1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1018429.1020906"},{"volume-title":"Proceedings of the International Conference on Acoustics Speech and Signal Processing. IEEE","author":"Sprechmann P.","key":"e_1_2_1_32_1"},{"volume-title":"Proceedings of International Conference on Computer Vision and Pattern Recognition. IEEE, 3681--3688","author":"Tamrakar A.","key":"e_1_2_1_33_1"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"R. Tibshirani. 1996. Regression shrinkage and selection via the lasso. J. Royal Statistical Soci. Series B (Methodological) 267--288.  R. Tibshirani. 1996. Regression shrinkage and selection via the lasso. J. Royal Statistical Soci. Series B (Methodological) 267--288.","DOI":"10.1111\/j.2517-6161.1996.tb02080.x"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995407"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.441"},{"volume-title":"Proceedings of the British Machine Vision Conference. BMVA Press","author":"Wang H.","key":"e_1_2_1_37_1"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.345"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-37431-9_44"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88688-4_48"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2014.2365699"},{"volume-title":"Proceedings of International Conference on Computer Vision and Pattern Recognition. IEEE, 1794--1801","author":"Yang J. C.","key":"e_1_2_1_42_1"},{"volume-title":"Proceedings of International Conference on Image Processing. IEEE","author":"Zhang B.","key":"e_1_2_1_43_1"},{"volume-title":"Proceedings of SPIE on Video Surveillance and Transportation Imaging Applications. SPIE","author":"Zhang B.","key":"e_1_2_1_44_1"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2750780","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2750780","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:00:42Z","timestamp":1750230042000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2750780"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,8,24]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,8,24]]}},"alternative-id":["10.1145\/2750780"],"URL":"https:\/\/doi.org\/10.1145\/2750780","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2015,8,24]]},"assertion":[{"value":"2014-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-08-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}