{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T16:19:43Z","timestamp":1776183583730,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":50,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,10,19]],"date-time":"2017-10-19T00:00:00Z","timestamp":1508371200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"one thousand talents plan","award":["11150087963001"],"award-info":[{"award-number":["11150087963001"]}]},{"name":"National Basic Research grant (973)","award":["2015CB352501"],"award-info":[{"award-number":["2015CB352501"]}]},{"name":"Joint NSFC-ISF Research Program","award":["61561146397"],"award-info":[{"award-number":["61561146397"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,10,19]]},"DOI":"10.1145\/3123266.3123341","type":"proceedings-article","created":{"date-parts":[[2017,10,20]],"date-time":"2017-10-20T13:04:26Z","timestamp":1508504666000},"page":"970-978","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":56,"title":["Towards Micro-video Understanding by Joint Sequential-Sparse Modeling"],"prefix":"10.1145","author":[{"given":"Meng","family":"Liu","sequence":"first","affiliation":[{"name":"Shandong University, Jinan, China"}]},{"given":"Liqiang","family":"Nie","sequence":"additional","affiliation":[{"name":"Shandong University, Jinan, China"}]},{"given":"Meng","family":"Wang","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Baoquan","family":"Chen","sequence":"additional","affiliation":[{"name":"Shandong University, Jinan, China"}]}],"member":"320","published-online":{"date-parts":[[2017,10,19]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806332"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25446-8_4"},{"key":"e_1_3_2_1_3_1","first-page":"24","article-title":"Multimodal task-driven dictionary learning for image classification","volume":"25","author":"Bahrampour Soheil","year":"2016","unstructured":"Soheil Bahrampour , Nasser M Nasrabadi , Asok Ray , and William Kenneth Jenkins . 2016 . Multimodal task-driven dictionary learning for image classification . IEEE TIP , Vol. 25 , 1 (2016), 24 -- 38 . Soheil Bahrampour, Nasser M Nasrabadi, Asok Ray, and William Kenneth Jenkins. 2016. Multimodal task-driven dictionary learning for image classification. IEEE TIP, Vol. 25, 1 (2016), 24--38.","journal-title":"IEEE TIP"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.279181"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964314"},{"key":"e_1_3_2_1_6_1","volume-title":"Efficient classification of multi-label and imbalanced data using min-max modular classifiers","author":"Chen Ken","unstructured":"Ken Chen , Bao-Liang Lu , and James T Kwok . 2006. Efficient classification of multi-label and imbalanced data using min-max modular classifiers . In IEEE IJCNN. 1770--1775. Ken Chen, Bao-Liang Lu, and James T Kwok. 2006. Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In IEEE IJCNN. 1770--1775."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Dan Ciregan Ueli Meier and J\u00fcrgen Schmidhuber. 2012. Multi-column deep neural networks for image classification IEEE CVPR. 3642--3649. Dan Ciregan Ueli Meier and J\u00fcrgen Schmidhuber. 2012. Multi-column deep neural networks for image classification IEEE CVPR. 3642--3649.","DOI":"10.1109\/CVPR.2012.6248110"},{"key":"e_1_3_2_1_8_1","first-page":"208","article-title":"Discriminative dictionary learning with common label alignment for cross-modal retrieval","volume":"18","author":"Deng Cheng","year":"2016","unstructured":"Cheng Deng , Xu Tang , Junchi Yan , Wei Liu , and Xinbo Gao . 2016 . Discriminative dictionary learning with common label alignment for cross-modal retrieval . IEEE MM , Vol. 18 , 2 (2016), 208 -- 218 . Cheng Deng, Xu Tang, Junchi Yan, Wei Liu, and Xinbo Gao. 2016. Discriminative dictionary learning with common label alignment for cross-modal retrieval. IEEE MM, Vol. 18, 2 (2016), 208--218.","journal-title":"IEEE MM"},{"key":"e_1_3_2_1_9_1","volume-title":"Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell.","author":"Donahue Jeffrey","year":"2015","unstructured":"Jeffrey Donahue , Lisa Anne Hendricks , Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015 . Long-term recurrent convolutional networks for visual recognition and description IEEE CVPR. 2625--2634. Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description IEEE CVPR. 2625--2634."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2439281"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502224"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.963769"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244303768966139"},{"key":"e_1_3_2_1_14_1","volume-title":"Towards end-to-end speech recognition with recurrent neural networks ICML","author":"Graves Alex","unstructured":"Alex Graves and Navdeep Jaitly . 2014. Towards end-to-end speech recognition with recurrent neural networks ICML , Vol. Vol. 14 . 1764--1772. Alex Graves and Navdeep Jaitly. 2014. Towards end-to-end speech recognition with recurrent neural networks ICML, Vol. Vol. 14. 1764--1772."},{"key":"e_1_3_2_1_15_1","unstructured":"Alex Graves and J\u00fcrgen Schmidhuber. 2009. Offline handwriting recognition with multidimensional recurrent neural networks NIPS. 545--552. Alex Graves and J\u00fcrgen Schmidhuber. 2009. Offline handwriting recognition with multidimensional recurrent neural networks NIPS. 545--552."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm247"},{"key":"e_1_3_2_1_17_1","unstructured":"Sepp Hochreiter and Jiirgen Schmidhuber. 1997. LTSM can solve hard time lag problems. In NIPS. 473--479. Sepp Hochreiter and Jiirgen Schmidhuber. 1997. LTSM can solve hard time lag problems. In NIPS. 473--479."},{"key":"e_1_3_2_1_18_1","unstructured":"Viren Jain and Sebastian Seung. 2009. Natural image denoising with convolutional networks NIPS. 769--776. Viren Jain and Sebastian Seung. 2009. Natural image denoising with convolutional networks NIPS. 769--776."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2655024"},{"key":"e_1_3_2_1_22_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks NIPS. 1097--1105. Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks NIPS. 1097--1105."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1631272.1631400"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.3115\/112405.112471"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806314"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964320"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/500141.500173"},{"key":"e_1_3_2_1_29_1","first-page":"3","article-title":"Recurrent neural network based language model","volume":"2","author":"Mikolov Tomas","year":"2010","unstructured":"Tomas Mikolov , Martin Karafi\u00e1t , Lukas Burget , Jan Cernock\u1ef3 , and Sanjeev Khudanpur . 2010 . Recurrent neural network based language model . In Interspeech , Vol. Vol. 2. 3 -- 3 . Tomas Mikolov, Martin Karafi\u00e1t, Lukas Burget, Jan Cernock\u1ef3, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Interspeech, Vol. Vol. 2. 3--3.","journal-title":"Interspeech"},{"key":"e_1_3_2_1_30_1","unstructured":"Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines ICML. 807--814. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines ICML. 807--814."},{"key":"e_1_3_2_1_31_1","volume-title":"The open world of micro-videos. arXiv preprint arXiv:1603.09439","author":"Nguyen Phuc Xuan","year":"2016","unstructured":"Phuc Xuan Nguyen , Gregory Rogez , Charless Fowlkes , and Deva Ramanan . 2016. The open world of micro-videos. arXiv preprint arXiv:1603.09439 ( 2016 ). Phuc Xuan Nguyen, Gregory Rogez, Charless Fowlkes, and Deva Ramanan. 2016. The open world of micro-videos. arXiv preprint arXiv:1603.09439 (2016)."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.257"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Joseph Redmon Santosh Divvala Ross Girshick and Ali Farhadi. 2016. You only look once: Unified real-time object detection IEEE CVPR. 779--788. Joseph Redmon Santosh Divvala Ross Girshick and Ali Farhadi. 2016. You only look once: Unified real-time object detection IEEE CVPR. 779--788.","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_2_1_34_1","volume-title":"listen and learn-a multimodal LSTM for speaker identification. arXiv preprint arXiv:1602.04364","author":"Ren Jimmy","year":"2016","unstructured":"Jimmy Ren , Yongtao Hu , Yu-Wing Tai , Chuan Wang , Li Xu , Wenxiu Sun , and Qiong Yan . 2016. Look , listen and learn-a multimodal LSTM for speaker identification. arXiv preprint arXiv:1602.04364 ( 2016 ). Jimmy Ren, Yongtao Hu, Yu-Wing Tai, Chuan Wang, Li Xu, Wenxiu Sun, and Qiong Yan. 2016. Look, listen and learn-a multimodal LSTM for speaker identification. arXiv preprint arXiv:1602.04364 (2016)."},{"key":"e_1_3_2_1_35_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks NIPS. 91--99. Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks NIPS. 91--99."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010011"},{"key":"e_1_3_2_1_37_1","volume-title":"Evolino: Hybrid neuroevolution optimal linear search for sequence learning IJCAI. 853--858.","author":"Schmidhuber J\u00fcrgen","year":"2005","unstructured":"J\u00fcrgen Schmidhuber , Daan Wierstra , and Faustino Gomez . 2005 . Evolino: Hybrid neuroevolution optimal linear search for sequence learning IJCAI. 853--858. J\u00fcrgen Schmidhuber, Daan Wierstra, and Faustino Gomez. 2005. Evolino: Hybrid neuroevolution optimal linear search for sequence learning IJCAI. 853--858."},{"key":"e_1_3_2_1_38_1","volume-title":"Facenet: A unified embedding for face recognition and clustering IEEE CVPR. 815--823.","author":"Schroff Florian","year":"2015","unstructured":"Florian Schroff , Dmitry Kalenichenko , and James Philbin . 2015 . Facenet: A unified embedding for face recognition and clustering IEEE CVPR. 815--823. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering IEEE CVPR. 815--823."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Ali Sharif Razavian Hossein Azizpour Josephine Sullivan and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition IEEE CVPR. 806--813. Ali Sharif Razavian Hossein Azizpour Josephine Sullivan and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition IEEE CVPR. 806--813.","DOI":"10.1109\/CVPRW.2014.131"},{"key":"e_1_3_2_1_40_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos NIPS. 568--576. Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos NIPS. 568--576."},{"key":"e_1_3_2_1_41_1","unstructured":"Nitish Srivastava Elman Mansimov and Ruslan Salakhutdinov. 2015. Unsupervised learning of video representations using LSTMs ICML. 843--852. Nitish Srivastava Elman Mansimov and Ruslan Salakhutdinov. 2015. Unsupervised learning of video representations using LSTMs ICML. 843--852."},{"key":"e_1_3_2_1_42_1","unstructured":"Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks NIPS. 3104--3112. Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks NIPS. 3104--3112."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.220"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2009.10-08-881"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.460"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964299"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080771"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806222"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Zhongwen Xu Yi Yang and Alex G Hauptmann. 2015. A discriminative CNN video representation for event detection IEEE CVPR. 1798--1807. Zhongwen Xu Yi Yang and Alex G Hauptmann. 2015. A discriminative CNN video representation for event detection IEEE CVPR. 1798--1807.","DOI":"10.1109\/CVPR.2015.7298789"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964307"}],"event":{"name":"MM '17: ACM Multimedia Conference","location":"Mountain View California USA","acronym":"MM '17","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 25th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123341","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3123266.3123341","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T16:39:47Z","timestamp":1750955987000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123341"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,10,19]]},"references-count":50,"alternative-id":["10.1145\/3123266.3123341","10.1145\/3123266"],"URL":"https:\/\/doi.org\/10.1145\/3123266.3123341","relation":{},"subject":[],"published":{"date-parts":[[2017,10,19]]},"assertion":[{"value":"2017-10-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}