{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T10:53:58Z","timestamp":1762253638709,"version":"3.41.0"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T00:00:00Z","timestamp":1606694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Key Research Program of Frontier Sciences of CAS","award":["QYZDJSSWJSC039"],"award-info":[{"award-number":["QYZDJSSWJSC039"]}]},{"name":"Research Program of National Laboratory of Pattern Recognition","award":["Z-2018007"],"award-info":[{"award-number":["Z-2018007"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61720106006, 62072455, 61702511, 61751211, 61620106003, 61532009, U1836220, U1705262, and 61872424"],"award-info":[{"award-number":["61720106006, 62072455, 61702511, 61751211, 61620106003, 61532009, U1836220, U1705262, and 61872424"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Program of China","award":["2018AAA0100604"],"award-info":[{"award-number":["2018AAA0100604"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2020,11,30]]},"abstract":"<jats:p>Recognizing activities from egocentric multimodal data collected by wearable cameras and sensors, is gaining interest, as multimodal methods always benefit from the complementarity of different modalities. However, since high-dimensional videos contain rich high-level semantic information while low-dimensional sensor signals describe simple motion patterns of the wearer, the large modality gap between the videos and the sensor signals raises a challenge for fusing the raw data. Moreover, the lack of large-scale egocentric multimodal datasets due to the cost of data collection and annotation processes makes another challenge for employing complex deep learning models. To jointly deal with the above two challenges, we propose a knowledge-driven multimodal activity recognition framework that exploits external knowledge to fuse multimodal data and reduce the dependence on large-scale training samples. Specifically, we design a dual-GCLSTM (Graph Convolutional LSTM) and a multi-layer GCN (Graph Convolutional Network) to collectively model the relations among activities and intermediate objects. The dual-GCLSTM is designed to fuse temporal multimodal features with top-down relation-aware guidance. In addition, we apply a co-attention mechanism to adaptively attend to the features of different modalities at different timesteps. The multi-layer GCN aims to learn relation-aware classifiers of activity categories. Experimental results on three publicly available egocentric multimodal datasets show the effectiveness of the proposed model.<\/jats:p>","DOI":"10.1145\/3409332","type":"journal-article","created":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T17:49:26Z","timestamp":1608227366000},"page":"1-133","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":22,"title":["Knowledge-driven Egocentric Multimodal Activity Recognition"],"prefix":"10.1145","volume":"16","author":[{"given":"Yi","family":"Huang","sequence":"first","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences, China and Peng Cheng Laboratory, Shenzhen, China"}]},{"given":"Xiaoshan","family":"Yang","sequence":"additional","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences, China and Peng Cheng Laboratory, Shenzhen, China"}]},{"given":"Junyu","family":"Gao","sequence":"additional","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences, China and Peng Cheng Laboratory, Shenzhen, China"}]},{"given":"Jitao","family":"Sang","sequence":"additional","affiliation":[{"name":"School of Computer and Information Technology 8 Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, China and Peng Cheng Laboratory, Shenzhen, China"}]},{"given":"Changsheng","family":"Xu","sequence":"additional","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences, China and Peng Cheng Laboratory, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2020,12,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_7"},{"key":"e_1_2_1_2_1","first-page":"1","article-title":"Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors","volume":"20","author":"Bernal Edgar A.","year":"2017","unstructured":"Edgar A. Bernal , Xitong Yang , Qun Li , Jayant Kumar , Sriganesh Madhvanath , Palghat Ramesh , and Raja Bala . 2017 . Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors . IEEE Trans. Multimedia 20 , 1 (Jan. 2017), 107--118. DOI:https:\/\/doi.org\/10.1109\/TMM.2017.2726187 10.1109\/TMM.2017.2726187 Edgar A. Bernal, Xitong Yang, Qun Li, Jayant Kumar, Sriganesh Madhvanath, Palghat Ramesh, and Raja Bala. 2017. Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors. IEEE Trans. Multimedia 20, 1 (Jan. 2017), 107--118. DOI:https:\/\/doi.org\/10.1109\/TMM.2017.2726187","journal-title":"IEEE Trans. Multimedia"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2015.2409731"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499621"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298698"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2016.XII.034"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_2_1_8_1","first-page":"3","article-title":"A survey of depth and inertial sensor fusion for human action recognition","volume":"76","author":"Chen Chen","year":"2017","unstructured":"Chen Chen , Roozbeh Jafari , and Nasser Kehtarnavaz . 2017 . A survey of depth and inertial sensor fusion for human action recognition . Multimedia Tools. Applic. 76 , 3 (Feb. 2017), 4405--4425. DOI:https:\/\/doi.org\/10.1007\/s11042-015-3177-1 10.1007\/s11042-015-3177-1 Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2017. A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools. Applic. 76, 3 (Feb. 2017), 4405--4425. DOI:https:\/\/doi.org\/10.1007\/s11042-015-3177-1","journal-title":"Multimedia Tools. Applic."},{"key":"e_1_2_1_9_1","first-page":"3","article-title":"Probabilistic semantic retrieval for surveillance videos with activity graphs","volume":"21","author":"Chen Yuting","year":"2018","unstructured":"Yuting Chen , Joseph Wang , Yannan Bai , Gregory Casta\u00f1\u00f3n , and Venkatesh Saligrama . 2018 . Probabilistic semantic retrieval for surveillance videos with activity graphs . IEEE Trans. Multimedia 21 , 3 (Mar. 2018), 704--716. DOI:https:\/\/doi.org\/10.1109\/TMM.2018.2865860 10.1109\/TMM.2018.2865860 Yuting Chen, Joseph Wang, Yannan Bai, Gregory Casta\u00f1\u00f3n, and Venkatesh Saligrama. 2018. Probabilistic semantic retrieval for surveillance videos with activity graphs. IEEE Trans. Multimedia 21, 3 (Mar. 2018), 704--716. DOI:https:\/\/doi.org\/10.1109\/TMM.2018.2865860","journal-title":"IEEE Trans. Multimedia"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2016.2628346"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/230"},{"volume-title":"Proceedings of the 12th European Conference on Computer Vision (ECCV\u201912)","author":"Fathi Alireza","key":"e_1_2_1_14_1","unstructured":"Alireza Fathi , Yin Li , and James M. Rehg . 2012. Learning to recognize daily actions using gaze . In Proceedings of the 12th European Conference on Computer Vision (ECCV\u201912) . Springer, 314--327. DOI:https:\/\/doi.org\/10.1007\/978-3-642-33718-5_23 10.1007\/978-3-642-33718-5_23 Alireza Fathi, Yin Li, and James M. Rehg. 2012. Learning to recognize daily actions using gaze. In Proceedings of the 12th European Conference on Computer Vision (ECCV\u201912). Springer, 314--327. DOI:https:\/\/doi.org\/10.1007\/978-3-642-33718-5_23"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240566"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018303"},{"key":"e_1_2_1_17_1","volume-title":"Cooperation learning from multiple social networks: Consistent and complementary perspectives","author":"Guan Weili","year":"2019","unstructured":"Weili Guan , Xuemeng Song , Tian Gan , Junyu Lin , Xiaojun Chang , and Liqiang Nie . 2019. Cooperation learning from multiple social networks: Consistent and complementary perspectives . IEEE Trans. Cybern . ( 2019 ). DOI:https:\/\/doi.org\/10.1109\/TCYB.2019.2951207 10.1109\/TCYB.2019.2951207 Weili Guan, Xuemeng Song, Tian Gan, Junyu Lin, Xiaojun Chang, and Liqiang Nie. 2019. Cooperation learning from multiple social networks: Consistent and complementary perspectives. IEEE Trans. Cybern. (2019). DOI:https:\/\/doi.org\/10.1109\/TCYB.2019.2951207"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2016.7727224"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/SMC.2015.525"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2016.7552937"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3063532"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2885228"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3321505"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"volume-title":"Proceedings of the 3rd International Conference for Learning Representations (ICLR\u201915)","author":"Diederik","key":"e_1_2_1_27_1","unstructured":"Diederik P. Kingma and Jimmy Ba. 2013. Adam: A method for stochastic optimization . In Proceedings of the 3rd International Conference for Learning Representations (ICLR\u201915) . Retrieved from http:\/\/arxiv.org\/abs\/1412.6980. Diederik P. Kingma and Jimmy Ba. 2013. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations (ICLR\u201915). Retrieved from http:\/\/arxiv.org\/abs\/1412.6980."},{"volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","author":"Thomas","key":"e_1_2_1_28_1","unstructured":"Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks . In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917) . Retrieved from https:\/\/openreview.net\/forum?id=SJU4ayYgl. Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917). Retrieved from https:\/\/openreview.net\/forum?id=SJU4ayYgl."},{"key":"e_1_2_1_29_1","first-page":"1","article-title":"Collective first-person vision for automatic gaze analysis in multiparty conversations","volume":"19","author":"Kumano Shiro","year":"2016","unstructured":"Shiro Kumano , Kazuhiro Otsuka , Ryo Ishii , and Junji Yamato . 2016 . Collective first-person vision for automatic gaze analysis in multiparty conversations . IEEE Trans. Multimedia 19 , 1 (Jan. 2016), 107--122. DOI:https:\/\/doi.org\/10.1109\/TMM.2016.2608002 10.1109\/TMM.2016.2608002 Shiro Kumano, Kazuhiro Otsuka, Ryo Ishii, and Junji Yamato. 2016. Collective first-person vision for automatic gaze analysis in multiparty conversations. IEEE Trans. Multimedia 19, 1 (Jan. 2016), 107--122. DOI:https:\/\/doi.org\/10.1109\/TMM.2016.2608002","journal-title":"IEEE Trans. Multimedia"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Li Yaguang","year":"2018","unstructured":"Yaguang Li , Rose Yu , Cyrus Shahabi , and Yan Liu . 2018 . Diffusion convolutional recurrent neural network: Data-driven traffic forecasting . In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918) . Retrieved from https:\/\/openreview.net\/forum?id=SJiHXGWAZ. Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918). Retrieved from https:\/\/openreview.net\/forum?id=SJiHXGWAZ."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123341"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1249\/MSS.0b013e31825e825a"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157129"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 1894--1903","author":"Ma Minghuang","year":"2016","unstructured":"Minghuang Ma , Haoqi Fan , and Kris M. Kitani . 2016. Going deeper into first-person activity recognition . In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 1894--1903 . DOI:https:\/\/doi.org\/10.1109\/CVPR. 2016 .209 10.1109\/CVPR.2016.209 Minghuang Ma, Haoqi Fan, and Kris M. Kitani. 2016. Going deeper into first-person activity recognition. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 1894--1903. DOI:https:\/\/doi.org\/10.1109\/CVPR.2016.209"},{"key":"e_1_2_1_35_1","volume-title":"Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (Nov","author":"van der Maaten Laurens","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton . 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (Nov . 2008 ), 2579--2605. Retrieved from http:\/\/www.jmlr.org\/papers\/v9\/vandermaaten08a.html. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (Nov. 2008), 2579--2605. Retrieved from http:\/\/www.jmlr.org\/papers\/v9\/vandermaaten08a.html."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.10"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)","author":"Mettes Pascal","year":"2017","unstructured":"Pascal Mettes and Cees G. M. Snoek . 2017. Spatial-aware object embeddings for zero-shot localization and classification of actions . In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917) . 4443--4452. DOI:https:\/\/doi.org\/10.1109\/ICCV. 2017 .476 10.1109\/ICCV.2017.476 Pascal Mettes and Cees G. M. Snoek. 2017. Spatial-aware object embeddings for zero-shot localization and classification of actions. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917). 4443--4452. DOI:https:\/\/doi.org\/10.1109\/ICCV.2017.476"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01443"},{"volume-title":"Proceedings of the 16th International Conference on Information Fusion (FUSION\u201913)","author":"Morerio Pietro","key":"e_1_2_1_39_1","unstructured":"Pietro Morerio , Lucio Marcenaro , and Carlo S. Regazzoni . 2013. Hand detection in first person vision . In Proceedings of the 16th International Conference on Information Fusion (FUSION\u201913) . IEEE, 1502--1507. Pietro Morerio, Lucio Marcenaro, and Carlo S. Regazzoni. 2013. Hand detection in first person vision. In Proceedings of the 16th International Conference on Information Fusion (FUSION\u201913). IEEE, 1502--1507."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.3390\/s17112556"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.721"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.3390\/s16010072"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.2200\/S00714ED1V01Y201603ICR048"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123313"},{"key":"e_1_2_1_45_1","unstructured":"Alan V. Oppenheim. 1999. Discrete-time Signal Processing. Pearson Education India.  Alan V. Oppenheim. 1999. Discrete-time Signal Processing. Pearson Education India."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248010"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00625"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2815785"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298752"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995720"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-04167-0_33"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8803460"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the 28th International Conference on Advances in Neural Information Processing Systems (NeurIPS\u201914)","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014 . Two-stream convolutional networks for action recognition in videos . In Proceedings of the 28th International Conference on Advances in Neural Information Processing Systems (NeurIPS\u201914) . 568--576. Retrieved from https:\/\/papers.nips.cc\/paper\/5353-two-stream-convolutional-networks-for-action-recognition-in-videos. Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 28th International Conference on Advances in Neural Information Processing Systems (NeurIPS\u201914). 568--576. Retrieved from https:\/\/papers.nips.cc\/paper\/5353-two-stream-convolutional-networks-for-action-recognition-in-videos."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2844101"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2763322"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2016.54"},{"volume-title":"ConceptNet 5: A large semantic network for relational knowledge","author":"Speer Robert","key":"e_1_2_1_57_1","unstructured":"Robert Speer and Catherine Havasi . 2013. ConceptNet 5: A large semantic network for relational knowledge . In The People\u2019s Web Meets NLP. Springer , Berlin , 161--176. DOI:https:\/\/doi.org\/10.1007\/978-3-642-35085-6_6 10.1007\/978-3-642-35085-6_6 Robert Speer and Catherine Havasi. 2013. ConceptNet 5: A large semantic network for relational knowledge. In The People\u2019s Web Meets NLP. Springer, Berlin, 161--176. DOI:https:\/\/doi.org\/10.1007\/978-3-642-35085-6_6"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_2_1_59_1","volume-title":"Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199","author":"Szegedy Christian","year":"2013","unstructured":"Christian Szegedy , Wojciech Zaremba , Ilya Sutskever , Joan Bruna , Dumitru Erhan , Ian Goodfellow , and Rob Fergus . 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 ( 2013 ). Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2812802"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-012-0594-8"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2018.02.010"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2617079"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00717"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2923608"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052577"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038917"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2602758"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8461249"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00983"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3409332","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3409332","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:33Z","timestamp":1750197693000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3409332"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,30]]},"references-count":70,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,11,30]]}},"alternative-id":["10.1145\/3409332"],"URL":"https:\/\/doi.org\/10.1145\/3409332","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2020,11,30]]},"assertion":[{"value":"2020-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-12-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}