{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T08:00:23Z","timestamp":1761897623200,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":46,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Kunshan Government Research (KGR) Funding in AY 2020\/2021."}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,18]]},"DOI":"10.1145\/3462244.3479908","type":"proceedings-article","created":{"date-parts":[[2021,10,15]],"date-time":"2021-10-15T15:01:58Z","timestamp":1634310118000},"page":"530-538","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Cross-modal Assisted Training for Abnormal Event Recognition in Elevators"],"prefix":"10.1145","author":[{"given":"Xinmeng","family":"Chen","sequence":"first","affiliation":[{"name":"Data Science Research Center, Duke Kunshan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuchen","family":"Gong","sequence":"additional","affiliation":[{"name":"Data Science Research Center, Duke Kunshan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ming","family":"Cheng","sequence":"additional","affiliation":[{"name":"Data Science Research Center, Duke Kunshan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qi","family":"Deng","sequence":"additional","affiliation":[{"name":"Technology Asia and Escalator, KONE Elevators Co., Ltd., China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ming","family":"Li","sequence":"additional","affiliation":[{"name":"Data Science Research Center, Duke Kunshan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,18]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Hamid Reza\u00a0Vaezi Joze, and Vishal\u00a0M Patel","author":"Abavisani Mahdi","year":"2019","unstructured":"Mahdi Abavisani , Hamid Reza\u00a0Vaezi Joze, and Vishal\u00a0M Patel . 2019 . Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition With Multimodal Training. In 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society , 1165\u20131174. Mahdi Abavisani, Hamid Reza\u00a0Vaezi Joze, and Vishal\u00a0M Patel. 2019. Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition With Multimodal Training. In 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 1165\u20131174."},{"key":"e_1_3_2_2_2_1","volume-title":"Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls. Expert systems with Applications 42, 21","author":"Arroyo Roberto","year":"2015","unstructured":"Roberto Arroyo , J\u00a0Javier Yebes , Luis\u00a0 M Bergasa , Iv\u00e1n\u00a0 G Daza , and Javier Almaz\u00e1n . 2015. Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls. Expert systems with Applications 42, 21 ( 2015 ), 7991\u20138005. Roberto Arroyo, J\u00a0Javier Yebes, Luis\u00a0M Bergasa, Iv\u00e1n\u00a0G Daza, and Javier Almaz\u00e1n. 2015. Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls. Expert systems with Applications 42, 21 (2015), 7991\u20138005."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/RADAR42522.2020.9114871"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.910878"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-006-0009-9"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018001"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2009.5413833"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/NAECON.2014.7045797"},{"key":"e_1_3_2_2_9_1","volume-title":"2014 international conference on computer vision theory and applications (VISAPP), Vol.\u00a02. IEEE, 478\u2013485","author":"Deniz Oscar","year":"2014","unstructured":"Oscar Deniz , Ismael Serrano , Gloria Bueno , and Tae-Kyun Kim . 2014 . Fast violence detection in video . In 2014 international conference on computer vision theory and applications (VISAPP), Vol.\u00a02. IEEE, 478\u2013485 . Oscar Deniz, Ismael Serrano, Gloria Bueno, and Tae-Kyun Kim. 2014. Fast violence detection in video. In 2014 international conference on computer vision theory and applications (VISAPP), Vol.\u00a02. IEEE, 478\u2013485."},{"key":"e_1_3_2_2_10_1","volume-title":"Violence detection using oriented violent flows. Image and vision computing 48","author":"Gao Yuan","year":"2016","unstructured":"Yuan Gao , Hong Liu , Xiaohu Sun , Can Wang , and Yi Liu . 2016. Violence detection using oriented violent flows. Image and vision computing 48 ( 2016 ), 37\u201341. Yuan Gao, Hong Liu, Xiaohu Sun, Can Wang, and Yi Liu. 2016. Violence detection using oriented violent flows. Image and vision computing 48 (2016), 37\u201341."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.265"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-12012-6_48"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00393"},{"key":"e_1_3_2_2_14_1","volume-title":"Projectors for Intel\u00ae RealSense\u2122 Depth Cameras D4xx. Intel Support","author":"Grunnet-Jepsen Anders","year":"2018","unstructured":"Anders Grunnet-Jepsen , John\u00a0 N Sweetser , Paul Winer , Akihiro Takagi , and John Woodfill . 2018. Projectors for Intel\u00ae RealSense\u2122 Depth Cameras D4xx. Intel Support , Interl Corporation : Santa Clara, CA, USA ( 2018 ). Anders Grunnet-Jepsen, John\u00a0N Sweetser, Paul Winer, Akihiro Takagi, and John Woodfill. 2018. Projectors for Intel\u00ae RealSense\u2122 Depth Cameras D4xx. Intel Support, Interl Corporation: Santa Clara, CA, USA (2018)."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00685"},{"key":"e_1_3_2_2_16_1","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531(2015).  Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531(2015)."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683898"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"crossref","unstructured":"Ahmad Jalal Yeon-Ho Kim Yong-Joong Kim Shaharyar Kamal and Daijin Kim. 2017. Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern recognition 61(2017) 295\u2013308.  Ahmad Jalal Yeon-Ho Kim Yong-Joong Kim Shaharyar Kamal and Daijin Kim. 2017. Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern recognition 61(2017) 295\u2013308.","DOI":"10.1016\/j.patcog.2016.08.003"},{"key":"e_1_3_2_2_19_1","unstructured":"Chunhua Jia Wenhai Yi Yu Wu Hui Huang Lei Zhang and Leilei Wu. 2020. Abnormal activity capture from passenger flow of elevator based on unsupervised learning and fine-grained multi-label recognition. arXiv preprint arXiv:2006.15873(2020) arXiv\u20132006.  Chunhua Jia Wenhai Yi Yu Wu Hui Huang Lei Zhang and Leilei Wu. 2020. Abnormal activity capture from passenger flow of elevator based on unsupervised learning and fine-grained multi-label recognition. arXiv preprint arXiv:2006.15873(2020) arXiv\u20132006."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/WIFS.2014.7084329"},{"key":"e_1_3_2_2_22_1","unstructured":"Li Liu and Ling Shao. 2013. Learning discriminative representations from RGB-D video data. In Twenty-third international joint conference on artificial intelligence.  Li Liu and Ling Shao. 2013. Learning discriminative representations from RGB-D video data. In Twenty-third international joint conference on artificial intelligence."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2017.04.015"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2017.09.029"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5539872"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/AVSS.2014.6918692"},{"key":"e_1_3_2_2_27_1","volume-title":"Multilevel depth and image fusion for human activity detection","author":"Ni Bingbing","year":"2013","unstructured":"Bingbing Ni , Yong Pei , Pierre Moulin , and Shuicheng Yan . 2013. Multilevel depth and image fusion for human activity detection . IEEE transactions on cybernetics 43, 5 ( 2013 ), 1383\u20131394. Bingbing Ni, Yong Pei, Pierre Moulin, and Shuicheng Yan. 2013. Multilevel depth and image fusion for human activity detection. IEEE transactions on cybernetics 43, 5 (2013), 1383\u20131394."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23678-5_39"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356528"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2738401"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/DICTA.2014.7008100"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/WAINA.2014.18"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.14257\/ijsia.2014.8.5.04"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"e_1_3_2_2_35_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199(2014).  Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199(2014)."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1101149.1101236"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"crossref","unstructured":"Jing Wang and Zhijie Xu. 2015. Crowd anomaly detection for automated video surveillance. (2015).  Jing Wang and Zhijie Xu. 2015. Crowd anomaly detection for automated video surveillance. (2015).","DOI":"10.1049\/ic.2015.0102"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2014.2315971"},{"key":"e_1_3_2_2_40_1","volume-title":"Eleview: An Active Elevator Monitoring Vision System.,.. In MVA. 253\u2013256.","author":"Xiao Ping","year":"1996","unstructured":"Ping Xiao , Maylor\u00a0 KH Leung , and Kok\u00a0Cheong Wong . 1996 . Eleview: An Active Elevator Monitoring Vision System.,.. In MVA. 253\u2013256. Ping Xiao, Maylor\u00a0KH Leung, and Kok\u00a0Cheong Wong. 1996. Eleview: An Active Elevator Monitoring Vision System.,.. In MVA. 253\u2013256."},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2014.06.011"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICAPR.2015.7050662"},{"key":"e_1_3_2_2_43_1","unstructured":"Matthew\u00a0D Zeiler. 2012. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012).  Matthew\u00a0D Zeiler. 2012. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012)."},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11633-021-1293-0"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-015-3122-3"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-10-3476-3_19"}],"event":{"name":"ICMI '21: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Montr\u00e9al QC Canada","acronym":"ICMI '21"},"container-title":["Proceedings of the 2021 International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462244.3479908","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3462244.3479908","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:54Z","timestamp":1750193334000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462244.3479908"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,18]]},"references-count":46,"alternative-id":["10.1145\/3462244.3479908","10.1145\/3462244"],"URL":"https:\/\/doi.org\/10.1145\/3462244.3479908","relation":{},"subject":[],"published":{"date-parts":[[2021,10,18]]},"assertion":[{"value":"2021-10-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}