{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:21:24Z","timestamp":1750220484883,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Kunshan Government Research (KGR) Funding in AY 2020\/2021"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,18]]},"DOI":"10.1145\/3461615.3491112","type":"proceedings-article","created":{"date-parts":[[2021,12,18]],"date-time":"2021-12-18T04:57:40Z","timestamp":1639803460000},"page":"112-120","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Multimodal Dynamic Neural Network for Call for Help Recognition in Elevators"],"prefix":"10.1145","author":[{"given":"Ran","family":"Ju","sequence":"first","affiliation":[{"name":"Data Science Research Center, Duke Kunshan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huangrui","family":"Chu","sequence":"additional","affiliation":[{"name":"Data Science Research Center, Duke Kunshan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yechen","family":"Wang","sequence":"additional","affiliation":[{"name":"Data Science Research Center, Duke Kunshan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qi","family":"Deng","sequence":"additional","affiliation":[{"name":"Technology Asia and Escalator, KONE Elevators Co., Ltd, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ming","family":"Cheng","sequence":"additional","affiliation":[{"name":"Data Science Research Center, Duke Kunshan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ming","family":"Li","sequence":"additional","affiliation":[{"name":"Data Science Research Center, Duke Kunshan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,12,17]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16, 6","author":"Atrey K","year":"2010","unstructured":"Pradeep\u00a0 K Atrey , M\u00a0Anwar Hossain , Abdulmotaleb El\u00a0Saddik , and Mohan\u00a0 S Kankanhalli . 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16, 6 ( 2010 ), 345\u2013379. Pradeep\u00a0K Atrey, M\u00a0Anwar Hossain, Abdulmotaleb El\u00a0Saddik, and Mohan\u00a0S Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16, 6 (2010), 345\u2013379."},{"key":"e_1_3_2_1_2_1","volume-title":"Multimodal machine learning: A survey and taxonomy","author":"Baltru\u0161aitis Tadas","year":"2018","unstructured":"Tadas Baltru\u0161aitis , Chaitanya Ahuja , and Louis-Philippe Morency . 2018. Multimodal machine learning: A survey and taxonomy . IEEE transactions on pattern analysis and machine intelligence 41, 2( 2018 ), 423\u2013443. Tadas Baltru\u0161aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2(2018), 423\u2013443."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854370"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2017.2697077"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2682899"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00630"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/WASPAA.2013.6701819"},{"key":"e_1_3_2_1_10_1","unstructured":"Alex Graves. 2016. Adaptive computation time for recurrent neural networks. arXiv preprint arXiv:1603.08983(2016).  Alex Graves. 2016. Adaptive computation time for recurrent neural networks. arXiv preprint arXiv:1603.08983(2016)."},{"key":"e_1_3_2_1_11_1","unstructured":"Yizeng Han Gao Huang Shiji Song Le Yang Honghui Wang and Yulin Wang. 2021. Dynamic neural networks: A survey. arXiv preprint arXiv:2102.04906(2021).  Yizeng Han Gao Huang Shiji Song Le Yang Honghui Wang and Yulin Wang. 2021. Dynamic neural networks: A survey. arXiv preprint arXiv:2102.04906(2021)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"volume-title":"CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp)","author":"Hershey Shawn","key":"e_1_3_2_1_13_1","unstructured":"Shawn Hershey , Sourish Chaudhuri , Daniel\u00a0 PW Ellis , Jort\u00a0 F Gemmeke , Aren Jansen , R\u00a0Channing Moore , Manoj Plakal , Devin Platt , Rif\u00a0 A Saurous , Bryan Seybold , 2017. CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp) . IEEE , 131\u2013135. Shawn Hershey, Sourish Chaudhuri, Daniel\u00a0PW Ellis, Jort\u00a0F Gemmeke, Aren Jansen, R\u00a0Channing Moore, Manoj Plakal, Devin Platt, Rif\u00a0A Saurous, Bryan Seybold, 2017. CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 131\u2013135."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1985.1168253"},{"key":"e_1_3_2_1_15_1","volume-title":"an introduction to voice assistants. Medical reference services quarterly 37, 1","author":"Hoy B","year":"2018","unstructured":"Matthew\u00a0 B Hoy . 2018. Alexa, Siri, Cortana, and more : an introduction to voice assistants. Medical reference services quarterly 37, 1 ( 2018 ), 81\u201388. Matthew\u00a0B Hoy. 2018. Alexa, Siri, Cortana, and more: an introduction to voice assistants. Medical reference services quarterly 37, 1 (2018), 81\u201388."},{"key":"e_1_3_2_1_16_1","volume-title":"Laurens van\u00a0der Maaten, and Kilian\u00a0Q Weinberger","author":"Huang Gao","year":"2017","unstructured":"Gao Huang , Danlu Chen , Tianhong Li , Felix Wu , Laurens van\u00a0der Maaten, and Kilian\u00a0Q Weinberger . 2017 . Multi-scale de nse networks for resource efficient image classification. arXiv preprint arXiv:1703.09844(2017). Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van\u00a0der Maaten, and Kilian\u00a0Q Weinberger. 2017. Multi-scale dense networks for resource efficient image classification. arXiv preprint arXiv:1703.09844(2017)."},{"key":"e_1_3_2_1_17_1","volume-title":"Multimedia classification and event detection using double fusion. Multimedia tools and applications 71, 1","author":"Bao Lei","year":"2014","unstructured":"Zhen-zhong Lan, Lei Bao , Shoou- I Yu , Wei Liu , and Alexander\u00a0 G Hauptmann . 2014. Multimedia classification and event detection using double fusion. Multimedia tools and applications 71, 1 ( 2014 ), 333\u2013347. Zhen-zhong Lan, Lei Bao, Shoou-I Yu, Wei Liu, and Alexander\u00a0G Hauptmann. 2014. Multimedia classification and event detection using double fusion. Multimedia tools and applications 71, 1 (2014), 333\u2013347."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2015.2389618"},{"key":"e_1_3_2_1_19_1","volume-title":"DCASE 2017 challenge setup: Tasks, datasets and baseline system. In DCASE 2017-Workshop on Detection and Classification of Acoustic Scenes and Events.","author":"Mesaros Annamaria","year":"2017","unstructured":"Annamaria Mesaros , Toni Heittola , Aleksandr Diment , Benjamin Elizalde , Ankit Shah , Emmanuel Vincent , Bhiksha Raj , and Tuomas Virtanen . 2017 . DCASE 2017 challenge setup: Tasks, datasets and baseline system. In DCASE 2017-Workshop on Detection and Classification of Acoustic Scenes and Events. Annamaria Mesaros, Toni Heittola, Aleksandr Diment, Benjamin Elizalde, Ankit Shah, Emmanuel Vincent, Bhiksha Raj, and Tuomas Virtanen. 2017. DCASE 2017 challenge setup: Tasks, datasets and baseline system. In DCASE 2017-Workshop on Detection and Classification of Acoustic Scenes and Events."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2993148.2993176"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-020-09904-8"},{"key":"e_1_3_2_1_22_1","volume-title":"IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.","author":"Povey Daniel","year":"2011","unstructured":"Daniel Povey , Arnab Ghoshal , Gilles Boulianne , Lukas Burget , Ondrej Glembek , Nagendra Goel , Mirko Hannemann , Petr Motlicek , Yanmin Qian , Petr Schwarz , 2011 . The Kaldi speech recognition toolkit . In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society. Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.590"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2738401"},{"key":"e_1_3_2_1_25_1","unstructured":"Sara Sabour Nicholas Frosst and Geoffrey\u00a0E Hinton. 2017. Dynamic routing between capsules. arXiv preprint arXiv:1710.09829(2017).  Sara Sabour Nicholas Frosst and Geoffrey\u00a0E Hinton. 2017. Dynamic routing between capsules. arXiv preprint arXiv:1710.09829(2017)."},{"key":"e_1_3_2_1_26_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199(2014).  Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199(2014)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1101149.1101236"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00675"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"crossref","unstructured":"William\u00a0H Warren and Robert\u00a0R Verbrugge. 1984. Auditory perception of breaking and bouncing events: a case study in ecological acoustics.Journal of Experimental Psychology: Human perception and performance 10 5(1984) 704.  William\u00a0H Warren and Robert\u00a0R Verbrugge. 1984. Auditory perception of breaking and bouncing events: a case study in ecological acoustics.Journal of Experimental Psychology: Human perception and performance 10 5(1984) 704.","DOI":"10.1037\/0096-1523.10.5.704"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2013.09.015"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58577-8_20"},{"key":"e_1_3_2_1_34_1","volume-title":"Condconv: Conditionally parameterized convolutions for efficient inference. arXiv preprint arXiv:1904.04971(2019).","author":"Yang Brandon","year":"2019","unstructured":"Brandon Yang , Gabriel Bender , Quoc\u00a0 V Le , and Jiquan Ngiam . 2019 . Condconv: Conditionally parameterized convolutions for efficient inference. arXiv preprint arXiv:1904.04971(2019). Brandon Yang, Gabriel Bender, Quoc\u00a0V Le, and Jiquan Ngiam. 2019. Condconv: Conditionally parameterized convolutions for efficient inference. arXiv preprint arXiv:1904.04971(2019)."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-019-08457-5"}],"event":{"name":"ICMI '21: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Montreal QC Canada","acronym":"ICMI '21"},"container-title":["Companion Publication of the 2021 International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461615.3491112","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3461615.3491112","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:04Z","timestamp":1750193344000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461615.3491112"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,18]]},"references-count":35,"alternative-id":["10.1145\/3461615.3491112","10.1145\/3461615"],"URL":"https:\/\/doi.org\/10.1145\/3461615.3491112","relation":{},"subject":[],"published":{"date-parts":[[2021,10,18]]},"assertion":[{"value":"2021-12-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}