{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:18:07Z","timestamp":1750220287433,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":20,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,3,4]],"date-time":"2022-03-04T00:00:00Z","timestamp":1646352000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Science and Technology Innovation Foundation of Shenzhen","award":["JCYJ20180504165826861"],"award-info":[{"award-number":["JCYJ20180504165826861"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,3,4]]},"DOI":"10.1145\/3529466.3529488","type":"proceedings-article","created":{"date-parts":[[2022,6,4]],"date-time":"2022-06-04T16:12:24Z","timestamp":1654359144000},"page":"216-220","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Speech Emotion Recognition Exploiting ASR-based and Phonological Knowledge Representations"],"prefix":"10.1145","author":[{"given":"Shuang","family":"Liang","sequence":"first","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China, China"}]},{"given":"Xiang","family":"Xie","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China, China and Shenzhen Research Institute, Beijing Institute of Technology, China"}]},{"given":"Qingran","family":"Zhan","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China, China"}]},{"given":"Hao","family":"Cheng","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China, China"}]}],"member":"320","published-online":{"date-parts":[[2022,6,4]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"crossref","unstructured":"Ming Chen and Xudong Zhao. 2020. A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition.. In INTERSPEECH. 374\u2013378. Ming Chen and Xudong Zhao. 2020. A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition.. In INTERSPEECH. 374\u2013378.","DOI":"10.21437\/Interspeech.2020-3156"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"crossref","unstructured":"Jaejin Cho Raghavendra Pappagari Purva Kulkarni Jes\u00fas Villalba Yishay Carmiel and Najim Dehak. 2019. Deep neural networks for emotion recognition combining audio and transcripts. arXiv preprint arXiv:1911.00432(2019). Jaejin Cho Raghavendra Pappagari Purva Kulkarni Jes\u00fas Villalba Yishay Carmiel and Najim Dehak. 2019. Deep neural networks for emotion recognition combining audio and transcripts. arXiv preprint arXiv:1911.00432(2019).","DOI":"10.21437\/Interspeech.2018-2466"},{"key":"e_1_3_2_2_3_1","volume-title":"Attention-based models for speech recognition. Advances in neural information processing systems 28","author":"Chorowski K","year":"2015","unstructured":"Jan\u00a0 K Chorowski , Dzmitry Bahdanau , Dmitriy Serdyuk , Kyunghyun Cho , and Yoshua Bengio . 2015. Attention-based models for speech recognition. Advances in neural information processing systems 28 ( 2015 ). Jan\u00a0K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. Advances in neural information processing systems 28 (2015)."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"crossref","unstructured":"Han Feng Sei Ueno and Tatsuya Kawahara. 2020. End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model.. In INTERSPEECH. 501\u2013505. Han Feng Sei Ueno and Tatsuya Kawahara. 2020. End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model.. In INTERSPEECH. 501\u2013505.","DOI":"10.21437\/Interspeech.2020-1180"},{"key":"e_1_3_2_2_5_1","volume-title":"Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100(2020).","author":"Gulati Anmol","year":"2020","unstructured":"Anmol Gulati , James Qin , Chung-Cheng Chiu , Niki Parmar , Yu Zhang , Jiahui Yu , Wei Han , Shibo Wang , Zhengdong Zhang , Yonghui Wu , 2020 . Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100(2020). Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, 2020. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100(2020)."},{"key":"e_1_3_2_2_6_1","unstructured":"DN Krishna. [n.d.]. A Dual-Decoder Conformer for Multilingual Speech Recognition. ([n. d.]). DN Krishna. [n.d.]. A Dual-Decoder Conformer for Multilingual Speech Recognition. ([n. d.])."},{"key":"e_1_3_2_2_7_1","unstructured":"Yuanchao Li Tianyu Zhao and Tatsuya Kawahara. 2019. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning.. In Interspeech. 2803\u20132807. Yuanchao Li Tianyu Zhao and Tatsuya Kawahara. 2019. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning.. In Interspeech. 2803\u20132807."},{"key":"e_1_3_2_2_8_1","unstructured":"Shuiyang Mao PC Ching C-C\u00a0Jay Kuo and Tan Lee. 2020. Advancing multiple instance learning with attention modeling for categorical speech emotion recognition. arXiv preprint arXiv:2008.06667(2020). Shuiyang Mao PC Ching C-C\u00a0Jay Kuo and Tan Lee. 2020. Advancing multiple instance learning with attention modeling for categorical speech emotion recognition. arXiv preprint arXiv:2008.06667(2020)."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"e_1_3_2_2_10_1","volume-title":"IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.","author":"Povey Daniel","year":"2011","unstructured":"Daniel Povey , Arnab Ghoshal , Gilles Boulianne , Lukas Burget , Ondrej Glembek , Nagendra Goel , Mirko Hannemann , Petr Motlicek , Yanmin Qian , Petr Schwarz , 2011 . The Kaldi speech recognition toolkit . In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society. Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Lorenzo Tarantino Philip\u00a0N Garner Alexandros Lazaridis 2019. Self-Attention for Speech Emotion Recognition.. In Interspeech. 2578\u20132582. Lorenzo Tarantino Philip\u00a0N Garner Alexandros Lazaridis 2019. Self-Attention for Speech Emotion Recognition.. In Interspeech. 2578\u20132582.","DOI":"10.21437\/Interspeech.2019-2822"},{"key":"e_1_3_2_2_12_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan\u00a0 N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan\u00a0N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_2_13_1","unstructured":"Haiyang Xu Hui Zhang Kun Han Yun Wang Yiping Peng and Xiangang Li. 2019. Learning alignment for multimodal emotion recognition from speech. arXiv preprint arXiv:1909.05645(2019). Haiyang Xu Hui Zhang Kun Han Yun Wang Yiping Peng and Xiangang Li. 2019. Learning alignment for multimodal emotion recognition from speech. arXiv preprint arXiv:1909.05645(2019)."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Sung-Lin Yeh Yun-Shao Lin and Chi-Chun Lee. 2020. Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation.. In INTERSPEECH. 536\u2013540. Sung-Lin Yeh Yun-Shao Lin and Chi-Chun Lee. 2020. Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation.. In INTERSPEECH. 536\u2013540.","DOI":"10.21437\/Interspeech.2020-2524"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683483"},{"key":"e_1_3_2_2_16_1","volume-title":"Articulatory Features Based TDNN Model for Spoken Language Recognition. In 2019 International Conference on Asian Language Processing (IALP). IEEE, 308\u2013312","author":"Yu Jiawei","year":"2019","unstructured":"Jiawei Yu , Minghao Guo , Yanlu Xie , and Jinsong Zhang . 2019 . Articulatory Features Based TDNN Model for Spoken Language Recognition. In 2019 International Conference on Asian Language Processing (IALP). IEEE, 308\u2013312 . Jiawei Yu, Minghao Guo, Yanlu Xie, and Jinsong Zhang. 2019. Articulatory Features Based TDNN Model for Spoken Language Recognition. In 2019 International Conference on Asian Language Processing (IALP). IEEE, 308\u2013312."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.3390\/electronics10182259"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.3390\/electronics10243172"},{"key":"e_1_3_2_2_19_1","volume-title":"Wenet: Production first and production ready end-to-end speech recognition toolkit. arXiv e-prints","author":"Zhang Binbin","year":"2021","unstructured":"Binbin Zhang , Di Wu , Chao Yang , Xiaoyu Chen , Zhendong Peng , Xiangming Wang , Zhuoyuan Yao , Xiong Wang , Fan Yu , Lei Xie , 2021 . Wenet: Production first and production ready end-to-end speech recognition toolkit. arXiv e-prints (2021), arXiv\u20132102. Binbin Zhang, Di Wu, Chao Yang, Xiaoyu Chen, Zhendong Peng, Xiangming Wang, Zhuoyuan Yao, Xiong Wang, Fan Yu, Lei Xie, 2021. Wenet: Production first and production ready end-to-end speech recognition toolkit. arXiv e-prints (2021), arXiv\u20132102."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2011.6163982"}],"event":{"name":"ICIAI 2022: 2022 the 6th International Conference on Innovation in Artificial Intelligence","acronym":"ICIAI 2022","location":"Guangzhou China"},"container-title":["2022 the 6th International Conference on Innovation in Artificial Intelligence (ICIAI)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529466.3529488","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3529466.3529488","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:25Z","timestamp":1750188685000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529466.3529488"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,4]]},"references-count":20,"alternative-id":["10.1145\/3529466.3529488","10.1145\/3529466"],"URL":"https:\/\/doi.org\/10.1145\/3529466.3529488","relation":{},"subject":[],"published":{"date-parts":[[2022,3,4]]},"assertion":[{"value":"2022-06-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}