{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:17:36Z","timestamp":1750220256733,"version":"3.41.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,12,21]],"date-time":"2022-12-21T00:00:00Z","timestamp":1671580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Guangdong Key Lab of AI and Multi-modal Data Processing, United International College (UIC), Zhuhai","award":["2020KSYS007"],"award-info":[{"award-number":["2020KSYS007"]}]},{"name":"Chinese National Research Fund","award":["61872239"],"award-info":[{"award-number":["61872239"]}]},{"name":"Beijing Normal University (Zhuhai) Guangdong"},{"name":"Zhuhai Science-Tech Innovation Bureau","award":["ZH22017001210119PWC, 28712217900001"],"award-info":[{"award-number":["ZH22017001210119PWC, 28712217900001"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2023,4,30]]},"abstract":"<jats:p>Spoken Language Understanding (SLU), a core component of the task-oriented dialogue system, expects a shorter inference facing the impatience of human users. Existing work increases inference speed by designing non-autoregressive models for single-turn SLU tasks but fails to apply to multi-turn SLU in confronting the dialogue history. The intuitive idea is to concatenate all historical utterances and utilize the non-autoregressive models directly. However, this approach seriously misses the salient historical information and suffers from the uncoordinated-slot problems. To overcome those shortcomings, we propose a novel model for multi-turn SLU named Salient History Attention with Layer-Refined Transformer (SHA-LRT), which comprises a SHA module, a Layer-Refined Mechanism (LRM), and a Slot Label Generation (SLG) task. SHA captures salient historical information for the current dialogue from both historical utterances and results via a well-designed history-attention mechanism. LRM predicts preliminary SLU results from Transformer\u2019s middle states and utilizes them to guide the final prediction, and SLG obtains the sequential dependency information for the non-autoregressive encoder. Experiments on public datasets indicate that our model significantly improves multi-turn SLU performance (17.5% on Overall) with accelerating (nearly 15 times) the inference process over the state-of-the-art baseline as well as effective on the single-turn SLU tasks.<\/jats:p>","DOI":"10.1145\/3545800","type":"journal-article","created":{"date-parts":[[2022,7,5]],"date-time":"2022-07-05T09:00:39Z","timestamp":1657011639000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Capture Salient Historical Information: A Fast and Accurate Non-autoregressive Model for Multi-turn Spoken Language Understanding"],"prefix":"10.1145","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1672-0358","authenticated-orcid":false,"given":"Lizhi","family":"Cheng","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Minhang District, Shanghai, PR China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0231-3196","authenticated-orcid":false,"given":"Weijia","family":"Jia","sequence":"additional","affiliation":[{"name":"BNU-UIC Institute of Artificial Intelligence and Future Networks, Beijing Normal University (Zhuhai), Guangdong Key Lab of AI and Multi-Modal Data Processing, BNU-HKBU United International College, Jintong Road, Tangjiawan, Zhuhai, Guangdong, PR China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8493-4449","authenticated-orcid":false,"given":"Wenmian","family":"Yang","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore City, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2022,12,21]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"Layer normalization","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).","journal-title":"arXiv preprint arXiv:1607.06450"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1541"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-5514"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2017-518"},{"key":"e_1_3_1_6_2","article-title":"Learning end-to-end goal-oriented dialog","author":"Bordes Antoine","year":"2016","unstructured":"Antoine Bordes, Y.-Lan Boureau, and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683 (2016).","journal-title":"arXiv preprint arXiv:1605.07683"},{"key":"e_1_3_1_7_2","article-title":"BERT for joint intent classification and slot filling","author":"Chen Qian","year":"2019","unstructured":"Qian Chen, Zhu Zhuo, and Wen Wang. 2019. BERT for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909 (2019).","journal-title":"arXiv preprint arXiv:1902.10909"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2016-312"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482229"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME51207.2021.9428384"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_1_12_2","article-title":"Snips voice platform: An embedded spoken language understanding system for private-by-design voice interfaces","author":"Coucke Alice","year":"2018","unstructured":"Alice Coucke, Alaa Saade, Adrien Ball, Th\u00e9odore Bluche, Alexandre Caulier, David Leroy, Cl\u00e9ment Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril et\u00a0al. 2018. Snips voice platform: An embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190 (2018).","journal-title":"arXiv preprint arXiv:1805.10190"},{"key":"e_1_3_1_13_2","first-page":"4171","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171\u20134186."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-5506"},{"issue":"2","key":"e_1_3_1_15_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3429980","article-title":"Learning to respond with your favorite stickers: A framework of unifying multi-modality and user preference in multi-turn dialog","volume":"39","author":"Gao Shen","year":"2021","unstructured":"Shen Gao, Xiuying Chen, Li Liu, Dongyan Zhao, and Rui Yan. 2021. Learning to respond with your favorite stickers: A framework of unifying multi-modality and user preference in multi-turn dialog. ACM Trans. Inf. Syst. 39, 2 (2021), 1\u201332.","journal-title":"ACM Trans. Inf. Syst."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2118"},{"key":"e_1_3_1_17_2","first-page":"5467","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Haihong E.","year":"2019","unstructured":"E. Haihong, Peiqing Niu, Zhongfu Chen, and Meina Song. 2019. A novel bi-directional interrelated model for joint intent detection and slot filling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5467\u20135471."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2016-402"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.3115\/116580.116613"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_1_22_2","article-title":"Adam: A method for stochastic optimization","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1223"},{"key":"e_1_3_1_24_2","article-title":"ODE transformer: An ordinary differential equation-inspired model for neural machine translation","author":"Li Bei","year":"2021","unstructured":"Bei Li, Quan Du, Tao Zhou, Shuhan Zhou, Xin Zeng, Tong Xiao, and Jingbo Zhu. 2021. ODE transformer: An ordinary differential equation-inspired model for neural machine translation. arXiv preprint arXiv:2104.02308 (2021).","journal-title":"arXiv preprint arXiv:2104.02308"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i15.17561"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3453183"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2016-1352"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.562"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1097"},{"key":"e_1_3_1_30_2","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).","journal-title":"arXiv preprint arXiv:1907.11692"},{"key":"e_1_3_1_31_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Loshchilov Ilya","year":"2018","unstructured":"Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3464377"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2383614"},{"key":"e_1_3_1_34_2","article-title":"Recurrent neural networks with external memory for language understanding","author":"Peng Baolin","year":"2015","unstructured":"Baolin Peng and Kaisheng Yao. 2015. Recurrent neural networks with external memory for language understanding. arXiv preprint arXiv:1506.00195 (2015).","journal-title":"arXiv preprint arXiv:1506.00195"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1079"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1214"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2021.3053400"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414110"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.15"},{"key":"e_1_3_1_40_2","first-page":"1807","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Qin Libo","year":"2020","unstructured":"Libo Qin, Xiao Xu, Wanxiang Che, and Ting Liu. 2020. AGIF: An adaptive graph-interactive framework for joint multiple intent detection and slot filling. In Findings of the Association for Computational Linguistics: EMNLP 2020. ACL, 1807\u20131816."},{"key":"e_1_3_1_41_2","article-title":"A study of non-autoregressive model for sequence generation","author":"Ren Yi","year":"2020","unstructured":"Yi Ren, Jinglin Liu, Xu Tan, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2020. A study of non-autoregressive model for sequence generation. arXiv preprint arXiv:2004.10454 (2020).","journal-title":"arXiv preprint arXiv:2004.10454"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2012.6289079"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2074"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1002\/9781119992691"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466796"},{"key":"e_1_3_1_46_2","first-page":"5998","volume-title":"Proceedings of the International Conference on Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447875"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2050"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.152"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3462207"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.5555\/3367471.3367609"},{"key":"e_1_3_1_52_2","article-title":"XLNet: Generalized autoregressive pretraining for language understanding","volume":"32","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/SLT.2014.7078572"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.743"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1519"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3317612"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-short.112"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3545800","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3545800","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:47Z","timestamp":1750188647000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3545800"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,21]]},"references-count":56,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,4,30]]}},"alternative-id":["10.1145\/3545800"],"URL":"https:\/\/doi.org\/10.1145\/3545800","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2022,12,21]]},"assertion":[{"value":"2022-01-30","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-23","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}