{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T14:37:38Z","timestamp":1777127858014,"version":"3.51.4"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2023,5,9]],"date-time":"2023-05-09T00:00:00Z","timestamp":1683590400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Research Grants Council of the Hong Kong Special Administrative Region, China","award":["UGC\/FDS16\/E09\/22"],"award-info":[{"award-number":["UGC\/FDS16\/E09\/22"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2023,5,31]]},"abstract":"<jats:p>User intent classification is a vital task for analyzing users\u2019 essential requirements from the users\u2019 input query in information retrieval systems, question answering systems, and dialogue systems. Pre-trained language model Bidirectional Encoder Representation from Transformers (BERT) has been widely applied to the user intent classification task. However, BERT is compute intensive and time-consuming during inference and usually causes latency in real-time applications. To improve the inference efficiency of BERT for the user intent classification task, this article proposes a new network named one-stage deep-supervised early-exiting BERT as one-stage deep-supervised early-exiting BERT (OdeBERT). In addition, a deep supervision strategy is developed to incorporate the network with internal classifiers by one-stage joint training to improve the learning process of classifiers by extracting discriminative category features. Experiments are conducted on publicly available datasets, including ECDT, SNIPS, and FDQuestion. The results show that the OdeBERT can speed up original BERT 12 times faster at most with the same performance, outperforming state-of-the-art baseline methods.<\/jats:p>","DOI":"10.1145\/3587464","type":"journal-article","created":{"date-parts":[[2023,3,13]],"date-time":"2023-03-13T12:28:29Z","timestamp":1678710509000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["OdeBERT: One-stage Deep-supervised Early-exiting BERT for Fast Inference in User Intent Classification"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1730-5014","authenticated-orcid":false,"given":"Yuanxia","family":"Liu","sequence":"first","affiliation":[{"name":"School of Computer Science, South China Normal University, Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9792-3949","authenticated-orcid":false,"given":"Tianyong","family":"Hao","sequence":"additional","affiliation":[{"name":"School of Computer Science, South China Normal University, Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4062-4945","authenticated-orcid":false,"given":"Hai","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science, South China Normal University, Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9646-0524","authenticated-orcid":false,"given":"Yuanyuan","family":"Mu","sequence":"additional","affiliation":[{"name":"School of Foreign Languages, Chaohu University, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5203-8663","authenticated-orcid":false,"given":"Heng","family":"Weng","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Dampness Syndrome of Chinese Medicine, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3976-0053","authenticated-orcid":false,"given":"Fu Lee","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Science and Technology, Hong Kong Metropolitan University, Kowloon, Hong Kong"}]}],"member":"320","published-online":{"date-parts":[[2023,5,9]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_1_3_2","unstructured":"Qian Chen Zhu Zhuo and Wen Wang. 2019. BERT for joint intent classification and slot filling. arXiv:1902.10909. Retrieved from https:\/\/arxiv.org\/abs\/1902.10909"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-7670-626"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/IALP48816.2019.9037668"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.5555\/3060832.3061023"},{"key":"e_1_3_1_8_2","first-page":"9782","volume-title":"Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS\u201920)","author":"Hou Lu","year":"2020","unstructured":"Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. DynaBERT: Dynamic BERT with adaptive width and depth. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS\u201920), 9782\u20139793."},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3454487"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.repl4nlp-1.18"},{"key":"e_1_3_1_11_2","unstructured":"J. S. McCarley. 2019. Pruning a bert-based question answering model. arXiv:1910.06360. Retrieved from https:\/\/arxiv.org\/pdf\/1910.06360v1"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1580"},{"key":"e_1_3_1_13_2","first-page":"14014","volume-title":"Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS\u201919)","author":"Michel Paul","year":"2019","unstructured":"Paul Michel, Omer Levy, and Graham Neubig. 2019. Are sixteen heads really better than one? In Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS\u201919), 14014\u201314024."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/EMC2-NIPS53020.2019.00016"},{"key":"e_1_3_1_15_2","unstructured":"Aishwarya Bhandare Vamsi Sripathi Deepthi Karkada Vivek Menon Sun Choi Kushal Datta and Vikram Saletore. 2019. Efficient 8-bit quantization of transformer neural machine language translation model. arXiv:1906.00532. Retrieved from https:\/\/arxiv.org\/abs\/1906.00532"},{"key":"e_1_3_1_16_2","first-page":"8815","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI\u201919)","author":"Shen Sheng","year":"2019","unstructured":"Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2019. Q-BERT: Hessian based ultra low precision quantization of BERT. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI\u201919). 8815\u20138821."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1441"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.372"},{"key":"e_1_3_1_19_2","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: Smaller faster cheaper and lighter. arXiv:1910.01108. Retrieved from https:\/\/arxiv.org\/abs\/1910.01108v4"},{"key":"e_1_3_1_20_2","unstructured":"Zhenzhong Lan Mingda Chen Sebastian Goodman Kevin Gimpel Piyush Sharma and Radu Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942. Retrieved from https:\/\/arxiv.org\/abs\/1909.11942"},{"key":"e_1_3_1_21_2","unstructured":"Canwen Xu and Julian McAuley. 2022. A survey on dynamic neural networks for natural language processing. arXiv:2202.07101. Retrieved from https:\/\/arxiv.org\/pdf\/2202.07101"},{"key":"e_1_3_1_22_2","first-page":"2464","volume-title":"Proceedings of the 23rd International Conference on Pattern Recognition (ICPR\u201917)","author":"Teerapittayanon Surat","year":"2017","unstructured":"Surat Teerapittayanon, Bradley McDanel, and HsiangTsung Kung. 2017. BranchyNet: Fast inference via early exiting from deep neural networks. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR\u201917). 2464\u20132469."},{"key":"e_1_3_1_23_2","first-page":"3301","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML\u201919)","author":"Kaya Yigitcan","year":"2019","unstructured":"Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2019. Shallow-deep networks: Understanding and mitigating network overthinking. In Proceedings of the 36th International Conference on Machine Learning (ICML\u201919). PMLR, 3301\u20133310."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00198"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00381"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.204"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.593"},{"key":"e_1_3_1_28_2","unstructured":"Shijie Geng Peng Gao Zuohui Fu and Yongfeng Zhang. Romebert: Robust training of multi-exit bert. arXiv: 2101.09755. Retrieved from https:\/\/arxiv.org\/abs\/2101.09755v1."},{"key":"e_1_3_1_29_2","first-page":"1","volume-title":"Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning","author":"Laskaridis Stefanos","unstructured":"Stefanos Laskaridis, Alexandros Kouris, and Nicholas D. Lane. Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning. Association for Computing Machinery, New York, NY, 1\u20136."},{"key":"e_1_3_1_30_2","first-page":"3859","volume-title":"Proceedings of Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing System","author":"Sabour Sara","year":"2017","unstructured":"Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Proceedings of Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing System. 3859\u20133869."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00552"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3587464","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3587464","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:16Z","timestamp":1750182556000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3587464"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,9]]},"references-count":30,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,5,31]]}},"alternative-id":["10.1145\/3587464"],"URL":"https:\/\/doi.org\/10.1145\/3587464","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,9]]},"assertion":[{"value":"2022-05-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-07","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}