{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T16:52:19Z","timestamp":1768409539797,"version":"3.49.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T00:00:00Z","timestamp":1680739200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Science and Technology Innovation 2030 Major Project of China","award":["2021ZD0113302"],"award-info":[{"award-number":["2021ZD0113302"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62076081, 61772153, 61936010"],"award-info":[{"award-number":["62076081, 61772153, 61936010"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2023,4,30]]},"abstract":"<jats:p>\n            Using off-the-shelf resources from resource-rich languages to transfer knowledge to low-resource languages has received a lot of attention. The requirements of enabling the model to achieve the reliable performance, including the scale of required annotated data and the effective framework, are not well guided. To address the first question, we empirically investigate the cost-effectiveness of several methods for training intent classification and slot-filling models from scratch in Indonesia (ID) using English data. Confronting the second challenge, we propose a Bi-Confidence-Frequency Cross-Lingual transfer framework (BiCF), which consists of \u201cBiCF Mixing\u201d, \u201cLatent Space Refinement\u201d and \u201cJoint Decoder\u201d, respectively, to overcome the lack of low-resource language dialogue data. BiCF Mixing based on the word-level alignment strategy generates code-mixed data by utilizing the importance-frequency and translating-confidence. Moreover, Latent Space Refinement trains a new dialogue understanding model using code-mixed data and word embedding models. Joint Decoder based on Bidirectional LSTM (BiLSTM) and Conditional Random Field (CRF) is used to obtain experimental results of intent classification and slot-filling. We also release a large-scale fine-labeled Indonesia dialogue dataset (ID-WOZ\n            <jats:xref ref-type=\"fn\">\n              <jats:sup>1<\/jats:sup>\n            <\/jats:xref>\n            ) and ID-BERT for experiments. BiCF achieves 93.56% and 85.17% (F1 score) on intent classification and slot filling, respectively. Extensive experiments demonstrate that our framework performs reliably and cost-efficiently on different scales of manually annotated Indonesian data.\n          <\/jats:p>","DOI":"10.1145\/3575803","type":"journal-article","created":{"date-parts":[[2022,12,16]],"date-time":"2022-12-16T06:43:52Z","timestamp":1671173032000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Building Dialogue Understanding Models for Low-resource Language Indonesian from Scratch"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2270-3378","authenticated-orcid":false,"given":"Donglin","family":"Di","sequence":"first","affiliation":[{"name":"Advance.AI, Robinson Road, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3695-8646","authenticated-orcid":false,"given":"Xianyang","family":"Song","sequence":"additional","affiliation":[{"name":"Northeast Forestry University, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5553-178X","authenticated-orcid":false,"given":"Weinan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Nan Gang District, Harbin, Heilongjiang Province, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5214-2268","authenticated-orcid":false,"given":"Yue","family":"Zhang","sequence":"additional","affiliation":[{"name":"Westlake University, Hangzhou, Zhejiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9722-5446","authenticated-orcid":false,"given":"Fanglin","family":"Wang","sequence":"additional","affiliation":[{"name":"Advance.AI, Robinson Road, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,4,6]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Hui Liu Qingyu Yin and William Yang Wang. 2018. Towards explainable NLP: A generative explanation framework for text classification. In Annual Meeting of the Association for Computational Linguistics 2018 .","DOI":"10.18653\/v1\/P19-1560"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Yu Wu Wei Wu Chen Xing Ming Zhou and Zhoujun Li. 2016. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Annual Meeting of the Association for Computational Linguistics 2016 .","DOI":"10.18653\/v1\/P17-1046"},{"key":"e_1_3_2_4_2","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation","author":"Grave Edouard","year":"2018","unstructured":"Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation."},{"key":"e_1_3_2_5_2","doi-asserted-by":"crossref","unstructured":"Sebastian Schuster Sonal Gupta Rushin Shah and Mike Lewis. 2018. Cross-lingual transfer learning for multilingual task oriented dialog. In North American Chapter of the Association for Computational Linguistics 2018 .","DOI":"10.18653\/v1\/N19-1380"},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","unstructured":"Tal Schuster Ori Ram Regina Barzilay and Amir Globerson. 2019. Cross-lingual alignment of contextual word embeddings with applications to zero-shot dependency parsing. In North American Chapter of the Association for Computational Linguistics 2019 .","DOI":"10.18653\/v1\/N19-1162"},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","unstructured":"Pawe\u0142 Budzianowski Tsung-Hsien Wen Bo-Hsiang Tseng Inigo Casanueva Stefan Ultes Osman Ramadan and Milica Ga\u0161i\u0107. 2018. Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In Conference on Empirical Methods in Natural Language Processing 2018 .","DOI":"10.18653\/v1\/D18-1547"},{"key":"e_1_3_2_8_2","unstructured":"Ashish Vaswani Samy Bengio Eugene Brevdo Francois Chollet Aidan N. Gomez Stephan Gouws Llion Jones \u0141ukasz Kaiser Nal Kalchbrenner Niki Parmar Ryan Sepassi Noam M. Shazeer and Jakob Uszkoreit. 2018. Tensor2tensor for neural machine translation. In Conference of the Association for Machine Translation in the Americas 2018 ."},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-32-9748-7_3"},{"key":"e_1_3_2_10_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. ArXiv abs\/1810.04805 (2019)."},{"key":"e_1_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Telmo Pires Eva Schlinger and Dan Garrette. 2019. How multilingual is multilingual BERT? In Annual Meeting of the Association for Computational Linguistics 2019 .","DOI":"10.18653\/v1\/P19-1493"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1092"},{"key":"e_1_3_2_13_2","first-page":"191","volume-title":"Proceedings of the 20th Nordic Conference of Computational Linguistics.","author":"Tiedemann J\u00f6rg","year":"2015","unstructured":"J\u00f6rg Tiedemann. 2015. Improving the cross-lingual projection of syntactic dependencies. In Proceedings of the 20th Nordic Conference of Computational Linguistics. Link\u00f6ping University Electronic, 191\u2013199."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/3013558.3013565"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1119"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1040"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00109"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"Hongmin Wang Yue Zhang GuangYong Leonard Chan Jie Yang and Hai Leong Chieu. 2017. Universal dependencies parsing for colloquial singaporean english. ArXiv abs\/1705.06463 (2017).","DOI":"10.18653\/v1\/P17-1159"},{"key":"e_1_3_2_19_2","first-page":"5998","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_2_20_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_2_21_2","unstructured":"Zhilin Yang Zihang Dai Yiming Yang Jaime G. Carbonell Ruslan Salakhutdinov and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Neural Information Processing Systems 2019 ."},{"key":"e_1_3_2_22_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. ArXiv abs\/1907.11692 (2019)."},{"issue":"140","key":"e_1_3_2_23_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer.","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 140 (2020), 1\u201367.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00343"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1042"},{"key":"e_1_3_2_26_2","first-page":"6294","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"McCann Bryan","year":"2017","unstructured":"Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Proceedings of the Advances in Neural Information Processing Systems. 6294\u20136305."},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"Zihan Liu Jamin Shin Yan Xu Genta Indra Winata Peng Xu Andrea Madotto and Pascale Fung. 2019. Zero-shot cross-lingual dialogue systems with transferable latent variables. In Conference on Empirical Methods in Natural Language Processing (2019).","DOI":"10.18653\/v1\/D19-1129"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6362"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-88483-3_15"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462883"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.33"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.5555\/867582"},{"key":"e_1_3_2_33_2","first-page":"133","volume-title":"Proceedings of the 1st Instructional Conference on Machine Learning","author":"Ramos Juan","year":"2003","unstructured":"Juan Ramos. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning. Piscataway, NJ, 133\u2013142."},{"key":"e_1_3_2_34_2","unstructured":"Chris Dyer Victor Chahuneau and Noah A. Smith. 2013. A simple fast and effective reparameterization of ibm model 2. In North American Chapter of the Association for Computational Linguistics 2013 ."},{"key":"e_1_3_2_35_2","article-title":"Statistical machine translation: IBM models 1 and 2","author":"Collins Michael","year":"2011","unstructured":"Michael Collins. 2011. Statistical machine translation: IBM models 1 and 2. Columbia Columbia Univ (2011).","journal-title":"Columbia Columbia Univ"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2016.10.065"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1070"},{"key":"e_1_3_2_38_2","unstructured":"Timothy Dozat and Christopher D. Manning. 2016. Deep biaffine attention for neural dependency parsing. ArXiv abs\/1611.01734 (2016)."},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","unstructured":"Andry Chowanda and Alan Darmasaputra Chowanda. 2017. Recurrent neural network to deep learn conversation in indonesian. In International Conference on Computer Science and Computational Intelligence 2017 .","DOI":"10.1016\/j.procs.2017.10.078"},{"key":"e_1_3_2_40_2","first-page":"801","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation","author":"Koto Fajri","year":"2016","unstructured":"Fajri Koto. 2016. A publicly available indonesian corpora for automatic abstractive and extractive chat summarization. In Proceedings of the 10th International Conference on Language Resources and Evaluation. 801\u2013805."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2018.08.179"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/357417.357420"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1037\/h0028106"},{"key":"e_1_3_2_44_2","unstructured":"Chien-Sheng Wu Andrea Madotto Ehsan Hosseini-Asl Caiming Xiong Richard Socher and Pascale Fung. 2019. Transferable multi-domain state generator for task-oriented dialogue systems. In Annual Meeting of the Association for Computational Linguistics 2019 ."},{"key":"e_1_3_2_45_2","unstructured":"Armand Joulin Edouard Grave Piotr Bojanowski Matthijs Douze H\u00e9rve J\u00e9gou and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. ArXiv abs\/1612.03651 (2016)."},{"key":"e_1_3_2_46_2","first-page":"3","volume-title":"Proceedings of the IJCNLP 2017, Tutorial Abstracts","author":"Melo Gerard de","year":"2017","unstructured":"Gerard de Melo. 2017. Multilingual vector representations of words, sentences, and documents. In Proceedings of the IJCNLP 2017, Tutorial Abstracts. 3\u20135."},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","unstructured":"Nikita Moghe Mark Steedman and Alexandra Birch. 2021. Cross-lingual intermediate fine-tuning improves dialogue state tracking. In Conference on Empirical Methods in Natural Language Processing 2021 .","DOI":"10.18653\/v1\/2021.emnlp-main.87"},{"key":"e_1_3_2_48_2","unstructured":"Yen-Ting Lin and Yun-Nung Chen. 2021. An empirical study of cross-lingual transferability in generative dialogue state tracker. ArXiv abs\/2101.11360 (2021)."},{"key":"e_1_3_2_49_2","unstructured":"Chulaka Gunasekara Seokhwan Kim Luis Fernando D\u2019Haro Abhinav Rastogi Yun-Nung Chen Mihail Eric Behnam Hedayatnia Karthik Gopalakrishnan Yang Liu Chao-Wei Huang Dilek Hakkani-T\u00fcr Jinchao Li Qi Zhu Lingxiao Luo Lars Liden Kaili Huang Shahin Shayandeh Runze Liang Baolin Peng Zheng Zhang Swadheen Shukla Minlie Huang Jianfeng Gao Shikib Mehri Yulan Feng Carla Gordon Seyed Hossein Alavi David R. Traum Maxine Esk\u00e9nazi Ahmad Beirami Eunjoon Cho Paul A. Crook Ankita De Alborz Geramifard Satwik Kottur Seungwhan Moon Shivani Poddar and Rajen Subba. 2020. Overview of the ninth dialog system technology challenge: Dstc9. ArXiv abs\/2011.06486 (2020)."},{"key":"e_1_3_2_50_2","unstructured":"Yue Feng Yang Wang and Hang Li. 2020. A sequence-to-sequence approach to dialogue state tracking. In Annual Meeting of the Association for Computational Linguistics 2020 ."},{"key":"e_1_3_2_51_2","unstructured":"Jinyu Guo Kai Shuang Jijie Li and Zihan Wang. 2021. Dual slot selector via local reliability verification for dialogue state tracking. ArXiv abs\/2107.12578 (2021)."},{"key":"e_1_3_2_52_2","doi-asserted-by":"crossref","unstructured":"Jinyu Guo Kai Shuang Jijie Li Zihan Wang and Yixuan Liu. 2022. Beyond the granularity: Multi-perspective dialogue collaborative selection for dialogue state tracking. ArXiv abs\/2205.10059 (2022).","DOI":"10.18653\/v1\/2022.acl-long.165"},{"key":"e_1_3_2_53_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting on Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311\u2013318."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3575803","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3575803","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:21Z","timestamp":1750182681000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3575803"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,6]]},"references-count":52,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,4,30]]}},"alternative-id":["10.1145\/3575803"],"URL":"https:\/\/doi.org\/10.1145\/3575803","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,6]]},"assertion":[{"value":"2022-02-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-27","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-04-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}