{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,29]],"date-time":"2026-01-29T18:26:36Z","timestamp":1769711196468,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,29]]},"DOI":"10.1145\/3778265.3778272","type":"proceedings-article","created":{"date-parts":[[2026,1,29]],"date-time":"2026-01-29T07:25:21Z","timestamp":1769671521000},"page":"42-50","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Vietnamese Dataset for Text Segmentation and Multiple Choices Reading Comprehension"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-9483-7472","authenticated-orcid":false,"given":"Toan Hai","family":"Nguyen","sequence":"first","affiliation":[{"name":"Institute for Artificial Intelligence, VNU University of Engineering and Technology, Hanoi, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4116-4067","authenticated-orcid":false,"given":"Duc Minh","family":"Do","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4911-3394","authenticated-orcid":false,"given":"Truong Xuan","family":"Quan","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5237-5036","authenticated-orcid":false,"given":"Ha Viet","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Institute for Artificial Intelligence, VNU University of Engineering and Technology, Hanoi, Vietnam"}]}],"member":"320","published-online":{"date-parts":[[2026,1,28]]},"reference":[{"key":"e_1_3_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Sebastian Arnold Rudolf Schneider Philippe Cudr\u00e9-Mauroux Felix A. Gers and Alexander L\u00f6ser. 2019. SECTOR: A neural model for coherent topic segmentation and classification. Transactions of the Association for Computational Linguistics 7:169\u2013184. https:\/\/aclanthology.org\/Q19-1011","DOI":"10.1162\/tacl_a_00261"},{"key":"e_1_3_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Doug Beeferman Adam Berger and John Lafferty. 1999. Statistical models for text segmentation. Machine Learning 34(1\u20133):177\u2013210. https:\/\/link.springer.com\/article\/10.1023\/A:1007506220214","DOI":"10.1023\/A:1007506220214"},{"key":"e_1_3_3_2_4_2","unstructured":"Thang Viet Bui Toan Thanh Tran and Phuong Le-Hong. 2020. Improving sequence tagging for Vietnamese text using transformer-based neural models. In Proceedings of the 34th Pacific Asia Conference on Language Information and Computation Hanoi Vietnam October 24\u201326 2020. Association for Computational Linguistics 13\u201320. https:\/\/aclanthology.org\/2020.paclic-1.2"},{"key":"e_1_3_3_2_5_2","unstructured":"Freddy Y. Y. Choi. 2000. Advances in domain independent linear text segmentation. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference (NAACL) Seattle Washington April 29\u2013May 4 2000. Association for Computational Linguistics 26\u201333."},{"key":"e_1_3_3_2_6_2","unstructured":"Kevin Clark Minh-Thang Luong Quoc V. Le and Christopher D. Manning. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generators. In Proceedings of the 8th International Conference on Learning Representations (ICLR 2020) Online April 26\u201330 2020. https:\/\/openreview.net\/forum?id=r1xMH1BtvB"},{"key":"e_1_3_3_2_7_2","doi-asserted-by":"crossref","unstructured":"Alexis Conneau Kartikay Khandelwal Naman Goyal Vishrav Chaudhary Guillaume Wenzek Francisco Guzm\u00e1n Edouard Grave Myle Ott Luke Zettlemoyer and Veselin Stoyanov. 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics Online July 5\u201310 2020. Association for Computational Linguistics 8440\u20138451. https:\/\/aclanthology.org\/2020.acl-main.747","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"e_1_3_3_2_8_2","unstructured":"Alexis Conneau Upamanyu Khandelwal Naman Goyal Vishrav Chaudhary Guillaume Wenzek Francisco Guzm\u00e1n Edouard Grave Myle Ott Luke Zettlemoyer and Veselin Stoyanov. 2023. BELEBELE: A benchmark for multilingual machine reading comprehension. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.16884. https:\/\/arxiv.org\/abs\/2308.16884"},{"key":"e_1_3_3_2_9_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 1 (Long and Short Papers) Minneapolis Minnesota June 2\u20137 2019. Association for Computational Linguistics 4171\u20134186. https:\/\/aclanthology.org\/N19-1423"},{"key":"e_1_3_3_2_10_2","unstructured":"Duc Do Minh and Vinh Nguyen Van and Thang Dam Cong. Using Large Language Models for education managements in Vietnamese with low resources arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2501.15022. https:\/\/arxiv.org\/abs\/2501.15022"},{"key":"e_1_3_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Mandar Joshi Eunsol Choi Daniel Weld and Luke Zettlemoyer. 2017. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Vancouver Canada July 30\u2013August 4 2017. Association for Computational Linguistics 1601\u20131611. https:\/\/aclanthology.org\/P17-1147","DOI":"10.18653\/v1\/P17-1147"},{"key":"e_1_3_3_2_12_2","doi-asserted-by":"crossref","unstructured":"Omri Koshorek Adir Cohen Noam Mor Michael Rotman and Jonathan Berant. 2018 Thatcher Margaret. Text segmentation as a supervised learning task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 2 (Short Papers) New Orleans Louisiana June 1\u20136 2018. Association for Computational Linguistics 469\u2013473. https:\/\/aclanthology.org\/N18-2075","DOI":"10.18653\/v1\/N18-2075"},{"key":"e_1_3_3_2_13_2","doi-asserted-by":"crossref","unstructured":"Guokun Lai Qizhe Xie Hanxiao Liu Yiming Yang and Eduard Hovy. 2017. RACE: Large-scale reading comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing Copenhagen Denmark September 7\u201311 2017. Association for Computational Linguistics 785\u2013794. https:\/\/aclanthology.org\/D17-1082","DOI":"10.18653\/v1\/D17-1082"},{"key":"e_1_3_3_2_14_2","doi-asserted-by":"publisher","unstructured":"Patrick Lewis Barlas Oguz Ruty Rinott Sebastian Riedel Holger Schwenk. 2020. MLQA: Evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics pages 7315\u20137330. Association for Computational Linguistics Online. 10.18653\/v1\/2020.acl-main.653","DOI":"10.18653\/v1\/2020.acl-main.653"},{"key":"e_1_3_3_2_15_2","doi-asserted-by":"crossref","unstructured":"Yang Liu Chenguang Zhu and Michael Zeng. 2022. End-to-end segmentation-based news summarization. In Findings of the Association for Computational Linguistics: ACL 2022 Dublin Ireland May 22\u201327 2022. Association for Computational Linguistics 544\u2013554. https:\/\/aclanthology.org\/2022.findings-acl.44","DOI":"10.18653\/v1\/2022.findings-acl.46"},{"key":"e_1_3_3_2_16_2","doi-asserted-by":"publisher","unstructured":"Kelvin Lo Yuan Jin Weicong Tan Ming Liu Lan Du and Wray Buntine. 2021. Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence. arXiv:https:\/\/arXiv.org\/abs\/2110.07160 [cs]. 10.48550\/arXiv.2110.07160","DOI":"10.48550\/arXiv.2110.07160"},{"key":"e_1_3_3_2_17_2","unstructured":"Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018) Vancouver Canada April 30\u2013May 3 2018. https:\/\/openreview.net\/forum?id=Bkg6RiCqY7"},{"key":"e_1_3_3_2_18_2","doi-asserted-by":"publisher","unstructured":"Son T. Luu Mao Nguyen Bui Loi Duc Nguyen Khiem Vinh Tran Kiet Van Nguyen Ngan Luu-Thuy Nguyen. 2021. Conversational machine reading comprehension for Vietnamese healthcare texts. In Advances in Computational Collective Intelligence pages 546\u2013558. Springer Cham. 10.48550\/arXiv.2105.01542","DOI":"10.48550\/arXiv.2105.01542"},{"key":"e_1_3_3_2_19_2","doi-asserted-by":"publisher","unstructured":"Son T. Luu Kiet Tuan Hoang Tuan Q. Pham Kiet Van Nguyen and Ngan Luu-Thuy Nguyen. 2021. A multiple choices reading comprehension corpus for Vietnamese language education. Neural Computing and Applications Springer Nature. 10.1007\/s00521-021-06486-8","DOI":"10.1007\/s00521-021-06486-8"},{"key":"e_1_3_3_2_20_2","unstructured":"Cam-Tu Nguyen Trung-Kien Nguyen Xuan-Hieu Phan Le-Minh Nguyen and Quang-Thuy Ha. 2006. Vietnamese word segmentation with CRFs and SVMs: An investigation. Proceedings of the 20th Pacific Asia Conference on Language Information and Computation. https:\/\/aclanthology.org\/Y06-1028.pdf"},{"key":"e_1_3_3_2_21_2","doi-asserted-by":"crossref","unstructured":"Dat Quoc Nguyen and Anh Tuan Nguyen. 2020. PhoBERT: Pre-trained language models for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020 Online November 16\u201320 2020. Association for Computational Linguistics 1037\u20131042. https:\/\/aclanthology.org\/2020.findings-emnlp.92","DOI":"10.18653\/v1\/2020.findings-emnlp.92"},{"key":"e_1_3_3_2_22_2","unstructured":"Dat Quoc Nguyen Thanh Vu Dai Quoc Nguyen Mark Dras and Mark Johnson. 2017. From Word Segmentation to POS Tagging for Vietnamese. In Proceedings of the 15th Annual Workshop of the Australasian Language Technology Association (ALTA) Brisbane Australia December 2017. Association for Computational Linguistics 108\u2013113. https:\/\/aclanthology.org\/U17-1013"},{"key":"e_1_3_3_2_23_2","unstructured":"Hai Toan Nguyen Tien Dat Nguyen and Viet Ha Nguyen. 2024. Enhancing retrieval augmented generation with hierarchical text segmentation chunking. In Proceedings of the 26th International Conference on Information Integration and Web Intelligence Singapore December 5-8 2023. Springer 233\u2013246. https:\/\/link.springer.com\/chapter\/10.1007\/978-981-96-4288-5_17"},{"key":"e_1_3_3_2_24_2","doi-asserted-by":"crossref","unstructured":"Kiet Van Nguyen Duc-Vu Nguyen Anh Gia-Tuan Nguyen and Ngan Luu-Thuy Nguyen. 2020. A Vietnamese dataset for evaluating machine reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020) Online December 8\u201313 2020. International Committee on Computational Linguistics 2595\u20132605. https:\/\/aclanthology.org\/2020.coling-main.233","DOI":"10.18653\/v1\/2020.coling-main.233"},{"key":"e_1_3_3_2_25_2","doi-asserted-by":"publisher","unstructured":"Tuan-Cuong Nguyen and Van-Nhien Nguyen. 2021. NLPBK at VLSP-2020 shared task: Compose transformer pretrained models for reliable intelligence identification on social network. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2101.12672. 10.48550\/arXiv.2101.12672","DOI":"10.48550\/arXiv.2101.12672"},{"key":"e_1_3_3_2_26_2","unstructured":"Tuan Huu Nguyen Tuan Duc Nguyen and Van Hoang Nguyen. 2022. UIT-ViNewsQA: A Vietnamese dataset for news question answering. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM) Atlanta Georgia USA October 17\u201321 2022. Association for Computing Machinery 4254\u20134258. https:\/\/dl.acm.org\/doi\/10.1145\/3511808.3557637"},{"key":"e_1_3_3_2_27_2","doi-asserted-by":"crossref","unstructured":"Lev Pevzner and Marti A. Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics 28(1):19\u201336. https:\/\/aclanthology.org\/J02-1002","DOI":"10.1162\/089120102317341756"},{"key":"e_1_3_3_2_28_2","doi-asserted-by":"crossref","unstructured":"Violaine Prince and Alexandre Labadi\u00e9. 2007. Text segmentation based on document understanding for information retrieval. In Proceedings of the 12th International Conference on Applications of Natural Language to Information Systems (NLDB 2007) Paris France June 27\u201329 2007. Springer 295\u2013304. https:\/\/link.springer.com\/chapter\/10.1007\/978-3-540-73351-5_26","DOI":"10.1007\/978-3-540-73351-5_26"},{"key":"e_1_3_3_2_29_2","doi-asserted-by":"crossref","unstructured":"Pranav Rajpurkar Jian Zhang Konstantin Lopyrev and Percy Liang. 2016. SQuAD: 100 000+ Questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016) Austin Texas November 1\u20135 2016. Association for Computational Linguistics 2383\u20132392. https:\/\/aclanthology.org\/D16-1264","DOI":"10.18653\/v1\/D16-1264"},{"key":"e_1_3_3_2_30_2","doi-asserted-by":"crossref","unstructured":"Siva Reddy Danqi Chen and Christopher D. Manning. 2019. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7:249\u2013266. https:\/\/aclanthology.org\/Q19-1016","DOI":"10.1162\/tacl_a_00266"},{"key":"e_1_3_3_2_31_2","doi-asserted-by":"crossref","unstructured":"Anna Rogers Olga Kovaleva and Anna Rumshisky. 2020. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics 8:842\u2013866. https:\/\/aclanthology.org\/2020.tacl-1.54","DOI":"10.1162\/tacl_a_00349"},{"key":"e_1_3_3_2_32_2","doi-asserted-by":"crossref","unstructured":"Gennady Shtekh Polina Kazakova Nikita Nikitinsky and Nikolay Skachkov. 2018. Exploring influence of topic segmentation on information retrieval quality. In Proceedings of the 5th International Conference on Internet Science (INSCI 2018) St. Petersburg Russia October 24\u201326 2018. Springer 131\u2013140. https:\/\/link.springer.com\/chapter\/10.1007\/978-3-030-01437-7_11","DOI":"10.1007\/978-3-030-01437-7_11"},{"key":"e_1_3_3_2_33_2","doi-asserted-by":"crossref","unstructured":"Cong Dao Tran Nhut Huy Pham Anh Tuan Nguyen Truong Son Hy and Tu Vu. 2023. ViDeBERTa: A powerful pre-trained language model for Vietnamese. In Findings of the Association for Computational Linguistics: EACL 2023 Dubrovnik Croatia May 2\u20136 2023. Association for Computational Linguistics 945\u2013955. https:\/\/aclanthology.org\/2023.findings-eacl.79","DOI":"10.18653\/v1\/2023.findings-eacl.79"},{"key":"e_1_3_3_2_34_2","unstructured":"Thanh Vu Dat Quoc Nguyen Dai Quoc Nguyen and Mark Dras and Mark Johnson. VnCoreNLP: A Vietnamese Natural Language Processing Toolkit. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1801.01331. https:\/\/arxiv.org\/abs\/1801.01331"},{"key":"e_1_3_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Wen Xiao and Giuseppe Carenini. 2019. Extractive summarization of long documents by combining global and local context. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Hong Kong China November 3\u20137 2019. Association for Computational Linguistics 3011\u20133021. https:\/\/aclanthology.org\/D19-1298","DOI":"10.18653\/v1\/D19-1298"},{"key":"e_1_3_3_2_36_2","unstructured":"Hai Yu Chong Deng Qinglin Zhang Jiaqing Liu Qian Chen and Wen Wang. 2023. Improving long document topic segmentation models with enhanced coherence modeling. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.10525. https:\/\/arxiv.org\/abs\/2310.10525"}],"event":{"name":"BDSIC 2025: 2025 7th International Conference on Big-data Service and Intelligent Computation","location":"Bangkok Thailand","acronym":"BDSIC 2025"},"container-title":["Proceedings of the 2025 7th International Conference on Big-data Service and Intelligent Computation"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3778265.3778272","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,29]],"date-time":"2026-01-29T07:26:18Z","timestamp":1769671578000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3778265.3778272"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,29]]},"references-count":35,"alternative-id":["10.1145\/3778265.3778272","10.1145\/3778265"],"URL":"https:\/\/doi.org\/10.1145\/3778265.3778272","relation":{},"subject":[],"published":{"date-parts":[[2025,10,29]]},"assertion":[{"value":"2026-01-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}