{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T09:48:22Z","timestamp":1774000102132,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":19,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,12,12]]},"DOI":"10.1145\/3788149.3788243","type":"proceedings-article","created":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T06:35:19Z","timestamp":1773988519000},"page":"561-568","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Considerations on the Trustworthiness Evaluation of LLMs for Math Education: Reliability, Robustness, and Explainability"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-0402-0897","authenticated-orcid":false,"given":"Xuehuan","family":"Chen","sequence":"first","affiliation":[{"name":"Concord College of Sino-Canada (CCSC), Hefei, Anhui, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8593-3669","authenticated-orcid":false,"given":"Yufei","family":"Mei","sequence":"additional","affiliation":[{"name":"Concord College of Sino-Canada (CCSC), Hefei, Anhui, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-5946-1475","authenticated-orcid":false,"given":"Yuhan","family":"Li","sequence":"additional","affiliation":[{"name":"Concord College of Sino-Canada (CCSC), Hefei, Anhui, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2305-0770","authenticated-orcid":false,"given":"Yan","family":"Yang","sequence":"additional","affiliation":[{"name":"Concord College of Sino-Canada (CCSC), Hefei, Anhui, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-7451-8973","authenticated-orcid":false,"given":"Qien","family":"Li","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7480-2004","authenticated-orcid":false,"given":"Rui","family":"Mei","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China and iFLYTEK Security Laboratory, Hefei, Anhui, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,3,19]]},"reference":[{"key":"e_1_3_3_1_2_2","unstructured":"Yuntao Bai Saurav Kadavath Sandipan Kundu et\u00a0al. 2022. Constitutional AI: Harmlessness from AI Feedback. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2212.08073 (2022)."},{"key":"e_1_3_3_1_3_2","volume-title":"ICLR 2024","author":"Bianchi Federico","year":"2024","unstructured":"Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio, Paul R\u00f6ttger, Dan Jurafsky, Tatsunori Hashimoto, and James Zou. 2024. Safety-tuned Llamas: Lessons from Improving the Safety of Large Language Models that Follow Instructions. In ICLR 2024."},{"key":"e_1_3_3_1_4_2","unstructured":"Stephen Casper Xander Davies Claudia Shi Thomas\u00a0Krendl Gilbert J\u00e9r\u00e9my Scheurer Javier Rando Rachel Freedman Tomasz Korbak David Lindner Pedro Freire et\u00a0al. 2023. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.15217 (2023)."},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3709026.3709114"},{"key":"e_1_3_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/Trustcom66490.2025.00131"},{"key":"e_1_3_3_1_7_2","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano Christopher Hesse and John Schulman. 2021. Training Verifiers to Solve Math Word Problems. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2110.14168 (2021)."},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Andy Extance. 2023. ChatGPT has entered the classroom: how LLMs could transform education. Nature 623 7987 (2023) 474\u2013477.","DOI":"10.1038\/d41586-023-03507-3"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-98414-3_23"},{"key":"e_1_3_3_1_10_2","volume-title":"NeurIPS Datasets and Benchmarks","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. 2021. Measuring Mathematical Problem Solving with the MATH Dataset. In NeurIPS Datasets and Benchmarks."},{"key":"e_1_3_3_1_11_2","unstructured":"Jiaming Ji Xinyu Chen Rui Pan Han Zhu Conghui Zhang Jiahao Li Donghai Hong Boyuan Chen Jiayi Zhou Kaile Wang et\u00a0al. 2025. Safe rlhf-v: Safe reinforcement learning from human feedback in multimodal large language models. arXiv e-prints (2025) arXiv\u20132503."},{"key":"e_1_3_3_1_12_2","unstructured":"Patrick Lewis Ethan Perez Aleksandra Piktus Fabio Petroni Vladimir Karpukhin Naman Goyal et\u00a0al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2005.11401 (2020)."},{"key":"e_1_3_3_1_13_2","unstructured":"Ruosen Li Ziming Luo and Xinya Du. 2024. FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.06304 (2024)."},{"key":"e_1_3_3_1_14_2","unstructured":"Percy Liang Rishi Bommasani Tony Lee Dimitris Tsipras Dilara Soylu Michihiro Yasunaga Yian Zhang Deepak Narayanan Yuhuai Wu Ananya Kumar et\u00a0al. 2022. Holistic Evaluation of Language Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2211.09110 (2022)."},{"key":"e_1_3_3_1_15_2","unstructured":"Hunter Lightman Vineet Kosaraju Yura Burda Harri Edwards Bowen Baker Teddy Lee Jan Leike John Schulman Ilya Sutskever and Karl Cobbe. 2023. Let\u2019s Verify Step by Step. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.20050 (2023)."},{"key":"e_1_3_3_1_16_2","unstructured":"National Institute of Standards and Technology. 2023. Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. https:\/\/nvlpubs.nist.gov\/nistpubs\/ai\/nist.ai.100-1.pdf"},{"key":"e_1_3_3_1_17_2","volume-title":"NeurIPS","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Katarina Slama, Alex Ray, John Schulman, et\u00a0al. 2022. Training Language Models to Follow Instructions with Human Feedback. In NeurIPS."},{"key":"e_1_3_3_1_18_2","first-page":"1548","volume-title":"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track","author":"Song Juntong","year":"2024","unstructured":"Juntong Song, Xingguang Wang, Juno Zhu, Yuanhao Wu, Xuxin Cheng, Randy Zhong, and Cheng Niu. 2024. RAG-HAT: A hallucination-aware tuning pipeline for LLM in retrieval-augmented generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track. 1548\u20131558."},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i26.34987"},{"key":"e_1_3_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.acl-long.158"}],"event":{"name":"CSAI 2025: 2025 The 9th International Conference on Computer Science and Artificial Intelligence","location":"Beijing China","acronym":"CSAI 2025"},"container-title":["Proceedings of the 2025 9th International Conference on Computer Science and Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3788149.3788243","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T06:37:46Z","timestamp":1773988666000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3788149.3788243"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,12]]},"references-count":19,"alternative-id":["10.1145\/3788149.3788243","10.1145\/3788149"],"URL":"https:\/\/doi.org\/10.1145\/3788149.3788243","relation":{},"subject":[],"published":{"date-parts":[[2025,12,12]]},"assertion":[{"value":"2026-03-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}