{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,23]],"date-time":"2026-06-23T17:26:29Z","timestamp":1782235589408,"version":"3.54.5"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","license":[{"start":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T00:00:00Z","timestamp":1720742400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2024,7,12]]},"abstract":"<jats:p>\n                    With the expanding application of Large Language Models (LLMs) in various domains, it becomes imperative to comprehensively investigate their unforeseen behaviors and consequent outcomes. In this study, we introduce and systematically explore the phenomenon of \u201cglitch tokens\u201d, which are anomalous tokens produced by established tokenizers and could potentially compromise the models\u2019 quality of response. Specifically, we experiment on seven top popular LLMs utilizing three distinct tokenizers and involving a totally of 182,517 tokens. We present categorizations of the identified glitch tokens and symptoms exhibited by LLMs when interacting with glitch tokens. Based on our observation that glitch tokens tend to cluster in the embedding space, we propose\n                    <jats:sc>GlitchHunter<\/jats:sc>\n                    , a novel iterative clustering-based technique, for efficient glitch token detection. The evaluation shows that our approach notably outperforms three baseline methods on eight open-source LLMs. To the best of our knowledge, we present the first comprehensive study on glitch tokens. Our new detection further provides valuable insights into mitigating tokenization-related errors in LLMs.\n                  <\/jats:p>","DOI":"10.1145\/3660799","type":"journal-article","created":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T10:22:09Z","timestamp":1720779729000},"page":"2075-2097","source":"Crossref","is-referenced-by-count":12,"title":["Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-8032-3841","authenticated-orcid":false,"given":"Yuxi","family":"Li","sequence":"first","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4978-127X","authenticated-orcid":false,"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0046-6674","authenticated-orcid":false,"given":"Gelei","family":"Deng","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2770-9189","authenticated-orcid":false,"given":"Ying","family":"Zhang","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksberg, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9597-3587","authenticated-orcid":false,"given":"Wenjia","family":"Song","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksberg, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2023-0247","authenticated-orcid":false,"given":"Ling","family":"Shi","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3977-6573","authenticated-orcid":false,"given":"Kailong","family":"Wang","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4382-0757","authenticated-orcid":false,"given":"Yuekang","family":"Li","sequence":"additional","affiliation":[{"name":"UNSW, Sydney, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7300-9215","authenticated-orcid":false,"given":"Yang","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1100-8633","authenticated-orcid":false,"given":"Haoyu","family":"Wang","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,7,12]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"(Accessed on 09\/25\/2023). SolidGoldMagikarp (plus prompt generation). https:\/\/www.lesswrong.com\/posts\/aPeJE8bSo6rAFoLqg\/solidgoldmagikarp-plus-prompt-generation."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/2008\/10\/P10008"},{"key":"e_1_3_1_4_2","article-title":"Language Models are Few-Shot Learners","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]","journal-title":"arXiv:2005.14165 [cs.CL]"},{"key":"e_1_3_1_5_2","article-title":"Play Guessing Game with LLM: Indirect Jailbreak Attack with Implicit Clues","author":"Chang Zhiyuan","year":"2024","unstructured":"Zhiyuan Chang, Mingyang Li, Yi Liu, Junjie Wang, Qing Wang, and Yang Liu. 2024. Play Guessing Game with LLM: Indirect Jailbreak Attack with Implicit Clues. arXiv preprint arXiv:2402.09091 (2024).","journal-title":"arXiv preprint arXiv:2402.09091"},{"key":"e_1_3_1_6_2","article-title":"Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality","author":"Chiang Wei-Lin","year":"2023","unstructured":"Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, et al. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https:\/\/vicuna. lmsys. org (accessed 14 April 2023) (2023).","journal-title":"See https:\/\/vicuna. lmsys. org (accessed 14 April 2023)"},{"key":"e_1_3_1_7_2","article-title":"MASTERKEY: Automated jailbreaking of large language model chatbots","author":"Deng Gelei","year":"2024","unstructured":"Gelei Deng, Yi Liu, Yuekang Li, Kailong Wang, Ying Zhang, Zefeng Li, Haoyu Wang, Tianwei Zhang, and Yang Liu. 2024. MASTERKEY: Automated jailbreaking of large language model chatbots. In NDSS.","journal-title":"NDSS"},{"key":"e_1_3_1_8_2","article-title":"Pentestgpt: An llm-empowered automatic penetration testing tool","author":"Deng Gelei","year":"2023","unstructured":"Gelei Deng, Yi Liu, V\u00edctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. 2023. Pentestgpt: An llm-empowered automatic penetration testing tool. arXiv preprint arXiv:2308.06782 (2023).","journal-title":"arXiv preprint arXiv:2308.06782"},{"key":"e_1_3_1_9_2","article-title":"Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning","author":"Deng Gelei","year":"2024","unstructured":"Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tianwei Zhang, and Yang Liu. 2024. Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning. NDSS AISCC (2024).","journal-title":"NDSS AISCC"},{"key":"e_1_3_1_10_2","article-title":"Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt","author":"Deng Yinlin","year":"2023","unstructured":"Yinlin Deng, Chunqiu Steven Xia, Chenyuan Yang, Shizhuo Dylan Zhang, Shujing Yang, and Lingming Zhang. 2023. Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt. arXiv preprint arXiv:2304.02014 (2023).","journal-title":"arXiv preprint arXiv:2304.02014"},{"key":"e_1_3_1_11_2","doi-asserted-by":"crossref","unstructured":"Zhengxiao Du Yujie Qian Xiao Liu Ming Ding Jiezhong Qiu Zhilin Yang and Jie Tang. 2022. GLM: General Language Model Pretraining with Autoregressive Blank Infilling. ACL 320\u2013335.","DOI":"10.18653\/v1\/2022.acl-long.26"},{"key":"e_1_3_1_12_2","first-page":"226","article-title":"A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise","volume":"96","author":"Ester Martin","year":"1996","unstructured":"Martin Ester, Hans-Peter Kriegel, J\u00f6rg Sander, Xiaowei Xu, et al. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA, Vol. 96. 226\u2013231.","journal-title":"Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA"},{"key":"e_1_3_1_13_2","volume-title":"An Introduction to Qualitative Research","author":"Flick U.","year":"2009","unstructured":"U. Flick. 2009. An Introduction to Qualitative Research. SAGE Publications. https:\/\/books.google.com.sg\/books?id=sFv1oWX2DoEC"},{"key":"e_1_3_1_14_2","unstructured":"A Search for More ChatGPT \/ GPT-3.5 \/ GPT-4 \u201cUnspeakable\u201d Glitch Tokens. (Accessed on 09\/26\/2023). https:\/\/www.lesswrong.com\/posts\/kmWrwtGE9B9hpbgRT\/a-search-for-more-chatgpt-gpt-3-5-gpt-4-unspeakable-glitch"},{"issue":"1","key":"e_1_3_1_15_2","article-title":"How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment","volume":"9","author":"Gilson Aidan","year":"2023","unstructured":"Aidan Gilson, Conrad W Safranek, Thomas Huang, Vimig Socrates, Ling Chi, Richard Andrew Taylor, David Chartash, et al. 2023. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Medical Education 9, 1 (2023), e45312.","journal-title":"JMIR Medical Education"},{"key":"e_1_3_1_16_2","unstructured":"GlitchHunter. (Accessed on 03\/05\/2024). https:\/\/sites.google.com\/view\/glitchhunter-fse2024."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409756"},{"key":"e_1_3_1_18_2","article-title":"The K-means algorithm","volume":"4","author":"Hartigan J","year":"1975","unstructured":"J Hartigan. 1975. The K-means algorithm. Clustering algorithms 4 (1975).","journal-title":"Clustering algorithms"},{"key":"e_1_3_1_19_2","article-title":"An empirical study on fine-tuning large language models of code for automated program repair","author":"Huang Kai","year":"2023","unstructured":"Kai Huang, Xiangxin Meng, Jian Zhang, Yang Liu, Wenjie Wang, Shuhao Li, and Yuqing Zhang. 2023. An empirical study on fine-tuning large language models of code for automated program repair. In 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1162-1174.","journal-title":"2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE)"},{"key":"e_1_3_1_20_2","unstructured":"Hierarchical Clustering in Machine Learning. (Accessed on 09\/27\/2023). https:\/\/www.geeksforgeeks.org\/ml-hierarchical-clustering-agglomerative-and-divisive-clustering\/."},{"key":"e_1_3_1_21_2","article-title":"Mistral 7B","author":"Jiang Albert Q.","year":"2023","unstructured":"Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, L\u00e9lio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timoth\u00e9e Lacroix, and William El Sayed. 2023. Mistral 7B. arXiv:2310.06825 [cs.CL]","journal-title":"arXiv:2310.06825 [cs.CL]"},{"key":"e_1_3_1_22_2","article-title":"UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction","author":"McInnes James Melville Leland","year":"2018","unstructured":"James Melville Leland McInnes, John Healy. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 [stats.ML]","journal-title":"arXiv:1802.03426 [stats.ML]"},{"key":"e_1_3_1_23_2","article-title":"Teaching Models to Express Their Uncertainty in Words","author":"Lin Stephanie","year":"2022","unstructured":"Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. Teaching Models to Express Their Uncertainty in Words. arXiv:2205.14334 [cs.CL]","journal-title":"arXiv:2205.14334 [cs.CL]"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Jiawei Liu Jinkun Lin Fabian Ruffy Cheng Tan Jinyang Li Aurojit Panda and Lingming Zhang. 2023. Nnsmith: Generating diverse and valid test cases for deep learning compilers. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems Volume 2. 530\u2013543.","DOI":"10.1145\/3575693.3575707"},{"key":"e_1_3_1_25_2","article-title":"Prompt Injection attack against LLM-integrated Applications","author":"Liu Yi","year":"2023","unstructured":"Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Yang Liu. 2023. Prompt Injection attack against LLM-integrated Applications. arXiv preprint arXiv:2306.05499 (2023).","journal-title":"arXiv preprint arXiv:2306.05499"},{"key":"e_1_3_1_26_2","article-title":"Jailbreaking chatgpt via prompt engineering: An empirical study","author":"Liu Yi","year":"2023","unstructured":"Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen Zheng, Ying Zhang, Lida Zhao, Tianwei Zhang, and Yang Liu. 2023. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860 (2023).","journal-title":"arXiv preprint arXiv:2305.13860"},{"key":"e_1_3_1_27_2","article-title":"NLTK: The Natural Language Toolkit","author":"Loper Edward","year":"2002","unstructured":"Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. arXiv:cs\/0205028 [cs.CL]","journal-title":"arXiv:cs\/0205028 [cs.CL]"},{"key":"e_1_3_1_28_2","unstructured":"ML | K means++ Algorithm. (Accessed on 09\/27\/2023). https:\/\/www.geeksforgeeks.org\/ml-k-means-algorithm\/."},{"key":"e_1_3_1_29_2","unstructured":"C. Model card Models and evaluations for claude models. (Accessed on 09\/25\/2023). https:\/\/www-files.anthropic.com\/production\/images\/Model-Card-Claude-2.pdf."},{"key":"e_1_3_1_30_2","article-title":"Text and Code Embeddings by Contrastive Pre-Training","author":"Neelakantan Arvind","year":"2022","unstructured":"Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, and Lilian Weng. 2022. Text and Code Embeddings by Contrastive Pre-Training. arXiv:2201.10005 [cs.CL]","journal-title":"arXiv:2201.10005 [cs.CL]"},{"key":"e_1_3_1_31_2","article-title":"BARD: A structured technique for group elicitation of Bayesian networks to support analytic reasoning","author":"Nicholson Ann E.","year":"2020","unstructured":"Ann E. Nicholson, Kevin B. Korb, Erik P. Nyberg, Michael Wybrow, Ingrid Zukerman, Steven Mascaro, Shreshth Thakur, Abraham Oshni Alvandi, Jeff Riley, Ross Pearson, Shane Morris, Matthieu Herrmann, A.K.M. Azad, Fergus Bolger, Ulrike Hahn, and David Lagnado. 2020. BARD: A structured technique for group elicitation of Bayesian networks to support analytic reasoning. arXiv:2003.01207 [cs.AI]","journal-title":"arXiv:2003.01207 [cs.AI]"},{"key":"e_1_3_1_32_2","first-page":"1","volume-title":"ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Ok Hyunjong","year":"2023","unstructured":"Hyunjong Ok and Seong-Bae Park. 2023. Post-Trained Language Model Adaptive to Extractive Summarization of Long Spoken Documents. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1\u20132."},{"key":"e_1_3_1_33_2","unstructured":"OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]"},{"key":"e_1_3_1_34_2","unstructured":"The petertodd phenomenon. (Accessed on 09\/25\/2023). https:\/\/www.lesswrong.com\/posts\/jkY6QdCfAXHJk3kea\/the-petertodd-phenomenon."},{"key":"e_1_3_1_35_2","unstructured":"ShareGPT52K. (Accessed on 03\/06\/2024). https:\/\/huggingface.co\/datasets\/RyokoAI\/ShareGPT52K."},{"key":"e_1_3_1_36_2","article-title":"Release Strategies and the Social Impacts of Language Models","author":"Solaiman Irene","year":"2019","unstructured":"Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, and Jasmine Wang. 2019. Release Strategies and the Social Impacts of Language Models. arXiv:1908.09203 [cs.CL]","journal-title":"arXiv:1908.09203 [cs.CL]"},{"key":"e_1_3_1_37_2","unstructured":"Rohan Taori Ishaan Gulrajani Tianyi Zhang Yann Dubois Xuechen Li Carlos Guestrin Percy Liang and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. https:\/\/github.com\/tatsu-lab\/stanford_alpaca."},{"key":"e_1_3_1_38_2","unstructured":"SolidGoldMagikarp II: technical details and more recent findings. (Accessed on 09\/25\/2023). https:\/\/www.lesswrong.com\/posts\/Ya9LzwEbfaAMY8ABo\/solidgoldmagikarp-ii-technical-details-and-more-recent."},{"key":"e_1_3_1_39_2","first-page":"2583","article-title":"aeroBERT-NER: Named-Entity Recognition for Aerospace Requirements Engineering using BERT","author":"Ray Archana Tikayat","year":"2023","unstructured":"Archana Tikayat Ray, Olivia J Pinon-Fischer, Dimitri N Mavris, Ryan T White, and Bjorn F Cole. 2023. aeroBERT-NER: Named-Entity Recognition for Aerospace Requirements Engineering using BERT. In AIAA SCITECH 2023 Forum. 2583.","journal-title":"AIAA SCITECH 2023 Forum"},{"key":"e_1_3_1_40_2","unstructured":"SolidGoldMagikarp III: Glitch token archaeology \u2014 LessWrong. (Accessed on 09\/26\/2023). https:\/\/www.lesswrong.com\/posts\/8viQEp8KBg2QSW4Yc\/solidgoldmagikarp-iii-glitch-token-archaeology."},{"key":"e_1_3_1_41_2","article-title":"LLaMA: Open and Efficient Foundation Language Models","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]","journal-title":"arXiv:2302.13971 [cs.CL]"},{"key":"e_1_3_1_42_2","article-title":"From Louvain to Leiden: guaranteeing well-connected communities","author":"Jan van Nees","year":"2018","unstructured":"Nees Jan van Eck Vincent Traag, Ludo Waltman. 2018. From Louvain to Leiden: guaranteeing well-connected communities. arXiv:1810.08473 [cs.SI]","journal-title":"arXiv:1810.08473 [cs.SI]"},{"key":"e_1_3_1_43_2","article-title":"BiasAsker: Measuring the Bias in Conversational AI System","author":"Wan Yuxuan","year":"2023","unstructured":"Yuxuan Wan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, and Michael Lyu. 2023. BiasAsker: Measuring the Bias in Conversational AI System. arXiv:2305.12434 [cs.CL]","journal-title":"arXiv:2305.12434 [cs.CL]"},{"key":"e_1_3_1_44_2","article-title":"MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems in LLM Augmented Generation","author":"Wang Guanyu","year":"2024","unstructured":"Guanyu Wang, Yuekang Li, Yi Liu, Gelei Deng, Tianlin Li, Guosheng Xu, Yang Liu, Haoyu Wang, and Kailong Wang. 2024. MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems in LLM Augmented Generation. FORGE (2024).","journal-title":"FORGE"},{"key":"e_1_3_1_45_2","article-title":"Validating Multimedia Content Moderation Software via Semantic Fusion","author":"Wang Wenxuan","year":"2023","unstructured":"Wenxuan Wang, Jingyuan Huang, Chang Chen, Jiazhen Gu, Jianping Zhang, Weibin Wu, Pinjia He, and Michael Lyu. 2023. Validating Multimedia Content Moderation Software via Semantic Fusion. arXiv:2305.13623 [cs.SE]","journal-title":"arXiv:2305.13623 [cs.SE]"},{"key":"e_1_3_1_46_2","article-title":"An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software","author":"Wang Wenxuan","year":"2023","unstructured":"Wenxuan Wang, Jingyuan Huang, Jen tse Huang, Chang Chen, Jiazhen Gu, Pinjia He, and Michael R. Lyu. 2023. An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software. arXiv:2308.09810 [cs.SE]","journal-title":"arXiv:2308.09810 [cs.SE]"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00200"},{"key":"e_1_3_1_48_2","article-title":"MT TM: Metamorphic Testing for Textual Content Moderation Software","author":"Wang Wenxuan","year":"2023","unstructured":"Wenxuan Wang, Jen tse Huang, Weibin Wu, Jianping Zhang, Yizhan Huang, Shuqing Li, Pinjia He, and Michael Lyu. 2023. MT TM: Metamorphic Testing for Textual Content Moderation Software. arXiv:2302.05706 [cs.CL]","journal-title":"arXiv:2302.05706 [cs.CL]"},{"key":"e_1_3_1_49_2","article-title":"CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?","author":"Wei Tianwen","year":"2023","unstructured":"Tianwen Wei, Jian Luan, Wei Liu, Shuang Dong, and Bin Wang. 2023. CMATH: Can Your Language Model Pass Chinese Elementary School Math Test? arXiv preprint arXiv:2306.16636 (2023).","journal-title":"arXiv preprint arXiv:2306.16636"},{"key":"e_1_3_1_50_2","article-title":"LLM Jailbreak Attack versus Defense Techniques-A Comprehensive Study","author":"Xu Zihao","year":"2024","unstructured":"Zihao Xu, Yi Liu, Gelei Deng, Yuekang Li, and Stjepan Picek. 2024. LLM Jailbreak Attack versus Defense Techniques-A Comprehensive Study. arXiv preprint arXiv:2402.13457 (2024).","journal-title":"arXiv preprint arXiv:2402.13457"},{"key":"e_1_3_1_51_2","article-title":"Automated Testing and Improvement of Named Entity Recognition Systems","author":"Yu Boxi","year":"2023","unstructured":"Boxi Yu, Yiyan Hu, Qiuyang Mang, Wenhan Hu, and Pinjia He. 2023. Automated Testing and Improvement of Named Entity Recognition Systems. arXiv:2308.07937 [cs.CL]","journal-title":"arXiv:2308.07937 [cs.CL]"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.2991\/scict-14.2014.3"},{"key":"e_1_3_1_53_2","article-title":"Glm-130b: An open bilingual pre-trained model","author":"Zeng Aohan","year":"2022","unstructured":"Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, et al. 2022. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022).","journal-title":"arXiv preprint arXiv:2210.02414"},{"key":"e_1_3_1_54_2","article-title":"E-NER: Evidential Deep Learning for Trustworthy Named Entity Recognition","author":"Zhang Zhen","year":"2023","unstructured":"Zhen Zhang, Mengting Hu, Shiwan Zhaofor, Minlie Huang, Haotian Wang, Lemao Liu, Zhirui Zhang, Zhe Liu, and Bingzhe Wu. 2023. E-NER: Evidential Deep Learning for Trustworthy Named Entity Recognition. arXiv:2305.17854 [cs.CL]","journal-title":"arXiv:2305.17854 [cs.CL]"},{"key":"e_1_3_1_55_2","article-title":"Fine-Tuning Language Models from Human Preferences","author":"Ziegler Daniel M.","year":"2020","unstructured":"Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2020. Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593 [cs.CL]","journal-title":"arXiv:1909.08593 [cs.CL]"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660799","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3660799","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T07:54:42Z","timestamp":1770191682000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660799"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,12]]},"references-count":54,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2024,7,12]]}},"alternative-id":["10.1145\/3660799"],"URL":"https:\/\/doi.org\/10.1145\/3660799","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,12]]}}}