{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T18:45:16Z","timestamp":1771267516789,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,2,22]]},"DOI":"10.1145\/3773966.3779370","type":"proceedings-article","created":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T17:50:01Z","timestamp":1771264201000},"page":"1190-1194","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["LookAhead Tuning: Safer Language Models via Partial Answer Previews"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-9273-7191","authenticated-orcid":false,"given":"Kangwei","family":"Liu","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China and Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4488-9871","authenticated-orcid":false,"given":"Mengru","family":"Wang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China and Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-8976-380X","authenticated-orcid":false,"given":"Yujie","family":"Luo","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China and Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4545-9432","authenticated-orcid":false,"given":"Lin","family":"Yuan","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China and Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2639-9462","authenticated-orcid":false,"given":"Mengshu","family":"Sun","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China and Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9700-5809","authenticated-orcid":false,"given":"Lei","family":"Liang","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China and Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2321-7259","authenticated-orcid":false,"given":"Zhiqiang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China and Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6033-6102","authenticated-orcid":false,"given":"Jun","family":"Zhou","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China and Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5645-1754","authenticated-orcid":false,"given":"Bryan","family":"Hooi","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4049-8478","authenticated-orcid":false,"given":"Shumin","family":"Deng","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,2,21]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Daniel Alexander Alber Zihao Yang Anton Alyakin Eunice Yang Sumedha Rai Aly A Valliani Jeff Zhang Gabriel R Rosenbaum Ashley K Amend-Thomas David B Kurland et al. 2025. Medical large language models are vulnerable to data-poisoning attacks. Nature Medicine (2025) 1-9."},{"key":"e_1_3_2_1_2_1","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano et al. 2021. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-5409"},{"key":"e_1_3_2_1_4_1","volume-title":"What's in Your'' Safe'' Data?: Identifying Benign Data that Breaks Safety. arXiv preprint arXiv:2404.01099","author":"He Luxi","year":"2024","unstructured":"Luxi He, Mengzhou Xia, and Peter Henderson. 2024. What's in Your'' Safe'' Data?: Identifying Benign Data that Breaks Safety. arXiv preprint arXiv:2404.01099 (2024)."},{"key":"e_1_3_2_1_5_1","volume-title":"Vaccine: Perturbation-aware Alignment for Large Language Model. arXiv preprint arXiv:2402.01109","author":"Huang Tiansheng","year":"2024","unstructured":"Tiansheng Huang, Sihao Hu, and Ling Liu. 2024. Vaccine: Perturbation-aware Alignment for Large Language Model. arXiv preprint arXiv:2402.01109 (2024)."},{"key":"e_1_3_2_1_6_1","unstructured":"Ke Ji Jiahao Xu Tian Liang Qiuzhi Liu Zhiwei He Xingyu Chen Xiaoyuan Liu Zhijie Wang Junying Chen Benyou Wang et al. 2025. The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models. arXiv preprint arXiv:2503.02875 (2025)."},{"key":"e_1_3_2_1_7_1","volume-title":"Lora fine-tuning efficiently undoes safety training in llama 2-chat 70b. arXiv preprint arXiv:2310.20624","author":"Lermen Simon","year":"2023","unstructured":"Simon Lermen, Charlie Rogers-Smith, and Jeffrey Ladish. 2023. Lora fine-tuning efficiently undoes safety training in llama 2-chat 70b. arXiv preprint arXiv:2310.20624 (2023)."},{"key":"e_1_3_2_1_8_1","volume-title":"Revisiting Catastrophic Forgetting in Large Language Model Tuning. In Findings of the Association for Computational Linguistics: EMNLP 2024","author":"Li Hongyu","year":"2024","unstructured":"Hongyu Li, Liang Ding, Meng Fang, and Dacheng Tao. 2024. Revisiting Catastrophic Forgetting in Large Language Model Tuning. In Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 12-16, 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, 4297-4308. https:\/\/aclanthology.org\/2024.findings-emnlp.249"},{"key":"e_1_3_2_1_9_1","volume-title":"The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=wxJ0eXwwda","author":"Lin Bill Yuchen","year":"2024","unstructured":"Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, and Yejin Choi. 2024. The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=wxJ0eXwwda"},{"key":"e_1_3_2_1_10_1","volume-title":"Chandra Bhagavatula, and Yejin Choi.","author":"Lin Bill Yuchen","year":"2023","unstructured":"Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Raghavi Chandu, Chandra Bhagavatula, and Yejin Choi. 2023. The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning. ArXiv, Vol. abs\/2312.01552 (2023). https:\/\/api.semanticscholar.org\/CorpusID:265608902"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2308.08747"},{"key":"e_1_3_2_1_12_1","volume-title":"Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Cand\u00e8s, and Tatsunori Hashimoto.","author":"Muennighoff Niklas","year":"2025","unstructured":"Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Cand\u00e8s, and Tatsunori Hashimoto. 2025. s1: Simple test-time scaling. arXiv preprint arXiv:2501.19393 (2025)."},{"key":"e_1_3_2_1_13_1","volume-title":"Exploiting Novel GPT-4 APIs. arXiv preprint arXiv:2312.14302","author":"Pelrine Kellin","year":"2023","unstructured":"Kellin Pelrine, Mohammad Taufeeque, Michal Zajac, Euan McLean, and Adam Gleave. 2023. Exploiting Novel GPT-4 APIs. arXiv preprint arXiv:2312.14302 (2023)."},{"key":"e_1_3_2_1_14_1","volume-title":"Safety Alignment Should Be Made More Than Just a Few Tokens Deep. arXiv preprint arXiv:2406.05946","author":"Qi Xiangyu","year":"2024","unstructured":"Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, and Peter Henderson. 2024. Safety Alignment Should Be Made More Than Just a Few Tokens Deep. arXiv preprint arXiv:2406.05946 (2024)."},{"key":"e_1_3_2_1_15_1","volume-title":"Even When Users Do Not Intend To! arXiv preprint arXiv:2310.03693","author":"Qi Xiangyu","year":"2023","unstructured":"Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, and Peter Henderson. 2023. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! arXiv preprint arXiv:2310.03693 (2023)."},{"key":"e_1_3_2_1_16_1","unstructured":"Hugo Touvron Louis Martin Kevin R. Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Daniel M. Bikel Lukas Blecher Cristian Cant\u00f3n Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony S. Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel M. Kloumann A. V. Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith R. Subramanian Xia Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zhengxu Yan Iliyan Zarov Yuchen Zhang Angela Fan Melissa Hall Melanie Kambadur Sharan Narang Aur\u00e9lien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv Vol. abs\/2307.09288 (2023). https:\/\/api.semanticscholar.org\/CorpusID:259950998"},{"key":"e_1_3_2_1_17_1","volume-title":"Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications. arXiv preprint arXiv:2402.05162","author":"Wei Boyi","year":"2024","unstructured":"Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, and Peter Henderson. 2024. Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications. arXiv preprint arXiv:2402.05162 (2024)."},{"key":"e_1_3_2_1_18_1","volume-title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.). http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html"},{"key":"e_1_3_2_1_19_1","volume-title":"Bill Yuchen Lin, and Radha Poovendran","author":"Xu Zhangchen","year":"2024","unstructured":"Zhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bill Yuchen Lin, and Radha Poovendran. 2024. SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding. arXiv preprint arXiv:2402.08983 (2024)."},{"key":"e_1_3_2_1_20_1","volume-title":"Xun Zhao, and Dahua Lin.","author":"Yang Xianjun","year":"2023","unstructured":"Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, and Dahua Lin. 2023a. Shadow alignment: The ease of subverting safely-aligned language models. arXiv preprint arXiv:2310.02949 (2023)."},{"key":"e_1_3_2_1_21_1","volume-title":"Xun Zhao, and Dahua Lin.","author":"Yang Xianjun","year":"2023","unstructured":"Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, and Dahua Lin. 2023b. Shadow alignment: The ease of subverting safely-aligned language models. arXiv preprint arXiv:2310.02949 (2023)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2024.ACL-LONG.58"},{"key":"e_1_3_2_1_23_1","first-page":"9236","article-title":"On the vulnerability of safety alignment in open-access llms","volume":"2024","author":"Yi Jingwei","year":"2024","unstructured":"Jingwei Yi, Rui Ye, Qisi Chen, Bin Zhu, Siheng Chen, Defu Lian, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2024. On the vulnerability of safety alignment in open-access llms. In Findings of the Association for Computational Linguistics ACL 2024. 9236-9260.","journal-title":"Findings of the Association for Computational Linguistics ACL"},{"key":"e_1_3_2_1_24_1","volume-title":"Removing rlhf protections in gpt-4 via fine-tuning. arXiv preprint arXiv:2311.05553","author":"Zhan Qiusi","year":"2023","unstructured":"Qiusi Zhan, Richard Fang, Rohan Bindu, Akul Gupta, Tatsunori Hashimoto, and Daniel Kang. 2023. Removing rlhf protections in gpt-4 via fine-tuning. arXiv preprint arXiv:2311.05553 (2023)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2024.FINDINGS-ACL.209"},{"key":"e_1_3_2_1_26_1","volume-title":"The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=tmsqb6WpLz","author":"Zhang Xiao","year":"2024","unstructured":"Xiao Zhang and Ji Wu. 2024. Dissecting learning and forgetting in language model finetuning. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=tmsqb6WpLz"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2303.18223"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2411.14405"}],"event":{"name":"WSDM '26:The Nineteenth ACM International Conference on Web Search and Data Mining","location":"Boise ID USA","sponsor":["SIGKDD ACM Special Interest Group on Knowledge Discovery in Data","SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval","SIGMOD ACM Special Interest Group on Management of Data"]},"container-title":["Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining"],"original-title":[],"deposited":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T17:57:17Z","timestamp":1771264637000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3773966.3779370"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,21]]},"references-count":28,"alternative-id":["10.1145\/3773966.3779370","10.1145\/3773966"],"URL":"https:\/\/doi.org\/10.1145\/3773966.3779370","relation":{},"subject":[],"published":{"date-parts":[[2026,2,21]]},"assertion":[{"value":"2026-02-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}