{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T15:56:14Z","timestamp":1780674974158,"version":"3.54.1"},"reference-count":62,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T00:00:00Z","timestamp":1746576000000},"content-version":"vor","delay-in-days":126,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,4,24]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing subpar results. We study this and find that (i) preference data gives a better learning signal when the underlying responses are contrastive, and (ii) alignment objectives lead to better performance when they specify more control over the model during training. Based on these insights, we introduce Contrastive Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs, and Anchored Preference Optimization (APO), a controllable and more stable alignment objective. We align Llama-3-8B-Instruct using various comparable datasets and alignment objectives and measure MixEval-Hard scores, which correlate highly with human judgments. The CLAIR preferences lead to the strongest performance out of all datasets, and APO consistently outperforms less controllable objectives. Our best model, trained on 32K CLAIR preferences with APO, improves Llama-3-8B-Instruct by 7.65%, closing the gap with GPT4-turbo by 45%. Our code and datasets are available.<\/jats:p>","DOI":"10.1162\/tacl_a_00748","type":"journal-article","created":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T20:02:31Z","timestamp":1746648151000},"page":"442-460","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":3,"title":["Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment"],"prefix":"10.1162","volume":"13","author":[{"given":"Karel","family":"D'Oosterlinck","sequence":"first","affiliation":[{"name":"Ghent University \u2013 imec, Belgium karel@contextual.ai"},{"name":"Contextual AI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Winnie","family":"Xu","sequence":"additional","affiliation":[{"name":"Contextual AI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chris","family":"Develder","sequence":"additional","affiliation":[{"name":"Ghent University \u2013 imec, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thomas","family":"Demeester","sequence":"additional","affiliation":[{"name":"Ghent University \u2013 imec, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Amanpreet","family":"Singh","sequence":"additional","affiliation":[{"name":"Contextual AI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Christopher","family":"Potts","sequence":"additional","affiliation":[{"name":"Stanford University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Douwe","family":"Kiela","sequence":"additional","affiliation":[{"name":"Stanford University, USA"},{"name":"Contextual AI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shikib","family":"Mehri","sequence":"additional","affiliation":[{"name":"Contextual AI, USA. shikib@contextual.ai"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","published-online":{"date-parts":[[2025,4,24]]},"reference":[{"key":"2025050716022813900_bib1","article-title":"GPT-4 technical report","author":"Achiam","year":"2023","journal-title":"arXiv preprint arXiv: 2303.08774"},{"key":"2025050716022813900_bib2","doi-asserted-by":"publisher","first-page":"8854","DOI":"10.18653\/v1\/2023.acl-long.493","article-title":"The CRINGE loss: Learning what language not to model","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Adolphs","year":"2023"},{"key":"2025050716022813900_bib3","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.592","article-title":"Direct preference optimization with an offset","author":"Amini","year":"2024","journal-title":"arXiv preprint arXiv:2402.10571"},{"key":"2025050716022813900_bib4","article-title":"Program synthesis with large language models","author":"Austin","year":"2021","journal-title":"arXiv preprint arXiv:2108.07732"},{"key":"2025050716022813900_bib5","first-page":"4447","article-title":"A general theoretical paradigm to understand learning from human preferences","volume-title":"International Conference on Artificial Intelligence and Statistics","author":"Azar","year":"2024"},{"key":"2025050716022813900_bib6","article-title":"Constitutional AI: Harmlessness from AI feedback","author":"Bai","year":"2022","journal-title":"arXiv preprint arXiv:2212.08073"},{"key":"2025050716022813900_bib7","doi-asserted-by":"publisher","first-page":"7432","DOI":"10.1609\/aaai.v34i05.6239","article-title":"PIQA: Reasoning about physical commonsense in natural language","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Bisk","year":"2020"},{"key":"2025050716022813900_bib8","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025050716022813900_bib9","article-title":"Chatbot arena: An open platform for evaluating LLMs by human preference","volume-title":"Forty-first International Conference on Machine Learning","author":"Chiang","year":"2024"},{"key":"2025050716022813900_bib10","first-page":"4302","article-title":"Deep reinforcement learning from human preferences","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Christiano","year":"2017"},{"key":"2025050716022813900_bib11","first-page":"2924","article-title":"BoolQ: Exploring the surprising difficulty of natural yes\/no questions","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Clark","year":"2019"},{"key":"2025050716022813900_bib12","article-title":"Think you have solved question answering? Try ARC, the AI2 reasoning challenge","author":"Clark","year":"2018","journal-title":"arXiv preprint arXiv: 1803.05457"},{"key":"2025050716022813900_bib13","article-title":"Training verifiers to solve math word problems","author":"Cobbe","year":"2021","journal-title":"arXiv preprint arXiv:2110.14168"},{"key":"2025050716022813900_bib14","article-title":"ULTRAFEEDBACK: Boosting language models with scaled AI feedback","volume-title":"Forty-first International Conference on Machine Learning","author":"Cui","year":"2024"},{"key":"2025050716022813900_bib15","first-page":"2368","article-title":"DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Dua","year":"2019"},{"key":"2025050716022813900_bib16","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-emnlp.56","article-title":"Negating negatives: Alignment without human positive samples via distributional dispreference optimization","author":"Duan","year":"2024","journal-title":"arXiv preprint arXiv: 2403.03419"},{"key":"2025050716022813900_bib17","article-title":"The llama 3 herd of models","author":"Dubey","year":"2024","journal-title":"arXiv preprint arXiv:2407.21783"},{"key":"2025050716022813900_bib18","article-title":"Length-controlled alpacaeval: A simple way to debias automatic evaluators","author":"Dubois","year":"2024","journal-title":"arXiv preprint arXiv: 2404.04475"},{"key":"2025050716022813900_bib19","article-title":"Helping or herding? Reward model ensembles mitigate but do not eliminate reward hacking","author":"Eisenstein","year":"2023","journal-title":"arXiv preprint arXiv:2312.09244"},{"key":"2025050716022813900_bib20","first-page":"5988","article-title":"Understanding dataset difficulty with V-usable information","volume-title":"Proceedings of the 39th International Conference on Machine Learning","author":"Ethayarajh","year":"2022"},{"key":"2025050716022813900_bib21","article-title":"KTO: Model alignment as prospect theoretic optimization","author":"Ethayarajh","year":"2024","journal-title":"arXiv preprint arXiv: 2402.01306"},{"key":"2025050716022813900_bib22","article-title":"Towards analyzing and understanding the limitations of DPO: A theoretical perspective","author":"Feng","year":"2024","journal-title":"arXiv preprint arXiv:2404.04626"},{"key":"2025050716022813900_bib23","first-page":"10835","article-title":"Scaling laws for reward model overoptimization","volume-title":"International Conference on Machine Learning","author":"Gao","year":"2023"},{"key":"2025050716022813900_bib24","article-title":"Measuring massive multitask language understanding","volume-title":"International Conference on Learning Representations","author":"Hendrycks","year":"2020"},{"key":"2025050716022813900_bib25","article-title":"Measuring mathematical problem solving with the MATH dataset","volume-title":"Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)","author":"Hendrycks","year":"2021"},{"issue":"7","key":"2025050716022813900_bib26","doi-asserted-by":"publisher","first-page":"578","DOI":"10.1136\/jech.2004.029496","article-title":"Estimating causal effects from epidemiological data","volume":"60","author":"Hern\u00e1n","year":"2006","journal-title":"Journal of Epidemiology & Community Health"},{"key":"2025050716022813900_bib27","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.emnlp-main.626","article-title":"Reference-free monolithic preference optimization with odds ratio","author":"Hong","year":"2024","journal-title":"arXiv preprint arXiv:2403.07691"},{"key":"2025050716022813900_bib28","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.792","article-title":"LLM-Blender: Ensembling large language models with pairwise ranking and generative fusion","volume-title":"The 61st Annual Meeting of the Association for Computational Linguistics","author":"Jiang","year":"2023"},{"key":"2025050716022813900_bib29","doi-asserted-by":"publisher","first-page":"1601","DOI":"10.18653\/v1\/P17-1147","article-title":"TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Joshi","year":"2017"},{"key":"2025050716022813900_bib30","article-title":"Binary classifier optimization for large language model alignment","author":"Jung","year":"2024","journal-title":"arXiv preprint arXiv:2404.04656"},{"key":"2025050716022813900_bib31","article-title":"sDPO: Don\u2019t use your data all at once","author":"Kim","year":"2024","journal-title":"arXiv preprint arXiv: 2403.19270"},{"key":"2025050716022813900_bib32","article-title":"Self-directed synthetic dialogues and revisions technical report","author":"Lambert","year":"2024","journal-title":"arXiv preprint arXiv:2407.18421"},{"key":"2025050716022813900_bib33","unstructured":"Xuechen\n              Li\n            , TianyiZhang, YannDubois, RohanTaori, IshaanGulrajani, CarlosGuestrin, PercyLiang, and Tatsunori B.Hashimoto. 2023. AlpacaEval: An automatic evaluator of instruction-following models. https:\/\/github.com\/tatsu-lab\/alpaca_eval"},{"key":"2025050716022813900_bib34","article-title":"SimPO: Simple preference optimization with a reference-free reward","author":"Meng","year":"2024","journal-title":"arXiv preprint arXiv:2405.14734"},{"key":"2025050716022813900_bib35","doi-asserted-by":"publisher","first-page":"2381","DOI":"10.18653\/v1\/D18-1260","article-title":"Can a suit of armor conduct electricity? A new dataset for open book question answering","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Mihaylov","year":"2018"},{"key":"2025050716022813900_bib36","article-title":"MixEval: Deriving wisdom of the crowd from LLM benchmark mixtures","author":"Ni","year":"2024","journal-title":"arXiv preprint arXiv:2406.06565"},{"key":"2025050716022813900_bib37","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025050716022813900_bib38","article-title":"Smaug: Fixing failure modes of preference optimisation with DPO-positive","author":"Pal","year":"2024","journal-title":"arXiv preprint arXiv:2402.13228"},{"key":"2025050716022813900_bib39","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.297","article-title":"Disentangling length from quality in direct preference optimization","author":"Park","year":"2024","journal-title":"arXiv preprint arXiv:2403.19159"},{"key":"2025050716022813900_bib40","article-title":"PAFT: A parallel training paradigm for effective LLM fine-tuning","author":"Pentyala","year":"2024","journal-title":"arXiv preprint arXiv:2406.17923"},{"key":"2025050716022813900_bib41","article-title":"Scaling laws for reward model overoptimization in direct alignment algorithms","author":"Rafailov","year":"2024","journal-title":"arXiv preprint arXiv:2406.02900"},{"key":"2025050716022813900_bib42","article-title":"Direct preference optimization: Your language model is secretly a reward model","volume":"36","author":"Rafailov","year":"2024","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025050716022813900_bib43","article-title":"GPQA: A graduate-level google-proof q&a benchmark","author":"Rein","year":"2023","journal-title":"arXiv preprint arXiv:2311.12022"},{"key":"2025050716022813900_bib44","article-title":"Offline regularised reinforcement learning for large language models alignment","author":"Richemond","year":"2024","journal-title":"arXiv preprint arXiv:2405 .19107"},{"key":"2025050716022813900_bib45","article-title":"Direct Nash optimization: Teaching language models to self-improve with general preferences","author":"Rosset","year":"2024","journal-title":"arXiv preprint arXiv: 2404.03715"},{"key":"2025050716022813900_bib46","article-title":"Verbosity bias in preference labeling by large language models","volume-title":"NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following","author":"Saito","year":"2023"},{"key":"2025050716022813900_bib47","doi-asserted-by":"crossref","first-page":"4463","DOI":"10.18653\/v1\/D19-1454","article-title":"Social IQa: Commonsense reasoning about social interactions","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Sap","year":"2019"},{"key":"2025050716022813900_bib48","article-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017","journal-title":"arXiv preprint arXiv:1707.06347"},{"key":"2025050716022813900_bib49","doi-asserted-by":"publisher","first-page":"13003","DOI":"10.18653\/v1\/2023.findings-acl.824","article-title":"Challenging BIG-bench tasks and whether chain-of-thought can solve them","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Suzgun","year":"2023"},{"key":"2025050716022813900_bib50","first-page":"4149","article-title":"CommonsenseQA: A question answering challenge targeting commonsense knowledge","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Talmor","year":"2019"},{"key":"2025050716022813900_bib51","unstructured":"Leandro\n              von Werra\n            , YounesBelkada, LewisTunstall, EdwardBeeching, TristanThrush, NathanLambert, and ShengyiHuang. 2020. TRL: Transformer reinforcement learning. https:\/\/github.com\/huggingface\/trl"},{"key":"2025050716022813900_bib52","article-title":"A comprehensive survey of LLM alignment techniques: RLHF, RLAIF, PPO, DPO and more","author":"Wang","year":"2024","journal-title":"arXiv preprint arXiv:2407.16216"},{"key":"2025050716022813900_bib53","unstructured":"Becca\n              Williams\n            \n          . 2023. Parallel process GPT. https:\/\/github.com\/tiny-rawr\/parallel_process_gpt"},{"key":"2025050716022813900_bib54","article-title":"\u03b2-DPO: Direct preference optimization with dynamic \u03b2","author":"Junkang","year":"2024","journal-title":"arXiv preprint arXiv:2407.08639"},{"key":"2025050716022813900_bib55","article-title":"Self-play preference optimization for language model alignment","author":"Yue","year":"2024","journal-title":"arXiv preprint arXiv:2405 .00675"},{"key":"2025050716022813900_bib56","article-title":"Contrastive preference optimization: Pushing the boundaries of LLM performance in machine translation","volume-title":"Forty-first International Conference on Machine Learning","author":"Haoran","year":"2024"},{"key":"2025050716022813900_bib57","article-title":"Self-rewarding language models","volume-title":"Forty-first International Conference on Machine Learning","author":"Yuan","year":"2024"},{"key":"2025050716022813900_bib58","doi-asserted-by":"publisher","first-page":"4791","DOI":"10.18653\/v1\/P19-1472","article-title":"HellaSwag: Can a machine really finish your sentence?","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zellers","year":"2019"},{"key":"2025050716022813900_bib59","article-title":"Negative preference optimization: From catastrophic collapse to effective unlearning","author":"Zhang","year":"2024","journal-title":"arXiv preprint arXiv:2404.05868"},{"key":"2025050716022813900_bib60","article-title":"Slic-hf: Sequence likelihood calibration with human feedback","author":"Zhao","year":"2023","journal-title":"arXiv preprint arXiv:2305.10425"},{"key":"2025050716022813900_bib61","doi-asserted-by":"publisher","first-page":"2299","DOI":"10.18653\/v1\/2024.findings-naacl.149","article-title":"AGIEval: A human-centric benchmark for evaluating foundation models","volume-title":"Findings of the Association for Computational Linguistics: NAACL 2024","author":"Zhong","year":"2024"},{"key":"2025050716022813900_bib62","article-title":"Starling-7B: Improving LLM helpfulness & harmlessness with RLAIF","author":"Zhu","year":"2023"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00748\/2522342\/tacl_a_00748.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00748\/2522342\/tacl_a_00748.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T20:02:42Z","timestamp":1746648162000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00748\/130712\/Anchored-Preference-Optimization-and-Contrastive"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":62,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00748","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025]]},"published":{"date-parts":[[2025]]}}}