{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T13:40:03Z","timestamp":1767706803585,"version":"3.41.0"},"reference-count":53,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T00:00:00Z","timestamp":1751414400000},"content-version":"vor","delay-in-days":182,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,6,27]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Rationalization is a framework that aims to build self-explanatory NLP models by extracting a subset of human-intelligible pieces of their inputting texts. It involves a cooperative game where a selector selects the most human-intelligible parts of the input as the rationale, followed by a predictor that makes predictions based on these selected rationales. Existing literature uses the cross-entropy between the model\u2019s predictions and the ground-truth labels to measure the informativeness of the selected rationales, guiding the selector to choose better ones. In this study, we first theoretically analyze the objective of rationalization by decomposing it into two parts: the model-agnostic informativeness of the rationale candidates and the predictor\u2019s degree of fit. We then provide various empirical evidence to support that, under this framework, the selector tends to sample from a limited small region, causing the predictor to overfit these localized areas. This results in a significant mismatch between the cross-entropy objective and the informativeness of the rationale candidates, leading to suboptimal solutions. To address this issue, we propose a simple yet effective method that introduces random vicinal1 perturbations to the selected rationale candidates. This approach broadens the predictor\u2019s assessment to a vicinity around the selected rationale candidate. Compared to recent competitive methods, our method significantly improves rationale quality (by up to 6.6%) across six widely used classification datasets.<\/jats:p>\n               <jats:p>The term \u201cvicinal\u201d is borrowed from vicinal risk minimization (Chapelle et al., 2000); \u201cvicinal\u201d means neighboring or adjacent.<\/jats:p>","DOI":"10.1162\/tacl_a_00758","type":"journal-article","created":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T20:07:08Z","timestamp":1751486828000},"page":"577-594","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":1,"title":["Exploring Practical Gaps in Using Cross Entropy to Implement Maximum Mutual Information Criterion for Rationalization"],"prefix":"10.1162","volume":"13","author":[{"given":"Wei","family":"Liu","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, HUST, China. idc_lw@hust.edu.cn"}]},{"given":"Zhiying","family":"Deng","sequence":"additional","affiliation":[{"name":"Faculty of Artificial Intelligence in Education, Central China Normal University, China. zhiyingdzy@gmail.com"}]},{"given":"Zhongyu","family":"Niu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, HUST, China. zy_niu@hust.edu.cn"}]},{"given":"Jun","family":"Wang","sequence":"additional","affiliation":[{"name":"iWudao Tech, China. jwang@iwudao.tech"}]},{"given":"Haozhao","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, HUST, China. hz_wang@hust.edu.cn"}]},{"given":"Ruixuan","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, HUST, China. rxli@hust.edu.cn"}]}],"member":"281","published-online":{"date-parts":[[2025,6,27]]},"reference":[{"key":"2025070216070340500_bib1","doi-asserted-by":"publisher","first-page":"2963","DOI":"10.18653\/v1\/P19-1284","article-title":"Interpretable neural predictions with differentiable binary variables","volume-title":"Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28\u2013August 2, 2019, Volume 1: Long Papers","author":"Bastings","year":"2019"},{"key":"2025070216070340500_bib2","doi-asserted-by":"publisher","first-page":"2867","DOI":"10.18653\/v1\/2022.bigscience-1.5","article-title":"UNIREX: A unified learning framework for language model rationale extraction","volume-title":"International Conference on Machine Learning, ICML 2022, 17\u201323 July 2022, Baltimore, Maryland, USA","author":"Chan","year":"2022"},{"key":"2025070216070340500_bib3","first-page":"10055","article-title":"A game theoretic approach to class-wise selective rationalization","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8\u201314, 2019, Vancouver, BC, Canada","author":"Chang","year":"2019"},{"key":"2025070216070340500_bib4","first-page":"1448","article-title":"Invariant rationalization","volume-title":"Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event","author":"Chang","year":"2020"},{"key":"2025070216070340500_bib5","first-page":"416","article-title":"Vicinal risk minimization","volume-title":"Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA","author":"Chapelle","year":"2000"},{"key":"2025070216070340500_bib6","doi-asserted-by":"publisher","first-page":"3792","DOI":"10.18653\/v1\/2022.naacl-main.278","article-title":"Can rationalization improve robustness?","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10\u201315, 2022","author":"Chen","year":"2022"},{"key":"2025070216070340500_bib7","first-page":"6424","article-title":"Learning to maximize mutual information for dynamic feature selection","volume-title":"International Conference on Machine Learning","author":"Covert","year":"2023"},{"key":"2025070216070340500_bib8","doi-asserted-by":"publisher","first-page":"4443","DOI":"10.18653\/v1\/2020.acl-main.408","article-title":"ERASER: A benchmark to evaluate rationalized NLP models","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"DeYoung","year":"2020"},{"key":"2025070216070340500_bib9","first-page":"11423","article-title":"Efficient and accurate estimation of lipschitz constants for deep neural networks","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8\u201314, 2019, Vancouver, BC, Canada","author":"Fazlyab","year":"2019"},{"issue":"11","key":"2025070216070340500_bib10","doi-asserted-by":"publisher","first-page":"e745\u2013e750","DOI":"10.1016\/S2589-7500(21)00208-9","article-title":"The false hope of current approaches to explainable artificial intelligence in health care","volume":"3","author":"Ghassemi","year":"2021","journal-title":"The Lancet Digital Health"},{"key":"2025070216070340500_bib11","doi-asserted-by":"publisher","first-page":"7275","DOI":"10.18653\/v1\/2022.acl-long.502","article-title":"Transkimmer: Transformer learns to layer-wise skim","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22\u201327, 2022","author":"Guan","year":"2022"},{"key":"2025070216070340500_bib12","first-page":"3945","article-title":"Joint learning of label and environment causal independence for graph out-of-distribution generalization","volume":"36","author":"Gui","year":"2023","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025070216070340500_bib13","doi-asserted-by":"publisher","first-page":"13090","DOI":"10.1609\/aaai.v35i14.17547","article-title":"Distribution matching for rationalization","volume-title":"Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2\u20139, 2021","author":"Huang","year":"2021"},{"key":"2025070216070340500_bib14","doi-asserted-by":"publisher","first-page":"4198","DOI":"10.18653\/v1\/2020.acl-main.386","article-title":"Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness?","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5\u201310, 2020","author":"Jacovi","year":"2020"},{"key":"2025070216070340500_bib15","doi-asserted-by":"publisher","first-page":"4459","DOI":"10.18653\/v1\/2020.acl-main.409","article-title":"Learning to faithfully rationalize by construction","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5\u201310, 2020","author":"Jain","year":"2020"},{"issue":"12","key":"2025070216070340500_bib16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3571730","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji","year":"2023","journal-title":"ACM Computing Surveys"},{"key":"2025070216070340500_bib17","article-title":"Causal reasoning and large language models: Opening a new frontier for causality","author":"K\u0131c\u0131man","year":"2023","journal-title":"arXiv preprint arXiv:2305.00050"},{"key":"2025070216070340500_bib18","article-title":"Adam: A method for stochastic optimization","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, Conference Track Proceedings","author":"Kingma","year":"2015"},{"key":"2025070216070340500_bib19","doi-asserted-by":"publisher","first-page":"107","DOI":"10.18653\/v1\/D16-1011","article-title":"Rationalizing neural predictions","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1\u20134, 2016","author":"Lei","year":"2016"},{"key":"2025070216070340500_bib20","article-title":"Evaluating chatgpt\u2019s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness","author":"Bo","year":"2023","journal-title":"arXiv preprint arXiv:2304.11633"},{"issue":"3","key":"2025070216070340500_bib21","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1145\/3236386.3241340","article-title":"The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery","volume":"16","author":"Lipton","year":"2018","journal-title":"Queue"},{"key":"2025070216070340500_bib22","article-title":"Breaking free from mmi: A new frontier in rationalization by probing input utilization","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Liu","year":"2025"},{"key":"2025070216070340500_bib23","article-title":"Is the MMI criterion necessary for interpretability? Degenerating non-causal features to plain noise for self-rationalization","volume-title":"Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10\u201315, 2024","author":"Liu","year":"2024"},{"key":"2025070216070340500_bib24","doi-asserted-by":"publisher","first-page":"12771","DOI":"10.18653\/v1\/2023.acl-long.715","article-title":"MGR: multi-generator based rationalization","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9\u201314, 2023","author":"Liu","year":"2023"},{"key":"2025070216070340500_bib25","article-title":"Fr: Folded rationalization with a unified encoder","volume-title":"Advances in Neural Information Processing Systems","author":"Liu","year":"2022"},{"key":"2025070216070340500_bib26","article-title":"Attacking for inspection and instruction: Debiasing self-explaining text classification","author":"Liu","year":"2024"},{"key":"2025070216070340500_bib27","article-title":"D-separation for causal self- explanation","volume-title":"Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10\u201316, 2023","author":"Liu","year":"2023"},{"key":"2025070216070340500_bib28","doi-asserted-by":"publisher","first-page":"1535","DOI":"10.1145\/3580305.3599299","article-title":"Decoupled rationalization with asymmetric learning rates: A flexible lipschitz restraint","volume-title":"Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6\u201310, 2023","author":"Liu","year":"2023"},{"key":"2025070216070340500_bib29","article-title":"Parameterized explainer for graph neural network","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6\u201312, 2020, virtual","author":"Luo","year":"2020"},{"key":"2025070216070340500_bib30","doi-asserted-by":"publisher","first-page":"1020","DOI":"10.1109\/ICDM.2012.110","article-title":"Learning attitudes and attributes from multi-aspect reviews","volume-title":"12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 10\u201313, 2012","author":"McAuley","year":"2012"},{"key":"2025070216070340500_bib31","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1162\/tacl_a_00465","article-title":"Evaluating explanations: How much do explanations from the teacher aid students?","volume":"10","author":"Pruthi","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025070216070340500_bib32","article-title":"Is chatgpt a general-purpose natural language processing task solver?","author":"Qin","year":"2023","journal-title":"arXiv preprint arXiv:2302.06476"},{"key":"2025070216070340500_bib33","article-title":"Where we have arrived in proving the emergence of sparse interaction primitives in DNNs","volume-title":"The Twelfth International Conference on Learning Representations","author":"Ren","year":"2024"},{"key":"2025070216070340500_bib34","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1145\/2939672.2939778","article-title":"\u201cWhy should I trust you?\u201d: Explaining the predictions of any classifier","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13\u201317, 2016","author":"Ribeiro","year":"2016"},{"issue":"5","key":"2025070216070340500_bib35","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","article-title":"Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead","volume":"1","author":"Rudin","year":"2019","journal-title":"Nature Machine Intelligence"},{"key":"2025070216070340500_bib36","unstructured":"Benjamin B.\n              Seiler\n            \n          . 2023. Applications of Cooperative Game Theory to Interpretable Machine Learning. Ph.D. thesis. Stanford University."},{"key":"2025070216070340500_bib37","doi-asserted-by":"publisher","first-page":"13771","DOI":"10.1609\/aaai.v35i15.17623","article-title":"Learning from the best: Rationalizing predictions by adversarial information calibration","volume-title":"Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2\u20139, 2021","author":"Sha","year":"2021"},{"key":"2025070216070340500_bib38","doi-asserted-by":"publisher","first-page":"103828","DOI":"10.1016\/j.artint.2022.103828","article-title":"Rationalizing predictions by adversarial information calibration","volume":"315","author":"Sha","year":"2023","journal-title":"Artificial Intelligence"},{"key":"2025070216070340500_bib39","doi-asserted-by":"publisher","first-page":"12647","DOI":"10.18653\/v1\/2023.acl-long.707","article-title":"Unsupervised selective rationalization with noise injection","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9\u201314, 2023","author":"Storek","year":"2023"},{"key":"2025070216070340500_bib40","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2401.05561","article-title":"Trustllm: Trustworthiness in large language models","author":"Sun","year":"2024","journal-title":"CoRR"},{"key":"2025070216070340500_bib41","article-title":"Intriguing properties of neural networks","volume-title":"2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14\u201316, 2014, Conference Track Proceedings","author":"Szegedy","year":"2014"},{"key":"2025070216070340500_bib42","first-page":"3839","article-title":"Lipschitz regularity of deep neural networks: Analysis and efficient estimation","volume-title":"Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3\u20138, 2018, Montr\u00e9al, Canada","author":"Virmaux","year":"2018"},{"key":"2025070216070340500_bib43","doi-asserted-by":"publisher","first-page":"783","DOI":"10.1145\/1835804.1835903","article-title":"Latent aspect rating analysis on review text data: A rating regression approach","volume-title":"Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25\u201328, 2010","author":"Wang","year":"2010"},{"key":"2025070216070340500_bib44","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume-title":"NeurIPS","author":"Wei","year":"2022"},{"key":"2025070216070340500_bib45","article-title":"Evaluating the robustness of neural networks: An extreme value theory approach","volume-title":"6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30\u2013May 3, 2018, Conference Track Proceedings","author":"Weng","year":"2018"},{"key":"2025070216070340500_bib46","article-title":"Discovering invariant rationales for graph neural networks","volume-title":"The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\u201329, 2022","author":"Yingxin","year":"2022"},{"key":"2025070216070340500_bib47","article-title":"LESS: selecting influential data for targeted instruction tuning","volume-title":"International Conference on Machine Learning, ICML 2024, 21\u201327 July 2024, Vienna, Austria","author":"Xia","year":"2024"},{"key":"2025070216070340500_bib48","article-title":"A comprehensive capability analysis of gpt-3 and gpt-3.5 series models","author":"Ye","year":"2023","journal-title":"arXiv preprint arXiv:2303.10420"},{"key":"2025070216070340500_bib49","doi-asserted-by":"publisher","first-page":"4092","DOI":"10.18653\/v1\/D19-1420","article-title":"Rethinking cooperative rationalization: Introspective extraction and complement control","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3\u20137, 2019","author":"Mo","year":"2019"},{"key":"2025070216070340500_bib50","first-page":"12822","article-title":"Understanding interlocking dynamics of cooperative rationalization","volume-title":"Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6\u201314, 2021, virtual","author":"Mo","year":"2021"},{"issue":"4","key":"2025070216070340500_bib51","doi-asserted-by":"publisher","first-page":"2019","DOI":"10.1109\/TPAMI.2020.3028783","article-title":"Interpreting image classifiers by generating discrete masks","volume":"44","author":"Yuan","year":"2022","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2025070216070340500_bib52","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.700","article-title":"Interventional rationalization","author":"Yue","year":"2023"},{"key":"2025070216070340500_bib53","first-page":"41715","article-title":"Towards trustworthy explanation: On causal rationalization","volume-title":"International Conference on Machine Learning, ICML 2023, 23\u201329 July 2023, Honolulu, Hawaii, USA","author":"Zhang","year":"2023"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00758\/2534936\/tacl_a_00758.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00758\/2534936\/tacl_a_00758.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T20:07:19Z","timestamp":1751486839000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00758\/131564\/Exploring-Practical-Gaps-in-Using-Cross-Entropy-to"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":53,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00758","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025]]},"published":{"date-parts":[[2025]]}}}