{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T20:45:47Z","timestamp":1783111547727,"version":"3.54.6"},"reference-count":52,"publisher":"MIT Press - Journals","license":[{"start":{"date-parts":[[2021,12,23]],"date-time":"2021-12-23T00:00:00Z","timestamp":1640217600000},"content-version":"vor","delay-in-days":356,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,12,17]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>\u26a0 This paper contains prompts and model outputs that are offensive in nature.<\/jats:p>\n               <jats:p>When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: They often generate racist, sexist, violent, or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we first demonstrate a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce. We refer to this capability as self-diagnosis. Based on this finding, we then propose a decoding algorithm that, given only a textual description of the undesired behavior, reduces the probability of a language model producing problematic text. We refer to this approach as self-debiasing. Self-debiasing does not rely on manually curated word lists, nor does it require any training data or changes to the model\u2019s parameters. While we by no means eliminate the issue of language models generating biased text, we believe our approach to be an important step in this direction.1<\/jats:p>","DOI":"10.1162\/tacl_a_00434","type":"journal-article","created":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T05:57:18Z","timestamp":1640325438000},"page":"1408-1424","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":119,"title":["Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP"],"prefix":"10.1162","volume":"9","author":[{"given":"Timo","family":"Schick","sequence":"first","affiliation":[{"name":"Center for Information and Language Processing (CIS), LMU Munich, Germany. schickt@cis.lmu.de"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sahana","family":"Udupa","sequence":"additional","affiliation":[{"name":"Institute of Social and Cultural Anthropology, LMU Munich, Germany. sahana.udupa@lmu.de"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hinrich","family":"Sch\u00fctze","sequence":"additional","affiliation":[{"name":"Center for Information and Language Processing (CIS), LMU Munich, Germany. inquiries@cislmu.org"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","published-online":{"date-parts":[[2021,12,17]]},"reference":[{"key":"2021122316152401600_bib1","doi-asserted-by":"publisher","DOI":"10.1145\/3461702.3462624","article-title":"Persistent anti-Muslim bias in large language models","author":"Abid","year":"2021","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib2","doi-asserted-by":"publisher","first-page":"33","DOI":"10.18653\/v1\/W19-3805","article-title":"Evaluating the underlying gender bias in contextualized word embeddings","volume-title":"Proceedings of the First Workshop on Gender Bias in Natural Language Processing","author":"Basta","year":"2019"},{"key":"2021122316152401600_bib3","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922","article-title":"On the dangers of stochastic parrots: Can language models be too big","volume-title":"Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency; Association for Computing Machinery","author":"Bender","year":"2021"},{"key":"2021122316152401600_bib4","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching word vectors with subword information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021122316152401600_bib5","first-page":"4349","article-title":"Man is to computer programmer as woman is to homemaker? Debiasing word embeddings","volume-title":"Advances in Neural Information Processing Systems 29","author":"Bolukbasi","year":"2016"},{"key":"2021122316152401600_bib6","first-page":"7","article-title":"Identifying and reducing gender bias in word-level language models","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop","author":"Bordia","year":"2019"},{"key":"2021122316152401600_bib7","first-page":"1877","article-title":"Language models are few-shot learners","volume-title":"Advances in Neural Information Processing Systems","author":"Brown","year":"2020"},{"issue":"6334","key":"2021122316152401600_bib8","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1126\/science.aal4230","article-title":"Semantics derived automatically from language corpora contain human-like biases","volume":"356","author":"Caliskan","year":"2017","journal-title":"Science"},{"key":"2021122316152401600_bib9","article-title":"Plug and play language models: A simple approach to controlled text generation","volume-title":"International Conference on Learning Representations","author":"Dathathri","year":"2020"},{"issue":"05","key":"2021122316152401600_bib10","doi-asserted-by":"publisher","first-page":"7659","DOI":"10.1609\/aaai.v34i05.6267","article-title":"On measuring and mitigating biased inferences of word embeddings","volume":"34","author":"Dev","year":"2020","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"2021122316152401600_bib11","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2021122316152401600_bib12","article-title":"Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity","author":"Fedus","year":"2021","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib13","doi-asserted-by":"publisher","first-page":"3356","DOI":"10.18653\/v1\/2020.findings-emnlp.301","article-title":"RealToxicityPrompts: Evaluating neural toxic degeneration in language models","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Gehman","year":"2020"},{"key":"2021122316152401600_bib14","article-title":"OpenWebText corpus","author":"Gokaslan","year":"2019"},{"key":"2021122316152401600_bib15","first-page":"609","article-title":"Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Gonen","year":"2019"},{"key":"2021122316152401600_bib16","doi-asserted-by":"publisher","first-page":"8342","DOI":"10.18653\/v1\/2020.acl-main.740","article-title":"Don\u2019t stop pretraining: Adapt language models to domains and tasks","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Gururangan","year":"2020"},{"key":"2021122316152401600_bib17","article-title":"CTRLsum: Towards generic controllable text summarization","author":"He","year":"2020","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib18","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1162\/tacl_a_00324","article-title":"How can we know what language models know?","volume":"8","author":"Jiang","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021122316152401600_bib19","first-page":"1256","article-title":"Debiasing pre-trained contextualised embeddings","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Kaneko","year":"2021"},{"key":"2021122316152401600_bib20","first-page":"212","article-title":"Dictionary-based debiasing of pre- trained word embeddings","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Kaneko","year":"2021"},{"key":"2021122316152401600_bib21","doi-asserted-by":"publisher","first-page":"7811","DOI":"10.18653\/v1\/2020.acl-main.698","article-title":"Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Kassner","year":"2020"},{"key":"2021122316152401600_bib22","article-title":"CTRL: A conditional transformer language model for controllable generation","author":"Keskar","year":"2019","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib23","first-page":"107","article-title":"Neural interactive translation prediction","volume-title":"Proceedings of the Association for Machine Translation in the Americas","author":"Knowles","year":"2016"},{"key":"2021122316152401600_bib24","article-title":"GeDi: Generative discriminator guided sequence generation","author":"Krause","year":"2020","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib25","doi-asserted-by":"publisher","first-page":"5082","DOI":"10.18653\/v1\/2020.coling-main.446","article-title":"Monolingual and multilingual reduction of gender bias in contextualized representations","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020","author":"Liang","year":"2020"},{"key":"2021122316152401600_bib26","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu","year":"2019","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib27","article-title":"Pointer sentinel mixture models","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings","author":"Merity","year":"2017"},{"key":"2021122316152401600_bib28","article-title":"Efficient estimation of word representations in vector space","author":"Mikolov","year":"2013","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib29","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.416","article-title":"StereoSet: Measuring stereotypical bias in pretrained language models","author":"Nadeem","year":"2020","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib30","doi-asserted-by":"publisher","first-page":"1953","DOI":"10.18653\/v1\/2020.emnlp-main.154","article-title":"CrowS-pairs: A challenge dataset for measuring social biases in masked language models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Nangia","year":"2020"},{"key":"2021122316152401600_bib31","doi-asserted-by":"publisher","first-page":"4296","DOI":"10.18653\/v1\/2020.acl-main.396","article-title":"Toxicity detection: Does context really matter?","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Pavlopoulos","year":"2020"},{"key":"2021122316152401600_bib32","doi-asserted-by":"publisher","first-page":"2227","DOI":"10.18653\/v1\/N18-1202","article-title":"Deep contextualized word representations","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Peters","year":"2018"},{"key":"2021122316152401600_bib33","article-title":"Zero-shot text classification with generative language models","author":"Puri","year":"2019","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib34","article-title":"Improving language understanding by generative pre-training","author":"Radford","year":"2018"},{"key":"2021122316152401600_bib35","unstructured":"Alec\n              Radford\n            , JeffWu, RewonChild, DavidLuan, DarioAmodei, and IlyaSutskever. 2019. Language models are unsupervised multitask learners. Technical report."},{"issue":"140","key":"2021122316152401600_bib36","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2021122316152401600_bib37","doi-asserted-by":"publisher","first-page":"7237","DOI":"10.18653\/v1\/2020.acl-main.647","article-title":"Null it out: Guarding protected attributes by iterative nullspace projection","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Ravfogel","year":"2020"},{"key":"2021122316152401600_bib38","doi-asserted-by":"publisher","first-page":"8","DOI":"10.18653\/v1\/N18-2002","article-title":"Gender bias in coreference resolution","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Rudinger","year":"2018"},{"key":"2021122316152401600_bib39","doi-asserted-by":"publisher","first-page":"2699","DOI":"10.18653\/v1\/2020.acl-main.240","article-title":"Masked language model scoring","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Salazar","year":"2020"},{"key":"2021122316152401600_bib40","article-title":"Few-shot text generation with pattern-exploiting training","author":"Schick","year":"2020","journal-title":"Computing Research Repository"},{"key":"2021122316152401600_bib41","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2021.eacl-main.20","article-title":"Exploiting cloze questions for few shot text classification and natural language inference","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics","author":"Schick","year":"2021"},{"key":"2021122316152401600_bib42","doi-asserted-by":"publisher","first-page":"2339","DOI":"10.18653\/v1\/2021.naacl-main.185","article-title":"It\u2019s not just size that matters: Small language models are also few-shot learners","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Schick","year":"2021"},{"key":"2021122316152401600_bib43","doi-asserted-by":"publisher","first-page":"3407","DOI":"10.18653\/v1\/D19-1339","article-title":"The woman worked as a babysitter: On biases in language generation","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Sheng","year":"2019"},{"key":"2021122316152401600_bib44","doi-asserted-by":"publisher","first-page":"3645","DOI":"10.18653\/v1\/P19-1355","article-title":"Energy and policy considerations for deep learning in NLP","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Strubell","year":"2019"},{"key":"2021122316152401600_bib45","article-title":"Artificial intelligence and the cultural problem of online extreme speech","author":"Udupa","year":"2020","journal-title":"Items, Social Science Research Council"},{"key":"2021122316152401600_bib46","article-title":"AI, extreme speech and the challenges of online content moderation","author":"Udupa","year":"2021"},{"key":"2021122316152401600_bib47","doi-asserted-by":"crossref","first-page":"30","DOI":"10.18653\/v1\/W19-2304","article-title":"BERT has a mouth, and it must speak: BERT as a Markov random field language model","volume-title":"Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation","author":"Wang","year":"2019"},{"key":"2021122316152401600_bib48","doi-asserted-by":"publisher","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"2021122316152401600_bib49","doi-asserted-by":"publisher","first-page":"66","DOI":"10.18653\/v1\/P16-1007","article-title":"Models and inference for prefix-constrained machine translation","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Wuebker","year":"2016"},{"key":"2021122316152401600_bib50","doi-asserted-by":"publisher","first-page":"2941","DOI":"10.18653\/v1\/D17-1323","article-title":"Men also like shopping: Reducing gender bias amplification using corpus-level constraints","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Zhao","year":"2017"},{"key":"2021122316152401600_bib51","doi-asserted-by":"publisher","first-page":"4847","DOI":"10.18653\/v1\/D18-1521","article-title":"Learning gender-neutral word embeddings","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Zhao","year":"2018"},{"key":"2021122316152401600_bib52","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1109\/ICCV.2015.11","article-title":"Aligning books and movies: Towards story-like visual explanations by watching movies and reading books","author":"Zhu","year":"2015","journal-title":"2015 IEEE International Conference on Computer Vision (ICCV)"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00434\/1979270\/tacl_a_00434.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00434\/1979270\/tacl_a_00434.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T05:57:43Z","timestamp":1640325463000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00434\/108865\/Self-Diagnosis-and-Self-Debiasing-A-Proposal-for"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":52,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00434","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}