{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T11:49:53Z","timestamp":1780400993219,"version":"3.54.1"},"reference-count":278,"publisher":"MIT Press","issue":"3","license":[{"start":{"date-parts":[[2024,6,11]],"date-time":"2024-06-11T00:00:00Z","timestamp":1718064000000},"content-version":"vor","delay-in-days":162,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this article, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely, metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.<\/jats:p>","DOI":"10.1162\/coli_a_00524","type":"journal-article","created":{"date-parts":[[2024,6,11]],"date-time":"2024-06-11T18:07:06Z","timestamp":1718129226000},"page":"1097-1179","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":484,"title":["Bias and Fairness in Large Language Models: A Survey"],"prefix":"10.1162","volume":"50","author":[{"given":"Isabel O.","family":"Gallegos","sequence":"first","affiliation":[{"name":"Department of Computer Science, Stanford University. iogalle@stanford.edu"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ryan A.","family":"Rossi","sequence":"additional","affiliation":[{"name":"Adobe Research. ryrossi@adobe.com"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Joe","family":"Barrow","sequence":"additional","affiliation":[{"name":"Pattern Data. joe.barrow@patterndataworks.com"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Md Mehrab","family":"Tanjim","sequence":"additional","affiliation":[{"name":"Adobe Research. tanjim@adobe.com"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sungchul","family":"Kim","sequence":"additional","affiliation":[{"name":"Adobe Research. sukim@adobe.com"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Franck","family":"Dernoncourt","sequence":"additional","affiliation":[{"name":"Adobe Research. dernonco@adobe.com"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tong","family":"Yu","sequence":"additional","affiliation":[{"name":"Adobe Research. tyu@adobe.com"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ruiyi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Adobe Research. ruizhang@adobe.com"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nesreen K.","family":"Ahmed","sequence":"additional","affiliation":[{"name":"Intel Labs. nesreen.k.ahmed@intel.com"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","published-online":{"date-parts":[[2024,9,1]]},"reference":[{"key":"2024092014251390100_bib1","doi-asserted-by":"publisher","first-page":"298","DOI":"10.1145\/3461702.3462624","article-title":"Persistent anti-Muslim bias in large language models","volume-title":"Proceedings of the 2021 AAAI\/ACM Conference on AI, Ethics, and Society","author":"Abid","year":"2021"},{"key":"2024092014251390100_bib2","doi-asserted-by":"publisher","first-page":"266","DOI":"10.18653\/v1\/2022.gebnlp-1.27","article-title":"Why knowledge distillation amplifies gender bias and how to mitigate from the perspective of DistilBERT","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"Ahn","year":"2022"},{"key":"2024092014251390100_bib3","doi-asserted-by":"publisher","first-page":"533","DOI":"10.18653\/v1\/2021.emnlp-main.42","article-title":"Mitigating language-dependent ethnic bias in BERT","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Ahn","year":"2021"},{"key":"2024092014251390100_bib4","doi-asserted-by":"publisher","first-page":"76","DOI":"10.18653\/v1\/2022.gebnlp-1.9","article-title":"Challenges in measuring bias via open-ended language generation","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"Aky\u00fcrek","year":"2022"},{"key":"2024092014251390100_bib5","doi-asserted-by":"publisher","first-page":"4486","DOI":"10.18653\/v1\/2023.acl-long.246","article-title":"Exploiting biased models to de-bias text: A gender-fair rewriting model","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Amrhein","year":"2023"},{"key":"2024092014251390100_bib6","doi-asserted-by":"publisher","first-page":"1105","DOI":"10.18653\/v1\/2022.findings-acl.88","article-title":"Entropy-based attention regularization frees unintended bias mitigation from lists","volume-title":"Findings of the Association for Computational Linguistics: ACL 2022","author":"Attanasio","year":"2022"},{"key":"2024092014251390100_bib7","article-title":"Constitutional AI: Harmlessness from AI feedback","author":"Bai","year":"2022","journal-title":"arXiv preprint arXiv:2212.08073"},{"key":"2024092014251390100_bib8","doi-asserted-by":"publisher","first-page":"1941","DOI":"10.18653\/v1\/2021.acl-long.151","article-title":"RedditBias: A real-world resource for bias evaluation and debiasing of conversational language models","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Barikeri","year":"2021"},{"key":"2024092014251390100_bib9","volume-title":"Fairness and Machine Learning: Limitations and Opportunities","author":"Barocas","year":"2019"},{"key":"2024092014251390100_bib10","first-page":"1","article-title":"Unmasking contextual stereotypes: Measuring and mitigating BERT\u2019s gender bias","volume-title":"Proceedings of the Second Workshop on Gender Bias in Natural Language Processing","author":"Bartl","year":"2020"},{"key":"2024092014251390100_bib11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.4000\/books.aaccademia.3085","article-title":"Hurtlex: A multilingual lexicon of words to hurt","volume-title":"CEUR Workshop Proceedings","author":"Bassignana","year":"2018"},{"issue":"4","key":"2024092014251390100_bib12","doi-asserted-by":"publisher","first-page":"362","DOI":"10.1215\/00031283-75-4-362","article-title":"Racial identification by speech","volume":"75","author":"Baugh","year":"2000","journal-title":"American Speech"},{"key":"2024092014251390100_bib13","article-title":"A typology of ethical risks in language technology with an eye towards where transparent documentation can help","author":"Bender","year":"2019"},{"key":"2024092014251390100_bib14","doi-asserted-by":"publisher","first-page":"587","DOI":"10.1162\/tacl_a_00041","article-title":"Data statements for natural language processing: Toward mitigating system bias and enabling better science","volume":"6","author":"Bender","year":"2018","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024092014251390100_bib15","doi-asserted-by":"publisher","first-page":"610","DOI":"10.1145\/3442188.3445922","article-title":"On the dangers of stochastic parrots: Can language models be too big?","volume-title":"Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency","author":"Bender","year":"2021"},{"key":"2024092014251390100_bib16","volume-title":"Race After Technology: Abolitionist Tools for the New Jim Code","author":"Benjamin","year":"2020"},{"key":"2024092014251390100_bib17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.12840\/issn.2255-4165.017","article-title":"How stereotypes are shared through language: A review and introduction of the social categories and stereotypes communication (SCSC) framework","volume":"7","author":"Beukeboom","year":"2019","journal-title":"Review of Communication Research"},{"key":"2024092014251390100_bib18","first-page":"727","article-title":"Re-contextualizing fairness in NLP: The case of India","volume-title":"Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Bhatt","year":"2022"},{"issue":"2","key":"2024092014251390100_bib19","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2021.100205","article-title":"Algorithmic injustice: A relational ethics approach","volume":"2","author":"Birhane","year":"2021","journal-title":"Patterns"},{"key":"2024092014251390100_bib20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3551624.3555290","article-title":"Power to the people? Opportunities and challenges for participatory AI","author":"Birhane","year":"2022","journal-title":"Equity and Access in Algorithms, Mechanisms, and Optimization"},{"key":"2024092014251390100_bib21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3617694.3623259","article-title":"Toward operationalizing pipeline-aware ML fairness: A research agenda for developing practical guidelines and tools","volume-title":"Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization","author":"Black","year":"2023"},{"key":"2024092014251390100_bib22","unstructured":"Blodgett, Su Lin\n          . 2021. Sociolinguistically Driven Approaches for Just Natural Language Processing. Ph.D. thesis. University of Massachusetts Amherst."},{"key":"2024092014251390100_bib23","doi-asserted-by":"publisher","first-page":"5454","DOI":"10.18653\/v1\/2020.acl-main.485","article-title":"Language (technology) is power: A critical survey of \u201cbias\u201d in NLP","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Blodgett","year":"2020"},{"key":"2024092014251390100_bib24","doi-asserted-by":"publisher","first-page":"1004","DOI":"10.18653\/v1\/2021.acl-long.81","article-title":"Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Blodgett","year":"2021"},{"key":"2024092014251390100_bib25","article-title":"Racial disparity in natural language processing: A case study of social media African-American English","author":"Blodgett","year":"2017","journal-title":"arXiv preprint arXiv:1707.00061"},{"key":"2024092014251390100_bib26","first-page":"4356","article-title":"Man is to computer programmer as woman is to homemaker? Debiasing word embeddings","volume":"29","author":"Bolukbasi","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib27","article-title":"On the opportunities and risks of foundation models","author":"Bommasani","year":"2021","journal-title":"arXiv preprint arXiv:2108.07258"},{"key":"2024092014251390100_bib28","doi-asserted-by":"publisher","first-page":"212","DOI":"10.18653\/v1\/2022.gebnlp-1.22","article-title":"Looking for a handsome carpenter! Debiasing GPT-3 job advertisements","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"Borchers","year":"2022"},{"key":"2024092014251390100_bib29","doi-asserted-by":"publisher","first-page":"7","DOI":"10.18653\/v1\/N19-3002","article-title":"Identifying and reducing gender bias in word-level language models","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop","author":"Bordia","year":"2019"},{"key":"2024092014251390100_bib30","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib31","doi-asserted-by":"publisher","DOI":"10.48558\/9SEV-4D26","article-title":"Disrupting the gospel of tech solutionism to build tech justice","volume-title":"Stanford Social Innovation Review","author":"Byrum","year":"2022"},{"key":"2024092014251390100_bib32","doi-asserted-by":"publisher","first-page":"370","DOI":"10.1145\/3593013.3594004","article-title":"On the independence of association bias and empirical fairness in language models","volume-title":"Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency","author":"Cabello","year":"2023"},{"issue":"6334","key":"2024092014251390100_bib33","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1126\/science.aal4230","article-title":"Semantics derived automatically from language corpora contain human-like biases","volume":"356","author":"Caliskan","year":"2017","journal-title":"Science"},{"key":"2024092014251390100_bib34","doi-asserted-by":"publisher","first-page":"561","DOI":"10.18653\/v1\/2022.acl-short.62","article-title":"On the intrinsic and extrinsic fairness evaluation metrics for contextualized language representations","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Cao","year":"2022"},{"key":"2024092014251390100_bib35","doi-asserted-by":"publisher","first-page":"1276","DOI":"10.18653\/v1\/2022.naacl-main.92","article-title":"Theory-grounded measurement of U.S. social stereotypes in English language models","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Cao","year":"2022"},{"key":"2024092014251390100_bib36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/S17-2001","article-title":"SemEval-2017 Task 1: Semantic textual similarity multilingual and crosslingual focused evaluation","volume-title":"Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)","author":"Cer","year":"2017"},{"key":"2024092014251390100_bib37","article-title":"A survey on evaluation of large language models","author":"Chang","year":"2023","journal-title":"arXiv preprint arXiv:2307.03109"},{"key":"2024092014251390100_bib38","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.84","article-title":"Marked personas: Using natural language prompts to measure stereotypes in language models","author":"Cheng","year":"2023","journal-title":"arXiv preprint arXiv:2305.18189"},{"key":"2024092014251390100_bib39","article-title":"FairFil: Contrastive neural debiasing method for pretrained text encoders","volume-title":"International Conference on Learning Representations","author":"Cheng","year":"2021"},{"issue":"2","key":"2024092014251390100_bib40","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1089\/big.2016.0047","article-title":"Fair prediction with disparate impact: A study of bias in recidivism prediction instruments","volume":"5","author":"Chouldechova","year":"2017","journal-title":"Big Data"},{"key":"2024092014251390100_bib41","article-title":"PaLM: Scaling language modeling with pathways","author":"Chowdhery","year":"2022","journal-title":"arXiv preprint arXiv:2204.02311"},{"key":"2024092014251390100_bib42","article-title":"Scaling instruction-finetuned language models","author":"Chung","year":"2022","journal-title":"arXiv preprint arXiv:2210.11416"},{"key":"2024092014251390100_bib43","doi-asserted-by":"publisher","first-page":"575","DOI":"10.18653\/v1\/2023.acl-long.34","article-title":"Increasing diversity while maintaining accuracy: Text data generation with large language models and human interventions","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Chung","year":"2023"},{"key":"2024092014251390100_bib44","doi-asserted-by":"publisher","first-page":"6539","DOI":"10.18653\/v1\/2021.acl-long.511","article-title":"A novel estimator of mutual information for learning to disentangle textual representations","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Colombo","year":"2021"},{"key":"2024092014251390100_bib45","doi-asserted-by":"publisher","first-page":"8440","DOI":"10.18653\/v1\/2020.acl-main.747","article-title":"Unsupervised cross-lingual representation learning at scale","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Conneau","year":"2020"},{"key":"2024092014251390100_bib46","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1146\/annurev-linguistics-011718-011659","article-title":"Language and discrimination: Generating meaning, perceiving identities, and discriminating outcomes","volume":"6","author":"Craft","year":"2020","journal-title":"Annual Review of Linguistics"},{"key":"2024092014251390100_bib47","article-title":"The trouble with bias","author":"Crawford","year":"2017"},{"key":"2024092014251390100_bib48","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3313831.3376488","article-title":"Detecting gender stereotypes: Lexicon vs. supervised learning methods","volume-title":"Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems","author":"Cryan","year":"2020"},{"key":"2024092014251390100_bib49","doi-asserted-by":"publisher","first-page":"1249","DOI":"10.1162\/tacl_a_00425","article-title":"Quantifying social biases in NLP: A generalization and empirical comparison of extrinsic fairness metrics","volume":"9","author":"Czarnowska","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024092014251390100_bib50","article-title":"Plug and play language models: A simple approach to controlled text generation","author":"Dathathri","year":"2019","journal-title":"arXiv preprint arXiv:1912.02164"},{"key":"2024092014251390100_bib51","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1162\/tacl_a_00449","article-title":"Dealing with disagreements: Looking beyond the majority vote in subjective annotations","volume":"10","author":"Davani","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024092014251390100_bib52","doi-asserted-by":"publisher","first-page":"638","DOI":"10.1007\/978-3-031-26390-3_37","article-title":"FairDistillation: Mitigating stereotyping in language models","volume-title":"Joint European Conference on Machine Learning and Knowledge Discovery in Databases","author":"Delobelle","year":"2022"},{"key":"2024092014251390100_bib53","doi-asserted-by":"publisher","first-page":"1693","DOI":"10.18653\/v1\/2022.naacl-main.122","article-title":"Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Delobelle","year":"2022"},{"key":"2024092014251390100_bib54","article-title":"Whose ground truth? Accounting for individual and collective identities underlying dataset annotation","author":"Denton","year":"2021","journal-title":"arXiv preprint arXiv:2112.04554"},{"key":"2024092014251390100_bib55","article-title":"Bringing the people back in: Contesting benchmark machine learning datasets","author":"Denton","year":"2020","journal-title":"arXiv preprint arXiv:2007.07399"},{"key":"2024092014251390100_bib56","doi-asserted-by":"publisher","first-page":"7659","DOI":"10.1609\/aaai.v34i05.6267","article-title":"On measuring and mitigating biased inferences of word embeddings","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Dev","year":"2020"},{"key":"2024092014251390100_bib57","doi-asserted-by":"publisher","first-page":"5034","DOI":"10.18653\/v1\/2021.emnlp-main.411","article-title":"OSCaR: Orthogonal subspace correction and rectification of biases in word embeddings","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Dev","year":"2021"},{"key":"2024092014251390100_bib58","doi-asserted-by":"publisher","first-page":"2083","DOI":"10.1145\/3531146.3534627","article-title":"Theories of \u201dgender\u201d in NLP bias research","volume-title":"Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency","author":"Devinney","year":"2022"},{"key":"2024092014251390100_bib59","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2024092014251390100_bib60","doi-asserted-by":"publisher","first-page":"862","DOI":"10.1145\/3442188.3445924","article-title":"BOLD: Dataset and metrics for measuring biases in open-ended language generation","volume-title":"Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency","author":"Dhamala","year":"2021"},{"key":"2024092014251390100_bib61","article-title":"Queer people are people first: Deconstructing sexual identity stereotypes in large language models","author":"Dhingra","year":"2023","journal-title":"arXiv preprint arXiv:2307.00101"},{"key":"2024092014251390100_bib62","doi-asserted-by":"publisher","first-page":"8173","DOI":"10.18653\/v1\/2020.emnlp-main.656","article-title":"Queens are powerful too: Mitigating gender bias in dialogue generation","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Dinan","year":"2020"},{"key":"2024092014251390100_bib63","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1145\/3278721.3278729","article-title":"Measuring and mitigating unintended bias in text classification","volume-title":"Proceedings of the 2018 AAAI\/ACM Conference on AI, Ethics, and Society","author":"Dixon","year":"2018"},{"key":"2024092014251390100_bib64","doi-asserted-by":"publisher","first-page":"1286","DOI":"10.18653\/v1\/2021.emnlp-main.98","article-title":"Documenting large webtext corpora: A case study on the colossal clean crawled corpus","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Dodge","year":"2021"},{"key":"2024092014251390100_bib65","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s41019-023-00211-0","article-title":"Improving gender-related fairness in sentence encoders: A semantics-based approach","author":"Dolci","year":"2023","journal-title":"Data Science and Engineering"},{"key":"2024092014251390100_bib66","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1145\/2090236.2090255","article-title":"Fairness through awareness","volume-title":"Proceedings of the 3rd Innovations in Theoretical Computer Science Conference","author":"Dwork","year":"2012"},{"key":"2024092014251390100_bib67","doi-asserted-by":"publisher","first-page":"1249","DOI":"10.18653\/v1\/2023.acl-short.108","article-title":"Improving gender fairness of pre-trained language models without catastrophic forgetting","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Fatemi","year":"2023"},{"key":"2024092014251390100_bib68","doi-asserted-by":"publisher","first-page":"9126","DOI":"10.18653\/v1\/2023.acl-long.507","article-title":"WinoQueer: A community-in-the-loop benchmark for anti-LGBTQ+ bias in large language models","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Felkner","year":"2023"},{"key":"2024092014251390100_bib69","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.4627814","article-title":"Should ChatGPT be biased? Challenges and risks of bias in large language models","author":"Ferrara","year":"2023","journal-title":"arXiv preprint arXiv:2304.03738"},{"key":"2024092014251390100_bib70","doi-asserted-by":"publisher","first-page":"6715","DOI":"10.18653\/v1\/2023.emnlp-main.415","article-title":"When the majority is wrong: Modeling annotator disagreement for subjective tasks","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing","author":"Fleisig","year":"2023"},{"key":"2024092014251390100_bib71","doi-asserted-by":"publisher","first-page":"6231","DOI":"10.18653\/v1\/2023.acl-long.343","article-title":"FairPrism: Evaluating fairness-related harms in text generation","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Fleisig","year":"2023"},{"key":"2024092014251390100_bib72","doi-asserted-by":"publisher","first-page":"653","DOI":"10.18653\/v1\/2020.emnlp-main.48","article-title":"Social chemistry 101: Learning to reason about social and moral norms","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Forbes","year":"2020"},{"issue":"4","key":"2024092014251390100_bib73","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1145\/3433949","article-title":"The (im)possibility of fairness: Different value systems require different mechanisms for fair decision making","volume":"64","author":"Friedler","year":"2021","journal-title":"Communications of the ACM"},{"key":"2024092014251390100_bib74","doi-asserted-by":"publisher","first-page":"9582","DOI":"10.18653\/v1\/2022.emnlp-main.651","article-title":"Debiasing pretrained text encoders by paying attention to paying attention","volume-title":"2022 Conference on Empirical Methods in Natural Language Processing","author":"Gaci","year":"2022"},{"issue":"16","key":"2024092014251390100_bib75","doi-asserted-by":"publisher","first-page":"E3635\u2013E3644","DOI":"10.1073\/pnas.1720347115","article-title":"Word embeddings quantify 100 years of gender and ethnic stereotypes","volume":"115","author":"Garg","year":"2018","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"2024092014251390100_bib76","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1145\/3306618.3317950","article-title":"Counterfactual fairness in text classification through robustness","volume-title":"Proceedings of the 2019 AAAI\/ACM Conference on AI, Ethics, and Society","author":"Garg","year":"2019"},{"key":"2024092014251390100_bib77","doi-asserted-by":"publisher","first-page":"4534","DOI":"10.18653\/v1\/2021.findings-acl.397","article-title":"He is very intelligent, she is very beautiful? On mitigating social biases in language modelling and generation","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Garimella","year":"2021"},{"key":"2024092014251390100_bib78","first-page":"311","article-title":"Demographic-aware language model fine-tuning as a bias mitigation technique","volume-title":"Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing","author":"Garimella","year":"2022"},{"issue":"12","key":"2024092014251390100_bib79","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1145\/3458723","article-title":"Datasheets for datasets","volume":"64","author":"Gebru","year":"2021","journal-title":"Communications of the ACM"},{"key":"2024092014251390100_bib80","doi-asserted-by":"publisher","first-page":"3356","DOI":"10.18653\/v1\/2020.findings-emnlp.301","article-title":"RealToxicityPrompts: Evaluating neural toxic degeneration in language models","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Gehman","year":"2020"},{"key":"2024092014251390100_bib81","doi-asserted-by":"publisher","first-page":"96","DOI":"10.18653\/v1\/2021.gem-1.10","article-title":"The GEM benchmark: Natural language generation, its evaluation and metrics","volume-title":"Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)","author":"Gehrmann","year":"2021"},{"key":"2024092014251390100_bib82","doi-asserted-by":"publisher","first-page":"5448","DOI":"10.18653\/v1\/2023.findings-acl.336","article-title":"Gender-tuning: Empowering fine-tuning for debiasing pre-trained language models","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Ghanbarzadeh","year":"2023"},{"key":"2024092014251390100_bib83","doi-asserted-by":"publisher","first-page":"59","DOI":"10.18653\/v1\/2022.ltedi-1.8","article-title":"Debiasing pre-trained language models via efficient fine-tuning","volume-title":"Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion","author":"Gira","year":"2022"},{"key":"2024092014251390100_bib84","article-title":"NLP systems that can\u2019t tell use from mention censor counterspeech, but teaching the distinction helps","author":"Gligoric","year":"2024","journal-title":"arXiv preprint arXiv:2404.01651"},{"key":"2024092014251390100_bib85","doi-asserted-by":"publisher","first-page":"1926","DOI":"10.18653\/v1\/2021.acl-long.150","article-title":"Intrinsic bias metrics do not correlate with application bias","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Goldfarb-Tarrant","year":"2021"},{"key":"2024092014251390100_bib86","doi-asserted-by":"publisher","first-page":"60","DOI":"10.18653\/v1\/N19-1061","article-title":"Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them","volume-title":"Proceedings of the 2019 Workshop on Widening NLP","author":"Gonen","year":"2019"},{"key":"2024092014251390100_bib87","first-page":"1","article-title":"\u201dGood\u201d isn\u2019t good enough","volume-title":"Proceedings of the AI for Social Good Workshop at NeurIPS","author":"Green","year":"2019"},{"issue":"6","key":"2024092014251390100_bib88","doi-asserted-by":"publisher","first-page":"1464","DOI":"10.1037\/0022-3514.74.6.1464","article-title":"Measuring individual differences in implicit cognition: The implicit association test","volume":"74","author":"Greenwald","year":"1998","journal-title":"Journal of Personality and Social Psychology"},{"issue":"2","key":"2024092014251390100_bib89","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1145\/2422509.2422511","article-title":"Moral responsibility for computing artifacts: \u201cThe rules\u201d and issues of trust","volume":"42","author":"Grodzinsky","year":"2012","journal-title":"SIGCAS Computers & Society"},{"key":"2024092014251390100_bib90","doi-asserted-by":"publisher","first-page":"4884","DOI":"10.18653\/v1\/2021.acl-long.378","article-title":"Parameter-efficient transfer learning with diff pruning","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Guo","year":"2021"},{"key":"2024092014251390100_bib91","doi-asserted-by":"publisher","first-page":"122","DOI":"10.1145\/3461702.3462536","article-title":"Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases","volume-title":"Proceedings of the 2021 AAAI\/ACM Conference on AI, Ethics, and Society","author":"Guo","year":"2021"},{"key":"2024092014251390100_bib92","doi-asserted-by":"publisher","first-page":"1012","DOI":"10.18653\/v1\/2022.acl-long.72","article-title":"Auto-debias: Debiasing masked language models with automated biased prompts","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Guo","year":"2022"},{"key":"2024092014251390100_bib93","doi-asserted-by":"publisher","first-page":"658","DOI":"10.18653\/v1\/2022.findings-acl.55","article-title":"Mitigating gender bias in distilled language models via counterfactual role reversal","volume-title":"Findings of the Association for Computational Linguistics: ACL 2022","author":"Gupta","year":"2022"},{"key":"2024092014251390100_bib94","article-title":"Survey on sociodemographic bias in natural language processing","author":"Gupta","year":"2023","journal-title":"arXiv preprint arXiv:2306.08158"},{"key":"2024092014251390100_bib95","doi-asserted-by":"publisher","first-page":"5267","DOI":"10.18653\/v1\/D19-1530","article-title":"It\u2019s all in the name: Mitigating gender bias with name-based counterfactual data substitution","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Hall Maudslay","year":"2019"},{"key":"2024092014251390100_bib96","doi-asserted-by":"publisher","first-page":"228","DOI":"10.18653\/v1\/2023.acl-short.21","article-title":"Detoxifying text with MaRCo: Controllable revision with experts and anti-experts","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Hallinan","year":"2023"},{"key":"2024092014251390100_bib97","doi-asserted-by":"publisher","first-page":"471","DOI":"10.18653\/v1\/2021.findings-acl.41","article-title":"Decoupling adversarial training for fair NLP","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Han","year":"2021"},{"key":"2024092014251390100_bib98","doi-asserted-by":"publisher","first-page":"2760","DOI":"10.18653\/v1\/2021.eacl-main.239","article-title":"Diverse adversaries for mitigating bias in training","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Han","year":"2021"},{"key":"2024092014251390100_bib99","doi-asserted-by":"publisher","first-page":"11335","DOI":"10.18653\/v1\/2022.emnlp-main.779","article-title":"Balancing out bias: Achieving fairness through balanced training","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Han","year":"2022"},{"key":"2024092014251390100_bib100","article-title":"Towards equal opportunity fairness through adversarial learning","author":"Han","year":"2022","journal-title":"arXiv preprint arXiv:2203.06317"},{"key":"2024092014251390100_bib101","doi-asserted-by":"publisher","first-page":"297","DOI":"10.18653\/v1\/2023.eacl-main.23","article-title":"Fair enough: Standardizing evaluation and model selection for fairness research in NLP","volume-title":"Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics","author":"Han","year":"2023"},{"key":"2024092014251390100_bib102","doi-asserted-by":"publisher","first-page":"501","DOI":"10.1145\/3351095.3372826","article-title":"Towards a critical race methodology in algorithmic fairness","volume-title":"Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency","author":"Hanna","year":"2020"},{"key":"2024092014251390100_bib103","first-page":"3323","article-title":"Equality of opportunity in supervised learning","volume":"29","author":"Hardt","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib104","article-title":"Pruning for protection: Increasing jailbreak resistance in aligned LLMs without fine-tuning","author":"Hasan","year":"2024","journal-title":"arXiv preprint arXiv:2401.10862"},{"key":"2024092014251390100_bib105","doi-asserted-by":"publisher","first-page":"6192","DOI":"10.18653\/v1\/2023.findings-acl.386","article-title":"Modular and on-demand bias mitigation with attribute-removal subnetworks","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Hauzenberger","year":"2023"},{"key":"2024092014251390100_bib106","doi-asserted-by":"publisher","first-page":"9681","DOI":"10.18653\/v1\/2022.emnlp-main.657","article-title":"MABEL: Attenuating gender bias using textual entailment data","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"He","year":"2022"},{"key":"2024092014251390100_bib107","doi-asserted-by":"publisher","first-page":"4173","DOI":"10.18653\/v1\/2021.findings-emnlp.352","article-title":"Detect and perturb: Neutral rewriting of biased and sensitive text via gradient-based decoding","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2021","author":"He","year":"2021"},{"key":"2024092014251390100_bib108","doi-asserted-by":"publisher","first-page":"5854","DOI":"10.18653\/v1\/2022.findings-emnlp.431","article-title":"Controlling bias exposure for fair interpretable predictions","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2022","author":"He","year":"2022"},{"key":"2024092014251390100_bib109","first-page":"1939","article-title":"Multicalibration: Calibration for the (computationally-identifiable) masses","volume-title":"International Conference on Machine Learning","author":"H\u00e9bert-Johnson","year":"2018"},{"key":"2024092014251390100_bib110","first-page":"2790","article-title":"Parameter-efficient transfer learning for NLP","volume-title":"International Conference on Machine Learning","author":"Houlsby","year":"2019"},{"key":"2024092014251390100_bib111","doi-asserted-by":"publisher","first-page":"65","DOI":"10.18653\/v1\/2020.findings-emnlp.7","article-title":"Reducing sentiment bias in language models via counterfactual evaluation","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Huang","year":"2020"},{"key":"2024092014251390100_bib112","article-title":"TrustGPT: A benchmark for trustworthy and responsible large language models","author":"Huang","year":"2023","journal-title":"arXiv preprint arXiv:2306.11507"},{"key":"2024092014251390100_bib113","doi-asserted-by":"publisher","first-page":"5491","DOI":"10.18653\/v1\/2020.acl-main.487","article-title":"Social biases in NLP models as barriers for persons with disabilities","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Hutchinson","year":"2020"},{"key":"2024092014251390100_bib114","doi-asserted-by":"publisher","first-page":"5961","DOI":"10.18653\/v1\/2023.findings-acl.369","article-title":"Shielded representations: Protecting sensitive attributes through iterative gradient-based projection","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Iskander","year":"2023"},{"key":"2024092014251390100_bib115","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1145\/3442188.3445901","article-title":"Measurement and fairness","volume-title":"Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency","author":"Jacobs","year":"2021"},{"key":"2024092014251390100_bib116","doi-asserted-by":"publisher","first-page":"93","DOI":"10.18653\/v1\/2021.gebnlp-1.11","article-title":"Generating gender augmented data for NLP","volume-title":"Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing","author":"Jain","year":"2021"},{"key":"2024092014251390100_bib117","doi-asserted-by":"publisher","first-page":"255","DOI":"10.18653\/v1\/2022.gebnlp-1.26","article-title":"What changed? Investigating debiasing methods using causal mediation analysis","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"Jeoung","year":"2022"},{"key":"2024092014251390100_bib118","doi-asserted-by":"publisher","first-page":"2206","DOI":"10.1145\/3531146.3534637","article-title":"Data governance in the age of large-scale data-driven language technology","volume-title":"Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency","author":"Jernite","year":"2022"},{"key":"2024092014251390100_bib119","doi-asserted-by":"publisher","first-page":"2936","DOI":"10.18653\/v1\/2020.acl-main.264","article-title":"Mitigating gender bias amplification in distribution by posterior regularization","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Jia","year":"2020"},{"key":"2024092014251390100_bib120","doi-asserted-by":"publisher","first-page":"3770","DOI":"10.18653\/v1\/2021.naacl-main.296","article-title":"On transferability of bias mitigation effects in language model fine-tuning","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Jin","year":"2021"},{"key":"2024092014251390100_bib121","doi-asserted-by":"publisher","first-page":"67","DOI":"10.18653\/v1\/2022.gebnlp-1.6","article-title":"Gender biases and where to find them: Exploring gender bias in pre-trained transformer-based language models using movement pruning","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"Joniak","year":"2022"},{"issue":"7815","key":"2024092014251390100_bib122","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1038\/d41586-020-02003-2","article-title":"Don\u2019t ask if artificial intelligence is good or fair, ask how it shifts power","volume":"583","author":"Kalluri","year":"2020","journal-title":"Nature"},{"issue":"1","key":"2024092014251390100_bib123","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10115-011-0463-8","article-title":"Data preprocessing techniques for classification without discrimination","volume":"33","author":"Kamiran","year":"2012","journal-title":"Knowledge and Information Systems"},{"key":"2024092014251390100_bib124","doi-asserted-by":"publisher","first-page":"1256","DOI":"10.18653\/v1\/2021.eacl-main.107","article-title":"Debiasing pre-trained contextualised embeddings","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Kaneko","year":"2021"},{"key":"2024092014251390100_bib125","doi-asserted-by":"publisher","first-page":"11954","DOI":"10.1609\/aaai.v36i11.21453","article-title":"Unmasking the mask\u2013evaluating social biases in masked language models","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Kaneko","year":"2022"},{"key":"2024092014251390100_bib126","first-page":"1299","article-title":"Debiasing isn\u2019t enough! \u2013 On the effectiveness of debiasing MLMs and their social biases in downstream tasks","volume-title":"Proceedings of the 29th International Conference on Computational Linguistics","author":"Kaneko","year":"2022"},{"key":"2024092014251390100_bib127","first-page":"2564","article-title":"Preventing fairness gerrymandering: Auditing and learning for subgroup fairness","volume-title":"International Conference on Machine Learning","author":"Kearns","year":"2018"},{"key":"2024092014251390100_bib128","article-title":"Learn what not to learn: Towards generative safety in chatbots","author":"Khalatbari","year":"2023","journal-title":"arXiv preprint arXiv:2304.11220"},{"key":"2024092014251390100_bib129","doi-asserted-by":"publisher","first-page":"4110","DOI":"10.18653\/v1\/2021.naacl-main.324","article-title":"Dynabench: Rethinking benchmarking in NLP","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Kiela","year":"2021"},{"key":"2024092014251390100_bib130","doi-asserted-by":"publisher","first-page":"4005","DOI":"10.18653\/v1\/2022.emnlp-main.267","article-title":"ProsocialDialog: A prosocial backbone for conversational agents","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Kim","year":"2022"},{"key":"2024092014251390100_bib131","doi-asserted-by":"publisher","first-page":"4598","DOI":"10.18653\/v1\/2023.findings-acl.281","article-title":"Critic-guided decoding for controlled text generation","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Kim","year":"2023"},{"key":"2024092014251390100_bib132","doi-asserted-by":"publisher","first-page":"43","DOI":"10.18653\/v1\/S18-2005","article-title":"Examining gender and race bias in two hundred sentiment analysis systems","volume-title":"Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics","author":"Kiritchenko","year":"2018"},{"issue":"13","key":"2024092014251390100_bib133","doi-asserted-by":"publisher","first-page":"3521","DOI":"10.1073\/pnas.1611835114","article-title":"Overcoming catastrophic forgetting in neural networks","volume":"114","author":"Kirkpatrick","year":"2017","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"2024092014251390100_bib134","first-page":"22199","article-title":"Large language models are zero-shot reasoners","volume":"35","author":"Kojima","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib135","doi-asserted-by":"publisher","first-page":"4929","DOI":"10.18653\/v1\/2021.findings-emnlp.424","article-title":"GeDi: Generative discriminator guided sequence generation","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2021","author":"Krause","year":"2021"},{"key":"2024092014251390100_bib136","doi-asserted-by":"publisher","first-page":"444","DOI":"10.1145\/3576840.3578295","article-title":"Grep-BiasIR: A dataset for investigating gender representation bias in information retrieval results","volume-title":"Proceedings of the 2023 Conference on Human Information Interaction and Retrieval","author":"Krieg","year":"2023"},{"key":"2024092014251390100_bib137","doi-asserted-by":"publisher","first-page":"2738","DOI":"10.18653\/v1\/2023.eacl-main.201","article-title":"Parameter-efficient modularised bias mitigation via AdapterFusion","volume-title":"Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics","author":"Kumar","year":"2023"},{"key":"2024092014251390100_bib138","doi-asserted-by":"publisher","first-page":"3299","DOI":"10.18653\/v1\/2023.eacl-main.241","article-title":"Language generation models can cause harm: So what can we do about it? An actionable survey","volume-title":"Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics","author":"Kumar","year":"2023"},{"key":"2024092014251390100_bib139","doi-asserted-by":"publisher","first-page":"166","DOI":"10.18653\/v1\/W19-3823","article-title":"Measuring bias in contextualized word representations","volume-title":"Proceedings of the First Workshop on Gender Bias in Natural Language Processing","author":"Kurita","year":"2019"},{"key":"2024092014251390100_bib140","doi-asserted-by":"publisher","first-page":"4782","DOI":"10.18653\/v1\/2021.findings-emnlp.411","article-title":"Sustainable modular debiasing of language models","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2021","author":"Lauscher","year":"2021"},{"key":"2024092014251390100_bib141","doi-asserted-by":"publisher","first-page":"695","DOI":"10.1145\/3461702.3462598","article-title":"Ethical data curation for AI: An approach based on feminist epistemology and critical theories of race","volume-title":"Proceedings of the 2021 AAAI\/ACM Conference on AI, Ethics, and Society","author":"Leavy","year":"2021"},{"key":"2024092014251390100_bib142","doi-asserted-by":"publisher","first-page":"3045","DOI":"10.18653\/v1\/2021.emnlp-main.243","article-title":"The power of scale for parameter-efficient prompt tuning","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Lester","year":"2021"},{"key":"2024092014251390100_bib143","first-page":"552","article-title":"The Winograd schema challenge","volume-title":"Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning","author":"Levesque","year":"2012"},{"key":"2024092014251390100_bib144","doi-asserted-by":"publisher","first-page":"2470","DOI":"10.18653\/v1\/2021.findings-emnlp.211","article-title":"Collecting a large-scale gender bias dataset for coreference resolution and machine translation","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2021","author":"Levy","year":"2021"},{"key":"2024092014251390100_bib145","doi-asserted-by":"publisher","first-page":"7871","DOI":"10.18653\/v1\/2020.acl-main.703","article-title":"BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Lewis","year":"2020"},{"key":"2024092014251390100_bib146","doi-asserted-by":"publisher","first-page":"3475","DOI":"10.18653\/v1\/2020.findings-emnlp.311","article-title":"UNQOVERing stereotyping biases via underspecified questions","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Li","year":"2020"},{"key":"2024092014251390100_bib147","doi-asserted-by":"publisher","first-page":"4582","DOI":"10.18653\/v1\/2021.acl-long.353","article-title":"Prefix-tuning: Optimizing continuous prompts for generation","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Li","year":"2021"},{"key":"2024092014251390100_bib148","doi-asserted-by":"publisher","first-page":"14254","DOI":"10.18653\/v1\/2023.acl-long.797","article-title":"Prompt tuning pushes farther, contrastive learning pulls closer: A two-stage approach to mitigate social biases","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Li","year":"2023"},{"key":"2024092014251390100_bib149","article-title":"Fairness of ChatGPT","author":"Li","year":"2023","journal-title":"arXiv preprint arXiv:2305.18569"},{"key":"2024092014251390100_bib150","doi-asserted-by":"publisher","first-page":"5502","DOI":"10.18653\/v1\/2020.acl-main.488","article-title":"Towards debiasing sentence representations","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Liang","year":"2020"},{"key":"2024092014251390100_bib151","first-page":"6565","article-title":"Towards understanding and mitigating social biases in language models","volume-title":"International Conference on Machine Learning","author":"Liang","year":"2021"},{"key":"2024092014251390100_bib152","article-title":"Holistic evaluation of language models","author":"Liang","year":"2022","journal-title":"arXiv preprint arXiv:2211.09110"},{"key":"2024092014251390100_bib153","doi-asserted-by":"publisher","first-page":"17","DOI":"10.18653\/v1\/2022.gebnlp-1.3","article-title":"Don\u2019t forget about pronouns: Removing gender bias in language models without losing factual gender information","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"Limisiewicz","year":"2022"},{"key":"2024092014251390100_bib154","doi-asserted-by":"publisher","first-page":"6691","DOI":"10.18653\/v1\/2021.acl-long.522","article-title":"DExperts: Decoding-time controlled text generation with experts and anti-experts","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Liu","year":"2021"},{"key":"2024092014251390100_bib155","doi-asserted-by":"publisher","first-page":"4403","DOI":"10.18653\/v1\/2020.coling-main.390","article-title":"Does gender matter? Towards fairness in dialogue systems","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Liu","year":"2020"},{"issue":"9","key":"2024092014251390100_bib156","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3560815","article-title":"Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing","volume":"55","author":"Liu","year":"2023","journal-title":"ACM Computing Surveys"},{"key":"2024092014251390100_bib157","doi-asserted-by":"publisher","first-page":"14857","DOI":"10.1609\/aaai.v35i17.17744","article-title":"Mitigating political bias in language models through reinforced calibration","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Liu","year":"2021"},{"key":"2024092014251390100_bib158","article-title":"GPT understands, too","author":"Liu","year":"2021","journal-title":"arXiv preprint arXiv:2103.10385"},{"key":"2024092014251390100_bib159","doi-asserted-by":"publisher","first-page":"186","DOI":"10.18653\/v1\/2023.acl-short.18","article-title":"BOLT: Fast energy-based controlled text generation with tunable biases","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Liu","year":"2023"},{"key":"2024092014251390100_bib160","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu","year":"2019","journal-title":"arXiv preprint arXiv:1907.11692"},{"key":"2024092014251390100_bib161","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1075\/impact.39.06lou","volume-title":"Implicit attitudes and the perception of sociolinguistic variation","author":"Loudermilk","year":"2015"},{"key":"2024092014251390100_bib162","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1007\/978-3-030-62077-6_14","article-title":"Gender bias in neural natural language processing","author":"Lu","year":"2020","journal-title":"Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion of His 65th Birthday"},{"key":"2024092014251390100_bib163","first-page":"27591","article-title":"Quark: Controllable text generation with reinforced unlearning","volume":"35","author":"Lu","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib164","doi-asserted-by":"publisher","first-page":"4288","DOI":"10.18653\/v1\/2021.naacl-main.339","article-title":"NeuroLogic decoding: (Un)supervised neural text generation with predicate logic constraints","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Lu","year":"2021"},{"key":"2024092014251390100_bib165","first-page":"4768","article-title":"A unified approach to interpreting model predictions","volume":"30","author":"Lundberg","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib166","doi-asserted-by":"publisher","first-page":"7426","DOI":"10.18653\/v1\/2020.emnlp-main.602","article-title":"PowerTransformer: Unsupervised controllable revision for biased language correction","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Ma","year":"2020"},{"key":"2024092014251390100_bib167","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1016\/S0065-2601(08)60272-5","article-title":"Linguistic intergroup bias: Stereotype perpetuation through language","volume":"31","author":"Maass","year":"1999","journal-title":"Advances in Experimental Social Psychology"},{"key":"2024092014251390100_bib168","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.589","article-title":"InterFair: Debiasing with natural language feedback for fair interpretable predictions","author":"Majumder","year":"2022","journal-title":"arXiv preprint arXiv:2210.07440"},{"key":"2024092014251390100_bib169","doi-asserted-by":"publisher","first-page":"1041","DOI":"10.18653\/v1\/2022.naacl-main.76","article-title":"Socially aware bias measurements for Hindi language representations","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Malik","year":"2022"},{"key":"2024092014251390100_bib170","doi-asserted-by":"publisher","first-page":"615","DOI":"10.18653\/v1\/N19-1062","article-title":"Black is to criminal as Caucasian is to police: Detecting and removing multiclass bias in word embeddings","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Manzini","year":"2019"},{"key":"2024092014251390100_bib171","article-title":"Understanding stereotypes in language models: Towards robust measurement and zero-shot debiasing","author":"Mattern","year":"2022","journal-title":"arXiv preprint arXiv:2212.10678"},{"key":"2024092014251390100_bib172","doi-asserted-by":"publisher","first-page":"622","DOI":"10.18653\/v1\/N19-1063","article-title":"On measuring social biases in sentence encoders","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"May","year":"2019"},{"key":"2024092014251390100_bib173","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.796","article-title":"Using in-context learning to improve dialogue safety","author":"Meade","year":"2023","journal-title":"arXiv preprint arXiv:2302.00871"},{"key":"2024092014251390100_bib174","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.132","article-title":"An empirical survey of the effectiveness of debiasing techniques for pre-trained language models","author":"Meade","year":"2021","journal-title":"arXiv preprint arXiv:2110.08527"},{"key":"2024092014251390100_bib175","doi-asserted-by":"publisher","first-page":"168","DOI":"10.18653\/v1\/2022.gebnlp-1.18","article-title":"A taxonomy of bias-causing ambiguities in machine translation","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"M\u011bchura","year":"2022"},{"issue":"6","key":"2024092014251390100_bib176","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3457607","article-title":"A survey on bias and fairness in machine learning","volume":"54","author":"Mehrabi","year":"2021","journal-title":"ACM Computing Surveys"},{"key":"2024092014251390100_bib177","doi-asserted-by":"publisher","first-page":"1699","DOI":"10.1145\/3593013.3594109","article-title":"Bias against 93 stigmatized groups in masked language models and downstream sentiment classification tasks","volume-title":"Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency","author":"Mei","year":"2023"},{"key":"2024092014251390100_bib178","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3605943","article-title":"Recent advances in natural language processing via large pre-trained language models: A survey","volume":"56","author":"Min","year":"2023","journal-title":"ACM Computing Surveys"},{"key":"2024092014251390100_bib179","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1145\/3287560.3287596","article-title":"Model cards for model reporting","volume-title":"Proceedings of the Conference on Fairness, Accountability, and Transparency","author":"Mitchell","year":"2019"},{"issue":"8","key":"2024092014251390100_bib180","doi-asserted-by":"publisher","first-page":"e0237861","DOI":"10.1371\/journal.pone.0237861","article-title":"Hate speech detection and racial bias mitigation in social media based on BERT model","volume":"15","author":"Mozafari","year":"2020","journal-title":"PloS ONE"},{"key":"2024092014251390100_bib181","doi-asserted-by":"publisher","first-page":"5356","DOI":"10.18653\/v1\/2021.acl-long.416","article-title":"StereoSet: Measuring stereotypical bias in pretrained language models","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Nadeem","year":"2021"},{"key":"2024092014251390100_bib182","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.154","article-title":"CrowS-Pairs: A challenge dataset for measuring social biases in masked language models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing","author":"Nangia","year":"2020"},{"key":"2024092014251390100_bib183","doi-asserted-by":"publisher","first-page":"116","DOI":"10.18653\/v1\/2023.eacl-main.9","article-title":"Nationality bias in text generation","volume-title":"Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics","author":"Narayanan Venkit","year":"2023"},{"key":"2024092014251390100_bib184","article-title":"Mitigating harm in language models with conditional-likelihood filtration","author":"Ngo","year":"2021","journal-title":"arXiv preprint arXiv:2108.07790"},{"key":"2024092014251390100_bib185","doi-asserted-by":"publisher","first-page":"2398","DOI":"10.18653\/v1\/2021.naacl-main.191","article-title":"HONEST: Measuring hurtful sentence completion in language models","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Nozza","year":"2021"},{"key":"2024092014251390100_bib186","doi-asserted-by":"publisher","first-page":"1295","DOI":"10.1145\/3534678.3539232","article-title":"Learning fair representation via distributional contrastive disentanglement","volume-title":"Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","author":"Oh","year":"2022"},{"key":"2024092014251390100_bib187","doi-asserted-by":"publisher","first-page":"4123","DOI":"10.18653\/v1\/2023.acl-long.227","article-title":"Social-group-agnostic bias mitigation via the stereotype content model","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Omrani","year":"2023"},{"key":"2024092014251390100_bib188","unstructured":"OpenAI. 2023. GPT-4 technical report."},{"key":"2024092014251390100_bib189","doi-asserted-by":"publisher","first-page":"151","DOI":"10.18653\/v1\/2022.gebnlp-1.17","article-title":"Choose your lenses: Flaws in gender bias evaluation","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"Orgad","year":"2022"},{"key":"2024092014251390100_bib190","doi-asserted-by":"publisher","first-page":"8801","DOI":"10.18653\/v1\/2023.acl-long.490","article-title":"BLIND: Bias removal with no demographics","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Orgad","year":"2023"},{"key":"2024092014251390100_bib191","doi-asserted-by":"publisher","first-page":"2602","DOI":"10.18653\/v1\/2022.naacl-main.188","article-title":"How gender debiasing affects internal model representations, and why it matters","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Orgad","year":"2022"},{"key":"2024092014251390100_bib192","doi-asserted-by":"publisher","first-page":"4262","DOI":"10.18653\/v1\/2021.acl-long.329","article-title":"Probing toxic content in large pre-trained language models","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Ousidhoum","year":"2021"},{"key":"2024092014251390100_bib193","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib194","doi-asserted-by":"publisher","first-page":"5073","DOI":"10.18653\/v1\/2022.findings-emnlp.372","article-title":"Don\u2019t just clean it, proxy clean it: Mitigating bias by proxy in pre-trained models","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2022","author":"Panda","year":"2022"},{"key":"2024092014251390100_bib195","doi-asserted-by":"publisher","first-page":"273","DOI":"10.18653\/v1\/2022.gebnlp-1.28","article-title":"Incorporating subjectivity into gendered ambiguous pronoun (GAP) resolution using style transfer","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"Pant","year":"2022"},{"key":"2024092014251390100_bib196","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1145\/3539597.3570473","article-title":"Never too late to learn: Regularizing gender bias in coreference resolution","volume-title":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","author":"Park","year":"2023"},{"key":"2024092014251390100_bib197","doi-asserted-by":"publisher","first-page":"2086","DOI":"10.18653\/v1\/2022.findings-acl.165","article-title":"BBQ: A hand-built bias benchmark for question answering","volume-title":"Findings of the Association for Computational Linguistics: ACL 2022","author":"Parrish","year":"2022"},{"key":"2024092014251390100_bib198","doi-asserted-by":"publisher","first-page":"374","DOI":"10.18653\/v1\/2020.inlg-1.43","article-title":"Reducing non-normative text generation from language models","volume-title":"Proceedings of the 13th International Conference on Natural Language Generation","author":"Peng","year":"2020"},{"key":"2024092014251390100_bib199","doi-asserted-by":"publisher","first-page":"487","DOI":"10.18653\/v1\/2021.eacl-main.39","article-title":"AdapterFusion: Non-destructive task composition for transfer learning","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Pfeiffer","year":"2021"},{"key":"2024092014251390100_bib200","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.472","article-title":"On the challenges of using black-box APIs for toxicity evaluation in research","author":"Pozzobon","year":"2023","journal-title":"arXiv preprint arXiv:2304.12397"},{"key":"2024092014251390100_bib201","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1007\/978-3-031-30047-9_29","article-title":"The other side of compression: Measuring bias in pruned transformers","volume-title":"International Symposium on Intelligent Data Analysis","author":"Proskurina","year":"2023"},{"key":"2024092014251390100_bib202","doi-asserted-by":"publisher","first-page":"480","DOI":"10.1609\/aaai.v34i01.5385","article-title":"Automatically neutralizing subjective bias in text","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Pryzant","year":"2020"},{"key":"2024092014251390100_bib203","doi-asserted-by":"publisher","first-page":"9496","DOI":"10.18653\/v1\/2022.emnlp-main.646","article-title":"Perturbation augmentation for fairer NLP","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Qian","year":"2022"},{"key":"2024092014251390100_bib204","doi-asserted-by":"publisher","first-page":"223","DOI":"10.18653\/v1\/P19-2031","article-title":"Reducing gender bias in word-level language models with a gender-equalizing loss function","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop","author":"Qian","year":"2019"},{"key":"2024092014251390100_bib205","article-title":"Improving language understanding by generative pre-training","author":"Radford","year":"2018"},{"issue":"8","key":"2024092014251390100_bib206","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"issue":"1","key":"2024092014251390100_bib207","first-page":"5485","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2024092014251390100_bib208","first-page":"1","article-title":"AI and the everything in the whole wide world benchmark","volume-title":"Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks","author":"Raji","year":"2021"},{"key":"2024092014251390100_bib209","doi-asserted-by":"publisher","first-page":"2383","DOI":"10.18653\/v1\/D16-1264","article-title":"SQuAD: 100,000+ questions for machine comprehension of text","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing","author":"Rajpurkar","year":"2016"},{"key":"2024092014251390100_bib210","doi-asserted-by":"publisher","first-page":"15762","DOI":"10.18653\/v1\/2023.acl-long.878","article-title":"A comparative study on the impact of model compression techniques on fairness in language models","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ramesh","year":"2023"},{"key":"2024092014251390100_bib211","article-title":"A trip towards fairness: Bias and de-biasing in large language models","author":"Ranaldi","year":"2023","journal-title":"arXiv preprint arXiv:2305.13862"},{"key":"2024092014251390100_bib212","doi-asserted-by":"publisher","first-page":"7237","DOI":"10.18653\/v1\/2020.acl-main.647","article-title":"Null it out: Guarding protected attributes by iterative nullspace projection","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Ravfogel","year":"2020"},{"key":"2024092014251390100_bib213","doi-asserted-by":"publisher","first-page":"306","DOI":"10.1145\/3404835.3462949","article-title":"Societal biases in retrieved contents: Measurement framework and adversarial mitigation of BERT rankers","volume-title":"Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Rekabsaz","year":"2021"},{"key":"2024092014251390100_bib214","doi-asserted-by":"publisher","first-page":"2065","DOI":"10.1145\/3397271.3401280","article-title":"Do neural ranking models intensify gender bias?","volume-title":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Rekabsaz","year":"2020"},{"key":"2024092014251390100_bib215","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1145\/2939672.2939778","article-title":"\u201dWhy should I trust you?\u201d Explaining the predictions of any classifier","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Ribeiro","year":"2016"},{"key":"2024092014251390100_bib216","doi-asserted-by":"publisher","first-page":"8","DOI":"10.18653\/v1\/N18-2002","article-title":"Gender bias in coreference resolution","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Rudinger","year":"2018"},{"key":"2024092014251390100_bib217","doi-asserted-by":"publisher","first-page":"2699","DOI":"10.18653\/v1\/2020.acl-main.240","article-title":"Masked language model scoring","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Salazar","year":"2020"},{"key":"2024092014251390100_bib218","first-page":"20378","article-title":"Movement pruning: Adaptive sparsity by fine-tuning","volume":"33","author":"Sanh","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib219","doi-asserted-by":"publisher","first-page":"1668","DOI":"10.18653\/v1\/P19-1163","article-title":"The risk of racial bias in hate speech detection","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Sap","year":"2019"},{"key":"2024092014251390100_bib220","first-page":"35894","article-title":"Fair infinitesimal jackknife: Mitigating the influence of biased training data points without refitting","volume":"35","author":"Sattigeri","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib221","doi-asserted-by":"publisher","first-page":"3814","DOI":"10.18653\/v1\/2022.findings-acl.301","article-title":"First the worst: Finding better gender translations during beam search","volume-title":"Findings of the Association for Computational Linguistics: ACL 2022","author":"Saunders","year":"2022"},{"key":"2024092014251390100_bib222","first-page":"2798","article-title":"Intra-processing methods for debiasing neural networks","volume":"33","author":"Savani","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib223","doi-asserted-by":"publisher","first-page":"1408","DOI":"10.1162\/tacl_a_00434","article-title":"Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP","volume":"9","author":"Schick","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"issue":"3","key":"2024092014251390100_bib224","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1038\/s42256-022-00458-8","article-title":"Large pre-trained language models contain human-like biases of what is right and wrong to do","volume":"4","author":"Schramowski","year":"2022","journal-title":"Nature Machine Intelligence"},{"key":"2024092014251390100_bib225","doi-asserted-by":"publisher","first-page":"1373","DOI":"10.18653\/v1\/2023.acl-short.118","article-title":"The tail wagging the dog: Dataset construction biases of social bias benchmarks","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Selvam","year":"2023"},{"key":"2024092014251390100_bib226","doi-asserted-by":"publisher","first-page":"5248","DOI":"10.18653\/v1\/2020.acl-main.468","article-title":"Predictive biases in natural language processing models: A conceptual framework and overview","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Shah","year":"2020"},{"key":"2024092014251390100_bib227","first-page":"81","article-title":"Does representational fairness imply empirical fairness?","volume-title":"Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022","author":"Shen","year":"2022"},{"key":"2024092014251390100_bib228","doi-asserted-by":"publisher","first-page":"3407","DOI":"10.18653\/v1\/D19-1339","article-title":"The woman worked as a babysitter: On biases in language generation","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Sheng","year":"2019"},{"key":"2024092014251390100_bib229","doi-asserted-by":"publisher","first-page":"3239","DOI":"10.18653\/v1\/2020.findings-emnlp.291","article-title":"Towards controllable biases in language generation","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Sheng","year":"2020"},{"key":"2024092014251390100_bib230","doi-asserted-by":"publisher","first-page":"750","DOI":"10.18653\/v1\/2021.naacl-main.60","article-title":"\u201cNice try, kiddo\u201d: Investigating ad hominems in dialogue responses","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Sheng","year":"2021"},{"key":"2024092014251390100_bib231","doi-asserted-by":"publisher","first-page":"4275","DOI":"10.18653\/v1\/2021.acl-long.330","article-title":"Societal biases in language generation: Progress and challenges","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Sheng","year":"2021"},{"key":"2024092014251390100_bib232","article-title":"BlenderBot 3: A deployed conversational agent that continually learns to responsibly engage","author":"Shuster","year":"2022","journal-title":"arXiv preprint arXiv:2208.03188"},{"key":"2024092014251390100_bib233","doi-asserted-by":"publisher","first-page":"2898","DOI":"10.18653\/v1\/2023.acl-long.163","article-title":"Learning to generate equitable text in dialogue from biased training data","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Sicilia","year":"2023"},{"key":"2024092014251390100_bib234","doi-asserted-by":"publisher","first-page":"2383","DOI":"10.18653\/v1\/2021.naacl-main.189","article-title":"Towards a comprehensive understanding and accurate evaluation of societal biases in pre-trained transformers","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Silva","year":"2021"},{"key":"2024092014251390100_bib235","doi-asserted-by":"publisher","first-page":"9180","DOI":"10.18653\/v1\/2022.emnlp-main.625","article-title":"\u201cI\u2019m sorry to hear that\u201d: Finding new biases in language models with a holistic descriptor dataset","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Smith","year":"2022"},{"key":"2024092014251390100_bib236","first-page":"5861","article-title":"Process for adapting language models to society (PALMS) with values-targeted datasets","volume":"34","author":"Solaiman","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"2024092014251390100_bib237","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"Journal of Machine Learning Research"},{"key":"2024092014251390100_bib238","doi-asserted-by":"publisher","first-page":"3524","DOI":"10.18653\/v1\/2022.acl-long.247","article-title":"Upstream mitigation is not all you need: Testing the bias transfer hypothesis in pre-trained language models","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Steed","year":"2022"},{"key":"2024092014251390100_bib239","doi-asserted-by":"publisher","first-page":"2213","DOI":"10.18653\/v1\/2023.acl-long.123","article-title":"MoralDial: A framework to train and evaluate moral dialogue systems via moral discussions","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Sun","year":"2023"},{"key":"2024092014251390100_bib240","article-title":"A simple and effective pruning approach for large language models","author":"Sun","year":"2023","journal-title":"arXiv preprint arXiv:2306.11695"},{"key":"2024092014251390100_bib241","article-title":"They, them, theirs: Rewriting with gender-neutral English","author":"Sun","year":"2021","journal-title":"arXiv preprint arXiv:2102.06788"},{"key":"2024092014251390100_bib242","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3465416.3483305","article-title":"A framework for understanding sources of harm throughout the machine learning life cycle","author":"Suresh","year":"2021","journal-title":"Equity and Access in Algorithms, Mechanisms, and Optimization"},{"key":"2024092014251390100_bib243","first-page":"13230","article-title":"Assessing social and intersectional biases in contextualized word representations","volume":"33","author":"Tan","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib244","doi-asserted-by":"publisher","first-page":"340","DOI":"10.18653\/v1\/2023.acl-short.30","article-title":"Language models get a gender makeover: Mitigating gender bias with few-shot data interventions","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Thakur","year":"2023"},{"key":"2024092014251390100_bib245","doi-asserted-by":"publisher","first-page":"163","DOI":"10.18653\/v1\/2022.naacl-srw.21","article-title":"Text style transfer for bias mitigation using masked language modeling","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop","author":"Tokpo","year":"2022"},{"key":"2024092014251390100_bib246","doi-asserted-by":"publisher","first-page":"6462","DOI":"10.18653\/v1\/2022.acl-long.447","article-title":"SaFeRDialogues: Taking feedback gracefully after conversational safety failures","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ung","year":"2022"},{"key":"2024092014251390100_bib247","doi-asserted-by":"publisher","first-page":"7597","DOI":"10.18653\/v1\/2020.emnlp-main.613","article-title":"Towards debiasing NLU models from unknown biases","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Utama","year":"2020"},{"key":"2024092014251390100_bib248","doi-asserted-by":"publisher","first-page":"8940","DOI":"10.18653\/v1\/2021.emnlp-main.704","article-title":"NeuTral Rewriter: A rule-based and neural approach to automatic rewriting into gender neutral alternatives","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Vanmassenhove","year":"2021"},{"key":"2024092014251390100_bib249","doi-asserted-by":"publisher","first-page":"225","DOI":"10.18653\/v1\/2022.gebnlp-1.23","article-title":"HeteroCorpus: A corpus for heteronormative language detection","volume-title":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","author":"V\u00e1squez","year":"2022"},{"key":"2024092014251390100_bib250","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3194770.3194776","article-title":"Fairness definitions explained","volume-title":"Proceedings of the International Workshop on Software Fairness","author":"Verma","year":"2018"},{"issue":"3","key":"2024092014251390100_bib251","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1080\/13645579.2018.1531228","article-title":"Indigenous data, indigenous methodologies and indigenous data sovereignty","volume":"22","author":"Walter","year":"2019","journal-title":"International Journal of Social Research Methodology"},{"key":"2024092014251390100_bib252","doi-asserted-by":"publisher","first-page":"30","DOI":"10.18653\/v1\/W19-2304","article-title":"BERT has a mouth, and it must speak: BERT as a Markov random field language model","volume-title":"Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation","author":"Wang","year":"2019"},{"key":"2024092014251390100_bib253","doi-asserted-by":"publisher","first-page":"3740","DOI":"10.18653\/v1\/2021.naacl-main.293","article-title":"Dynamically disentangling social bias from task-oriented representations with adversarial attack","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Wang","year":"2021"},{"key":"2024092014251390100_bib254","first-page":"4473","article-title":"Toward fairness in text generation via mutual information minimization based on importance sampling","volume-title":"International Conference on Artificial Intelligence and Statistics","author":"Wang","year":"2023"},{"key":"2024092014251390100_bib255","article-title":"Pay attention to your tone: Introducing a new dataset for polite language rewrite","author":"Wang","year":"2022","journal-title":"arXiv preprint arXiv:2212.10190"},{"key":"2024092014251390100_bib256","doi-asserted-by":"publisher","first-page":"605","DOI":"10.1162\/tacl_a_00240","article-title":"Mind the GAP: A balanced corpus of gendered ambiguous pronouns","volume":"6","author":"Webster","year":"2018","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024092014251390100_bib257","article-title":"Measuring and reducing gendered correlations in pre-trained models","author":"Webster","year":"2020","journal-title":"arXiv preprint arXiv:2010.06032"},{"key":"2024092014251390100_bib258","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024092014251390100_bib259","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1145\/3531146.3533088","article-title":"Taxonomy of risks posed by language models","volume-title":"Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency","author":"Weidinger","year":"2022"},{"key":"2024092014251390100_bib260","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/ICASSP49357.2023.10095658","article-title":"Compensatory debiasing for gender imbalances in language models","volume-title":"ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Woo","year":"2023"},{"key":"2024092014251390100_bib261","doi-asserted-by":"publisher","first-page":"2390","DOI":"10.18653\/v1\/2021.naacl-main.190","article-title":"Detoxifying language models risks marginalizing minority voices","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Xu","year":"2021"},{"key":"2024092014251390100_bib262","article-title":"Recipes for safety in open-domain chatbots","author":"Xu","year":"2020","journal-title":"arXiv preprint arXiv:2010.07079"},{"key":"2024092014251390100_bib263","doi-asserted-by":"publisher","first-page":"10780","DOI":"10.1609\/aaai.v37i9.26279","article-title":"ADEPT: A DEbiasing PrompT Framework","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Yang","year":"2023"},{"key":"2024092014251390100_bib264","article-title":"Unified detoxifying and debiasing in language generation via inference-time adaptive optimization","author":"Yang","year":"2022","journal-title":"arXiv preprint arXiv:2210.04492"},{"key":"2024092014251390100_bib265","doi-asserted-by":"publisher","first-page":"6032","DOI":"10.18653\/v1\/2023.findings-acl.375","article-title":"Unlearning bias in language models by partitioning gradients","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Yu","year":"2023"},{"key":"2024092014251390100_bib266","doi-asserted-by":"publisher","first-page":"1755","DOI":"10.1145\/3539618.3591938","article-title":"Mixup-based unified framework to overcome gender bias resurgence","volume-title":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Yu","year":"2023"},{"key":"2024092014251390100_bib267","article-title":"Should we attend more or less? Modulating attention for fairness","author":"Zayed","year":"2023","journal-title":"arXiv preprint arXiv:2305.13088"},{"key":"2024092014251390100_bib268","doi-asserted-by":"publisher","first-page":"14593","DOI":"10.1609\/aaai.v37i12.26706","article-title":"Deep learning on a healthy data diet: Finding important examples for fairness","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Zayed","year":"2023"},{"key":"2024092014251390100_bib269","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1145\/3278721.3278779","article-title":"Mitigating unwanted biases with adversarial learning","volume-title":"Proceedings of the 2018 AAAI\/ACM Conference on AI, Ethics, and Society","author":"Zhang","year":"2018"},{"key":"2024092014251390100_bib270","article-title":"mixup: Beyond empirical risk minimization","volume-title":"International Conference on Learning Representations","author":"Zhang","year":"2018"},{"key":"2024092014251390100_bib271","doi-asserted-by":"publisher","first-page":"629","DOI":"10.18653\/v1\/N19-1064","article-title":"Gender bias in contextualized word embeddings","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Zhao","year":"2019"},{"key":"2024092014251390100_bib272","doi-asserted-by":"publisher","first-page":"2979","DOI":"10.18653\/v1\/D17-1323","article-title":"Men also like shopping: Reducing gender bias amplification using corpus-level constraints","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Zhao","year":"2017"},{"key":"2024092014251390100_bib273","doi-asserted-by":"publisher","first-page":"15","DOI":"10.18653\/v1\/N18-2003","article-title":"Gender bias in coreference resolution: Evaluation and debiasing methods","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Zhao","year":"2018"},{"key":"2024092014251390100_bib274","first-page":"12697","article-title":"Calibrate before use: Improving few-shot performance of language models","volume-title":"International Conference on Machine Learning","author":"Zhao","year":"2021"},{"key":"2024092014251390100_bib275","doi-asserted-by":"publisher","first-page":"1022","DOI":"10.18653\/v1\/2023.findings-acl.65","article-title":"Click: Controllable text generation with sequence likelihood contrastive learning","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Zheng","year":"2023"},{"key":"2024092014251390100_bib276","doi-asserted-by":"publisher","first-page":"4227","DOI":"10.18653\/v1\/2023.acl-long.232","article-title":"Causal-debias: Unifying debiasing in pretrained language models and fine-tuning via causal invariant learning","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Zhou","year":"2023"},{"key":"2024092014251390100_bib277","doi-asserted-by":"publisher","first-page":"3701","DOI":"10.18653\/v1\/2022.acl-long.258","article-title":"VALUE: Understanding dialect disparity in NLU","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ziems","year":"2022"},{"key":"2024092014251390100_bib278","doi-asserted-by":"publisher","first-page":"1651","DOI":"10.18653\/v1\/P19-1161","article-title":"Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zmigrod","year":"2019"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/50\/3\/1097\/2471010\/coli_a_00524.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/50\/3\/1097\/2471010\/coli_a_00524.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T14:25:47Z","timestamp":1726842347000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/50\/3\/1097\/121961\/Bias-and-Fairness-in-Large-Language-Models-A"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":278,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,9,1]]},"published-print":{"date-parts":[[2024,9,1]]}},"URL":"https:\/\/doi.org\/10.1162\/coli_a_00524","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}