{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T00:25:29Z","timestamp":1702599929450},"reference-count":40,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T00:00:00Z","timestamp":1685404800000},"content-version":"vor","delay-in-days":4,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We present the Assignment-Maximization Spectral Attribute removaL (AMSAL) algorithm, which erases information from neural representations when the information to be erased is implicit rather than directly being aligned to each input example. Our algorithm works by alternating between two steps. In one, it finds an assignment of the input representations to the information to be erased, and in the other, it creates projections of both the input representations and the information to be erased into a joint latent space. We test our algorithm on an extensive array of datasets, including a Twitter dataset with multiple guarded attributes, the BiasBios dataset, and the BiasBench benchmark. The latter benchmark includes four datasets with various types of protected attributes. Our results demonstrate that bias can often be removed in our setup. We also discuss the limitations of our approach when there is a strong entanglement between the main task and the information to be erased.1<\/jats:p>","DOI":"10.1162\/tacl_a_00558","type":"journal-article","created":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T17:27:17Z","timestamp":1685467637000},"page":"488-510","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":0,"title":["Erasure of Unaligned Attributes from Neural Representations"],"prefix":"10.1162","volume":"11","author":[{"given":"Shun","family":"Shao","sequence":"first","affiliation":[{"name":"Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK. s.shao-11@inf.ed.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yftah","family":"Ziser","sequence":"additional","affiliation":[{"name":"Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK. yftah.ziser@inf.ed.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shay B.","family":"Cohen","sequence":"additional","affiliation":[{"name":"Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK. scohen@inf.ed.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2023,5,26]]},"reference":[{"key":"2023053017265106500_bib1","doi-asserted-by":"publisher","first-page":"1119","DOI":"10.18653\/v1\/D16-1120","article-title":"Demographic dialectal variation in social media: A case study of African-American English","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing","author":"Lin Blodgett","year":"2016"},{"key":"2023053017265106500_bib2","first-page":"4349","article-title":"Man is to computer programmer as woman is to homemaker? Debiasing word embeddings","volume-title":"Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5\u201310, 2016, Barcelona, Spain","author":"Bolukbasi","year":"2016"},{"key":"2023053017265106500_bib3","first-page":"2927","article-title":"Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics","author":"Cachola","year":"2018"},{"issue":"6334","key":"2023053017265106500_bib4","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1126\/science.aal4230","article-title":"Semantics derived automatically from language corpora contain human-like biases","volume":"356","author":"Caliskan","year":"2017","journal-title":"Science"},{"key":"2023053017265106500_bib5","article-title":"Conditional supervised contrastive learning for fair text classification","author":"Chi","year":"2022","journal-title":"ArXiv preprint"},{"key":"2023053017265106500_bib6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/D18-1001","article-title":"Privacy-preserving neural representations of text","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Coavoux","year":"2018"},{"key":"2023053017265106500_bib7","doi-asserted-by":"publisher","first-page":"2614","DOI":"10.18653\/v1\/2022.acl-long.187","article-title":"Learning disentangled textual representations via statistical measures of similarity","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Colombo","year":"2022"},{"key":"2023053017265106500_bib8","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1145\/3287560.3287572","article-title":"Bias in bios: A case study of semantic representation bias in a high-stakes setting","volume-title":"Proceedings of the Conference on Fairness, Accountability, and Transparency","author":"De-Arteaga","year":"2019"},{"key":"2023053017265106500_bib9","doi-asserted-by":"publisher","first-page":"5034","DOI":"10.18653\/v1\/2021.emnlp-main.411","article-title":"OSCaR: Orthogonal subspace correction and rectification of biases in word embeddings","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Dev","year":"2021"},{"key":"2023053017265106500_bib10","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2023053017265106500_bib11","first-page":"1381","article-title":"Understanding gender bias in knowledge base embeddings","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Yupei","year":"2022"},{"key":"2023053017265106500_bib12","article-title":"Censoring representations with an adversary","volume-title":"4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2\u20134, 2016, Conference Track Proceedings","author":"Edwards","year":"2016"},{"key":"2023053017265106500_bib13","doi-asserted-by":"publisher","first-page":"11","DOI":"10.18653\/v1\/D18-1002","article-title":"Adversarial removal of demographic attributes from text data","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Elazar","year":"2018"},{"issue":"2","key":"2023053017265106500_bib14","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1162\/coli_a_00404","article-title":"CausaLM: Causal model explanation through counterfactual language models","volume":"47","author":"Feder","year":"2021","journal-title":"Computational Linguistics"},{"key":"2023053017265106500_bib15","doi-asserted-by":"publisher","first-page":"1615","DOI":"10.18653\/v1\/D17-1169","article-title":"Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Felbo","year":"2017"},{"key":"2023053017265106500_bib16","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.595","article-title":"Measuring social bias in knowledge graph embeddings","author":"Fisher","year":"2019","journal-title":"ArXiv preprint"},{"issue":"1","key":"2023053017265106500_bib17","first-page":"2096","article-title":"Domain-adversarial training of neural networks","volume":"17","author":"Ganin","year":"2016","journal-title":"The Journal of Machine Learning Research"},{"key":"2023053017265106500_bib18","first-page":"609","article-title":"Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Gonen","year":"2019"},{"key":"2023053017265106500_bib19","article-title":"Balancing out bias: Achieving fairness through training reweighting","author":"Han","year":"2021","journal-title":"ArXiv preprint"},{"key":"2023053017265106500_bib20","doi-asserted-by":"crossref","first-page":"471","DOI":"10.18653\/v1\/2021.findings-acl.41","article-title":"Decoupling adversarial training for fair NLP","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Han","year":"2021"},{"key":"2023053017265106500_bib21","article-title":"fairlib: A unified framework for assessing and improving classification fairness","author":"Han","year":"2022","journal-title":"ArXiv preprint"},{"key":"2023053017265106500_bib22","doi-asserted-by":"publisher","first-page":"717","DOI":"10.18653\/v1\/2022.naacl-main.52","article-title":"Easy adaptation to mitigate gender bias in multilingual text classification","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Huang","year":"2022"},{"key":"2023053017265106500_bib23","article-title":"Fasttext.zip: Compressing text classification models","author":"Joulin","year":"2016","journal-title":"ArXiv preprint"},{"key":"2023053017265106500_bib24","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1002\/nav.3800020109","article-title":"The Hungarian method for the assignment problem","volume":"2","author":"Kuhn","year":"1955","journal-title":"Naval Research Logistics Quarterly"},{"key":"2023053017265106500_bib25","doi-asserted-by":"crossref","first-page":"25","DOI":"10.18653\/v1\/P18-2005","article-title":"Towards robust and privacy-preserving text representations","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Li","year":"2018"},{"key":"2023053017265106500_bib26","volume-title":"Information theory, Inference and Learning Algorithms","author":"MacKay","year":"2003"},{"key":"2023053017265106500_bib27","first-page":"622","article-title":"On measuring social biases in sentence encoders","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"May","year":"2019"},{"key":"2023053017265106500_bib28","doi-asserted-by":"publisher","first-page":"1878","DOI":"10.18653\/v1\/2022.acl-long.132","article-title":"An empirical survey of the effectiveness of debiasing techniques for pre-trained language models","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Meade","year":"2022"},{"key":"2023053017265106500_bib29","doi-asserted-by":"publisher","first-page":"5356","DOI":"10.18653\/v1\/2021.acl-long.416","article-title":"StereoSet: Measuring stereotypical bias in pretrained language models","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Nadeem","year":"2021"},{"key":"2023053017265106500_bib30","doi-asserted-by":"publisher","first-page":"1953","DOI":"10.18653\/v1\/2020.emnlp-main.154","article-title":"CrowS-pairs: A challenge dataset for measuring social biases in masked language models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Nangia","year":"2020"},{"key":"2023053017265106500_bib31","article-title":"On minimum-cost assignments in unbalanced bipartite graphs","author":"Ramshaw","year":"2012","journal-title":"HP Labs, Palo Alto, CA, USA, Technical Report HPL-2012-40R1"},{"key":"2023053017265106500_bib32","doi-asserted-by":"publisher","first-page":"7237","DOI":"10.18653\/v1\/2020.acl-main.647","article-title":"Null it out: Guarding protected attributes by iterative nullspace projection","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Ravfogel","year":"2020"},{"key":"2023053017265106500_bib33","first-page":"18400","article-title":"Linear adversarial concept erasure","volume-title":"International Conference on Machine Learning","author":"Ravfogel","year":"2022"},{"key":"2023053017265106500_bib34","doi-asserted-by":"publisher","first-page":"1807","DOI":"10.18653\/v1\/2022.acl-long.127","article-title":"Under the morphosyntactic lens: A multifaceted evaluation of gender bias in speech translation","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Savoldi","year":"2022"},{"key":"2023053017265106500_bib35","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2023.eacl-main.118","article-title":"Gold doesn\u2019t always glitter: Spectral removal of linear and nonlinear guarded attribute information","volume-title":"Proceedings of the 17th Annual Meeting of the European chapter of the Association for Computational Linguistics (EACL)","author":"Shao","year":"2023"},{"key":"2023053017265106500_bib36","article-title":"Contrastive learning for fair representations","author":"Shen","year":"2021","journal-title":"ArXiv preprint"},{"key":"2023053017265106500_bib37","unstructured":"Gilbert W. Stewart . 1990. Perturbation theory for the singular value decomposition, Technical Report UMIACS-90-120 \/ CS-TR 2539, University of Maryland, College Park."},{"key":"2023053017265106500_bib38","article-title":"GLUE: A multi-task benchmark and analysis platform for natural language understanding","volume-title":"7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6\u20139, 2019","author":"Wang","year":"2019"},{"key":"2023053017265106500_bib39","doi-asserted-by":"publisher","first-page":"3740","DOI":"10.18653\/v1\/2021.naacl-main.293","article-title":"Dynamically disentangling social bias from task-oriented representations with adversarial attack","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Wang","year":"2021"},{"key":"2023053017265106500_bib40","doi-asserted-by":"publisher","first-page":"4847","DOI":"10.18653\/v1\/D18-1521","article-title":"Learning gender-neutral word embeddings","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Zhao","year":"2018"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00558\/2110602\/tacl_a_00558.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00558\/2110602\/tacl_a_00558.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,14]],"date-time":"2023-12-14T12:26:50Z","timestamp":1702556810000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00558\/116162\/Erasure-of-Unaligned-Attributes-from-Neural"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":40,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00558","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}