{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,25]],"date-time":"2025-03-25T14:34:22Z","timestamp":1742913262187,"version":"3.40.3"},"publisher-location":"Cham","reference-count":26,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031264375"},{"type":"electronic","value":"9783031264382"}],"license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T00:00:00Z","timestamp":1677110400000},"content-version":"vor","delay-in-days":53,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Predictions from machine learning models can reflect biases in the data on which they are trained. Gender bias has been identified in natural language processing systems such as those used for recruitment. The development of approaches to mitigate gender bias in training data typically need to be able to isolate the effect of gender on the output to see the impact of gender. While it is possible to isolate and identify gender for some types of training data, e.g. CVs in recruitment, for most textual corpora there is no obvious gender label. This paper proposes a general approach to measure bias in textual training data for NLP prediction systems by providing a gender label identified from the textual content of the training data. The approach is compared with the identity term template approach currently in use, also known as Gender Bias Evaluation Datasets (GBETs), which involves the design of synthetic test datasets which isolate gender and are used to probe for gender bias in a dataset. We show that our Identity Term Sampling (ITS) approach is capable of identifying gender bias at least as well as identity term templates and can be used on training data that has no obvious gender label.<\/jats:p>","DOI":"10.1007\/978-3-031-26438-2_18","type":"book-chapter","created":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T06:32:56Z","timestamp":1677047576000},"page":"226-238","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Identity Term Sampling for\u00a0Measuring Gender Bias in\u00a0Training Data"],"prefix":"10.1007","author":[{"given":"Nasim","family":"Sobhani","sequence":"first","affiliation":[]},{"given":"Sarah Jane","family":"Delany","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,23]]},"reference":[{"key":"18_CR1","doi-asserted-by":"crossref","unstructured":"Basta, C., et al.: Evaluating the underlying gender bias in contextualized word embeddings. In: Proceedings of the 1st Workshop on Gender Bias in NLP. ACL (2019)","DOI":"10.18653\/v1\/W19-3805"},{"key":"18_CR2","unstructured":"Bolukbasi, T., et al.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In: Advances in NeurIPS (2016)"},{"issue":"6334","key":"18_CR3","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1126\/science.aal4230","volume":"356","author":"A Caliskan","year":"2017","unstructured":"Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183\u2013186 (2017)","journal-title":"Science"},{"key":"18_CR4","doi-asserted-by":"crossref","unstructured":"Davidson, T., et al.: Racial bias in hate speech and abusive language detection datasets. In: Proceedings of the 3rd Workshop on Abusive Language Online. ACL (2019)","DOI":"10.18653\/v1\/W19-3504"},{"key":"18_CR5","doi-asserted-by":"crossref","unstructured":"De-Arteaga, M., et al.: Bias in bios: a case study of semantic representation bias in a high-stakes setting. In: Proceedings of FAT* (2019)","DOI":"10.1145\/3287560.3287572"},{"key":"18_CR6","doi-asserted-by":"crossref","unstructured":"Dixon, L., et al.: Measuring and mitigating unintended bias in text classification. In: Proceedings of the 2018 AAAI\/ACM Conference on AIES, AIES 2018. ACM (2018)","DOI":"10.1145\/3278721.3278729"},{"key":"18_CR7","doi-asserted-by":"crossref","unstructured":"Emami, A., et al.: The KnowRef coreference corpus: removing gender and number cues for difficult pronominal anaphora resolution. In: Proceedings of ACL (2019)","DOI":"10.18653\/v1\/P19-1386"},{"key":"18_CR8","doi-asserted-by":"crossref","unstructured":"Founta, A.M., et al.: Large scale crowdsourcing and characterization of twitter abusive behavior. In: Twelfth International AAAI Conference on Web and Social Media (2018)","DOI":"10.1609\/icwsm.v12i1.14991"},{"key":"18_CR9","first-page":"3315","volume":"29","author":"M Hardt","year":"2016","unstructured":"Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. Adv. Neural. Inf. Process. Syst. 29, 3315\u20133323 (2016)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"18_CR10","doi-asserted-by":"crossref","unstructured":"Kiritchenko, S., Mohammad, S.: Examining gender and race bias in 200 sentiment analysis systems. In: Proceedings of Conference on Lexical & Computational Semantics (2018)","DOI":"10.18653\/v1\/S18-2005"},{"key":"18_CR11","volume-title":"Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion of His 65th Birthday","author":"K Lu","year":"2020","unstructured":"Lu, K., et al.: Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion of His 65th Birthday. Springer, Heidelberg (2020)"},{"key":"18_CR12","doi-asserted-by":"crossref","unstructured":"Nadeem, M., Bethke, A., Reddy, S.: StereoSet: measuring stereotypical bias in pretrained language models. In: Proceedings of ACL and the 11th IJCNLP). ACL (2021)","DOI":"10.18653\/v1\/2021.acl-long.416"},{"key":"18_CR13","doi-asserted-by":"crossref","unstructured":"Nangia, N., Vania, C., Bhalerao, R., Bowman, S.R.: CrowS-pairs: a challenge dataset for measuring social biases in masked language models. In: EMNLP (2020)","DOI":"10.18653\/v1\/2020.emnlp-main.154"},{"key":"18_CR14","doi-asserted-by":"crossref","unstructured":"Park, J.H., Shin, J., Fung, P.: Reducing gender bias in abusive language detection. In: Proceedings of EMNLP. ACL (2018)","DOI":"10.18653\/v1\/D18-1302"},{"key":"18_CR15","doi-asserted-by":"crossref","unstructured":"Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the NAACL. ACL (2018)","DOI":"10.18653\/v1\/N18-1202"},{"key":"18_CR16","doi-asserted-by":"crossref","unstructured":"Prost, F., Thain, N., Bolukbasi, T.: Debiasing embeddings for reduced gender bias in text classification. In: Proceedings of the 1st Workshop on Gender Bias in NLP (2019)","DOI":"10.18653\/v1\/W19-3810"},{"key":"18_CR17","doi-asserted-by":"crossref","unstructured":"Rudinger, R., et al.: Social bias in elicited natural language inferences. In: Proceedings of the First ACL Workshop on Ethics in NLP. ACLs (2017)","DOI":"10.18653\/v1\/W17-1609"},{"key":"18_CR18","first-page":"845","volume":"9","author":"B Savoldi","year":"2021","unstructured":"Savoldi, B., Gaido, M., Bentivogli, L., Negri, M., Turchi, M.: Gender bias in machine translation. Trans. ACL 9, 845\u2013874 (2021)","journal-title":"Trans. ACL"},{"key":"18_CR19","doi-asserted-by":"crossref","unstructured":"Smith, E.M., et al.: \u201c I\u2019m sorry to hear that\u201d: finding bias in language models with a holistic descriptor dataset. arXiv preprint arXiv:2205.09209 (2022)","DOI":"10.18653\/v1\/2022.emnlp-main.625"},{"key":"18_CR20","unstructured":"Stanczak, K., Augenstein, I.: A survey on gender bias in natural language processing. arXiv preprint arXiv:2112.14168 (2021)"},{"key":"18_CR21","unstructured":"Sun, T., et al.: Mitigating gender bias in natural language processing: literature review. In: Proceedings of the ACL. ACL (2019)"},{"key":"18_CR22","doi-asserted-by":"crossref","unstructured":"Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings NAACL Student Workshop (2016)","DOI":"10.18653\/v1\/N16-2013"},{"key":"18_CR23","first-page":"605","volume":"6","author":"K Webster","year":"2018","unstructured":"Webster, K., Recasens, M., Axelrod, V., Baldridge, J.: Mind the GAP: a balanced corpus of gendered ambiguous pronouns. Trans. ACL 6, 605\u2013617 (2018)","journal-title":"Trans. ACL"},{"key":"18_CR24","doi-asserted-by":"crossref","unstructured":"Zhao, J., Wang, T., Yatskar, M., Cotterell, R., Ordonez, V., Chang, K.W.: Gender bias in contextualized word embeddings. In: Proceedings of the NAACL. ACL (2019)","DOI":"10.18653\/v1\/N19-1064"},{"key":"18_CR25","doi-asserted-by":"crossref","unstructured":"Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Gender bias in coreference resolution: Evaluation and debiasing methods. In: Proceedings of the NAACL (2018)","DOI":"10.18653\/v1\/N18-2003"},{"key":"18_CR26","doi-asserted-by":"crossref","unstructured":"Zhao, J., et al.: Men also like shopping: reducing gender bias amplification using corpus-level constraints. In: Proceedings of the EMNLP (2017)","DOI":"10.18653\/v1\/D17-1323"}],"container-title":["Communications in Computer and Information Science","Artificial Intelligence and Cognitive Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-26438-2_18","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,7]],"date-time":"2023-12-07T10:56:35Z","timestamp":1701946595000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-26438-2_18"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"ISBN":["9783031264375","9783031264382"],"references-count":26,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-26438-2_18","relation":{},"ISSN":["1865-0929","1865-0937"],"issn-type":[{"type":"print","value":"1865-0929"},{"type":"electronic","value":"1865-0937"}],"subject":[],"published":{"date-parts":[[2023]]},"assertion":[{"value":"23 February 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"AICS","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Irish Conference on Artificial Intelligence and Cognitive Science","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Munster","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Ireland","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2022","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"8 December 2022","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"9 December 2022","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"30","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"aics2022","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/aics2022.mtu.ie\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"EasyChair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"102","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"41","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"40% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}