{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,20]],"date-time":"2026-06-20T21:40:44Z","timestamp":1781991644105,"version":"3.54.5"},"reference-count":124,"publisher":"Association for Computing Machinery (ACM)","issue":"13s","license":[{"start":{"date-parts":[[2023,7,13]],"date-time":"2023-07-13T00:00:00Z","timestamp":1689206400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["2107290, 1741022, 1934565, 2106176"],"award-info":[{"award-number":["2107290, 1741022, 1934565, 2106176"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>Data-driven algorithms are only as good as the data they work with, while datasets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons, ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods. Given that \u201cbias in, bias out,\u201d one cannot expect AI-based solutions to have equitable outcomes for societal applications, without addressing issues such as representation bias. While there has been extensive study of fairness in machine learning models, including several review papers, bias in the data has been less studied. This article reviews the literature on identifying and resolving representation bias as a feature of a dataset, independent of how consumed later. The scope of this survey is bounded to structured (tabular) and unstructured (e.g., image, text, graph) data. It presents taxonomies to categorize the studied techniques based on multiple design dimensions and provides a side-by-side comparison of their properties.<\/jats:p>\n          <jats:p>There is still a long way to fully address representation bias issues in data. The authors hope that this survey motivates researchers to approach these challenges in the future by observing existing work within their respective domains.<\/jats:p>","DOI":"10.1145\/3588433","type":"journal-article","created":{"date-parts":[[2023,3,17]],"date-time":"2023-03-17T12:07:05Z","timestamp":1679054825000},"page":"1-39","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":108,"title":["Representation Bias in Data: A Survey on Identification and Resolution Techniques"],"prefix":"10.1145","volume":"55","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7016-3807","authenticated-orcid":false,"given":"Nima","family":"Shahbazi","sequence":"first","affiliation":[{"name":"University of Illinois Chicago, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6609-5706","authenticated-orcid":false,"given":"Yin","family":"Lin","sequence":"additional","affiliation":[{"name":"University of Michigan, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5251-6186","authenticated-orcid":false,"given":"Abolfazl","family":"Asudeh","sequence":"additional","affiliation":[{"name":"University of Illinois Chicago, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0724-5214","authenticated-orcid":false,"given":"H. V.","family":"Jagadish","sequence":"additional","affiliation":[{"name":"University of Michigan, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,7,13]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"2019. Health United States Spotlight: Race and Ethnic Disparities in Heart Disease . Health United States spotlight CDC Stacks Public Health Publications. https:\/\/stacks.cdc.gov\/view\/cdc\/77732."},{"key":"e_1_3_2_3_2","unstructured":"2019. Asian-American and Pacific Islander Heritage in the United States. https:\/\/www.census.gov\/newsroom\/facts-for-features\/2019\/asian-american-pacific-islander.html. Accessed 26-03-2023."},{"key":"e_1_3_2_4_2","article-title":"Active sampling for min-max fairness","author":"Abernethy Jacob","year":"2020","unstructured":"Jacob Abernethy, Pranjal Awasthi, Matth\u00e4us Kleindessner, Jamie Morgenstern, Chris Russell, and Jie Zhang. 2020. Active sampling for min-max fairness. arXiv preprint arXiv:2006.06879 (2020).","journal-title":"arXiv preprint arXiv:2006.06879"},{"key":"e_1_3_2_5_2","first-page":"arXiv\u20132006","article-title":"Adaptive sampling to reduce disparate performance","author":"Abernethy Jacob","year":"2020","unstructured":"Jacob Abernethy, Pranjal Awasthi, Matth\u00e4us Kleindessner, Jamie Morgenstern, and Jie Zhang. 2020. Adaptive sampling to reduce disparate performance. arXiv e-prints (2020), arXiv\u20132006.","journal-title":"arXiv e-prints"},{"key":"e_1_3_2_6_2","volume-title":"Proceedings of the EDBT\/ICDT Workshops","author":"Accinelli Chiara","year":"2021","unstructured":"Chiara Accinelli, Barbara Catania, Giovanna Guerrini, and Simone Minisi. 2021. The impact of rewriting on coverage constraint satisfaction. In Proceedings of the EDBT\/ICDT Workshops."},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3546913"},{"key":"e_1_3_2_8_2","volume-title":"Proceedings of the EDBT\/ICDT Workshops","author":"Accinelli Chiara","year":"2020","unstructured":"Chiara Accinelli, Simone Minisi, and Barbara Catania. 2020. Coverage-based rewriting for data preparation. In Proceedings of the EDBT\/ICDT Workshops."},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00056"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457315"},{"key":"e_1_3_2_11_2","volume-title":"Proceedings of the EDBT\/ICDT Workshops","author":"Azzalini Fabio","year":"2021","unstructured":"Fabio Azzalini, Chiara Criscuolo, and Letizia Tanca. 2021. FAIR-DB: Functional dependencies to discover data bias. In Proceedings of the EDBT\/ICDT Workshops."},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","unstructured":"Fabio Azzalini Chiara Criscuolo and Letizia Tanca. 2022. E-FAIR-DB: functional dependencies to discover data bias and enhance data equity. ACM Journal of Data and Information Quality 14 4 (2022) 1\u201326.","DOI":"10.1145\/3552433"},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","unstructured":"Fabio Azzalini Chiara Criscuolo and Letizia Tanca. 2022. FAIR-DB: A system to discover unfairness in datasets. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) . IEEE 3494\u20133497.","DOI":"10.1109\/ICDE53745.2022.9866857"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313504"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00671-8"},{"key":"e_1_3_2_16_2","unstructured":"Solon Barocas Moritz Hardt and Arvind Narayanan. 2019. Fairness and machine learning: Limitations and opportunities. Retrieved from fairmlbook.org."},{"key":"e_1_3_2_17_2","first-page":"671","article-title":"Big data\u2019s disparate impact","volume":"104","author":"Barocas Solon","year":"2016","unstructured":"Solon Barocas and Andrew D. Selbst. 2016. Big data\u2019s disparate impact. Calif. L. Rev. 104 (2016), 671.","journal-title":"Calif. L. Rev."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-14-106"},{"key":"e_1_3_2_19_2","article-title":"Language (technology) is power: A critical survey of \u201cbias\u201d in NLP","author":"Blodgett Su Lin","year":"2020","unstructured":"Su Lin Blodgett, Solon Barocas, Hal Daum\u00e9 III, and Hanna Wallach. 2020. Language (technology) is power: A critical survey of \u201cbias\u201d in NLP. arXiv preprint arXiv:2005.14050 (2020).","journal-title":"arXiv preprint arXiv:2005.14050"},{"key":"e_1_3_2_20_2","article-title":"Man is to computer programmer as woman is to homemaker? Debiasing word embeddings","volume":"29","author":"Bolukbasi Tolga","year":"2016","unstructured":"Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Adv. Neural Inf. Process. Syst. 29 (2016).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_21_2","first-page":"715","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Bose Avishek","year":"2019","unstructured":"Avishek Bose and William Hamilton. 2019. Compositional fairness constraints for graph embeddings. In Proceedings of the International Conference on Machine Learning. PMLR, 715\u2013724."},{"key":"e_1_3_2_22_2","first-page":"77","volume-title":"Proceedings of the Conference on Fairness, Accountability and Transparency","author":"Buolamwini Joy","year":"2018","unstructured":"Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conference on Fairness, Accountability and Transparency. PMLR, 77\u201391."},{"key":"e_1_3_2_23_2","first-page":"1220","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Buyl Maarten","year":"2020","unstructured":"Maarten Buyl and Tijl De Bie. 2020. Debayes: A Bayesian method for debiasing network embeddings. In Proceedings of the International Conference on Machine Learning. PMLR, 1220\u20131229."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/VAST47406.2019.8986948"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2472010"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00146-022-01472-5"},{"key":"e_1_3_2_27_2","first-page":"1349","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Celis L. Elisa","year":"2020","unstructured":"L. Elisa Celis, Vijay Keswani, and Nisheeth Vishnoi. 2020. Data preprocessing to mitigate bias: A maximum entropy based approach. In Proceedings of the International Conference on Machine Learning. PMLR, 1349\u20131359."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_3_2_29_2","article-title":"Why is my classifier discriminatory?","author":"Chen Irene","year":"2018","unstructured":"Irene Chen, Fredrik D. Johansson, and David Sontag. 2018. Why is my classifier discriminatory? arXiv preprint arXiv:1805.12002 (2018).","journal-title":"arXiv preprint arXiv:1805.12002"},{"key":"e_1_3_2_30_2","article-title":"Toward understanding bias correlations for mitigation in NLP","author":"Cheng Lu","year":"2022","unstructured":"Lu Cheng, Suyu Ge, and Huan Liu. 2022. Toward understanding bias correlations for mitigation in NLP. arXiv preprint arXiv:2205.12391 (2022).","journal-title":"arXiv preprint arXiv:2205.12391"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1121\/1.401664"},{"key":"e_1_3_2_32_2","article-title":"A survey on fairness for machine learning on graphs","author":"Choudhary Manvi","year":"2022","unstructured":"Manvi Choudhary, Charlotte Laclau, and Christine Largeron. 2022. A survey on fairness for machine learning on graphs. arXiv preprint arXiv:2205.05396 (2022).","journal-title":"arXiv preprint arXiv:2205.05396"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00139"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-019-0217-0"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3287560.3287572"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173986"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3278721.3278729"},{"key":"e_1_3_2_39_2","article-title":"Fairness in graph mining: A survey","author":"Dong Yushun","year":"2022","unstructured":"Yushun Dong, Jing Ma, Chen Chen, and Jundong Li. 2022. Fairness in graph mining: A survey. arXiv preprint arXiv:2204.09888 (2022).","journal-title":"arXiv preprint arXiv:2204.09888"},{"key":"e_1_3_2_40_2","article-title":"Auditing ImageNet: Towards a model-driven framework for annotating demographic attributes of large-scale image datasets","author":"Dulhanty Chris","year":"2019","unstructured":"Chris Dulhanty and Alexander Wong. 2019. Auditing ImageNet: Towards a model-driven framework for annotating demographic attributes of large-scale image datasets. arXiv preprint arXiv:1905.01347 (2019).","journal-title":"arXiv preprint arXiv:1905.01347"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-3975(02)00738-7"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0140-6736(96)91485-3"},{"key":"e_1_3_2_43_2","unstructured":"Hasan Erokyar. 2014. Age and gender recognition for speech applications based on support vector machines. USF Tampa Graduate Theses and Dissertations . https:\/\/digitalcommons.usf.edu\/etd\/5356."},{"key":"e_1_3_2_44_2","article-title":"A survey on bias in visual datasets","author":"Fabbrizzi Simone","year":"2021","unstructured":"Simone Fabbrizzi, Symeon Papadopoulos, Eirini Ntoutsi, and Ioannis Kompatsiaris. 2021. A survey on bias in visual datasets. arXiv preprint arXiv:2107.07919 (2021).","journal-title":"arXiv preprint arXiv:2107.07919"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00180"},{"key":"e_1_3_2_46_2","unstructured":"Sara Feijo. 2020. Here\u2019s what happened when Boston tried to assign students good schools close to home. 2020. https:\/\/news.northeastern.edu\/2018\/07\/16\/heres-what-happened-when-boston-tried-to-assign-students-good-schools-close-to-home\/. Accessed 26-03-2023."},{"key":"e_1_3_2_47_2","article-title":"Quantifying bias in automatic speech recognition","author":"Feng Siyuan","year":"2021","unstructured":"Siyuan Feng, Olya Kudina, Bence Mark Halpern, and Odette Scharenborg. 2021. Quantifying bias in automatic speech recognition. arXiv preprint arXiv:2103.15122 (2021).","journal-title":"arXiv preprint arXiv:2103.15122"},{"issue":"1","key":"e_1_3_2_48_2","first-page":"1","article-title":"Ethical dimensions for data quality","volume":"12","author":"Firmani Donatella","year":"2019","unstructured":"Donatella Firmani, Letizia Tanca, and Riccardo Torlone. 2019. Ethical dimensions for data quality. J. Data Inf. Qual. 12, 1 (2019), 1\u20135.","journal-title":"J. Data Inf. Qual."},{"key":"e_1_3_2_49_2","article-title":"Handling bias in toxic speech detection: A survey","author":"Garg Tanmay","year":"2022","unstructured":"Tanmay Garg, Sarah Masud, Tharun Suresh, and Tanmoy Chakraborty. 2022. Handling bias in toxic speech detection: A survey. arXiv preprint arXiv:2202.00126 (2022).","journal-title":"arXiv preprint arXiv:2202.00126"},{"key":"e_1_3_2_50_2","article-title":"Datasheets for datasets","author":"Gebru Timnit","year":"2018","unstructured":"Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daum\u00e9 III, and Kate Crawford. 2018. Datasheets for datasets. arXiv preprint arXiv:1803.09010 (2018).","journal-title":"arXiv preprint arXiv:1803.09010"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-021-01448-w"},{"key":"e_1_3_2_52_2","article-title":"Model patching: Closing the subgroup performance gap with data augmentation","author":"Goel Karan","year":"2020","unstructured":"Karan Goel, Albert Gu, Yixuan Li, and Christopher R\u00e9. 2020. Model patching: Closing the subgroup performance gap with data augmentation. arXiv preprint arXiv:2008.06775 (2020).","journal-title":"arXiv preprint arXiv:2008.06775"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939754"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.5153\/sro.55"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1007\/11538059_91"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1038\/427312a"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1002\/9781119125563.evpsych241"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/10515.10522"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380063"},{"key":"e_1_3_2_60_2","article-title":"Dealing with bias via data augmentation in supervised learning scenarios","volume":"24","author":"Iosifidis Vasileios","year":"2018","unstructured":"Vasileios Iosifidis and Eirini Ntoutsi. 2018. Dealing with bias via data augmentation in supervised learning scenarios. Jo Bates Paul D. Clough Robert J\u00e4schke 24 (2018).","journal-title":"Jo Bates Paul D. Clough Robert J\u00e4schke"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/2611567"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00394"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3384689"},{"key":"e_1_3_2_64_2","article-title":"Conditional network embeddings","author":"Kang Bo","year":"2018","unstructured":"Bo Kang, Jefrey Lijffijt, and Tijl De Bie. 2018. Conditional network embeddings. arXiv preprint arXiv:1805.07544 (2018).","journal-title":"arXiv preprint arXiv:1805.07544"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482030"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00159"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i11.21454"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33718-5_12"},{"key":"e_1_3_2_70_2","article-title":"Inherent trade-offs in the fair determination of risk scores","author":"Kleinberg Jon","year":"2016","unstructured":"Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016).","journal-title":"arXiv preprint arXiv:1609.05807"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1915768117"},{"key":"e_1_3_2_72_2","first-page":"1774","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","author":"Laclau Charlotte","year":"2021","unstructured":"Charlotte Laclau, Ievgen Redko, Manvi Choudhary, and Christine Largeron. 2021. All of the fairness for edge prediction with optimal transport. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 1774\u20131782."},{"key":"e_1_3_2_73_2","unstructured":"Jennifer Langston. 2015. Who\u2019s a CEO? Google image results can shift gender biases. Retrieved from https:\/\/www.washington.edu\/news\/2015\/04\/09\/whos-a-ceo-google-image-results-can-shift-gender-biases\/."},{"key":"e_1_3_2_74_2","unstructured":"Alyssa Whitlock Lees and Ananth Balashankar. 2019. Fairness sample complexity and the case for human intervention. Where is the Human? Bridging the Gap Between AI and HCI CHI Workshop 2019 . https:\/\/michae.lv\/ai-hci-workshop\/#call-for-participation."},{"key":"e_1_3_2_75_2","article-title":"Big holes in big data: A Monte Carlo algorithm for detecting large hyper-rectangles in high dimensional data","volume":"1704","author":"Lemley Joseph","year":"2017","unstructured":"Joseph Lemley, Filip Jagodzinski, and Razvan Andonie. 2017. Big holes in big data: A Monte Carlo algorithm for detecting large hyper-rectangles in high dimensional data. CoRR abs\/1704.00683 (2017).","journal-title":"CoRR"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00980"},{"key":"e_1_3_2_77_2","unstructured":"M. Lichman. 2013. Adult Income Dataset UCI Machine Learning Repository. Retrieved from https:\/\/archive.ics.uci.edu\/ml\/datasets\/adult."},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407821"},{"key":"e_1_3_2_79_2","first-page":"930","volume-title":"Proceedings of the 15th International Joint Conference on Artifical Intelligence (IJCAI)","author":"Liu Bing","year":"1997","unstructured":"Bing Liu, Liang-Ping Ku, and Wynne Hsu. 1997. Discovering interesting holes in data. In Proceedings of the 15th International Joint Conference on Artifical Intelligence (IJCAI). Morgan Kaufmann Publishers Inc., 930\u2013935."},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0095268"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9747501"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.nuse-1.5"},{"key":"e_1_3_2_83_2","article-title":"Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings","author":"Manzini Thomas","year":"2019","unstructured":"Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, and Alan W. Black. 2019. Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. arXiv preprint arXiv:1904.04047 (2019).","journal-title":"arXiv preprint arXiv:1904.04047"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3457607"},{"key":"e_1_3_2_85_2","article-title":"Diversity in faces","author":"Merler Michele","year":"2019","unstructured":"Michele Merler, Nalini Ratha, Rogerio S. Feris, and John R. Smith. 2019. Diversity in faces. arXiv preprint arXiv:1901.10436 (2019).","journal-title":"arXiv preprint arXiv:1901.10436"},{"key":"e_1_3_2_86_2","first-page":"1961","article-title":"Patterns count-based labels for datasets","author":"Moskovitch Y.","year":"2021","unstructured":"Y. Moskovitch and H. Jagadish. 2021. Patterns count-based labels for datasets. In Proceedings of the IEEE 37th International Conference on Data Engineering (ICDE). 1961\u20131966.","journal-title":"Proceedings of the IEEE 37th International Conference on Data Engineering (ICDE)"},{"issue":"12","key":"e_1_3_2_87_2","first-page":"2829","article-title":"COUNTATA: Dataset labeling using pattern counts","volume":"13","author":"Moskovitch Yuval","year":"2020","unstructured":"Yuval Moskovitch and H. V. Jagadish. 2020. COUNTATA: Dataset labeling using pattern counts. Int. J. Very Large Data Bases 13, 12 (2020), 2829\u20132832.","journal-title":"Int. J. Very Large Data Bases"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.1145\/3530800.3534528"},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476299"},{"key":"e_1_3_2_90_2","article-title":"Responsible data integration: Next-generation challenges","author":"Nargesian Fatemeh","year":"2022","unstructured":"Fatemeh Nargesian, Abolfazl Asudeh, and H. V. Jagadish. 2022. Responsible data integration: Next-generation challenges. Procedings of the SIGMOD Conference.","journal-title":"Procedings of the SIGMOD Conference"},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.67.026126"},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.1525\/9780520339897-01"},{"key":"e_1_3_2_93_2","article-title":"Achieving representative data via convex hull feasibility sampling algorithms","author":"Niss Laura","year":"2022","unstructured":"Laura Niss, Yuekai Sun, and Ambuj Tewari. 2022. Achieving representative data via convex hull feasibility sampling algorithms. arXiv preprint arXiv:2204.06664 (2022).","journal-title":"arXiv preprint arXiv:2204.06664"},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1356"},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.3389\/fdata.2019.00013"},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1145\/3351095.3372843"},{"key":"e_1_3_2_97_2","article-title":"Reducing gender bias in abusive language detection","author":"Park Ji Ho","year":"2018","unstructured":"Ji Ho Park, Jamin Shin, and Pascale Fung. 2018. Reducing gender bias in abusive language detection. arXiv preprint arXiv:1808.07231 (2018).","journal-title":"arXiv preprint arXiv:1808.07231"},{"key":"e_1_3_2_98_2","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457284"},{"key":"e_1_3_2_99_2","doi-asserted-by":"publisher","DOI":"10.1145\/3494672"},{"key":"e_1_3_2_100_2","article-title":"Interpretable data-based explanations for fairness debugging","author":"Pradhan Romila","year":"2021","unstructured":"Romila Pradhan, Jiongli Zhu, Boris Glavic, and Babak Salimi. 2021. Interpretable data-based explanations for fairness debugging. arXiv preprint arXiv:2112.09745 (2021).","journal-title":"arXiv preprint arXiv:2112.09745"},{"key":"e_1_3_2_101_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/456"},{"issue":"2","key":"e_1_3_2_102_2","first-page":"51","article-title":"Gender recognition using speech processing techniques in LABVIEW","volume":"1","author":"Rakesh Kumar","year":"2011","unstructured":"Kumar Rakesh, Subhangi Dutta, and Kumara Shama. 2011. Gender recognition using speech processing techniques in LABVIEW. Int. J. Adv. Eng. Technol. 1, 2 (2011), 51.","journal-title":"Int. J. Adv. Eng. Technol."},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457323"},{"key":"e_1_3_2_104_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00401"},{"key":"e_1_3_2_105_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-86365-4_35"},{"key":"e_1_3_2_106_2","unstructured":"Ben Schmidt. 2015. Rejecting the gender binary: A vector-space operation. Bens Bookworm Blog (2015). http:\/\/bookworm.benschmidt.org\/posts\/2015-10-30-rejecting-the-gender-binary.html."},{"key":"e_1_3_2_107_2","doi-asserted-by":"publisher","DOI":"10.1515\/9781400881970-018"},{"key":"e_1_3_2_108_2","doi-asserted-by":"publisher","DOI":"10.1145\/3375627.3375865"},{"key":"e_1_3_2_109_2","first-page":"24535","article-title":"Adaptive sampling for minimax fair classification","volume":"34","author":"Shekhar Shubhanshu","year":"2021","unstructured":"Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, and Tara Javidi. 2021. Adaptive sampling for minimax fair classification. Adv. Neural Inf. Process. Syst. 34 (2021), 24535\u201324544.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00111"},{"key":"e_1_3_2_111_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAI.2021.3133818"},{"key":"e_1_3_2_112_2","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415570"},{"key":"e_1_3_2_113_2","volume-title":"Applied Sampling","author":"Sudman Seymour","year":"1976","unstructured":"Seymour Sudman. 1976. Applied Sampling. Technical Report. Academic Press, New York."},{"key":"e_1_3_2_114_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3357853"},{"key":"e_1_3_2_115_2","article-title":"Mitigating gender bias in natural language processing: Literature review","author":"Sun Tony","year":"2019","unstructured":"Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. 2019. Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976 (2019).","journal-title":"arXiv preprint arXiv:1906.08976"},{"key":"e_1_3_2_116_2","doi-asserted-by":"publisher","DOI":"10.1145\/3465416.3483305"},{"key":"e_1_3_2_117_2","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452792"},{"key":"e_1_3_2_118_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995347"},{"issue":"11","key":"e_1_3_2_119_2","article-title":"Visualizing data using t-SNE.","volume":"9","author":"Maaten Laurens Van der","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 11 (2008).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_120_2","article-title":"Getting gender right in neural machine translation","author":"Vanmassenhove Eva","year":"2019","unstructured":"Eva Vanmassenhove, Christian Hardmeier, and Andy Way. 2019. Getting gender right in neural machine translation. arXiv preprint arXiv:1909.05088 (2019).","journal-title":"arXiv preprint arXiv:1909.05088"},{"key":"e_1_3_2_121_2","article-title":"Identification of bias against people with disabilities in sentiment analysis and toxicity detection models","author":"Venkit Pranav Narayanan","year":"2021","unstructured":"Pranav Narayanan Venkit and Shomir Wilson. 2021. Identification of bias against people with disabilities in sentiment analysis and toxicity detection models. arXiv preprint arXiv:2111.13259 (2021).","journal-title":"arXiv preprint arXiv:2111.13259"},{"key":"e_1_3_2_122_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58580-8_43"},{"key":"e_1_3_2_123_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00017"},{"key":"e_1_3_2_124_2","article-title":"Gender bias in coreference resolution: Evaluation and debiasing methods","author":"Zhao Jieyu","year":"2018","unstructured":"Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 (2018).","journal-title":"arXiv preprint arXiv:1804.06876"},{"key":"e_1_3_2_125_2","article-title":"Learning gender-neutral word embeddings","author":"Zhao Jieyu","year":"2018","unstructured":"Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang. 2018. Learning gender-neutral word embeddings. arXiv preprint arXiv:1809.01496 (2018).","journal-title":"arXiv preprint arXiv:1809.01496"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588433","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588433","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:12Z","timestamp":1750178832000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588433"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,13]]},"references-count":124,"journal-issue":{"issue":"13s","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3588433"],"URL":"https:\/\/doi.org\/10.1145\/3588433","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,13]]},"assertion":[{"value":"2022-03-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-14","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}