{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,27]],"date-time":"2025-07-27T07:51:31Z","timestamp":1753602691782,"version":"3.28.2"},"reference-count":90,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:p>Potential harms from the under-representation of minorities in data, particularly in multi-modal settings, is a well-recognized concern. While there has been extensive effort in detecting such under-representation, resolution has remained a challenge.<\/jats:p><jats:p>With recent generative AI advancements, large language and foundation models have emerged as versatile tools across various domains. In this paper, we propose Chameleon, a system that efficiently utilizes these tools to augment a dataset with minimal addition of synthetically generated tuples to enhance the coverage of the under-represented groups. Our system applies quality and outlier-detection tests to ensure the quality and semantic integrity of the generated tuples. In order to minimize the rejection chance of the generated tuples, we propose multiple strategies to provide a guide for the foundation model. Our experiment results, in addition to confirming the efficiency of our proposed algorithms, illustrate our approach's effectiveness, as the model's unfairness in a downstream task significantly dropped after data repair using Chameleon.<\/jats:p>","DOI":"10.14778\/3681954.3682014","type":"journal-article","created":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T16:23:36Z","timestamp":1725035016000},"page":"3470-3483","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of Minorities"],"prefix":"10.14778","volume":"17","author":[{"given":"Mahdi","family":"Erfanian","sequence":"first","affiliation":[{"name":"University of Illinois Chicago"}]},{"given":"H. V.","family":"Jagadish","sequence":"additional","affiliation":[{"name":"University of Michigan"}]},{"given":"Abolfazl","family":"Asudeh","sequence":"additional","affiliation":[{"name":"University of Illinois Chicago"}]}],"member":"320","published-online":{"date-parts":[[2024,8,30]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"EDBT Workshops.","author":"Accinelli Chiara","year":"2021","unstructured":"Chiara Accinelli, Barbara Catania, Giovanna Guerrini, and Simone Minisi. 2021. The impact of rewriting on coverage constraint satisfaction.. In EDBT Workshops."},{"key":"e_1_2_1_2_1","volume-title":"Coverage-based Rewriting for Data Preparation. In EDBT Workshops.","author":"Accinelli Chiara","year":"2020","unstructured":"Chiara Accinelli, Simone Minisi, and Barbara Catania. 2020. Coverage-based Rewriting for Data Preparation. In EDBT Workshops."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Rakesh Agrawal Sreenivas Gollapudi Alan Halverson and Samuel Ieong. 2009. Diversifying search results. In WSDM. ACM 5--14.","DOI":"10.1145\/1498759.1498766"},{"key":"e_1_2_1_4_1","first-page":"97","article-title":"Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes","volume":"17","author":"Arora Simran","year":"2023","unstructured":"Simran Arora, Brandon Yang, Sabri Eyuboglu, Avanika Narayan, Andrew Hojel, Immanuel Trummer, and Christopher R\u00e9. 2023. Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes. PVLDB 17, 2 (2023), 97--105.","journal-title":"PVLDB"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00056"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Abolfazl Asudeh Nima Shahbazi Zhongjun Jin and H. V. Jagadish. 2021. Identifying Insufficient Data Coverage for Ordinal Continuous-Valued Attributes. In SIGMOD. ACM.","DOI":"10.1145\/3448016.3457315"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 30th Italian Symposium on Advanced Database Systems.","author":"Azzalini Fabio","year":"2021","unstructured":"Fabio Azzalini, Chiara Criscuolo, and Letizia Tanca. 2021. Functional Dependencies to Mitigate Data Bias. In Proceedings of the 30th Italian Symposium on Advanced Database Systems."},{"key":"e_1_2_1_8_1","volume-title":"Lumiere: A Space-Time Diffusion Model for Video Generation. arXiv preprint arXiv:2401.12945","author":"Bar-Tal Omer","year":"2024","unstructured":"Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Tomer Michaeli, et al. 2024. Lumiere: A Space-Time Diffusion Model for Video Generation. arXiv preprint arXiv:2401.12945 (2024)."},{"key":"e_1_2_1_9_1","unstructured":"Solon Barocas Moritz Hardt and Arvind Narayanan. 2019. Fairness and machine learning: Limitations and opportunities. fairmlbook.org."},{"key":"e_1_2_1_10_1","first-page":"671","article-title":"Big data's disparate impact","volume":"104","author":"Barocas Solon","year":"2016","unstructured":"Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev. 104 (2016), 671.","journal-title":"Calif. L. Rev."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-017-9312-z"},{"volume-title":"The enigma of diversity: The language of race and the limits of racial justice","author":"Berrey Ellen","key":"e_1_2_1_12_1","unstructured":"Ellen Berrey. 2015. The enigma of diversity: The language of race and the limits of racial justice. University of Chicago Press."},{"key":"e_1_2_1_13_1","volume-title":"SMOTE for high-dimensional class-imbalanced data. BMC bioinformatics 14","author":"Blagus Rok","year":"2013","unstructured":"Rok Blagus and Lara Lusa. 2013. SMOTE for high-dimensional class-imbalanced data. BMC bioinformatics 14 (2013), 1--16."},{"key":"e_1_2_1_14_1","unstructured":"Rishi Bommasani Drew A Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CEC48606.2020.9185782"},{"key":"e_1_2_1_16_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901."},{"key":"e_1_2_1_17_1","volume-title":"Yuanzhi Li, Scott Lundberg, et al.","author":"Bubeck S\u00e9bastien","year":"2023","unstructured":"S\u00e9bastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023)."},{"key":"e_1_2_1_18_1","first-page":"1","article-title":"Privlava: synthesizing relational data with foreign keys under differential privacy","volume":"1","author":"Cai Kuntai","year":"2023","unstructured":"Kuntai Cai, Xiaokui Xiao, and Graham Cormode. 2023. Privlava: synthesizing relational data with foreign keys under differential privacy. Proceedings of the ACM on Management of Data 1, 2 (2023), 1--25.","journal-title":"Proceedings of the ACM on Management of Data"},{"key":"e_1_2_1_19_1","unstructured":"L Elisa Celis Vijay Keswani and Nisheeth Vishnoi. 2020. Data preprocessing to mitigate bias: A maximum entropy based approach. In ICML. PMLR 1349--1359."},{"key":"e_1_2_1_20_1","volume-title":"Data distribution tailoring revisited: cost-efficient integration of representative data. The VLDB Journal","author":"Chang Jiwon","year":"2024","unstructured":"Jiwon Chang, Bohan Cui, Fatemeh Nargesian, Abolfazl Asudeh, and HV Jagadish. 2024. Data distribution tailoring revisited: cost-efficient integration of representative data. The VLDB Journal (2024), 1--24."},{"key":"e_1_2_1_21_1","volume-title":"How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings. arXiv preprint arXiv:2305.11853","author":"Chang Shuaichen","year":"2023","unstructured":"Shuaichen Chang and Eric Fosler-Lussier. 2023. How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings. arXiv preprint arXiv:2305.11853 (2023)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_2_1_24_1","volume-title":"Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al.","author":"Chen Mark","year":"2021","unstructured":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3632093.3632101"},{"key":"e_1_2_1_26_1","unstructured":"Alessio Corrado. 2019. Animals-10 Dataset. https:\/\/www.kaggle.com\/datasets\/alessiocorrado99\/animals10 Accessed: 2024-05-16."},{"key":"e_1_2_1_27_1","volume-title":"The hidden biases in big data. Harvard business review 1, 4","author":"Crawford Kate","year":"2013","unstructured":"Kate Crawford. 2013. The hidden biases in big data. Harvard business review 1, 4 (2013)."},{"key":"e_1_2_1_28_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1080\/21670811.2014.976411"},{"key":"e_1_2_1_30_1","unstructured":"Wilfrid J Dixon and Frank J Massey Jr. 1951. Introduction to statistical analysis. (1951)."},{"key":"e_1_2_1_31_1","first-page":"7","article-title":"Why diversity programs fail and what works better","volume":"94","author":"Dobbin Frank","year":"2016","unstructured":"Frank Dobbin and Alexandra Kalev. 2016. Why diversity programs fail and what works better. Harvard Business Review 94, 7--8 (2016), 52--60.","journal-title":"Harvard Business Review"},{"key":"e_1_2_1_32_1","volume-title":"Diversity in big data: A review. Big data 5, 2","author":"Drosou Marina","year":"2017","unstructured":"Marina Drosou, HV Jagadish, Evaggelia Pitoura, and Julia Stoyanovich. 2017. Diversity in big data: A review. Big data 5, 2 (2017), 73--84."},{"key":"e_1_2_1_33_1","volume-title":"AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs. arXiv preprint arXiv:2403.00198","author":"Ebrahimi Sana","year":"2024","unstructured":"Sana Ebrahimi, Kaiwen Chen, Abolfazl Asudeh, Gautam Das, and Nick Koudas. 2024. AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs. arXiv preprint arXiv:2403.00198 (2024)."},{"key":"e_1_2_1_34_1","volume-title":"REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models. arXiv preprint arXiv:2404.11782","author":"Ebrahimi Sana","year":"2024","unstructured":"Sana Ebrahimi, Nima Shahbazi, and Abolfazl Asudeh. 2024. REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models. arXiv preprint arXiv:2404.11782 (2024)."},{"key":"e_1_2_1_35_1","volume-title":"Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities. arXiv preprint arXiv:2402.01071","author":"Erfanian Mahdi","year":"2024","unstructured":"Mahdi Erfanian, HV Jagadish, and Abolfazl Asudeh. 2024. Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities. arXiv preprint arXiv:2402.01071 (2024)."},{"key":"e_1_2_1_36_1","volume-title":"arXiv preprint arXiv:1706.02633","author":"Esteban Crist\u00f3bal","year":"2017","unstructured":"Crist\u00f3bal Esteban, Stephanie L Hyland, and Gunnar R\u00e4tsch. 2017. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633 (2017)."},{"key":"e_1_2_1_37_1","volume-title":"Relational data synthesis using generative adversarial networks: A design space exploration. arXiv preprint arXiv:2008.12763","author":"Fan Ju","year":"2020","unstructured":"Ju Fan, Tongyu Liu, Guoliang Li, Junyou Chen, Yuwei Shen, and Xiaoyong Du. 2020. Relational data synthesis using generative adversarial networks: A design space exploration. arXiv preprint arXiv:2008.12763 (2020)."},{"key":"e_1_2_1_38_1","unstructured":"Nikolaos Fanourakis Christos Kontousias Vasilis Efthymiou Vassilis Christophides and Dimitris Plexousakis. 2023. FairER demo: Fairness-Aware and Explainable Entity Resolution. (2023)."},{"key":"e_1_2_1_39_1","volume-title":"Data augmentation using synthetic data for time series classification with deep residual networks. arXiv preprint arXiv:1808.02455","author":"Fawaz Hassan Ismail","year":"2018","unstructured":"Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. 2018. Data augmentation using synthetic data for time series classification with deep residual networks. arXiv preprint arXiv:1808.02455 (2018)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Yunhe Feng and Chirag Shah. 2022. Has CEO Gender Bias Really Been Fixed? Adversarial Attacking and Improving Gender Fairness in Image Search. (2022).","DOI":"10.1609\/aaai.v36i11.21445"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1137\/1032082"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/230538.230561"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-021-01448-w"},{"key":"e_1_2_1_44_1","volume-title":"Model patching: Closing the subgroup performance gap with data augmentation. arXiv preprint arXiv:2008.06775","author":"Goel Karan","year":"2020","unstructured":"Karan Goel, Albert Gu, Yixuan Li, and Christopher R\u00e9. 2020. Model patching: Closing the subgroup performance gap with data augmentation. arXiv preprint arXiv:2008.06775 (2020)."},{"volume-title":"Monte carlo methods","author":"Hammersley John","key":"e_1_2_1_45_1","unstructured":"John Hammersley. 2013. Monte carlo methods. Springer Science & Business Media."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/11538059_91"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_2_1_48_1","volume-title":"Dealing with bias via data augmentation in supervised learning scenarios. Jo Bates Paul D. Clough Robert J\u00e4schke 24","author":"Iosifidis Vasileios","year":"2018","unstructured":"Vasileios Iosifidis and Eirini Ntoutsi. 2018. Dealing with bias via data augmentation in supervised learning scenarios. Jo Bates Paul D. Clough Robert J\u00e4schke 24 (2018)."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2611567"},{"key":"e_1_2_1_50_1","volume-title":"Demonstration of ThalamusDB: Answering Complex SQL Queries with Natural Language Predicates on Multi-Modal Data. In Companion of the 2023 International Conference on Management of Data. 179--182","author":"Jo Saehan","year":"2023","unstructured":"Saehan Jo and Immanuel Trummer. 2023. Demonstration of ThalamusDB: Answering Complex SQL Queries with Natural Language Predicates on Multi-Modal Data. In Companion of the 2023 International Conference on Management of Data. 179--182."},{"key":"e_1_2_1_51_1","volume-title":"Data preprocessing techniques for classification without discrimination. Knowledge and information systems 33, 1","author":"Kamiran Faisal","year":"2012","unstructured":"Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and information systems 33, 1 (2012), 1--33."},{"key":"e_1_2_1_52_1","unstructured":"Jon Kleinberg. 2019. Fairness Rankings and Behavioral Biases. FAT*."},{"key":"e_1_2_1_53_1","volume-title":"Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis. arXiv preprint arXiv:2304.12573","author":"Lazier Simone","year":"2023","unstructured":"Simone Lazier, Saravanan Thirumuruganathan, and Hadis Anahideh. 2023. Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis. arXiv preprint arXiv:2304.12573 (2023)."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772758"},{"key":"e_1_2_1_55_1","unstructured":"Yanying Li Haipei Sun and Wendy Hui Wang. 2020. Towards fair truth discovery from biased crowdsourced answers. In SIGKDD. 599--607."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407821"},{"key":"e_1_2_1_57_1","volume-title":"Fairness and missing values. arXiv preprint arXiv:1905.12728","author":"Mart\u00ednez-Plumed Fernando","year":"2019","unstructured":"Fernando Mart\u00ednez-Plumed, C\u00e8sar Ferri, David Nieves, and Jos\u00e9 Hern\u00e1ndez-Orallo. 2019. Fairness and missing values. arXiv preprint arXiv:1905.12728 (2019)."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3457607"},{"key":"e_1_2_1_59_1","volume-title":"Sebastiano Barbieri, Giuseppe Jurman, and Venet Osmani.","author":"Micheletti Nicolo","year":"2023","unstructured":"Nicolo Micheletti, Raffaele Marchesi, Nicholas I-Hsien Kuo, Sebastiano Barbieri, Giuseppe Jurman, and Venet Osmani. 2023. Generative AI Mitigates Representation Bias Using Synthetic Health Data. medRxiv (2023), 2023--09."},{"key":"e_1_2_1_60_1","unstructured":"Melika Mousavi Nima Shahbazi and Abolfazl Asudeh. 2024. Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach. In EDBT. 47--60."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476299"},{"key":"e_1_2_1_62_1","doi-asserted-by":"crossref","unstructured":"Fatemeh Nargesian Abolfazl Asudeh and H. V. Jagadish. 2022. Responsible Data Integration: Next-generation Challenges. SIGMOD (2022).","DOI":"10.1145\/3514221.3522567"},{"key":"e_1_2_1_63_1","unstructured":"Nelgiriyewithana. 2023. Emotions Dataset. https:\/\/www.kaggle.com\/datasets\/nelgiriyewithana\/emotions Accessed: 2024-05-16."},{"key":"e_1_2_1_64_1","volume-title":"Contributions to the theory of testing statistical hypotheses. Statistical Research Memoirs","author":"Neyman Jerzy","year":"1936","unstructured":"Jerzy Neyman and Egon Sharpe Pearson. 1936. Contributions to the theory of testing statistical hypotheses. Statistical Research Memoirs (1936)."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.3389\/fdata.2019.00013"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.6028\/NIST.IR.6264"},{"key":"e_1_2_1_67_1","volume-title":"Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2","author":"Ramesh Aditya","year":"2022","unstructured":"Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2 (2022), 3."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3422648.3422657"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319901"},{"key":"e_1_2_1_70_1","volume-title":"Support vector method for novelty detection. Advances in neural information processing systems 12","author":"Sch\u00f6lkopf Bernhard","year":"1999","unstructured":"Bernhard Sch\u00f6lkopf, Robert C Williamson, Alex Smola, John Shawe-Taylor, and John Platt. 1999. Support vector method for novelty detection. Advances in neural information processing systems 12 (1999)."},{"key":"e_1_2_1_71_1","volume-title":"Reliability evaluation of individual predictions: a data-centric approach. The VLDB Journal","author":"Shahbazi Nima","year":"2024","unstructured":"Nima Shahbazi and Abolfazl Asudeh. 2024. Reliability evaluation of individual predictions: a data-centric approach. The VLDB Journal (2024), 1--28."},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611525"},{"key":"e_1_2_1_73_1","first-page":"3","article-title":"Coverage-based Data-centric Approaches for Responsible and Trustworthy AI","volume":"47","author":"Shahbazi Nima","year":"2024","unstructured":"Nima Shahbazi, Mahdi Erfanian, and Abolfazl Asudeh. 2024. Coverage-based Data-centric Approaches for Responsible and Trustworthy AI. IEEE Data Eng. Bull. 47, 1 (2024), 3--17.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_74_1","volume-title":"Representation Bias in Data: A Survey on Identification and Resolution Techniques. Comput. Surveys","author":"Shahbazi Nima","year":"2023","unstructured":"Nima Shahbazi, Yin Lin, Abolfazl Asudeh, and HV Jagadish. 2023. Representation Bias in Data: A Survey on Identification and Resolution Techniques. Comput. Surveys (2023)."},{"key":"e_1_2_1_75_1","volume-title":"Djallel Bouneffouf, Vinod Muthusamy, and Kush R Varshney.","author":"Sharma Shubham","year":"2020","unstructured":"Shubham Sharma, Yunfeng Zhang, Jes\u00fas M R\u00edos Aliaga, Djallel Bouneffouf, Vinod Muthusamy, and Kush R Varshney. 2020. Data augmentation for discrimination prevention and bias disambiguation. In AIES. 358--364."},{"volume-title":"Fairness-Aware Range Queries for Selecting Unbiased Data","author":"Shetiya Suraj","key":"e_1_2_1_76_1","unstructured":"Suraj Shetiya, Ian P. Swift, Abolfazl Asudeh, and Gautam Das. 2022. Fairness-Aware Range Queries for Selecting Unbiased Data. In ICDE. IEEE."},{"key":"e_1_2_1_77_1","unstructured":"Mallory Simon. 2009. HP looking into claim webcams can't see black people. CNN."},{"key":"e_1_2_1_78_1","volume-title":"Measurement of diversity. Nature 163, 4148","author":"Simpson Edward H","year":"1949","unstructured":"Edward H Simpson. 1949. Measurement of diversity. Nature 163, 4148 (1949)."},{"key":"e_1_2_1_79_1","unstructured":"James Surowiecki. 2005. The wisdom of crowds. Anchor."},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3329486.3329493"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995347"},{"key":"e_1_2_1_82_1","volume-title":"Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)."},{"key":"e_1_2_1_83_1","unstructured":"Tess Townsend. 2017. Most engineers are white and so are the faces they use to train software. Recode."},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.14778\/3625054.3625066"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611630"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1057\/s41599-022-01144-1"},{"key":"e_1_2_1_87_1","volume-title":"Large language models are versatile decomposers: Decompose evidence and questions for table-based reasoning. arXiv preprint arXiv:2301.13808","author":"Ye Yunhu","year":"2023","unstructured":"Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, and Yongbin Li. 2023. Large language models are versatile decomposers: Decompose evidence and questions for table-based reasoning. arXiv preprint arXiv:2301.13808 (2023)."},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00017"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.463"},{"key":"e_1_2_1_90_1","doi-asserted-by":"crossref","unstructured":"Ce Zhou Qian Li Chen Li Jun Yu Yixin Liu Guangjing Wang Kai Zhang Cheng Ji Qiben Yan Lifang He et al. 2023. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419 (2023).","DOI":"10.1007\/s13042-024-02443-6"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3681954.3682014","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,27]],"date-time":"2024-11-27T15:16:21Z","timestamp":1732720581000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3681954.3682014"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":90,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10.14778\/3681954.3682014"],"URL":"https:\/\/doi.org\/10.14778\/3681954.3682014","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"2024-08-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}