{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T20:23:56Z","timestamp":1776111836343,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":110,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,10,29]],"date-time":"2024-10-29T00:00:00Z","timestamp":1730160000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006374","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-2205171, CCF-2045402, GRFP"],"award-info":[{"award-number":["CNS-2205171, CCF-2045402, GRFP"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,10,29]]},"DOI":"10.1145\/3689904.3694702","type":"proceedings-article","created":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T12:19:49Z","timestamp":1729685989000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-4275-653X","authenticated-orcid":false,"given":"Rachel","family":"Hong","sequence":"first","affiliation":[{"name":"University of Washington, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1362-554X","authenticated-orcid":false,"given":"William","family":"Agnew","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4899-226X","authenticated-orcid":false,"given":"Tadayoshi","family":"Kohno","sequence":"additional","affiliation":[{"name":"University of Washington, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3753-8405","authenticated-orcid":false,"given":"Jamie","family":"Morgenstern","sequence":"additional","affiliation":[{"name":"University of Washington, United States of America"}]}],"member":"320","published-online":{"date-parts":[[2024,10,29]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"Sandhini Agarwal Gretchen Krueger Jack Clark Alec Radford Jong\u00a0Wook Kim and Miles Brundage. 2021. Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications. http:\/\/arxiv.org\/abs\/2108.02818 arXiv:2108.02818 [cs]."},{"key":"e_1_3_2_2_2_1","unstructured":"[2] Stability AI. 2024. https:\/\/stability.ai\/stable-image"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600211.3604720"},{"key":"e_1_3_2_2_4_1","unstructured":"Amazon. 2024. Amazon Rekognition. https:\/\/docs.aws.amazon.com\/rekognition\/latest\/dg\/what-is.html"},{"key":"e_1_3_2_2_5_1","first-page":"55320","article-title":"Ethical Considerations for Responsible Data Curation","volume":"36","author":"Andrews Jerone","year":"2024","unstructured":"Jerone Andrews, Dora Zhao, William Thong, Apostolos Modas, Orestis Papakyriakopoulos, and Alice Xiang. 2024. Ethical Considerations for Responsible Data Curation. Advances in Neural Information Processing Systems 36 (2024), 55320\u201355360.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_6_1","unstructured":"Internet Archive. 2022. Wayback CDX Server API documentation. https:\/\/archive.org\/developers\/wayback-cdx-server.html"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1177\/08861099211001460"},{"key":"e_1_3_2_2_8_1","volume-title":"Exploring 12 million of the 2.3 billion images used to train stable diffusion\u2019s image generator. Retrieved July 6","author":"Baio Andy","year":"2022","unstructured":"Andy Baio. 2022. Exploring 12 million of the 2.3 billion images used to train stable diffusion\u2019s image generator. Retrieved July 6 (2022), 2023. https:\/\/waxy.org\/2022\/08\/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator\/"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3593013.3594095"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1168987.1169018"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600211.3604722"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3531146.3533083"},{"key":"e_1_3_2_2_14_1","first-page":"21268","article-title":"Into the LAION\u2019s Den: Investigating Hate in Multimodal Datasets","volume":"36","author":"Birhane Abeba","year":"2023","unstructured":"Abeba Birhane, Vinay Prabhu, Sang Han, Vishnu\u00a0Naresh Boddeti, and Alexandra\u00a0Sasha Luccioni. 2023. Into the LAION\u2019s Den: Investigating Hate in Multimodal Datasets. Advances in Neural Information Processing Systems 36 (2023), 21268\u2013\u201321284.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_15_1","unstructured":"Abeba Birhane Vinay\u00a0Uday Prabhu and Emmanuel Kahembwe. 2021. Multimodal datasets: misogyny pornography and malignant stereotypes."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468536"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"crossref","unstructured":"Su\u00a0Lin Blodgett Solon Barocas Hal Daum\u00e9\u00a0III and Hanna Wallach. 2020. Language (technology) is power: A critical survey of\" bias\" in NLP.","DOI":"10.18653\/v1\/2020.acl-main.485"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3462244.3479897"},{"key":"e_1_3_2_2_19_1","volume-title":"Fake Photos","author":"Bourel Matthieu","unstructured":"Matthieu Bourel. 2024. Fake Photos, Real Harm: AOC and the Fight Against AI Porn. https:\/\/www.rollingstone.com\/culture\/culture-features\/aoc-deepfake-ai-porn-personal-experience-defiance-act-1234998491\/"},{"key":"e_1_3_2_2_20_1","volume-title":"Is exposure to online content depicting risky behavior related to viewers","author":"Branley Dawn\u00a0Beverley","year":"2017","unstructured":"Dawn\u00a0Beverley Branley and Judith Covey. 2017. Is exposure to online content depicting risky behavior related to viewers\u2019 own risky behavior offline?Computers in Human Behavior 75 (2017), 283\u2013287."},{"key":"e_1_3_2_2_21_1","volume-title":"Language models are few-shot learners. Advances in neural information processing systems 33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared\u00a0D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877\u20131901."},{"key":"e_1_3_2_2_22_1","volume-title":"Conference on Fairness, Accountability, and Transparency. PMLR","author":"Buolamwini Joy","year":"2018","unstructured":"Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability, and Transparency. PMLR, New York, NY, USA, 77\u201391."},{"key":"e_1_3_2_2_23_1","volume-title":"A critical sense","author":"Butler Judith","unstructured":"Judith Butler. 2013. Gender as performance. In A critical sense. Routledge, New York, NY, USA, 109\u2013125."},{"key":"e_1_3_2_2_24_1","volume-title":"Web content accessibility guidelines (WCAG) 2.0","author":"Caldwell Ben","year":"2008","unstructured":"Ben Caldwell, Michael Cooper, Loretta\u00a0Guarino Reid, Gregg Vanderheiden, Wendy Chisholm, John Slatin, and Jason White. 2008. Web content accessibility guidelines (WCAG) 2.0. WWW Consortium (W3C) 290 (2008), 1\u201334."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514094.3534162"},{"key":"e_1_3_2_2_26_1","volume-title":"Why is my classifier discriminatory?Advances in neural information processing systems 31","author":"Chen Irene","year":"2018","unstructured":"Irene Chen, Fredrik\u00a0D Johansson, and David Sontag. 2018. Why is my classifier discriminatory?Advances in neural information processing systems 31 (2018)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00276"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","unstructured":"B. Clemm\u00a0von Hohenberg E. Menchen-Trevino A. Casas and M. Wojcieszak. 2021. A list of over 5000 US news domains and their social media accounts. https:\/\/doi.org\/10.5281\/zenodo.7651047","DOI":"10.5281\/zenodo.7651047"},{"key":"e_1_3_2_2_29_1","unstructured":"Cloudflare. 2024. Cloudflare API v4 documentation: Get multiple domain details. https:\/\/developers.cloudflare.com\/api\/operations\/domain-intelligence-get-multiple-domain-details"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3531146.3533143"},{"key":"e_1_3_2_2_31_1","unstructured":"[31] Common Crawl. 2024. https:\/\/commoncrawl.org\/"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19836-6_6"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1177\/0162243915579283"},{"key":"e_1_3_2_2_34_1","unstructured":"DataComp. 2024. DataComp Tracks. https:\/\/www.datacomp.ai\/#tracks"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"crossref","unstructured":"Meera Desai Abigail Jacobs and Dallas Card. 2023. An Archival Perspective on Pretraining Data.","DOI":"10.1016\/j.patter.2024.100966"},{"key":"e_1_3_2_2_36_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding.","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3278721.3278729"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"crossref","unstructured":"Jesse Dodge Maarten Sap Ana Marasovi\u0107 William Agnew Gabriel Ilharco Dirk Groeneveld Margaret Mitchell and Matt Gardner. 2021. Documenting large webtext corpora: A case study on the colossal clean crawled corpus.","DOI":"10.18653\/v1\/2021.emnlp-main.98"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2090236.2090255"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10742-009-0047-1"},{"key":"e_1_3_2_2_41_1","unstructured":"Alex Fang Albin\u00a0Madappally Jose Amit Jain Ludwig Schmidt Alexander Toshev and Vaishaal Shankar. 2023. Data filtering networks."},{"key":"e_1_3_2_2_42_1","first-page":"27092","article-title":"DataComp: In search of the next generation of multimodal datasets","volume":"36","author":"Gadre Samir\u00a0Yitzhak","year":"2024","unstructured":"Samir\u00a0Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, 2024. DataComp: In search of the next generation of multimodal datasets. Advances in Neural Information Processing Systems 36 (2024), 27092\u2013\u201327112.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"crossref","unstructured":"Sachin Goyal Pratyush Maini Zachary\u00a0C. Lipton Aditi Raghunathan and J.\u00a0Zico Kolter. 2024. Scaling Laws for Data Filtering \u2013 Data Curation cannot be Compute Agnostic. arxiv:2404.07177\u00a0[cs.LG]","DOI":"10.1109\/CVPR52733.2024.02142"},{"key":"e_1_3_2_2_44_1","unstructured":"Michael\u00a0M Grynbaum and Ryan Mac. 2023. The Times Sues OpenAI and Microsoft. 1\u00a0pages."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE55515.2023.00303"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174092"},{"key":"e_1_3_2_2_47_1","unstructured":"Ritwik Gupta. 2024. LAION and the Challenges of Preventing AI-Generated CSAM. https:\/\/www.techpolicy.press\/laion-and-the-challenges-of-preventing-ai-generated-csam\/"},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"crossref","unstructured":"Suchin Gururangan Dallas Card Sarah\u00a0K Dreier Emily\u00a0K Gade Leroy\u00a0Z Wang Zeyu Wang Luke Zettlemoyer and Noah\u00a0A Smith. 2022. Whose language counts as high quality? measuring language ideologies in text data selection.","DOI":"10.18653\/v1\/2022.emnlp-main.165"},{"key":"e_1_3_2_2_49_1","unstructured":"Alex Hanna and Tina\u00a0M Park. 2020. Against scale: Provocations and resistances to scale thinking."},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","unstructured":"Peter Henderson Xuechen Li Dan Jurafsky Tatsunori Hashimoto Mark\u00a0A. Lemley and Percy Liang. 2023. Foundation Models and Fair Use. https:\/\/doi.org\/10.48550\/arXiv.2303.15715 arXiv:2303.15715 [cs].","DOI":"10.48550\/arXiv.2303.15715"},{"key":"e_1_3_2_2_51_1","volume-title":"Lisa\u00a0Anne Hendricks","author":"Hoffmann Jordan","year":"2022","unstructured":"Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de\u00a0Las Casas, Lisa\u00a0Anne Hendricks, Johannes Welbl, Aidan Clark, 2022. Training compute-optimal large language models."},{"key":"e_1_3_2_2_52_1","unstructured":"IP2Location. 2024. IP2Location Lite IP-Country IPv6 Database. https:\/\/lite.ip2location.com\/ip2location-lite"},{"key":"e_1_3_2_2_53_1","unstructured":"IWF. 2023. How AI is being abused to create child sexual abuse imagery. https:\/\/www.iwf.org.uk\/media\/q4zll2ya\/iwf-ai-csam-report_public-oct23v1.pdf"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3351095.3372829"},{"key":"e_1_3_2_2_55_1","volume-title":"Location accuracy of commercial IP address geolocation databases. Information technology and control 46, 3","author":"Komosny Dan","year":"2017","unstructured":"Dan Komosny, Miroslav Voznak, and Saeed\u00a0Ur Rehman. 2017. Location accuracy of commercial IP address geolocation databases. Information technology and control 46, 3 (2017), 333\u2013344."},{"key":"e_1_3_2_2_56_1","volume-title":"Counterfactual fairness. Advances in neural information processing systems 30","author":"Kusner J","year":"2017","unstructured":"Matt\u00a0J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. Advances in neural information processing systems 30 (2017), 11\u00a0pages."},{"key":"e_1_3_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1080\/14791420.2016.1273534"},{"key":"e_1_3_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.fsidi.2020.301022"},{"key":"e_1_3_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.4523551"},{"key":"e_1_3_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404868.3406664"},{"key":"e_1_3_2_2_61_1","volume-title":"The social construction of race. Harvard Civil Rights-Civil Liberties Law Review","author":"F\u00a0Haney Lopez Ian","unstructured":"Ian F\u00a0Haney Lopez. 1995. The social construction of race. Harvard Civil Rights-Civil Liberties Law Review, Cambridge, MA, USA."},{"key":"e_1_3_2_2_62_1","first-page":"56338","article-title":"Stable bias: Evaluating societal representations in diffusion models","volume":"36","author":"Luccioni Alexandra\u00a0Sasha","year":"2024","unstructured":"Alexandra\u00a0Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. 2024. Stable bias: Evaluating societal representations in diffusion models. Advances in Neural Information Processing Systems 36 (2024), 56338\u201356351.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_63_1","doi-asserted-by":"crossref","unstructured":"Li Lucy Suchin Gururangan Luca Soldaini Emma Strubell David Bamman Lauren Klein and Jesse Dodge. 2024. AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters.","DOI":"10.18653\/v1\/2024.acl-long.400"},{"key":"e_1_3_2_2_64_1","volume-title":"The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47","author":"Ma S","year":"2015","unstructured":"Debbie\u00a0S Ma, Joshua Correll, and Bernd Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47 (2015), 1122\u20131135."},{"key":"e_1_3_2_2_65_1","unstructured":"Susan\u00a0R Madsen. 2021. Why Calling Women\u2019Girls\u2019 Is A Bigger Deal Than You May Think."},{"key":"e_1_3_2_2_66_1","unstructured":"[66] Midjourney. 2024. https:\/\/www.midjourney.com\/home"},{"key":"e_1_3_2_2_67_1","volume-title":"Clipcap: Clip prefix for image captioning.","author":"Mokady Ron","year":"2021","unstructured":"Ron Mokady, Amir Hertz, and Amit\u00a0H Bermano. 2021. Clipcap: Clip prefix for image captioning."},{"key":"e_1_3_2_2_68_1","unstructured":"Andreas Mueller. 2023. word_cloud. https:\/\/github.com\/amueller\/word_cloud"},{"key":"e_1_3_2_2_69_1","first-page":"1","article-title":"The Art of Cybersecurity: Defense in Depth Strategy for Robust Protection","volume":"1","author":"Mughal Arif\u00a0Ali","year":"2018","unstructured":"Arif\u00a0Ali Mughal. 2018. The Art of Cybersecurity: Defense in Depth Strategy for Robust Protection. International Journal of Intelligent Automation and Computing 1, 1 (2018), 1\u201320.","journal-title":"International Journal of Intelligent Automation and Computing"},{"key":"e_1_3_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3517644"},{"key":"e_1_3_2_2_71_1","first-page":"21455","article-title":"Quality not quantity: On the interaction between dataset design and robustness of clip","volume":"35","author":"Nguyen Thao","year":"2022","unstructured":"Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, and Ludwig Schmidt. 2022. Quality not quantity: On the interaction between dataset design and robustness of clip. Advances in Neural Information Processing Systems 35 (2022), 21455\u201321469.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_72_1","volume-title":"Hongzhi Yin, and Quoc Viet\u00a0Hung Nguyen.","author":"Nguyen Thanh\u00a0Tam","year":"2022","unstructured":"Thanh\u00a0Tam Nguyen, Thanh\u00a0Trung Huynh, Phi\u00a0Le Nguyen, Alan Wee-Chung Liew, Hongzhi Yin, and Quoc Viet\u00a0Hung Nguyen. 2022. A survey of machine unlearning."},{"key":"e_1_3_2_2_73_1","volume-title":"Making technology masculine: men, women and modern machines in America","author":"Oldenziel Ruth","year":"1870","unstructured":"Ruth Oldenziel. 1999. Making technology masculine: men, women and modern machines in America, 1870-1945. Amsterdam University Press, Amsterdam, Netherlands."},{"key":"e_1_3_2_2_74_1","volume-title":"Model Card: CLIP. https:\/\/github.com\/openai\/CLIP\/blob\/main\/model-card.md","author":"AI.","year":"2022","unstructured":"OpenAI. 2022. Model Card: CLIP. https:\/\/github.com\/openai\/CLIP\/blob\/main\/model-card.md"},{"key":"e_1_3_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/1971162.1971171"},{"key":"e_1_3_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.520"},{"key":"e_1_3_2_2_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/3576915.3616679"},{"key":"e_1_3_2_2_78_1","volume-title":"International Conference on Machine Learning. PMLR, Online, 8748\u20138763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, Online, 8748\u20138763."},{"key":"e_1_3_2_2_79_1","volume-title":"Language models are unsupervised multitask learners. OpenAI blog 1, 8","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9."},{"key":"e_1_3_2_2_80_1","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455856"},{"key":"e_1_3_2_2_81_1","unstructured":"World\u00a0Population Review. 2024. Western Countries 2024. https:\/\/worldpopulationreview.com\/country-rankings\/western-countries"},{"key":"e_1_3_2_2_82_1","unstructured":"Reece Rogers. 2024. Here\u2019s How Generative AI Depicts Queer People."},{"key":"e_1_3_2_2_83_1","volume-title":"International Conference on Machine Learning. PMLR, Online, 9040\u20139051","author":"Rolf Esther","year":"2021","unstructured":"Esther Rolf, Theodora\u00a0T Worledge, Benjamin Recht, and Michael Jordan. 2021. Representation matters: Assessing the importance of subgroup allocations in training data. In International Conference on Machine Learning. PMLR, Online, 9040\u20139051."},{"key":"e_1_3_2_2_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_2_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/3517745.3561418"},{"key":"e_1_3_2_2_86_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1163"},{"key":"e_1_3_2_2_87_1","unstructured":"Mia Sato and Emillia David. 2024. I\u2019m still trying to generate an AI Asian man and white woman."},{"key":"e_1_3_2_2_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3476058"},{"key":"e_1_3_2_2_89_1","first-page":"25278","article-title":"Laion-5b: An open large-scale dataset for training next generation image-text models","volume":"35","author":"Schuhmann Christoph","year":"2022","unstructured":"Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems 35 (2022), 25278\u201325294.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_90_1","unstructured":"Christoph Schuhmann Richard Vencu Romain Beaumont Robert Kaczmarczyk Clayton Mullis Aarush Katta Theo Coombes Jenia Jitsev and Aran Komatsuzaki. 2021. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs."},{"key":"e_1_3_2_2_91_1","doi-asserted-by":"publisher","DOI":"10.1177\/2378023120967171"},{"key":"e_1_3_2_2_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600211.3604673"},{"key":"e_1_3_2_2_93_1","unstructured":"Shutterstock. 2024. Can I use images on my website?https:\/\/support.shutterstock.com\/s\/article\/Can-I-use-Images-on-my-website?language=en_US"},{"key":"e_1_3_2_2_94_1","unstructured":"Nakatani Shuyo. 2014. langdetect. https:\/\/github.com\/Mimino666\/langdetect"},{"key":"e_1_3_2_2_95_1","unstructured":"Natasha Singer. 2024. Teen Girls Confront an Epidemic of Deepfake Nudes in Schools."},{"key":"e_1_3_2_2_96_1","volume-title":"When reality monitoring fails: The role of imagination in stereotype maintenance.Journal of Personality and Social Psychology 52, 4","author":"Slusher P","year":"1987","unstructured":"Morgan\u00a0P Slusher and Craig\u00a0A Anderson. 1987. When reality monitoring fails: The role of imagination in stereotype maintenance.Journal of Personality and Social Psychology 52, 4 (1987), 653."},{"key":"e_1_3_2_2_97_1","unstructured":"Teachers\u00a0Pay Teachers. 2022. How do I obtain a copyright in my work? Should I register my copyright?https:\/\/help.teacherspayteachers.com\/hc\/en-us\/articles\/360042535652-How-do-I-obtain-a-copyright-in-my-work-Should-I-register-my-copyright"},{"key":"e_1_3_2_2_98_1","unstructured":"David Thiel. 2023. Identifying and Eliminating CSAM in Generative ML Training Data and Models."},{"key":"e_1_3_2_2_99_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58621-8_45"},{"key":"e_1_3_2_2_100_1","first-page":"161","article-title":"Unpacking hetero-patriarchy: tracing the conflation of sex, gender & (and) sexual orientation to its origins","volume":"8","author":"Valdes Francisco","year":"1996","unstructured":"Francisco Valdes. 1996. Unpacking hetero-patriarchy: tracing the conflation of sex, gender & (and) sexual orientation to its origins. Yale JL & Human. 8 (1996), 161.","journal-title":"Yale JL & Human."},{"key":"e_1_3_2_2_101_1","unstructured":"Pranshu Verma and Drew Harwell. 2023. Exploitive illegal photos of children found in the data that trains some AI. https:\/\/www.washingtonpost.com\/technology\/2023\/12\/20\/ai-child-pornography-abuse-photos-laion\/"},{"key":"e_1_3_2_2_102_1","doi-asserted-by":"publisher","DOI":"10.1145\/503376.503460"},{"key":"e_1_3_2_2_103_1","unstructured":"Jess Weatherbed. 2024. Trolls have flooded X with graphic Taylor Swift AI fakes."},{"key":"e_1_3_2_2_104_1","unstructured":"WebAIM. 2024. The WebAIM Million: An annual accessibility analysis of the top 1 000 000 home pages. https:\/\/webaim.org\/projects\/million\/#alttext"},{"key":"e_1_3_2_2_105_1","unstructured":"Guillaume Wenzek Marie-Anne Lachaux Alexis Conneau Vishrav Chaudhary Francisco Guzm\u00e1n Armand Joulin and Edouard Grave. 2019. CCNet: Extracting high quality monolingual datasets from web crawl data."},{"key":"e_1_3_2_2_106_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514094.3534136"},{"key":"e_1_3_2_2_107_1","doi-asserted-by":"publisher","DOI":"10.1145\/3593013.3594072"},{"key":"e_1_3_2_2_108_1","unstructured":"Hu Xu Saining Xie Xiaoqing\u00a0Ellen Tan Po-Yao Huang Russell Howes Vasu Sharma Shang-Wen Li Gargi Ghosh Luke Zettlemoyer and Christoph Feichtenhofer. 2023. Demystifying clip data."},{"key":"e_1_3_2_2_109_1","unstructured":"Ke Yang Biao Huang Julia Stoyanovich and Sebastian Schelter. 2020. Fairness-Aware Instrumentation of Preprocessing Pipelines for Machine Learning."},{"key":"e_1_3_2_2_110_1","volume-title":"Breadwinner status and gender ideologies of men and women regarding family roles. Sociological perspectives 43, 1","author":"Zuo Jiping","year":"2000","unstructured":"Jiping Zuo and Shengming Tang. 2000. Breadwinner status and gender ideologies of men and women regarding family roles. Sociological perspectives 43, 1 (2000), 29\u201343."}],"event":{"name":"EAAMO '24: Equity and Access in Algorithms, Mechanisms, and Optimization","location":"San Luis Potosi Mexico","acronym":"EAAMO '24","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence","SIGecom Special Interest Group on Economics and Computation"]},"container-title":["Proceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689904.3694702","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3689904.3694702","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T01:47:15Z","timestamp":1755913635000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689904.3694702"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,29]]},"references-count":110,"alternative-id":["10.1145\/3689904.3694702","10.1145\/3689904"],"URL":"https:\/\/doi.org\/10.1145\/3689904.3694702","relation":{},"subject":[],"published":{"date-parts":[[2024,10,29]]},"assertion":[{"value":"2024-10-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}