{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,28]],"date-time":"2026-07-28T12:16:47Z","timestamp":1785241007858,"version":"3.55.0"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"CSCW2","license":[{"start":{"date-parts":[[2022,11,7]],"date-time":"2022-11-07T00:00:00Z","timestamp":1667779200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2022,11,7]]},"abstract":"<jats:p>Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparency around the processes by which they are created, researchers and practitioners have begun to advocate for increased data documentation and have proposed several data documentation frameworks. However, there is little research on whether these data documentation frameworks meet the needs of ML practitioners, who both create and consume datasets. To address this gap, we set out to understand ML practitioners' data documentation perceptions, needs, challenges, and desiderata, with the ultimate goal of deriving design requirements that can inform future data documentation frameworks. We conducted a series of semi-structured interviews with 14 ML practitioners at a single large, international technology company. We had them answer a list of questions taken from datasheets for datasets~\\citegebru2018datasheets. Our findings show that current approaches to data documentation are largely ad hoc and myopic in nature. Participants expressed needs for data documentation frameworks to be adaptable to their contexts, integrated into their existing tools and workflows, and automated wherever possible. Despite the fact that data documentation frameworks are often motivated from the perspective of responsible AI, participants did not make the connection between the questions that they were asked to answer and their responsible AI implications. In addition, participants often had difficulties prioritizing the needs of dataset consumers and providing information that someone unfamiliar with their datasets might need to know. Based on these findings, we derive seven design requirements for future data documentation frameworks such as more actionable guidance on how the characteristics of datasets might result in harms and how these harms might be mitigated, more explicit prompts for reflection, automated adaptation to different contexts, and integration into ML practitioners' existing tools and workflows.<\/jats:p>","DOI":"10.1145\/3555760","type":"journal-article","created":{"date-parts":[[2022,11,11]],"date-time":"2022-11-11T22:58:54Z","timestamp":1668207534000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":56,"title":["Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata"],"prefix":"10.1145","volume":"6","author":[{"given":"Amy K.","family":"Heger","sequence":"first","affiliation":[{"name":"Microsoft, St. Louis, MO, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Liz B.","family":"Marquis","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mihaela","family":"Vorvoreanu","sequence":"additional","affiliation":[{"name":"Microsoft, Redmond, WA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hanna","family":"Wallach","sequence":"additional","affiliation":[{"name":"Microsoft, New York City, NY, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jennifer","family":"Wortman Vaughan","sequence":"additional","affiliation":[{"name":"Microsoft, New York, NY, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,11,11]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-SEIP.2019.00042"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2019.2942288"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v36i1.2564"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3351095.3375691"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3461702.3462610"},{"key":"e_1_2_2_6_1","volume-title":"Proceedings of the ACM International Conference on Human Factors in Computing Systems (CHI). 2271--2274","author":"Baumer Eric P. S.","unstructured":"Eric P. S. Baumer and M. Six Silberman . 2011. When the Implication Is Not to Design (Technology) . In Proceedings of the ACM International Conference on Human Factors in Computing Systems (CHI). 2271--2274 . Eric P. S. Baumer and M. Six Silberman. 2011. When the Implication Is Not to Design (Technology). In Proceedings of the ACM International Conference on Human Factors in Computing Systems (CHI). 2271--2274."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00041"},{"key":"e_1_2_2_8_1","volume-title":"Overcoming Failures of Imagination in AI Infused System Development and Deployment. In NeurIPS Workshop on Navigating the Broader Impacts of AI Research.","author":"Boyarskaya Margarita","year":"2020","unstructured":"Margarita Boyarskaya , Alexandra Olteanu , and Kate Crawford . 2020 . Overcoming Failures of Imagination in AI Infused System Development and Deployment. In NeurIPS Workshop on Navigating the Broader Impacts of AI Research. Margarita Boyarskaya, Alexandra Olteanu, and Kate Crawford. 2020. Overcoming Failures of Imagination in AI Infused System Development and Deployment. In NeurIPS Workshop on Navigating the Broader Impacts of AI Research."},{"key":"e_1_2_2_9_1","volume-title":"Ethical Sensitivity in Machine Learning Development. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. 87--92","author":"Boyd Karen","year":"2020","unstructured":"Karen Boyd . 2020 . Ethical Sensitivity in Machine Learning Development. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. 87--92 . Karen Boyd. 2020. Ethical Sensitivity in Machine Learning Development. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. 87--92."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3479582"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1191\/1478088706qp063oa"},{"key":"e_1_2_2_12_1","volume-title":"Proceedings of the Conference on Fairness, Accountability and Transparency (FAT*). 77--91","author":"Buolamwini Joy","year":"2018","unstructured":"Joy Buolamwini and Timnit Gebru . 2018 . Gender shades: Intersectional accuracy disparities in commercial gender classification . In Proceedings of the Conference on Fairness, Accountability and Transparency (FAT*). 77--91 . Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conference on Fairness, Accountability and Transparency (FAT*). 77--91."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.418"},{"key":"e_1_2_2_14_1","volume-title":"NeurIPS Workshop on Dataset Curation and Security.","author":"Chmielinski Kasia S.","year":"2020","unstructured":"Kasia S. Chmielinski , Sarah Newman , Matt Taylor , Josh Joseph , Kemi Thomas , Jessica Yurkofsky , and Yue Chelsea Qiu . 2020 . The Dataset Nutrition Label (2nd Gen): Leveraging Context to mitigate Harms in Artificial Intelligence . In NeurIPS Workshop on Dataset Curation and Security. Kasia S. Chmielinski, Sarah Newman, Matt Taylor, Josh Joseph, Kemi Thomas, Jessica Yurkofsky, and Yue Chelsea Qiu. 2020. The Dataset Nutrition Label (2nd Gen): Leveraging Context to mitigate Harms in Artificial Intelligence. In NeurIPS Workshop on Dataset Curation and Security."},{"key":"e_1_2_2_15_1","volume-title":"Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data","author":"Chouldechova Alexandra","year":"2017","unstructured":"Alexandra Chouldechova . 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data , Vol. 5 , 2 ( 2017 ), 153--163. Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, Vol. 5, 2 (2017), 153--163."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3311957.3359509"},{"key":"e_1_2_2_17_1","volume-title":"CVPR Workshop on Computer Vision for Global Challenges. 52--59","author":"de Vries Terrance","unstructured":"Terrance de Vries , Ishan Misra , Changhan Wang , and Laurens van der Maaten. 2019. Does Object Recognition Work for Everyone? . In CVPR Workshop on Computer Vision for Global Challenges. 52--59 . Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. 2019. Does Object Recognition Work for Everyone?. In CVPR Workshop on Computer Vision for Global Challenges. 52--59."},{"key":"e_1_2_2_18_1","article-title":"Innovating Like an Optimist, Preparing Like a Pessimist: Ethical Speculation and the Legal Imagination","volume":"19","author":"Fiesler Casey","year":"2021","unstructured":"Casey Fiesler . 2021 . Innovating Like an Optimist, Preparing Like a Pessimist: Ethical Speculation and the Legal Imagination . Colorado Technology Law Journal , Vol. 19 , 1 (2021). Casey Fiesler. 2021. Innovating Like an Optimist, Preparing Like a Pessimist: Ethical Speculation and the Legal Imagination. Colorado Technology Law Journal, Vol. 19, 1 (2021).","journal-title":"Colorado Technology Law Journal"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458723"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-020-00257-z"},{"key":"e_1_2_2_21_1","unstructured":"Zolt\u00e1n G\u00f3cza. 2015. Myth #21: People can tell you what they want. https:\/\/uxmyths.com\/post\/746610684\/myth-21-people-can-tell-you-what-they-want\/  Zolt\u00e1n G\u00f3cza. 2015. Myth #21: People can tell you what they want. https:\/\/uxmyths.com\/post\/746610684\/myth-21-people-can-tell-you-what-they-want\/"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.54648\/COLA2018095"},{"key":"e_1_2_2_23_1","volume-title":"Wisconsin. In Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing. 206--210","author":"Romael Haque MD","year":"2019","unstructured":"MD Romael Haque , Katherine Weathington , and Shion Guha . 2019 . Exploring the Impact of (Not) Changing Default Settings in Algorithmic Crime Mapping-A Case Study of Milwaukee , Wisconsin. In Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing. 206--210 . MD Romael Haque, Katherine Weathington, and Shion Guha. 2019. Exploring the Impact of (Not) Changing Default Settings in Algorithmic Crime Mapping-A Case Study of Milwaukee, Wisconsin. In Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing. 206--210."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376177"},{"key":"e_1_2_2_25_1","volume-title":"The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677","author":"Holland Sarah","year":"2018","unstructured":"Sarah Holland , Ahmed Hosny , Sarah Newman , Joshua Joseph , and Kasia Chmielinski . 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677 ( 2018 ). Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677 (2018)."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300830"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3134688"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300637"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445918"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-019-0088-2"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3311957.3359512"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3375627.3375835"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10606-007-9044-5"},{"key":"e_1_2_2_34_1","unstructured":"Min Kyung Lee and Kate Rich. 2021. Who Is Included in Human Perceptions of AI?: Trust and Perceived Fairness around Healthcare AI and Cultural Mistrust. (2021) 1--14.  Min Kyung Lee and Kate Rich. 2021. Who Is Included in Human Perceptions of AI?: Trust and Perceived Fairness around Healthcare AI and Cultural Mistrust. (2021) 1--14."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376445"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3361118"},{"key":"e_1_2_2_37_1","volume-title":"Qualitative research design: An interactive approach","author":"Maxwell Joseph A","unstructured":"Joseph A Maxwell . 2012. Qualitative research design: An interactive approach . Vol. 41 . Sage publications. Joseph A Maxwell. 2012. Qualitative research design: An interactive approach. Vol. 41. Sage publications."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3415186"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445880"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3287560.3287596"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-019-0114-4"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300356"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432955"},{"key":"e_1_2_2_44_1","unstructured":"Jakob Nielsen. 2001. First Rule of Usability? Don't Listen to Users. https:\/\/www.nngroup.com\/articles\/first-rule-of-usability-dont-listen-to-users\/  Jakob Nielsen. 2001. First Rule of Usability? Don't Listen to Users. https:\/\/www.nngroup.com\/articles\/first-rule-of-usability-dont-listen-to-users\/"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274405"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1866029.1866038"},{"key":"e_1_2_2_48_1","volume-title":"NeurIPS Workshop on Machine Learning Retrospectives, Surveys, and Meta-analyses.","author":"Paullada Amandalynne","year":"2020","unstructured":"Amandalynne Paullada , Inioluwa Deborah Raji , Emily M. Bender , Emily Denton , and Alex Hanna . 2020 . Data and its (dis)contents: A survey of dataset development and use in machine learning research . In NeurIPS Workshop on Machine Learning Retrospectives, Surveys, and Meta-analyses. Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. 2020. Data and its (dis)contents: A survey of dataset development and use in machine learning research. In NeurIPS Workshop on Machine Learning Retrospectives, Surveys, and Meta-analyses."},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3449081"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00752449"},{"key":"e_1_2_2_51_1","first-page":"109","article-title":"Disparate impact in big data policing","volume":"52","author":"Selbst Andrew D","year":"2017","unstructured":"Andrew D Selbst . 2017 . Disparate impact in big data policing . Ga. L. Rev. , Vol. 52 (2017), 109 . Andrew D Selbst. 2017. Disparate impact in big data policing. Ga. L. Rev., Vol. 52 (2017), 109.","journal-title":"Ga. L. Rev."},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180492"},{"key":"e_1_2_2_53_1","volume-title":"Institutional ecology,translations' and boundary objects: Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology","author":"Star Susan Leigh","year":"1907","unstructured":"Susan Leigh Star and James R Griesemer . 1989. Institutional ecology,translations' and boundary objects: Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology , 1907 --39. Social studies of science, Vol. 19 , 3 (1989), 387--420. Susan Leigh Star and James R Griesemer. 1989. Institutional ecology,translations' and boundary objects: Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology, 1907--39. Social studies of science, Vol. 19, 3 (1989), 387--420."},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313129"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1533-8525.1988.tb01249.x"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174014"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2818048.2820078"},{"key":"e_1_2_2_58_1","volume-title":"Why It's Wrong To Ask Users What They Want (And What To Ask Instead). Forbes (May","author":"Yeykelis Leo","year":"2018","unstructured":"Leo Yeykelis . 2018. Why It's Wrong To Ask Users What They Want (And What To Ask Instead). Forbes (May 2018 ). https:\/\/www.forbes.com\/sites\/leoyeykelis\/2018\/05\/10\/why-its-wrong-to-ask-users-what-they-want-and-what-to-ask-instead\/'sh=1449b7c91f22 Leo Yeykelis. 2018. Why It's Wrong To Ask Users What They Want (And What To Ask Instead). Forbes (May 2018). https:\/\/www.forbes.com\/sites\/leoyeykelis\/2018\/05\/10\/why-its-wrong-to-ask-users-what-they-want-and-what-to-ask-instead\/'sh=1449b7c91f22"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pmed.1002683"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3392826"}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555760","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3555760","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:19Z","timestamp":1750182559000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555760"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,7]]},"references-count":59,"journal-issue":{"issue":"CSCW2","published-print":{"date-parts":[[2022,11,7]]}},"alternative-id":["10.1145\/3555760"],"URL":"https:\/\/doi.org\/10.1145\/3555760","relation":{},"ISSN":["2573-0142"],"issn-type":[{"value":"2573-0142","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,7]]},"assertion":[{"value":"2022-11-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}