{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T03:18:52Z","timestamp":1758079132170,"version":"3.44.0"},"reference-count":12,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>Preparing datasets\u2014a critical phase known as data wrangling\u2014constitutes the dominant phase of data science development, consuming upwards of 80% of the total project time. This phase encompasses a myriad of tasks: parsing data, restructuring it for analysis, repairing inaccuracies, merging sources, eliminating duplicates, and ensuring overall data integrity. Traditional approaches, typically through manual coding in languages such as Python or using spreadsheets, are not only laborious but also error-prone. These issues range from missing entries and formatting inconsistencies to data type inaccuracies, all of which can affect the quality of downstream tasks if not properly corrected. To address these challenges, we present Buckaroo, a visualization system to highlight discrepancies in data and enable on-the-spot corrections through direct manipulations of visual objects. Buckaroo (1) automatically finds \"interesting\" data groups that exhibit anomalies compared to the rest of the groups and recommends them for inspection; (2) suggests wrangling actions that the user can choose to repair the anomalies; and (3) allows users to visually manipulate their data by displaying the effects of their wrangling actions and offering the ability to undo or redo these actions, which supports the iterative nature of data wrangling.<\/jats:p>","DOI":"10.14778\/3750601.3750687","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:05Z","timestamp":1758029885000},"page":"5423-5426","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Buckaroo: A Direct Manipulation Visual Data Wrangler"],"prefix":"10.14778","volume":"18","author":[{"given":"Annabelle","family":"Warner","sequence":"first","affiliation":[{"name":"University of Utah"}]},{"given":"Andrew","family":"McNutt","sequence":"additional","affiliation":[{"name":"University of Utah"}]},{"given":"Paul","family":"Rosen","sequence":"additional","affiliation":[{"name":"University of Utah"}]},{"given":"El Kindi","family":"Rezig","sequence":"additional","affiliation":[{"name":"University of Utah"}]}],"member":"320","published-online":{"date-parts":[[2025,9,16]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Jakob Bach. 2025. Using Constraints to Discover Sparse and Alternative Subgroup Descriptions. arXiv:2406.01411 [cs.LG] https:\/\/arxiv.org\/abs\/2406.01411"},{"key":"e_1_2_1_2_1","volume-title":"Dango: A Mixed-Initiative Data Wrangling System using Large Language Model.","author":"Chen Wei-Hao","year":"2025","unstructured":"Wei-Hao Chen, Weixi Tong, Amanda Case, and Tianyi Zhang. 2025. Dango: A Mixed-Initiative Data Wrangling System using Large Language Model. (2025)."},{"key":"e_1_2_1_3_1","volume-title":"Daniel Perelman, Mohammad Raza, Sherry Shi, Danny Simmons, and Ashish Tiwari.","author":"Chopra Bhavya","year":"2023","unstructured":"Bhavya Chopra, Anna Fariha, Sumit Gulwani, Austin Z. Henley, Daniel Perelman, Mohammad Raza, Sherry Shi, Danny Simmons, and Ashish Tiwari. 2023. CoWrangler: Recommender System for Data-Wrangling Scripts (SIGMOD '23)."},{"key":"e_1_2_1_4_1","volume-title":"Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang.","author":"Deng Dong","year":"2017","unstructured":"Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang. 2017. The Data Civilizer System. In CIDR."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2133416.2146416"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-010-0356-2"},{"key":"e_1_2_1_7_1","volume-title":"SIGCHI Conference on Human Factors in Computing Systems. 3363\u20133372","author":"Kandel Sean","year":"2011","unstructured":"Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts. In SIGCHI Conference on Human Factors in Computing Systems. 3363\u20133372."},{"key":"e_1_2_1_8_1","volume-title":"Arquero: JavaScript Library for Data Tables. https:\/\/idl.uw.edu\/arquero\/ Accessed: 2025-03-30.","author":"Interactive Data Lab UW","year":"2025","unstructured":"UW Interactive Data Lab. 2025. Arquero: JavaScript Library for Data Tables. https:\/\/idl.uw.edu\/arquero\/ Accessed: 2025-03-30."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685890"},{"key":"e_1_2_1_10_1","volume-title":"Dagger: A Data (not code) Debugger. In CIDR.","author":"Rezig El Kindi","year":"2020","unstructured":"El Kindi Rezig, Lei Cao, Giovanni Simonini, Maxime Schoemans, Samuel Madden, Nan Tang, Mourad Ouzzani, and Michael Stonebraker. 2020. Dagger: A Data (not code) Debugger. In CIDR."},{"key":"e_1_2_1_11_1","volume-title":"Tasks and visualizations used for data profiling: A survey and interview study","author":"Ruddle Roy A","year":"2023","unstructured":"Roy A Ruddle, James Cheshire, and Sara Johansson Fernstad. 2023. Tasks and visualizations used for data profiling: A survey and interview study. IEEE Transactions on Visualization and Computer Graphics (2023)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2022.3209470"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3750601.3750687","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:43:25Z","timestamp":1758030205000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3750601.3750687"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":12,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.14778\/3750601.3750687"],"URL":"https:\/\/doi.org\/10.14778\/3750601.3750687","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8]]},"assertion":[{"value":"2025-09-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}