{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T01:14:01Z","timestamp":1775610841372,"version":"3.50.1"},"reference-count":69,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2011,9,2]],"date-time":"2011-09-02T00:00:00Z","timestamp":1314921600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Information Visualization"],"published-print":{"date-parts":[[2011,10]]},"abstract":"<jats:p> In spite of advances in technologies for working with data, analysts still spend an inordinate amount of time diagnosing data quality issues and manipulating data into a usable form. This process of \u2018data wrangling\u2019 often constitutes the most tedious and time-consuming aspect of analysis. Though data cleaning and integration arelongstanding issues in the database community, relatively little research has explored how interactive visualization can advance the state of the art. In this article, we review the challenges and opportunities associated with addressing data quality issues. We argue that analysts might more effectively wrangle data through new interactive systems that integrate data verification, transformation, and visualization. We identify a number of outstanding research questions, including how appropriate visual encodings can facilitate apprehension of missing data, discrepant values, and uncertainty; how interactive visualizations might facilitate data transform specification; and how recorded provenance and social interaction might enable wider reuse, verification, and modification of data transformations. <\/jats:p>","DOI":"10.1177\/1473871611415994","type":"journal-article","created":{"date-parts":[[2011,9,3]],"date-time":"2011-09-03T01:34:38Z","timestamp":1315013678000},"page":"271-288","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":253,"title":["Research directions in data wrangling: Visualizations and transformations for usable and credible data"],"prefix":"10.1177","volume":"10","author":[{"given":"Sean","family":"Kandel","sequence":"first","affiliation":[{"name":"Computer Science Department, Stanford University, USA."}]},{"given":"Jeffrey","family":"Heer","sequence":"additional","affiliation":[{"name":"Computer Science Department, Stanford University, USA."}]},{"given":"Catherine","family":"Plaisant","sequence":"additional","affiliation":[{"name":"Human-Computer Interaction Lab, University of Maryland, USA."}]},{"given":"Jessie","family":"Kennedy","sequence":"additional","affiliation":[{"name":"Institute for Informatics & Digital Innovation, Edinburgh Napier University, UK."}]},{"given":"Frank","family":"van Ham","sequence":"additional","affiliation":[{"name":"Center for Advanced Studies, IBM France."}]},{"given":"Nathalie Henry","family":"Riche","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, USA."}]},{"given":"Chris","family":"Weaver","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Oklahoma, USA."}]},{"given":"Bongshin","family":"Lee","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, USA."}]},{"given":"Dominique","family":"Brodbeck","sequence":"additional","affiliation":[{"name":"University of Applied Sciences Northwestern Switzerland, CH."}]},{"given":"Paolo","family":"Buono","sequence":"additional","affiliation":[{"name":"Dipartimento di Informatica, Universit\u00e0 degli Studi di Bari Aldo Moro, Italy."}]}],"member":"179","published-online":{"date-parts":[[2011,9,2]]},"reference":[{"key":"bibr1-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1002\/0471448354"},{"key":"bibr2-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1145\/1217299.1217304"},{"key":"bibr3-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.250581"},{"key":"bibr4-1473871611415994","unstructured":"Gravano L, Ipeirotis PG, Jagadish HV, Koudas N, Muthukrishnan S, Pietarinen L, Srivastava D. Using q-grams in a dbms for approximate string processing, 2001."},{"key":"bibr5-1473871611415994","doi-asserted-by":"crossref","unstructured":"Sarawagi S, Bhamidipaty A. Interactive deduplication using active learning. Proceedings of ACM SIGKDD (Edmonton, Alberta, Canada), 2002.","DOI":"10.1145\/775047.775087"},{"key":"bibr6-1473871611415994","first-page":"431","author":"Robertson GG","year":"2005","journal-title":"ACM CHI"},{"key":"bibr7-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2008.55"},{"key":"bibr8-1473871611415994","unstructured":"Huynh D and Mazzocchi S. Freebase GridWorks. http:\/\/code.google.com\/p\/google-refine\/."},{"key":"bibr9-1473871611415994","first-page":"381","author":"Raman V","year":"2001","journal-title":"VLDB"},{"key":"bibr10-1473871611415994","volume-title":"Illuminating the Path: The Research and Development Agenda for Visual Analytics","author":"Thomas JJ","year":"2005"},{"key":"bibr11-1473871611415994","author":"Li L","year":"2010","journal-title":"12th International Conference on Enterprise Information Systems"},{"key":"bibr12-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1023\/A:1021564703268"},{"key":"bibr13-1473871611415994","volume-title":"Technical Report HUB-IB-164","author":"M\u00fcller H","year":"2003"},{"key":"bibr14-1473871611415994","unstructured":"Ludscher B, Lin K, Bowers S, Jaeger-Frank E, Brodaric B, Baru C. Managing scientific data: From data integration to scientific workflows, 2005."},{"key":"bibr15-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1109\/2945.841121"},{"key":"bibr16-1473871611415994","first-page":"424","volume":"82","author":"Carr DB","year":"1987","journal-title":"J Am Stat Assoc"},{"key":"bibr17-1473871611415994","volume-title":"Graphics of Large Datasets: Visualizing a Million","author":"Utwin A","year":"2006"},{"key":"bibr18-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1145\/253262.253291"},{"key":"bibr19-1473871611415994","volume-title":"Quantitative Data Cleaning for Large Databases","author":"Hellerstein JM","year":"2008"},{"key":"bibr20-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1023\/B:AIRE.0000045502.10941.a9"},{"key":"bibr21-1473871611415994","first-page":"212","author":"Twiddy JC","year":"2004","journal-title":"IEEE Visualization"},{"key":"bibr22-1473871611415994","first-page":"100","volume-title":"VIS \u201903: 14th IEEE Visualization 2003 (VIS\u201903)","author":"Eaton C","year":"2003"},{"key":"bibr23-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1559\/1523040054738936"},{"key":"bibr24-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1057\/ivs.2009.1"},{"key":"bibr25-1473871611415994","first-page":"51","author":"Correa C","year":"2009","journal-title":"IEEE Visual Analytics Science and Technology"},{"key":"bibr26-1473871611415994","first-page":"143","author":"Griethe H","year":"2006","journal-title":"SimVis"},{"key":"bibr27-1473871611415994","first-page":"37","author":"Olston C","year":"2002","journal-title":"IEEE Symposium on Information Visualization"},{"key":"bibr28-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1007\/s003710050111"},{"key":"bibr29-1473871611415994","volume-title":"Czerwinski M and Parr CS","author":"Lee B","year":"2007"},{"key":"bibr30-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2004.30"},{"key":"bibr31-1473871611415994","author":"Gershon ND","journal-title":"Proceedings of the 3rd Conference on Visualization '92, VIS '92"},{"key":"bibr32-1473871611415994","author":"Lodha SK","journal-title":"Proceedings of the 7th Conference on Visualization '96, VIS '96"},{"key":"bibr33-1473871611415994","unstructured":"Kosara R. Semantic Depth of Field Using Blur for Focus+Context Visualization. PhD Thesis, Vienna University of Technology, Vienna, Austria, 2001."},{"key":"bibr34-1473871611415994","first-page":"953","author":"Benjelloun O","year":"2006","journal-title":"VLDB \u201906: 32nd International Conference on Very Large Data Bases"},{"key":"bibr35-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1016\/B978-155860688-3\/50014-2"},{"key":"bibr36-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1145\/1502650.1502692"},{"key":"bibr37-1473871611415994","first-page":"903","author":"Huynh DF","year":"2007","journal-title":"ISWC"},{"key":"bibr38-1473871611415994","first-page":"161","author":"Miller RC","year":"2001","journal-title":"USENIX Technical Conference"},{"key":"bibr39-1473871611415994","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1145\/1378773.1378792","author":"Tuchinda R","year":"2008","journal-title":"ACM IUI"},{"key":"bibr40-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1145\/1502650.1502667"},{"key":"bibr41-1473871611415994","first-page":"1719","author":"Leshed G","year":"2008","journal-title":"ACM CHI"},{"key":"bibr42-1473871611415994","first-page":"337","author":"Arasu A","year":"2003","journal-title":"ACM SIGMOD"},{"key":"bibr43-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007562322031"},{"key":"bibr44-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1145\/335191.336568"},{"key":"bibr45-1473871611415994","author":"Kandel S","year":"2011","journal-title":"ACM Human Factors in Computing Systems (CHI)"},{"key":"bibr46-1473871611415994","author":"Benjelloun O","year":"2008","journal-title":"VLDB J"},{"key":"bibr47-1473871611415994","first-page":"538","volume":"1","author":"Cafarella MJ","year":"2008","journal-title":"PVLDB"},{"key":"bibr48-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-003-0104-2"},{"key":"bibr49-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1007\/s007780100057"},{"key":"bibr50-1473871611415994","first-page":"805","author":"Haas LM","year":"2005","journal-title":"ACM SIGMOD"},{"key":"bibr51-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376700"},{"key":"bibr52-1473871611415994","unstructured":"Altova. Data Integration: Opportunities, Challenges, and Altova Mapforce. White Paper. http:\/\/www.altova.com\/whitepapers\/mapforce.pdf, accessed July 2010."},{"key":"bibr53-1473871611415994","unstructured":"CloverETL. Cloveretl Overview. http:\/\/www.cloveretl.com\/products\/designer, accessed July 2010."},{"key":"bibr54-1473871611415994","unstructured":"Informatica. The Informatica Data Quality Methodology: A Framework to Achieve Pervasive Data Quality through Enhanced Business\u2013IT Collaboration. http:\/\/www.informatica.com\/downloads\/7130-DQ-Methodology-wp-web.pdf, accessed July 2010."},{"key":"bibr55-1473871611415994","author":"Ives ZG","year":"2009","journal-title":"CIDR"},{"key":"bibr56-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142574"},{"key":"bibr57-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2008.137"},{"key":"bibr58-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2007.28"},{"key":"bibr59-1473871611415994","first-page":"49","author":"Kreuseler M","year":"2004","journal-title":"IEEE InfoVis"},{"key":"bibr60-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2009.87"},{"key":"bibr61-1473871611415994","author":"Chappell D","journal-title":"The Workflow Way: Understanding Windows Workflow Foundation"},{"key":"bibr62-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1007\/s10844-006-0034-8"},{"key":"bibr63-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bth361"},{"key":"bibr64-1473871611415994","first-page":"87","author":"Buneman P","year":"2000","journal-title":"In Foundations of Software Technology and Theoretical Computer Science"},{"key":"bibr65-1473871611415994","volume-title":"The Design of Everyday Things","author":"Norman DA","year":"2002"},{"key":"bibr66-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1145\/503099.503102"},{"key":"bibr67-1473871611415994","first-page":"1029","author":"Heer J","year":"2007","journal-title":"ACM CHI"},{"key":"bibr68-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1109\/HICSS.2008.188"},{"key":"bibr69-1473871611415994","doi-asserted-by":"publisher","DOI":"10.1057\/ivs.2009.16"}],"container-title":["Information Visualization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1473871611415994","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1473871611415994","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1473871611415994","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T21:11:43Z","timestamp":1741036303000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1473871611415994"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,9,2]]},"references-count":69,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2011,10]]}},"alternative-id":["10.1177\/1473871611415994"],"URL":"https:\/\/doi.org\/10.1177\/1473871611415994","relation":{},"ISSN":["1473-8716","1473-8724"],"issn-type":[{"value":"1473-8716","type":"print"},{"value":"1473-8724","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,9,2]]}}}