{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T03:18:12Z","timestamp":1777778292070,"version":"3.51.4"},"reference-count":82,"publisher":"SAGE Publications","issue":"2-3","license":[{"start":{"date-parts":[[2021,7,1]],"date-time":"2021-07-01T00:00:00Z","timestamp":1625097600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Information Visualization"],"published-print":{"date-parts":[[2021,7]]},"abstract":"<jats:p>To accommodate the demands of a data-driven society, we have expanded our ability to collect and store data, develop sophisticated algorithms, and generate elaborated visual representations of the data analysis process outcomes. However, data preprocessing, as the activity of transforming the raw data into an appropriate format for subsequent analysis, is still a challenging part of this process. Although we can find studies that address the use of visualization techniques to support the activities in the scope of preprocessing, the current Visual Analytics processes do not consider preprocessing an equally important phase in their processes. Hence, with this paper, we aim to contribute to the discussion of how we can incorporate the preprocessing as a prominent phase in the Visual Analytics process and promote better alternatives to assist the data analysts during the preprocessing activities. To achieve that, we are introducing the Preprocessing Profiling Approach for Visual Analytics (PrAVA), a conceptual Visual Analytics process that includes Preprocessing Profiling as a new phase. It also contemplates a set of guidelines to be considered by new solutions adopting PrAVA. Moreover, we analyze its applicability through use case scenarios that show resourceful methods for data understanding and evaluation of the preprocessing impacts. As a final contribution, we indicate a list of research opportunities in the scope of preprocessing combined with visualization and Visual Analytics to stimulate a shift to visual preprocessing.<\/jats:p>","DOI":"10.1177\/14738716211021591","type":"journal-article","created":{"date-parts":[[2021,7,2]],"date-time":"2021-07-02T09:38:05Z","timestamp":1625218685000},"page":"101-122","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":4,"title":["PrAVA: Preprocessing profiling approach for visual analytics"],"prefix":"10.1177","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8900-4179","authenticated-orcid":false,"given":"Alessandra Maciel Paz","family":"Milani","sequence":"first","affiliation":[{"name":"University of Victoria, Victoria, BC, Canada"},{"name":"Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lucas Angelo","family":"Loges","sequence":"additional","affiliation":[{"name":"Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2316-760X","authenticated-orcid":false,"given":"Fernando Vieira","family":"Paulovich","sequence":"additional","affiliation":[{"name":"Dalhousie University, Halifax, NS, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9446-6757","authenticated-orcid":false,"given":"Isabel Harb","family":"Manssour","sequence":"additional","affiliation":[{"name":"Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2021,7,2]]},"reference":[{"key":"bibr1-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1023\/A:1021564703268"},{"key":"bibr2-14738716211021591","volume-title":"Introduction to data mining","author":"Tan P","year":"2006"},{"key":"bibr3-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1145\/1978942.1979444"},{"key":"bibr4-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1177\/1473871611415994"},{"key":"bibr5-14738716211021591","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v059.i10"},{"key":"bibr6-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1145\/2939502.2939511"},{"key":"bibr7-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2864838"},{"key":"bibr8-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1177\/1473871619896101"},{"key":"bibr9-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1002\/0471448354"},{"key":"bibr10-14738716211021591","first-page":"3","volume":"23","author":"Rahm E","year":"2000","journal-title":"Bull IEEE Comput Soc Tech Committee Data Eng"},{"key":"bibr11-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2016.2598829"},{"key":"bibr12-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2014.62"},{"key":"bibr13-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2003.1207445"},{"key":"bibr14-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1201\/b18379"},{"key":"bibr15-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-016-6028-y"},{"key":"bibr16-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1145\/2254556.2254659"},{"key":"bibr17-14738716211021591","first-page":"39","volume-title":"Proceedings of SIGRAD 2012; interactive visual analysis of data","author":"Bernard J"},{"key":"bibr18-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1145\/2637748.2638423"},{"key":"bibr19-14738716211021591","volume-title":"Mastering the information age: solving problems with visual analytics","author":"Keim D","year":"2010"},{"key":"bibr20-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346481"},{"key":"bibr21-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-39940-9_601"},{"key":"bibr22-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/HICSS.2016.183"},{"key":"bibr23-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/VAST.2017.8585498"},{"key":"bibr24-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13210"},{"key":"bibr25-14738716211021591","first-page":"1","volume-title":"Mensch and computer 2016 \u2014 Workshopband","author":"Seipp K","year":"2016"},{"key":"bibr26-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2015.2467622"},{"key":"bibr27-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2017.2744805"},{"key":"bibr28-14738716211021591","first-page":"1","volume-title":"Proceedings of the biennial conference on innovative data systems research","author":"Heer J","year":"2015"},{"key":"bibr29-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-69459-7_32"},{"key":"bibr30-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1145\/1541880.1541882"},{"key":"bibr31-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750549"},{"key":"bibr32-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1007\/s11634-011-0102-y"},{"key":"bibr33-14738716211021591","first-page":"861","volume-title":"Proceedings of the conference on human-computer interaction","author":"Eaton C","year":"2005"},{"key":"bibr34-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/iV.2017.12"},{"key":"bibr35-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2864914"},{"key":"bibr36-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376420"},{"key":"bibr37-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2017.2743990"},{"key":"bibr38-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2012.219"},{"key":"bibr39-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2005.07.023"},{"key":"bibr40-14738716211021591","unstructured":"Pandas-profiling. Pandas-profiling, 2020. https:\/\/github.com\/pandas-profiling\/pandas-profiling."},{"key":"bibr41-14738716211021591","doi-asserted-by":"publisher","DOI":"10.21105\/joss.00547"},{"key":"bibr42-14738716211021591","doi-asserted-by":"publisher","DOI":"10.21105\/joss.01075"},{"key":"bibr43-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2017.2744938"},{"key":"bibr44-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2017.2744683"},{"key":"bibr45-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2017.2744158"},{"key":"bibr46-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/VAST.2017.8585720"},{"key":"bibr47-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2017.2745158"},{"key":"bibr48-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2016.2598828"},{"key":"bibr49-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2015.2467191"},{"key":"bibr50-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1145\/3092931.3092937"},{"key":"bibr51-14738716211021591","unstructured":"Tableau. Tableau prep. URL https:\/\/www.tableau.com\/products\/prep (2020)."},{"key":"bibr52-14738716211021591","unstructured":"Trifacta. Trifacta data wrangling tools & software. URL https:\/\/www.trifacta.com\/ (2020)"},{"key":"bibr53-14738716211021591","unstructured":"Tableau. Tableau. URL http:\/\/www.tableau.com\/ (2020)"},{"key":"bibr54-14738716211021591","first-page":"13","volume":"5","author":"Shearer C","year":"2000","journal-title":"J Data Warehous"},{"key":"bibr55-14738716211021591","first-page":"1","author":"Turkay C","year":"2018","journal-title":"arXiv preprint"},{"key":"bibr56-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346574"},{"key":"bibr57-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2009.84"},{"key":"bibr58-14738716211021591","first-page":"50","volume":"19","author":"Heer J","year":"2012","journal-title":"Big Data"},{"key":"bibr59-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2019.2934283"},{"key":"bibr60-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/VISUAL.2019.8933542"},{"key":"bibr61-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2009.111"},{"key":"bibr62-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1111\/j.1469-1809.1936.tb02137.x"},{"key":"bibr63-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13972"},{"key":"bibr64-14738716211021591","unstructured":"Dua D, Graff C. The UCI machine learning repository - mammographic mass data set. URL https:\/\/archive.ics.uci.edu\/ml\/datasets\/Mammographic+Mass (2020)."},{"key":"bibr65-14738716211021591","unstructured":"Dua D, Graff C. The UCI machine learning repository - cervical cancer (risk factors) data set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/Cervical+cancer+%28Risk+Factors%29 (2020)."},{"key":"bibr66-14738716211021591","first-page":"2825","volume":"12","author":"Pedregosa F","year":"2011","journal-title":"J Mach Learn Res"},{"key":"bibr67-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1118\/1.2786864"},{"key":"bibr68-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-020-02250-1"},{"key":"bibr69-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-020-04737-7"},{"key":"bibr70-14738716211021591","first-page":"1","volume-title":"2020 IST-Africa conference (IST-Africa)","author":"Ahishakiye E","year":"2020"},{"key":"bibr71-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/ICECTE48615.2019.9303554"},{"key":"bibr72-14738716211021591","doi-asserted-by":"publisher","DOI":"10.3390\/s20102809"},{"key":"bibr73-14738716211021591","first-page":"6","volume":"23","author":"D\u2019Ignazio C","year":"2017","journal-title":"Inf Des J"},{"key":"bibr74-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-40700-5_12"},{"key":"bibr75-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/VAST.2009.5332611"},{"key":"bibr76-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2015.2467591"},{"key":"bibr77-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1201\/b17511"},{"key":"bibr78-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1177\/0272989X10373805"},{"key":"bibr79-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/VAST.2012.6400554"},{"key":"bibr80-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2859973"},{"key":"bibr81-14738716211021591","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2016.2640960"},{"key":"bibr82-14738716211021591","first-page":"25","volume-title":"EuroVis workshop on visual analytics (EuroVA)","author":"Angelini M","year":"2019"}],"container-title":["Information Visualization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/14738716211021591","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/14738716211021591","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/14738716211021591","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T19:19:09Z","timestamp":1777490349000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/14738716211021591"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7]]},"references-count":82,"journal-issue":{"issue":"2-3","published-print":{"date-parts":[[2021,7]]}},"alternative-id":["10.1177\/14738716211021591"],"URL":"https:\/\/doi.org\/10.1177\/14738716211021591","relation":{},"ISSN":["1473-8716","1473-8724"],"issn-type":[{"value":"1473-8716","type":"print"},{"value":"1473-8724","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7]]}}}