{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T03:17:16Z","timestamp":1777778236370,"version":"3.51.4"},"reference-count":59,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2020,1,30]],"date-time":"2020-01-30T00:00:00Z","timestamp":1580342400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100002322","name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100009288","name":"Canadian Bureau for International Education","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100009288","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Information Visualization"],"published-print":{"date-parts":[[2020,10]]},"abstract":"<jats:p>The current information age has increasingly required organizations to become data-driven. However, analyzing and managing raw data is still a challenging part of the data mining process. Even though we can find interview studies proposing design implications or recommendations for future visualization solutions in the data mining scope, they cover the entire workflow and do not fully focus on the challenges during the preprocessing phase and on how visualization can support it. Moreover, they do not organize a final list of insights consolidating the findings of other related studies. Hence, to better understand the current practice of enterprise professionals in data mining workflows, in particular, during the preprocessing phase, and how visualization supports this process, we conducted semi-structured interviews with 13 data analysts. The discussion about the challenges and opportunities based on the responses of the interviewees resulted in a list of 10 insights. This list was compared with the closest related works, improving the reliability of our findings and providing background, as a consolidated set of requirements, for future visualization research articles applied to visual data exploration in data mining. Furthermore, we provide greater details on the profile of the data analysts, the main challenges they face, and the opportunities that arise while they are engaged in data mining projects in diverse organizational areas.<\/jats:p>","DOI":"10.1177\/1473871619896101","type":"journal-article","created":{"date-parts":[[2020,1,30]],"date-time":"2020-01-30T06:59:06Z","timestamp":1580367546000},"page":"273-287","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":13,"title":["Visualization in the preprocessing phase: Getting insights from enterprise professionals"],"prefix":"10.1177","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8900-4179","authenticated-orcid":false,"given":"Alessandra Maciel Paz","family":"Milani","sequence":"first","affiliation":[{"name":"Pontif\u00edcia Universidade Cat\u00f3lica do Rio Grande do Sul (PUCRS), Porto Alegre, RS, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2316-760X","authenticated-orcid":false,"given":"Fernando V.","family":"Paulovich","sequence":"additional","affiliation":[{"name":"Dalhousie University, Halifax, NS, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Isabel Harb","family":"Manssour","sequence":"additional","affiliation":[{"name":"Pontif\u00edcia Universidade Cat\u00f3lica do Rio Grande do Sul (PUCRS), Porto Alegre, RS, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2020,1,30]]},"reference":[{"key":"bibr1-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1002\/0471448354"},{"key":"bibr2-1473871619896101","volume-title":"Data mining: concepts and techniques","author":"Han J","year":"2011","edition":"3"},{"key":"bibr3-1473871619896101","volume-title":"Knowledge discovery in databases","author":"Piateski G","year":"1991"},{"issue":"4","key":"bibr4-1473871619896101","first-page":"13","volume":"5","author":"Shearer C.","year":"2000","journal-title":"J Data Warehous"},{"key":"bibr5-1473871619896101","first-page":"3363","volume-title":"Proceedings of the SIGCHI conference on human factors in computing systems, CHI \u201811","author":"Kandel S"},{"key":"bibr6-1473871619896101","unstructured":"Hellerstein JM. Quantitative data cleaning for large databases, 2008, http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.115.6419"},{"key":"bibr7-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1002\/9781118840962"},{"key":"bibr8-1473871619896101","volume-title":"Introduction to Data mining","author":"Tan PN","year":"2005","edition":"1"},{"key":"bibr9-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2003.1207445"},{"key":"bibr10-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1201\/b18379"},{"key":"bibr11-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2017.2743990"},{"key":"bibr12-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2012.219"},{"key":"bibr13-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2865040"},{"key":"bibr14-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2011.279"},{"key":"bibr15-1473871619896101","volume-title":"Research Methods in Human-Computer Interaction","author":"Lazar J","year":"2017","edition":"2"},{"key":"bibr16-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1063\/1.4907823"},{"key":"bibr17-1473871619896101","unstructured":"Altexsoft. Machine learning: bridging between business and data science, 2019, https:\/\/www.altexsoft.com\/whitepapers\/machine-learning-bridging-between-business-and-data-science\/"},{"key":"bibr18-1473871619896101","unstructured":"Python. Python, 2018, https:\/\/www.python.org\/"},{"key":"bibr19-1473871619896101","unstructured":"R. The R project for statistical computing, 2018, https:\/\/www.r-project.org\/"},{"key":"bibr20-1473871619896101","unstructured":"Databricks. Databricks: making big data simple, 2018, https:\/\/databricks.com\/"},{"key":"bibr21-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656280"},{"key":"bibr22-1473871619896101","unstructured":"KNIME. KNIME: open for innovation, 2018, https:\/\/www.knime.com\/"},{"key":"bibr23-1473871619896101","first-page":"361","volume-title":"Proceedings of the Third international AAAI conference on weblogs and social media","author":"Bastian M"},{"key":"bibr24-1473871619896101","unstructured":"Gephi. The Open Graph Viz Platform, 2018, https:\/\/gephi.org\/"},{"key":"bibr25-1473871619896101","first-page":"2349","volume":"14","author":"Dem\u0161ar J","year":"2013","journal-title":"J Mach Learn Res"},{"key":"bibr26-1473871619896101","unstructured":"Orange. Orange: data mining fruitful and fun, 2018, https:\/\/orange.biolab.si\/"},{"key":"bibr27-1473871619896101","volume-title":"Proceedings of the 2nd USENIX conference on hot topics in cloud computing","author":"Zaharia M"},{"key":"bibr28-1473871619896101","unstructured":"Spark A. Apache spark: unified analytics engine for big data, 2018, https:\/\/spark.apache.org\/"},{"key":"bibr29-1473871619896101","unstructured":"Matplotlib. Matplotlib: Python plotting\u2014Matplotlib 3.0.2 documentation, 2018, https:\/\/matplotlib.org\/"},{"key":"bibr30-1473871619896101","unstructured":"Seaborn. seaborn: statistical data visualization - 0.9.0 documentation, 2018, https:\/\/seaborn.pydata.org\/"},{"key":"bibr31-1473871619896101","unstructured":"ggplot2. ggplot2\u2014tidyverse, 2018, https:\/\/ggplot2.tidyverse.org\/"},{"key":"bibr32-1473871619896101","unstructured":"Hadoop A. Apache Hadoop, 2018, https:\/\/hadoop.apache.org\/"},{"key":"bibr33-1473871619896101","unstructured":"SAS. SAS analytics, 2018, https:\/\/www.sas.com\/"},{"key":"bibr34-1473871619896101","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v074.i07"},{"key":"bibr35-1473871619896101","unstructured":"Templ M, Alfons A, Kowarik A, et al. Vim: visualization and imputation of missing values, 2018, https:\/\/cran.r-project.org\/web\/packages\/VIM\/index.html"},{"key":"bibr36-1473871619896101","unstructured":"Tableau. Tableau, 2018, http:\/\/www.tableau.com\/"},{"key":"bibr37-1473871619896101","unstructured":"Qlik. Qlik: data analytics for modern business intelligence, 2018, https:\/\/www.qlik.com"},{"key":"bibr38-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1016\/S0042-6989(00)00003-1"},{"key":"bibr39-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346574"},{"key":"bibr40-1473871619896101","first-page":"336","volume-title":"Proceedings of the 1996 IEEE symposium on visual languages. VL \u201896","author":"Shneiderman B"},{"key":"bibr41-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1007\/s00371-015-1132-9"},{"key":"bibr42-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2016.2598839"},{"key":"bibr43-1473871619896101","unstructured":"Heer J, Hellerstein JM, Kandel S. Predictive interaction for data transformation, http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.692.1613"},{"key":"bibr44-1473871619896101","first-page":"318","volume-title":"Proceedings of the SIGCHI conference on human factors in computing systems, CHI \u201894","author":"Rao R"},{"key":"bibr45-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1109\/2945.841121"},{"key":"bibr46-1473871619896101","first-page":"1157","volume":"3","author":"Guyon I","year":"2003","journal-title":"J Mach Learn Res"},{"key":"bibr47-1473871619896101","first-page":"2579","volume":"9","author":"Maaten L","year":"2008","journal-title":"J Mach Learn Res"},{"key":"bibr48-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7091-2668-4_10"},{"key":"bibr49-1473871619896101","unstructured":"Molnar C. Interpretable machine learning, 2019, https:\/\/christophm.github.io\/interpretable-ml-book\/"},{"key":"bibr50-1473871619896101","unstructured":"TensorFlow. A neural network playground, 2019, https:\/\/playground.tensorflow.org"},{"key":"bibr51-1473871619896101","unstructured":"Smilkov D, Carter S, Sculley D, et al. Direct-manipulation visualization of deep networks, 2017, arXiv:1708.03788"},{"key":"bibr52-1473871619896101","volume-title":"Mastering the information age: solving problems with visual analytics","author":"Keim D","year":"2010"},{"key":"bibr53-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346325"},{"key":"bibr54-1473871619896101","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13210"},{"key":"bibr55-1473871619896101","unstructured":"Commission E. Eu general data protection regulation, 2019, https:\/\/ec.europa.eu\/commission\/priorities\/justice-and-fundamental-rights\/data-protection\/2018-reform-eu-data-protection-rules_en"},{"key":"bibr56-1473871619896101","first-page":"480","volume-title":"Proceedings of the OTM Confederated International Conferences \u201cOn the Move to Meaningful Internet Systems,\u201d","author":"von Zernichow BM"},{"key":"bibr57-1473871619896101","first-page":"547","volume-title":"Proceedings of the international working conference on advanced visual interfaces, AVI \u201812","author":"Kandel S"},{"key":"bibr58-1473871619896101","unstructured":"Trifacta. Trifacta data wrangling tools & software, 2018, https:\/\/www.trifacta.com\/"},{"key":"bibr59-1473871619896101","unstructured":"Tableau. Tableau prep, 2019, https:\/\/www.tableau.com\/products\/prep"}],"container-title":["Information Visualization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1473871619896101","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1473871619896101","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1473871619896101","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T19:19:00Z","timestamp":1777490340000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1473871619896101"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,30]]},"references-count":59,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,10]]}},"alternative-id":["10.1177\/1473871619896101"],"URL":"https:\/\/doi.org\/10.1177\/1473871619896101","relation":{},"ISSN":["1473-8716","1473-8724"],"issn-type":[{"value":"1473-8716","type":"print"},{"value":"1473-8724","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,30]]}}}