{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:35:45Z","timestamp":1760240145575,"version":"build-2065373602"},"reference-count":44,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,3,25]],"date-time":"2019-03-25T00:00:00Z","timestamp":1553472000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Progressive visualization offers a great deal of promise for big data visualization; however, current progressive visualization systems do not allow for continuous interaction. What if users want to see more confident results on a subset of the visualization? This can happen when users are in exploratory analysis mode but want to ask some directed questions of the data as well. In a progressive visualization system, the online aggregation algorithm determines the database sampling rate and resulting convergence rate, not the user. In this paper, we extend a recent method in online aggregation, called Wander Join, that is optimized for queries that join tables, one of the most computationally expensive operations. This extension leverages importance sampling to enable user-driven sampling when data joins are in the query. We applied user interaction techniques that allow the user to view and adjust the convergence rate, providing more transparency and control over the online aggregation process. By leveraging importance sampling, our extension of Wander Join also allows for stratified sampling of groups when there is data distribution skew. We also improve the convergence rate of filtering queries, but with additional overhead costs not needed in the original Wander Join algorithm.<\/jats:p>","DOI":"10.3390\/informatics6010014","type":"journal-article","created":{"date-parts":[[2019,3,25]],"date-time":"2019-03-25T06:56:52Z","timestamp":1553497012000},"page":"14","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Selective Wander Join: Fast Progressive Visualizations for Data Joins"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9518-1259","authenticated-orcid":false,"given":"Marianne","family":"Procopio","sequence":"first","affiliation":[{"name":"Department of Computer Science, Tufts University, Medford, MA 02155, USA"}]},{"given":"Carlos","family":"Scheidegger","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Arizona, Tucson, AZ 85721, USA"}]},{"given":"Eugene","family":"Wu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Columbia University, New York, NY 10027, USA"}]},{"given":"Remco","family":"Chang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Tufts University, Medford, MA 02155, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., and Price, T.G. (June, January 30). Access path selection in a relational database management system. Proceedings of the 1979 ACM SIGMOD international conference on Management of data, Boston, MA, USA.","DOI":"10.1145\/582095.582099"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1093\/biomet\/57.1.97","article-title":"Monte Carlo sampling methods using Markov chains and their applications","volume":"57","author":"Hastings","year":"1970","journal-title":"Biometrika"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Li, F., Wu, B., Yi, K., and Zhao, Z. (July, January 26). Wander Join: Online Aggregation via Random Walks. Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA.","DOI":"10.1145\/2882903.2915235"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"903","DOI":"10.14778\/2732951.2732964","article-title":"The case for data visualization management systems: Vision paper","volume":"7","author":"Wu","year":"2014","journal-title":"Proc. VLDB Endow."},{"key":"ref_5","unstructured":"Wu, E., Psallidas, F., Miao, Z., Zhang, H., Rettig, L., Wu, Y., and Sellam, T. (2017, January 8\u201311). Combining Design and Performance in a Data Visualization Management System. Proceedings of the Conference on Innovative Data Systems Research, Chaminade, CA, USA."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1643","DOI":"10.1109\/TVCG.2014.2346578","article-title":"Opening the black box: Strategies for increased user involvement in existing algorithm implementations","volume":"20","author":"Piringer","year":"2014","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Angelini, M., Santucci, G., Schumann, H., and Schulz, H.J. (2018). A Review and Characterization of Progressive Visual Analytics. Inform. Multidiscip. Dig. Publ. Inst., 5.","DOI":"10.3390\/informatics5030031"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1653","DOI":"10.1109\/TVCG.2014.2346574","article-title":"Progressive visual analytics: User-driven visual exploration of in-progress analytics","volume":"20","author":"Stolper","year":"2014","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1739","DOI":"10.1109\/TVCG.2016.2570755","article-title":"Approximated and user steerable tsne for progressive visual analytics","volume":"23","author":"Pezzotti","year":"2016","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1109\/TVCG.2016.2598470","article-title":"Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis","volume":"23","author":"Turkay","year":"2017","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Fisher, D., Popov, I., and Drucker, S. (2012, January 5\u201310). Trust me, I\u2019m partially right: Incremental visualization lets analysts explore large datasets faster. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Austin, TX, USA.","DOI":"10.1145\/2207676.2208294"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Fisher, D. (2011, January 23\u201324). Incremental, approximate database queries and uncertainty for exploratory visualization. Proceedings of the 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV), Providence, RI, USA.","DOI":"10.1109\/LDAV.2011.6092320"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1109\/2.781635","article-title":"Interactive data analysis: The control project","volume":"32","author":"Hellerstein","year":"1999","journal-title":"Computer"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1977","DOI":"10.1109\/TVCG.2016.2607714","article-title":"How Progressive Visualizations Affect Exploratory Analysis","volume":"23","author":"Zgraggen","year":"2016","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Moritz, D., Fisher, D., Ding, B., and Wang, C. (2017, January 6\u201311). Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA.","DOI":"10.1145\/3025453.3025456"},{"key":"ref_16","unstructured":"Fekete, J.D. (2015, January 17\u201321). Progressivis: A toolkit for steerable progressive analytics and visualization. Proceedings of the 1st Workshop on Data Systems for Interactive Analysis, Chicago, IL, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Rosenbaum, R., and Schumann, H. (2009, January 24). Progressive refinement: More than a means to overcome limited bandwidth. Proceedings of the IS&T\/SPIE Electronic Imaging, San Jose, CA, USA.","DOI":"10.1117\/12.810501"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1111\/cgf.13205","article-title":"Steering the craft: UI elements and visualizations for supporting progressive visual analytics","volume":"Volume 36","author":"Badam","year":"2017","journal-title":"Computer Graphics Forum"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1109\/2945.981851","article-title":"Polaris: A system for query, analysis, and visualization of multidimensional relational databases","volume":"8","author":"Stolte","year":"2002","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2456","DOI":"10.1109\/TVCG.2013.179","article-title":"Nanocubes for real-time exploration of spatiotemporal datasets","volume":"19","author":"Lins","year":"2013","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1111\/cgf.12129","article-title":"imMens: Real-time Visual Querying of Big Data","volume":"Volume 32","author":"Liu","year":"2013","journal-title":"Computer Graphics Forum"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1109\/TVCG.2016.2598624","article-title":"Hashedcubes: Simple, low memory, real-time visual exploration of big data","volume":"23","author":"Pahins","year":"2017","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1109\/TVCG.2016.2598694","article-title":"Gaussian Cubes: Real-Time Modeling for Visual Exploration of Large Multidimensional Datasets","volume":"23","author":"Wang","year":"2017","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., and Stoica, I. (2013, January 15\u201317). BlinkDB: Queries with bounded errors and bounded response times on very large data. Proceedings of the 8th ACM European Conference on Computer Systems, Prague, Czech Republic.","DOI":"10.1145\/2465351.2465355"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ding, B., Huang, S., Chaudhuri, S., Chakrabarti, K., and Wang, C. (July, January 26). Sample+ Seek: Approximating Aggregates with Distribution Precision Guarantee. Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA.","DOI":"10.1145\/2882903.2915249"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kamat, N., Jayachandran, P., Tunga, K., and Nandi, A. (April, January 31). Distributed and interactive cube exploration. Proceedings of the 2014 IEEE 30th International Conference on Data Engineering (ICDE), Chicago, IL, USA.","DOI":"10.1109\/ICDE.2014.6816674"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Li, X., Han, J., Yin, Z., Lee, J.G., and Sun, Y. (2008, January 10\u201312). Sampling cube: A framework for statistical olap over sampling data. Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Vancouver, BC, Canada.","DOI":"10.1145\/1376616.1376695"},{"key":"ref_28","unstructured":"Fekete, J.D., and Primet, R. (arXiv, 2016). Progressive analytics: A computation paradigm for exploratory data analysis, arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Im, J.F., Villegas, F.G., and McGuffin, M.J. (2013, January 6\u20139). Visreduce: Fast and responsive incremental information visualization of large datasets. Proceedings of the 2013 IEEE International Conference on Big Data, Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2013.6691710"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1145\/1242524.1242526","article-title":"Optimized stratified sampling for approximate query processing","volume":"32","author":"Chaudhuri","year":"2007","journal-title":"ACM Trans. Database Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Park, Y., Cafarella, M., and Mozafari, B. (2016, January 16\u201320). Visualization-aware sampling for very large databases. Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland.","DOI":"10.1109\/ICDE.2016.7498287"},{"key":"ref_32","unstructured":"Doshi, P.R., Geraldine, E., Rosario, G., Rundensteiner, E., and Ward, M. (2003, January 9\u201311). A strategy selection framework for adaptive prefetching in data visualization. Proceedings of the 15th International Conference on Scientific and Statistical Database Management, Cambridge, MA, USA."},{"key":"ref_33","unstructured":"Chan, S.M., Xiao, L., Gerth, J., and Hanrahan, P. (2008, January 19\u201324). Maintaining interactivity while exploring massive time series. Proceedings of the IEEE Symposium on Visual Analytics Science and Technology, Columbus, OH, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Battle, L., Chang, R., and Stonebraker, M. (July, January 26). Dynamic prefetching of data tiles for interactive visualization. Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA.","DOI":"10.1145\/2882903.2882919"},{"key":"ref_35","unstructured":"Cetintemel, U., Cherniack, M., DeBrabant, J., Diao, Y., Dimitriadou, K., Kalinin, A., Papaemmanouil, O., and Zdonik, S.B. (2013, January 6\u20139). Query Steering for Interactive Data Exploration. Proceedings of the Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA."},{"key":"ref_36","unstructured":"Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., and O\u2019Neil, E. (September, January 30). C-store: A column-oriented DBMS. Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Kemper, A., and Neumann, T. (2011, January 11\u201316). HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE), Hannover, Germany.","DOI":"10.1109\/ICDE.2011.5767867"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2142","DOI":"10.1109\/TKDE.2016.2557324","article-title":"Interactive visualization of large data sets","volume":"28","author":"Godfrey","year":"2016","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1145\/253262.253291","article-title":"Online aggregation","volume":"26","author":"Hellerstein","year":"1997","journal-title":"ACM SIGMOD Rec."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1145\/304181.304208","article-title":"Ripple joins for online aggregation","volume":"28","author":"Haas","year":"1999","journal-title":"ACM SIGMOD Rec."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1198\/jcgs.2011.1de","article-title":"ASA 2009 Data Expo","volume":"20","author":"Wickham","year":"2011","journal-title":"J. Comput. Graph. Stat."},{"key":"ref_42","unstructured":"Shneiderman, B. (1996, January 3\u20136). The eyes have it: A task by data type taxonomy for information visualizations. Proceedings of the IEEE Symposium on Visual Languages, Boulder, CO, USA."},{"key":"ref_43","unstructured":"Alabi, D., and Wu, E. (July, January 26). PFunk-H: Approximate query processing using perceptual models. Proceedings of the Workshop on Human-In-the-Loop Data Analytics, San Francisco, CA, USA."},{"key":"ref_44","unstructured":"Wu, E., and Nandi, A. (2015, January 26). Towards Perception-aware Interactive Data Visualization Systems. Proceedings of the DSIA Workshop, Chicago, IL, USA."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/6\/1\/14\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:40:24Z","timestamp":1760186424000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/6\/1\/14"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,25]]},"references-count":44,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["informatics6010014"],"URL":"https:\/\/doi.org\/10.3390\/informatics6010014","relation":{},"ISSN":["2227-9709"],"issn-type":[{"type":"electronic","value":"2227-9709"}],"subject":[],"published":{"date-parts":[[2019,3,25]]}}}