{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:23:54Z","timestamp":1760235834527,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2021,9,29]],"date-time":"2021-09-29T00:00:00Z","timestamp":1632873600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology, Taiwan","doi-asserted-by":"publisher","award":["109B0054"],"award-info":[{"award-number":["109B0054"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The use of distribution-based data representation to handle large-scale scientific datasets is a promising approach. Distribution-based approaches often transform a scientific dataset into many distributions, each of which is calculated from a small number of samples. Most of the proposed parallel algorithms focus on modeling single distributions from many input samples efficiently, but these may not fit the large-scale scientific data processing scenario because they cannot utilize computing resources effectively. Histograms and the Gaussian Mixture Model (GMM) are the most popular distribution representations used to model scientific datasets. Therefore, we propose the use of multi-set histogram and GMM modeling algorithms for the scenario of large-scale scientific data processing. Our algorithms are developed by data-parallel primitives to achieve portability across different hardware architectures. We evaluate the performance of the proposed algorithms in detail and demonstrate use cases for scientific data processing.<\/jats:p>","DOI":"10.3390\/a14100285","type":"journal-article","created":{"date-parts":[[2021,9,30]],"date-time":"2021-09-30T00:03:42Z","timestamp":1632960222000},"page":"285","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Efficient and Portable Distribution Modeling for Large-Scale Scientific Data Processing with Data-Parallel Primitives"],"prefix":"10.3390","volume":"14","author":[{"given":"Hao-Yi","family":"Yang","sequence":"first","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei 11677, Taiwan"}]},{"given":"Zhi-Rong","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei 11677, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7241-1939","authenticated-orcid":false,"given":"Ko-Chih","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei 11677, Taiwan"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1109\/TVCG.2016.2598604","article-title":"In situ distribution guided analysis and visualization of transonic jet engine simulations","volume":"23","author":"Dutta","year":"2016","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Dutta, S., Shen, H.W., and Chen, J.P. (2018, January 10\u201313). In Situ prediction driven feature analysis in jet engine simulations. Proceedings of the 2018 IEEE Pacific Visualization Symposium (PacificVis), Kobe, Japan.","DOI":"10.1109\/PacificVis.2018.00017"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Thompson, D., Levine, J.A., Bennett, J.C., Bremer, P.T., Gyulassy, A., Pascucci, V., and P\u00e9bay, P.P. (2011, January 23\u201324). Analysis of large-scale scalar data using hixels. Proceedings of the 2011 IEEE Symposium on Large Data Analysis and Visualization, Providence, RI, USA.","DOI":"10.1109\/LDAV.2011.6092313"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wang, K.C., Lu, K., Wei, T.H., Shareef, N., and Shen, H.W. (2017, January 18\u201321). Statistical visualization and analysis of large data using a value-based spatial distribution. Proceedings of the 2017 IEEE Pacific Visualization Symposium (PacificVis), Seoul, Korea.","DOI":"10.1109\/PACIFICVIS.2017.8031590"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Kumar, N.P., Satoor, S., and Buck, I. (2009, January 25\u201327). Fast parallel expectation maximization for gaussian mixture models on gpus using cuda. Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications, Seoul, Korea.","DOI":"10.1109\/HPCC.2009.45"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kwedlo, W. (2014, January 25\u201327). A parallel EM algorithm for Gaussian mixture models implemented on a NUMA system using OpenMP. Proceedings of the 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Seoul, Korea.","DOI":"10.1109\/PDP.2014.77"},{"key":"ref_7","unstructured":"Shams, R., and Kennedy, R. (2007, January 24\u201327). Efficient histogram algorithms for NVIDIA CUDA compatible devices. Proceedings of the ICSPCS 2007, Dubai, United Arab Emirates."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Li, G., Xu, J., Zhang, T., Shan, G., Shen, H.W., Wang, K.C., Liao, S., and Lu, Z. (2020, January 3\u20135). Distribution-based particle data reduction for in-situ analysis and visualization of large-scale n-body cosmological simulations. Proceedings of the 2020 IEEE Pacific Visualization Symposium (PacificVis), Tianjin, China.","DOI":"10.1109\/PacificVis48177.2020.1186"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Bell, N., and Hoberock, J. (2012). Thrust: A productivity-oriented library for CUDA. GPU Computing Gems Jade Edition, Elsevier.","DOI":"10.1016\/B978-0-12-385963-1.00026-5"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1109\/MCG.2016.48","article-title":"Vtk-m: Accelerating the visualization toolkit for massively threaded architectures","volume":"36","author":"Moreland","year":"2016","journal-title":"IEEE Comput. Graph. Appl."},{"key":"ref_11","unstructured":"Sewell, C.M. (2012). Piston: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators, Los Alamos National Lab. (LANL). Technical Report."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2693","DOI":"10.1109\/TVCG.2013.152","article-title":"Efficient local statistical analysis via integral histograms with discrete wavelet transform","volume":"19","author":"Lee","year":"2013","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wei, T.H., Dutta, S., and Shen, H.W. (2018, January 10\u201313). Information guided data sampling and recovery using bitmap indexing. Proceedings of the 2018 IEEE Pacific Visualization Symposium (PacificVis), Kobe, Japan.","DOI":"10.1109\/PacificVis.2018.00016"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wei, T.H., Chen, C.M., Woodring, J., Zhang, H., and Shen, H.W. (2017, January 18\u201321). Efficient distribution-based feature search in multi-field datasets. Proceedings of the 2017 IEEE Pacific Visualization Symposium (PacificVis), Seoul, Korea.","DOI":"10.1109\/PACIFICVIS.2017.8031586"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"3299","DOI":"10.1109\/TVCG.2019.2920130","article-title":"Ray-based exploration of large time-varying volume data using per-ray proxy distributions","volume":"26","author":"Wang","year":"2019","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liu, S., Levine, J.A., Bremer, P.T., and Pascucci, V. (2012, January 14\u201315). Gaussian mixture model based volume visualization. Proceedings of the IEEE Symposium on Large Data Analysis and Visualization (LDAV), Seattle, WA, USA.","DOI":"10.1109\/LDAV.2012.6378978"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Li, C., and Shen, H.W. (2017, January 27\u201330). Winding angle assisted particle tracing in distribution-based vector field. Proceedings of the SIGGRAPH Asia 2017 Symposium on Visualization, Bangkok, Thailand.","DOI":"10.1145\/3139295.3139297"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1109\/TVCG.2015.2467436","article-title":"Distribution driven extraction and tracking of features for time-varying data analysis","volume":"22","author":"Dutta","year":"2015","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, K.C., Xu, J., Woodring, J., and Shen, H.W. (2019, January 23\u201326). Statistical super resolution for data analysis and visualization of large scale cosmological simulations. Proceedings of the 2019 IEEE Pacific Visualization Symposium (PacificVis), Bangkok, Thailand.","DOI":"10.1109\/PacificVis.2019.00043"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Chaudhuri, A., Lee, T.Y., Shen, H.W., and Peterka, T. (2013, January 13\u201314). Efficient range distribution query in large-scale scientific data. Proceedings of the 2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV), Atlanta, GA, USA.","DOI":"10.1109\/LDAV.2013.6675171"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chaudhuri, A., Wei, T.H., Lee, T.Y., Shen, H.W., and Peterka, T. (2014, January 4\u20137). Efficient range distribution query for visualizing scientific data. Proceedings of the 2014 IEEE Pacific Visualization Symposium, Yokohama, Japan.","DOI":"10.1109\/PacificVis.2014.60"},{"key":"ref_22","unstructured":"Chen, C.M., Biswas, A., and Shen, H.W. (2015, January 14\u201317). Uncertainty modeling and error reduction for pathline computation in time-varying flow fields. Proceedings of the 2015 IEEE Pacific Visualization Symposium (PacificVis), Hangzhou, China."},{"key":"ref_23","unstructured":"Blelloch, G.E. (1990). Vector Models for Data-Parallel Computing, MIT Press."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Lessley, B., Perciano, T., Heinemann, C., Camp, D., Childs, H., and Bethel, E.W. (2018, January 21\u201321). DPP-PMRF: Rethinking optimization for a probabilistic graphical model using data-parallel primitives. Proceedings of the 2018 IEEE 8th Symposium on Large Data Analysis and Visualization (LDAV), Berlin, Germany.","DOI":"10.1109\/LDAV.2018.8739239"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Austin, W., Ballard, G., and Kolda, T.G. (2016, January 23\u201327). Parallel tensor compression for large-scale scientific data. Proceedings of the 2016 IEEE international parallel and distributed processing symposium (IPDPS), Chicago, IL, USA.","DOI":"10.1109\/IPDPS.2016.67"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1016\/j.parco.2003.04.001","article-title":"Distributed frameworks and parallel algorithms for processing large-scale geographic data","volume":"29","author":"Hawick","year":"2003","journal-title":"Parallel Comput."},{"key":"ref_27","unstructured":"Yenpure, A., Childs, H., and Moreland, K.D. (2019). Efficient Point Merging Using Data Parallel Techniques, Sandia National Lab. (SNL-NM). Technical Report."},{"key":"ref_28","unstructured":"Larsen, M., Labasan, S., Navr\u00e1til, P.A., Meredith, J.S., and Childs, H. (2015, January 25\u201326). Volume Rendering Via Data-Parallel Primitives. Proceedings of the 15th Eurographics Symposium on Parallel Graphics and Visualization, Cagliari, Sardinia, Italy."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"376-1","DOI":"10.2352\/ISSN.2470-1173.2020.1.VDA-376","article-title":"HashFight: A Platform-Portable Hash Table for Multi-Core and Many-Core Architectures","volume":"2020","author":"Lessley","year":"2020","journal-title":"Electron. Imaging"},{"key":"ref_30","unstructured":"Li, S., Marsaglia, N., Chen, V., Sewell, C.M., Clyne, J.P., and Childs, H. (2017, January 12\u201313). Achieving Portable Performance for Wavelet Compression Using Data Parallel Primitives. Proceedings of the 17th Eurographics Symposium on Parallel Graphics and Visualization, Barcelona, Spain."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lessley, B., Perciano, T., Mathai, M., Childs, H., and Bethel, E.W. (2017, January 2). Maximal clique enumeration with data-parallel primitives. Proceedings of the IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV), Phoenix, AZ, USA.","DOI":"10.1109\/LDAV.2017.8231847"},{"key":"ref_32","unstructured":"Esbensen, K.H., Guyot, D., Westad, F., and Houmoller, L.P. (2021, September 26). Multivariate Data Analysis: In Practice: An Introduction to Multivariate Data Analysis and Experimental Design. Available online: https:\/\/www.google.com\/books?hl=en&lr=&id=Qsn6yjRXOaMC&oi=fnd&pg=PA1&dq=Multivariate+data+analysis:+in+practice:+an+introduction+to+++multivariate+data+analysis+and+experimental+design&ots=cD1l2TqOT2&sig=1CUTO79G3V3-gGuEODBYBODjDJs."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1049\/el:19920132","article-title":"Improving the accuracy of direct histogram specification","volume":"28","author":"Zhang","year":"1992","journal-title":"Electron. Lett."},{"key":"ref_34","unstructured":"Jones, M., and Viola, P. (2021, September 26). Fast Multi-View Face Detection. Available online: https:\/\/www.researchgate.net\/profile\/Michael-Jones-66\/publication\/228362107_Fast_multi-view_face_detection\/links\/0fcfd50d35f8570d70000000\/Fast-multi-view-face-detection.pdf."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chakravarti, R., and Meng, X. (2009, January 27\u201329). A study of color histogram based image retrieval. Proceedings of the Sixth International Conference on Information Technology: New Generations, Las Vegas, NV, USA.","DOI":"10.1109\/ITNG.2009.126"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"033303","DOI":"10.1103\/PhysRevE.102.033303","article-title":"Extending machine learning classification capabilities with histogram reweighting","volume":"102","author":"Bachtis","year":"2020","journal-title":"Phys. Rev. E"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1109\/TVCG.2017.2744099","article-title":"Uncertainty visualization using copula-based analysis in mixed distribution models","volume":"24","author":"Hazarika","year":"2017","journal-title":"IEEE Trans. Visual Comput. Graphics"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1214","DOI":"10.1109\/TVCG.2018.2864801","article-title":"Codda: A flexible copula-based distribution driven analysis framework for large-scale multivariate data","volume":"25","author":"Hazarika","year":"2018","journal-title":"IEEE Trans. Visual Comput. Graphics"},{"key":"ref_39","unstructured":"(2021, September 26). IEEE Visualization 2004 Contest. Available online: http:\/\/vis.computer.org\/vis2004contest\/."},{"key":"ref_40","unstructured":"(2021, September 26). Nyx Simulation. Available online: https:\/\/amrex-astro.github.io\/Nyx\/."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/10\/285\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:07:34Z","timestamp":1760166454000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/10\/285"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,29]]},"references-count":40,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["a14100285"],"URL":"https:\/\/doi.org\/10.3390\/a14100285","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2021,9,29]]}}}