{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T06:36:43Z","timestamp":1764225403434,"version":"build-2065373602"},"reference-count":53,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2017,12,12]],"date-time":"2017-12-12T00:00:00Z","timestamp":1513036800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Density-based spatial clustering of applications with noise (DBSCAN) is a density-based clustering algorithm that has the characteristics of being able to discover clusters of any shape, effectively distinguishing noise points and naturally supporting spatial databases. DBSCAN has been widely used in the field of spatial data mining. This paper studies the parallelization design and realization of the DBSCAN algorithm based on the Spark platform, and solves the following problems that arise when computing macro data: the requirement of a great deal of calculation using the single-node algorithm; the low level of resource-utilization with the multi-node algorithm; the large time consumption; and the lack of instantaneity. The experimental results indicate that the proposed parallel algorithm design is able to achieve more stable speedup at an increased involved spatial data scale.<\/jats:p>","DOI":"10.3390\/rs9121301","type":"journal-article","created":{"date-parts":[[2017,12,12]],"date-time":"2017-12-12T13:35:00Z","timestamp":1513085700000},"page":"1301","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":36,"title":["Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform"],"prefix":"10.3390","volume":"9","author":[{"given":"Fang","family":"Huang","sequence":"first","affiliation":[{"name":"School of Resources & Environment, University of Electronic Science and Technology of China, 2006 Xiyuan Ave., West Hi-Tech Zone, Chengdu 611731, China"},{"name":"Institute of Remote Sensing Big Data, Big Data Research Center, University of Electronic Science and Technology of China, 2006 Xiyuan Road, West Hi-Tech Zone, Chengdu 611731, China"}]},{"given":"Qiang","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Resources & Environment, University of Electronic Science and Technology of China, 2006 Xiyuan Ave., West Hi-Tech Zone, Chengdu 611731, China"}]},{"given":"Ji","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Resources & Environment, University of Electronic Science and Technology of China, 2006 Xiyuan Ave., West Hi-Tech Zone, Chengdu 611731, China"}]},{"given":"Jian","family":"Tao","sequence":"additional","affiliation":[{"name":"Texas A&M Engineering Experiment Station and High Performance Research Computing, Texas A&M University, College Station, TX 77843, USA"}]},{"given":"Xiaocheng","family":"Zhou","sequence":"additional","affiliation":[{"name":"Key Laboratory of Spatial Data Mining & Information Sharing of Ministry of Education, Fuzhou University, No. 2 Xueyuan Road, Fuzhou University New District, Fuzhou 350116, China"}]},{"given":"Du","family":"Jin","sequence":"additional","affiliation":[{"name":"School of Resources & Environment, University of Electronic Science and Technology of China, 2006 Xiyuan Ave., West Hi-Tech Zone, Chengdu 611731, China"}]},{"given":"Xicheng","family":"Tan","sequence":"additional","affiliation":[{"name":"International School of Software, Wuhan University, 129 Luoyu Road, Wuhan 430079, China"}]},{"given":"Lizhe","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science, China University of Geosciences, Wuhan 430074, China"},{"name":"Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 10094, China"}]}],"member":"1968","published-online":{"date-parts":[[2017,12,12]]},"reference":[{"key":"ref_1","unstructured":"Ester, M., Kriegel, H.-P., Sander, J., and Xu, X.W. (1996, January 2\u20134). A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1007\/s11704-013-3158-3","article-title":"MR-DBSCAN: A scalable MapReduce-based DBSCAN algorithm for heavily skewed data","volume":"8","author":"He","year":"2014","journal-title":"Front. Comput. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ankerst, M., Breunig, M.M., Kriegel, H.-P., and Sander, J. (1999, January 1\u20133). OPTICS: Ordering points to identify the clustering structure. Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, USA.","DOI":"10.1145\/304182.304187"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Chen, M., Gao, X., and Li, H. (2010, January 16\u201318). Parallel DBSCAN with Priority R-tree. Proceedings of the 2010 2nd IEEE International Conference on Information Management and Engineering, Chengdu, China.","DOI":"10.1109\/ICIME.2010.5477926"},{"key":"ref_5","unstructured":"Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., and Hu, Q. (2010). TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality. Rough Sets and Current Trends in Computing, Proceedings of the 7th International Conference, RSCTC 2010, Warsaw, Poland, 28\u201330 June 2010, Springer."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1111\/j.1467-8659.2007.01012.x","article-title":"A Survey of General-Purpose Computation on Graphics Hardware","volume":"26","author":"Owens","year":"2007","journal-title":"Comput. Graph. Forum"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1016\/j.future.2012.09.001","article-title":"G-Hadoop: MapReduce across distributed data centers for data-intensive computing","volume":"29","author":"Wang","year":"2013","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1145\/2629581","article-title":"Solving the global atmospheric equations through heterogeneous reconfigurable platforms","volume":"8","author":"Gan","year":"2015","journal-title":"ACM Trans. Reconfig. Technol. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1722","DOI":"10.1002\/cpe.2979","article-title":"A scalable Helmholtz solver in GRAPES over large-scale multicore cluster","volume":"25","author":"Li","year":"2013","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_10","first-page":"197","article-title":"Parallel processing of massive remote sensing images in a GPU architecture","volume":"33","author":"Liu","year":"2014","journal-title":"Comput. Inf."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1417","DOI":"10.1109\/TITB.2010.2072963","article-title":"GPGPU-Aided Ensemble Empirical-Mode Decomposition for EEG Analysis during Anesthesia","volume":"14","author":"Chen","year":"2010","journal-title":"IEEE Trans. Inf. Technol. Biomed."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/LGRS.2012.2198790","article-title":"GPU Implementation of an Automatic Target Detection and Classification Algorithm for Hyperspectral Image Analysis","volume":"10","author":"Lopez","year":"2013","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2281","DOI":"10.1109\/JSTARS.2014.2320896","article-title":"Multi-GPU Implementation of the Minimum Volume Simplex Analysis Algorithm for Hyperspectral Unmixing","volume":"7","author":"Agathos","year":"2014","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1007\/s10586-014-0413-9","article-title":"A scalable and fast OPTICS for clustering trajectory big data","volume":"18","author":"Deng","year":"2015","journal-title":"Clust. Comput."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1109\/TC.2013.2295806","article-title":"Fast and Scalable Multi-Way Analysis of Massive Neural Data","volume":"64","author":"Chen","year":"2015","journal-title":"IEEE Trans. Comput."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1016\/j.sysarc.2016.07.002","article-title":"Parallel compressive sampling matching pursuit algorithm for compressed sensing signal reconstruction with OpenCL","volume":"72","author":"Huang","year":"2017","journal-title":"J. Syst. Archit."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.1007\/s10586-015-0451-y","article-title":"A data parallel approach to modelling and simulation of large crowd","volume":"18","author":"Yu","year":"2015","journal-title":"Clust. Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MCSE.2012.89","article-title":"DDDAS-based parallel simulation of threat management for urban water distribution systems","volume":"16","author":"Wang","year":"2014","journal-title":"Comput. Sci. Eng."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1016\/j.adhoc.2015.07.011","article-title":"A MapReduce based Parallel Niche Genetic Algorithm for contaminant source identification in water distribution network","volume":"35","author":"Hu","year":"2015","journal-title":"Ad Hoc Netw."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.is.2013.11.002","article-title":"DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce","volume":"42","author":"Kim","year":"2014","journal-title":"Inf. Syst."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1023\/A:1009884809343","article-title":"A Fast Parallel Clustering Algorithm for Large Spatial Databases","volume":"3","author":"Xu","year":"1999","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"143","DOI":"10.3906\/elk-1202-83","article-title":"M-FDBSCAN: A multicore density-based uncertain data clustering algorithm","volume":"22","author":"Erdem","year":"2014","journal-title":"Turkish J. Electri. Eng. Comput. Sci."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"B\u00f6hm, C., Noll, R., Plant, C., and Wackersreuther, B. (2009, January 2\u20136). Density-based clustering using graphics processors. Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China.","DOI":"10.1145\/1645953.1646038"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1016\/j.procs.2013.05.200","article-title":"G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering","volume":"18","author":"Andrade","year":"2013","journal-title":"Procedia Comput. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","article-title":"MapReduce: Simplified data processing on large clusters","volume":"51","author":"Dean","year":"2008","journal-title":"Commun. ACM"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"B\u00f6se, J.-H., Andrzejak, A., and H\u00f6gqvist, M. (2010, January 26). Beyond online aggregation: Parallel and incremental data mining with online Map-Reduce. Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, Raleigh, NC, USA.","DOI":"10.1145\/1779599.1779602"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"He, Y., Tan, H., Luo, W., Mao, H., Ma, D., Feng, S., and Fan, J. (2011, January 7\u20139). MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce. Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems, Tainan, Taiwan.","DOI":"10.1109\/ICPADS.2011.83"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Dai, B.R., and Lin, I.C. (2012, January 24\u201329). Efficient Map\/Reduce-Based DBSCAN Algorithm with Optimized Data Partition. Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, Honolulu, HI, USA.","DOI":"10.1109\/CLOUD.2012.42"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1133","DOI":"10.4028\/www.scientific.net\/AMR.301-303.1133","article-title":"Research on parallel DBSCAN algorithm design based on mapreduce","volume":"301\u2013303","author":"Fu","year":"2011","journal-title":"Adv. Mater. Res."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Kumar, A., Kiran, M., and Prathap, B.R. (2013, January 4\u20136). Verification and validation of MapReduce program model for parallel K-means algorithm on Hadoop cluster. Proceedings of the 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), Tiruchengode, India.","DOI":"10.1109\/ICCCNT.2013.6726852"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Anchalia, P.P., Koundinya, A.K., and Srinath, N.K. (2013, January 24\u201326). MapReduce Design of K-Means Clustering Algorithm. Proceedings of the 2013 International Conference on Information Science and Applications (ICISA), Suwon, Korea.","DOI":"10.1109\/ICISA.2013.6579448"},{"key":"ref_32","unstructured":"Xu, Z.Q., and Zhao, D.W. (2012, January 11\u201313). Research on Clustering Algorithm for Massive Data Based on Hadoop Platform. Proceedings of the 2012 International Conference on Computer Science and Service System, Nanjing, China."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Nagpal, A., Jatain, A., and Gaur, D. (2013, January 11\u201312). Review based on data clustering algorithms. Proceedings of the 2013 IEEE Conference on Information & Communication Technologies, Thuckalay, India.","DOI":"10.1109\/CICT.2013.6558109"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Lin, X., Wang, P., and Wu, B. (2013, January 17\u201319). Log analysis in cloud computing environment with Hadoop and Spark. Proceedings of the 2013 5th IEEE International Conference on Broadband Network & Multimedia Technology, Guilin, China.","DOI":"10.1109\/ICBNMT.2013.6823956"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Shukla, S., Lease, M., and Tewari, A. (2012, January 12\u201316). Parallelizing ListNet training using spark. Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, OR, USA.","DOI":"10.1145\/2348283.2348502"},{"key":"ref_36","unstructured":"Lawson, D. (2014). Alternating Direction Method of Multipliers Implementation Using Apache Spark, Stanford University."},{"key":"ref_37","unstructured":"(2016, December 15). Biglearn. Available online: http:\/\/biglearn.org\/2013\/files\/papers\/biglearning2013_submission_7.pdf."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, B., Yin, J., Hua, Q., Wu, Z., and Cao, J. (2016, January 13\u201316). Parallelizing K-Means-Based Clustering on Spark. Proceedings of the 2016 International Conference on Advanced Cloud and Big Data (CBD), Chengdu, China.","DOI":"10.1109\/CBD.2016.016"},{"key":"ref_39","unstructured":"Jiang, H., and Liu, Z. (2015, January 23\u201325). Parallel FP-Like Algorithm with Spark. Proceedings of the 2015 IEEE 12th International Conference on e-Business Engineering, Beijing, China."},{"key":"ref_40","first-page":"40","article-title":"Spatial overlay analysis of land use vector data based on Spark","volume":"43","author":"Jin","year":"2016","journal-title":"J. Zhejiang Univ."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Chen, Y.G., Balke, W.T., Xu, J.L., Xu, W., Jin, P.Q., Lin, X., Tang, T., and Hwang, E.J. (2014). On Massive Spatial Data Retrieval Based on Spark. Web-Age Information Management, Proceedings of the WAIM 2014 International Conference on Web-Age Information Management, Macau, China, 16\u201318 June 2014, Springer.","DOI":"10.1007\/978-3-319-11538-2"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Suchanek, F., and Weikum, G. (2013, January 22\u201327). Knowledge harvesting in the big-data era. Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA.","DOI":"10.1145\/2463676.2463724"},{"key":"ref_43","first-page":"51","article-title":"DBSCAN spatial clustering algorithm and its application in urban planning","volume":"30","author":"Li","year":"2005","journal-title":"Sci. Surv. Mapp."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Nisa, K.K., Andrianto, H.A., and Mardhiyyah, R. (2014, January 18\u201319). Hotspot clustering using DBSCAN algorithm and shiny web framework. Proceedings of the 2014 International Conference on Advanced Computer Science and Information System, Jakarta, Indonesia.","DOI":"10.1109\/ICACSIS.2014.7065840"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"\u00c7elik, M., Dada\u015fer-\u00c7elik, F., and Dokuz, A.S. (2011, January 15\u201318). Anomaly detection in temperature data using DBSCAN algorithm. Proceedings of the 2011 International Symposium on Innovations in Intelligent Systems and Applications, Istanbul, Turkey.","DOI":"10.1109\/INISTA.2011.5946052"},{"key":"ref_46","unstructured":"Silva, T.L.C.D., Neto, A.C.A., Magalhaes, R.P., Farias, V.A.E.D., Mac\u00eado, J.A.F.D., and Machado, J.C. (2014, January 27\u201330). Efficient and distributed DBScan algorithm using mapreduce to detect density areas on traffic data. Proceedings of the 16th International Conference on Enterprise Information Systems, Lisbon, Portugal."},{"key":"ref_47","unstructured":"Adiba, M.E., and Lindsay, B.G. (1980, January 1\u20133). Database Snapshots. Proceedings of the Sixth International Conference on Very Large Data Bases, Montreal, QC, Canada."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Wang, W., Tao, L., Gao, C., Wang, B.F., Yang, H., and Zhang, Z.A. (2014, January 19\u201321). C-DBSCAN Algorithm for Determining Bus-Stop Locations Based on Taxi GPS Data. Proceedings of the 10th International Conference on Advanced Data Mining and Applications, Guilin, China.","DOI":"10.1007\/978-3-319-14717-8_23"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Liu, C.K., Qin, K., and Kang, C.G. (2015, January 8\u201310). Exploring time-dependent traffic congestion patterns from taxi trajectory data. Proceedings of the 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM), Fuzhou, China.","DOI":"10.1109\/ICSDM.2015.7298022"},{"key":"ref_50","unstructured":"Chen, X.W., Lu, Z.H., Jantsch, A., and Chen, S. (2009, January 20\u201323). Speedup analysis of data-parallel applications on Multi-core NoCs. Proceedings of the IEEE 8th International Conference on ASIC, Changsha, China."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Ieda, H. (2010). Development and management of transport systems. Sustainable Urban Transport in an Asian Context, Springer.","DOI":"10.1007\/978-4-431-93954-2"},{"key":"ref_52","unstructured":"Yin, L. (2010, January 7\u20139). The Analysis of Our Urban Transportation Problem and the Research of Road Construction &map Planning Management. Proceedings of the 2010 International Conference on E-Product E-Service and E-Entertainment, Henan, China."},{"key":"ref_53","unstructured":"Shao, Y., and Song, J.H. (2010). Traffic Congestion Management Strategies and Methods in Large Metropolitan Area: A Case Study in Shenzhen. Urban Transp. China, 8, (In Chinese)."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/9\/12\/1301\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:53:42Z","timestamp":1760208822000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/9\/12\/1301"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,12]]},"references-count":53,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2017,12]]}},"alternative-id":["rs9121301"],"URL":"https:\/\/doi.org\/10.3390\/rs9121301","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2017,12,12]]}}}