{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T05:26:18Z","timestamp":1772429178239,"version":"3.50.1"},"reference-count":33,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2017,9,8]],"date-time":"2017-09-08T00:00:00Z","timestamp":1504828800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>In the era of big data, Internet-based geospatial information services such as various LBS apps are deployed everywhere, followed by an increasing number of queries against the massive spatial data. As a result, the traditional relational spatial database (e.g., PostgreSQL with PostGIS and Oracle Spatial) cannot adapt well to the needs of large-scale spatial query processing. Spark is an emerging outstanding distributed computing framework in the Hadoop ecosystem. This paper aims to address the increasingly large-scale spatial query-processing requirement in the era of big data, and proposes an effective framework GeoSpark SQL, which enables spatial queries on Spark. On the one hand, GeoSpark SQL provides a convenient SQL interface; on the other hand, GeoSpark SQL achieves both efficient storage management and high-performance parallel computing through integrating Hive and Spark. In this study, the following key issues are discussed and addressed: (1) storage management methods under the GeoSpark SQL framework, (2) the spatial operator implementation approach in the Spark environment, and (3) spatial query optimization methods under Spark. Experimental evaluation is also performed and the results show that GeoSpark SQL is able to achieve real-time query processing. It should be noted that Spark is not a panacea. It is observed that the traditional spatial database PostGIS\/PostgreSQL performs better than GeoSpark SQL in some query scenarios, especially for the spatial queries with high selectivity, such as the point query and the window query. In general, GeoSpark SQL performs better when dealing with compute-intensive spatial queries such as the kNN query and the spatial join query.<\/jats:p>","DOI":"10.3390\/ijgi6090285","type":"journal-article","created":{"date-parts":[[2017,9,8]],"date-time":"2017-09-08T11:34:52Z","timestamp":1504870492000},"page":"285","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1255-1913","authenticated-orcid":false,"given":"Zhou","family":"Huang","sequence":"first","affiliation":[{"name":"Institute of Remote Sensing & GIS, Peking University, Beijing 100871, China"},{"name":"Beijing Advanced Innovation Center for Future Internet Technology, Beijing 100124, China"}]},{"given":"Yiran","family":"Chen","sequence":"additional","affiliation":[{"name":"Institute of Remote Sensing & GIS, Peking University, Beijing 100871, China"}]},{"given":"Lin","family":"Wan","sequence":"additional","affiliation":[{"name":"Faculty of Information Engineering, China University of Geosciences, Wuhan 430074, China"}]},{"given":"Xia","family":"Peng","sequence":"additional","affiliation":[{"name":"Collaborative Innovation Center of eTourism, Institute of Tourism, Beijing Union University, Beijing 100101, China"},{"name":"State Key Laboratory of Resources and Environmental Information System, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China"}]}],"member":"1968","published-online":{"date-parts":[[2017,9,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Zhong, Y., Han, J., Zhang, T., Li, Z., Fang, J., and Chen, G. (2012, January 21\u201325). Towards parallel spatial query processing for big spatial data. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), Shanghai, China.","DOI":"10.1109\/IPDPSW.2012.245"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"121","DOI":"10.14257\/ijdta.2014.7.6.11","article-title":"Newsql: Towards next-generation scalable rdbms for online transaction processing (oltp) for big data management","volume":"7","author":"Moniruzzaman","year":"2014","journal-title":"Int. J. Database Theory Appl."},{"key":"ref_3","first-page":"166","article-title":"Massive geospatial data cloud storage and services based on nosql database technique","volume":"15","author":"Chen","year":"2013","journal-title":"J. Geo-Inf. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1836","DOI":"10.3724\/SP.J.1001.2013.04377","article-title":"Algorithm for processing k-nearest join based on r-tree in mapreduce","volume":"24","author":"Liu","year":"2013","journal-title":"J. Softw."},{"key":"ref_5","unstructured":"(2017, July 14). GIS Tools for Hadoop. Available online: http:\/\/esri.github.io\/gis-tools-for-hadoop\/."},{"key":"ref_6","unstructured":"(2017, July 14). Spatialhadoop. Available online: http:\/\/spatialhadoop.cs.umn.edu\/."},{"key":"ref_7","unstructured":"(2017, July 14). Hadoop-GIS. Available online: http:\/\/bmidb.cs.stonybrook.edu\/hadoopgis\/index."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Tripathy, A., Mishra, L., and Patra, P.K. (2010, January 20\u201322). An efficient approach for distributed spatial query optimization using filters. Proceedings of the 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), Chengdu, China.","DOI":"10.1109\/ICACTE.2010.5579413"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Cary, A., Sun, Z., Hristidis, V., and Rishe, N. (2009, January 2\u20134). Experiences on processing spatial data with mapreduce. Proceedings of the International Conference on Scientific and Statistical Database Management, New Orleans, LA, USA.","DOI":"10.1007\/978-3-642-02279-1_24"},{"key":"ref_10","unstructured":"Wang, Y., and Wang, S. (2010, January 28\u201331). Research and implementation on spatial data storage and operation based on hadoop platform. Proceedings of the 2010 Second IITA International Conference on Geoscience and Remote Sensing (IITA-GRS), Qingdao, China."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Yan, B., and Rhodes, P.J. (2011, January 13\u201316). IDEA\u2014An API for parallel computing with large spatial datasets. Proceedings of the 2011 International Conference on Parallel Processing (ICPP), Taipei, Taiwan.","DOI":"10.1109\/ICPP.2011.70"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wan, L., Huang, Z., and Peng, X. (2016). An Effective NoSQL-Based Vector Map Tile Management Approach. ISPRS Int. J. Geo-Inf., 5.","DOI":"10.3390\/ijgi5110215"},{"key":"ref_13","unstructured":"Cui, X. (2010). Distributed Storage Management and Parallel Processing Technologies of Massive Spatial Data. [Master\u2019s Thesis, National University of Defense Technology]."},{"key":"ref_14","unstructured":"Zhong, Y., Zhu, X., Cheng, Z., Liao, H., and Fang, J. (2011, January 26). A high efficiency management method for massive spatial data based on the distributed storage computing architecture. Proceedings of the China National Conference on High Performance Computing, Beijing, China."},{"key":"ref_15","unstructured":"(2017, July 14). HadoopDB. Available online: http:\/\/db.cs.yale.edu\/hadoopdb\/hadoopdb.html."},{"key":"ref_16","unstructured":"(2017, July 14). Apache Spark. Available online: http:\/\/spark.apache.org\/docs\/latest\/."},{"key":"ref_17","unstructured":"(2017, July 14). Understanding Spark\u2019s Core RDD. Available online: http:\/\/www.infoq.com\/cn\/articles\/spark-core-rdd\/."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xie, X., Xiong, Z., Hu, X., Zhou, G., and Ni, J. (2014, January 16\u201318). On massive spatial data retrieval based on spark. Proceedings of the International Conference on Web-Age Information Management, Macau, China.","DOI":"10.1007\/978-3-319-11538-2_19"},{"key":"ref_19","first-page":"401","article-title":"A framework of distributed spatial data analysis based on shark\/spark","volume":"17","author":"Wen","year":"2015","journal-title":"J. Geo-Inf. Sci."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"You, S., Zhang, J., and Gruenwald, L. (2015, January 13\u201317). Large-scale spatial join query processing in cloud. Proceedings of the 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW), Seoul, Korea.","DOI":"10.1109\/ICDEW.2015.7129541"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yu, J., Wu, J., and Sarwat, M. (2015, January 3\u20136). Geospark: A cluster computing framework for processing large-scale spatial data. Proceedings of the the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Bellevue, WA, USA.","DOI":"10.1145\/2820783.2820860"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Baig, F., Mehrotra, M., Vo, H., Wang, F., Saltz, J., and Kurc, T. (September, January 31). Sparkgis: Efficient comparison and evaluation of algorithm results in tissue image analysis studies. Proceedings of the Biomedical Data Management and Graph Online Querying: VLDB 2015 Workshops, Big-O (Q) and DMAH, Waikoloa, HI, USA.","DOI":"10.1007\/978-3-319-41576-5_10"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Gali\u0107, Z. (2016). Spatio-temporal data streams and big data paradigm. Spatio-Temporal Data Streams, Springer.","DOI":"10.1007\/978-1-4939-6575-5"},{"key":"ref_24","unstructured":"Kini, A., and Emanuele, R. (2017, July 14). Geotrellis: Adding Geospatial Capabilities to Spark, Spark Summit 2014. Available online: https:\/\/spark-summit.org\/2014\/geotrellis-adding-geospatial-capabilities-to-spark\/."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1565","DOI":"10.14778\/3007263.3007310","article-title":"Locationspark: A distributed in-memory data management system for big spatial data","volume":"9","author":"Tang","year":"2016","journal-title":"Proc. VLDB Endow."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhang, F., Zhou, J., Liu, R., Du, Z., and Ye, X. (2016). A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability. Sustainability, 8.","DOI":"10.3390\/su8090926"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Du, Z., Zhao, X., Ye, X., Zhou, J., Zhang, F., and Liu, R. (2017). An Effective High-Performance Multiway Spatial Join Algorithm with Spark. ISPRS Int. J. Geo-Inf., 6.","DOI":"10.3390\/ijgi6040096"},{"key":"ref_28","unstructured":"Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., and Stoica, I. (2010, January 22\u201325). Spark: Cluster computing with working sets. Proceedings of the Usenix Conference on Hot Topics in Cloud Computing, Boston, MA, USA."},{"key":"ref_29","unstructured":"(2017, July 14). Apache Drill. Available online: http:\/\/drill.apache.org\/."},{"key":"ref_30","unstructured":"(2017, July 14). Apache Impala. Available online: http:\/\/impala.apache.org\/."},{"key":"ref_31","unstructured":"(2017, July 14). Shark, Spark SQL, Hive on Spark, and the Future of SQL on Apache Spark. Available online: https:\/\/databricks.com\/blog\/2014\/07\/01\/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html."},{"key":"ref_32","unstructured":"(2017, July 14). Introduction to Spark SQL. Available online: http:\/\/www.cnblogs.com\/shishanyuan\/p\/4723604.html."},{"key":"ref_33","unstructured":"International Organization for Standardization (2016). Information Technology\u2014Database Languages\u2014SQL Multimedia and Application Packages\u2014Part 3: Spatial, International Organization for Standardization. ISO\/IEC 13249-3:2016."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/6\/9\/285\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:44:29Z","timestamp":1760208269000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/6\/9\/285"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,9,8]]},"references-count":33,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2017,9]]}},"alternative-id":["ijgi6090285"],"URL":"https:\/\/doi.org\/10.3390\/ijgi6090285","relation":{},"ISSN":["2220-9964"],"issn-type":[{"value":"2220-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,9,8]]}}}