{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T15:46:14Z","timestamp":1769528774911,"version":"3.49.0"},"reference-count":44,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2018,4,7]],"date-time":"2018-04-07T00:00:00Z","timestamp":1523059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark\/SpatialHadoop were enhanced from Spark\/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small\/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability.<\/jats:p>","DOI":"10.3390\/ijgi7040144","type":"journal-article","created":{"date-parts":[[2018,4,10]],"date-time":"2018-04-10T13:06:08Z","timestamp":1523365568000},"page":"144","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5231-2303","authenticated-orcid":false,"given":"Fei","family":"Hu","sequence":"first","affiliation":[{"name":"NSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mengchao","family":"Xu","sequence":"additional","affiliation":[{"name":"NSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingchao","family":"Yang","sequence":"additional","affiliation":[{"name":"NSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanshou","family":"Liang","sequence":"additional","affiliation":[{"name":"NSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kejin","family":"Cui","sequence":"additional","affiliation":[{"name":"NSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael M.","family":"Little","sequence":"additional","affiliation":[{"name":"NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher S.","family":"Lynnes","sequence":"additional","affiliation":[{"name":"NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel Q.","family":"Duffy","sequence":"additional","affiliation":[{"name":"NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7768-4066","authenticated-orcid":false,"given":"Chaowei","family":"Yang","sequence":"additional","affiliation":[{"name":"NSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,4,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Demchenko, Y., Grosso, P., De Laat, C., and Membrey, P. (2013, January 20\u201324). Addressing big data issues in scientific data infrastructure. Proceedings of the 2013 International Conference on Collaboration Technologies and Systems (CTS), San Diego, CA, USA.","DOI":"10.1109\/CTS.2013.6567203"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1038\/455028a","article-title":"Big data: How do your data grow?","volume":"455","author":"Lynch","year":"2008","journal-title":"Nature"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Camara, G., Assis, L.F., Ribeiro, G., Ferreira, K.R., Llapa, E., and Vinhas, L. (2016, January 31). Big earth observation data analytics: Matching requirements to system architectures. Proceedings of the 5th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, Burlingame, CA, USA.","DOI":"10.1145\/3006386.3006393"},{"key":"ref_4","unstructured":"Skytland, N. (2018, April 06). What Is NASA Doing with Big Data Today?, Available online: https:\/\/open.nasa.gov\/blog\/what-is-nasa-doing-with-big-data-today\/."},{"key":"ref_5","unstructured":"Das, K. (2015, January 14\u201318). Evaluation of Big Data Containers for Popular Storage, Retrieval, and Computation Primitives in Earth Science Analysis. Proceedings of the 2015 AGU Fall Meeting Abstracts, San Francisco, CA, USA."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1080\/17538947.2016.1239771","article-title":"Big Data and cloud computing: Innovation opportunities and challenges","volume":"10","author":"Yang","year":"2017","journal-title":"Int. J. Digit. Earth"},{"key":"ref_7","unstructured":"National Research Council (2003). IT Roadmap to a Geospatial Future, National Academies Press."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Baumann, P., and Stamerjohanns, H. (2014). Towards a systematic benchmark for array database systems. Specifying Big Data Benchmarks, Springer.","DOI":"10.1007\/978-3-642-53974-9_9"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Brown, P.G. (2010, January 6\u201311). Overview of SciDB: Large scale array storage, processing and analysis. Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, IN, USA.","DOI":"10.1145\/1807167.1807271"},{"key":"ref_10","unstructured":"Chodorow, K. (2013). MongoDB: The Definitive Guide: Powerful and Scalable Data Storage, O\u2019Reilly Media, Inc."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1626","DOI":"10.14778\/1687553.1687609","article-title":"Hive: A warehousing solution over a map-reduce framework","volume":"2","author":"Thusoo","year":"2009","journal-title":"Proc. VLDB Endow."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1145\/2934664","article-title":"Apache Spark: A unified engine for big data processing","volume":"59","author":"Zaharia","year":"2016","journal-title":"Commun. ACM"},{"key":"ref_13","unstructured":"Rusu, F., and Cheng, Y. (arXiv, 2013). A survey on array storage, query languages, and systems, arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1109\/TGRS.2008.2002076","article-title":"Fire information for resource management system: Archiving and distributing MODIS active fire data","volume":"47","author":"Davies","year":"2009","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhong, Y., Sun, S., Liao, H., Zhao, Y., and Fang, J. (2011, January 24\u201326). A novel method to manage very large raster data on distributed key-value storage system. Proceedings of the 2011 19th International Conference on Geoinformatics, Shanghai, China.","DOI":"10.1109\/GeoInformatics.2011.5980711"},{"key":"ref_16","unstructured":"(2018, April 06). MySQL Enterprise Scalbility. Available online: https:\/\/www.mysql.com\/products\/enterprise\/scalability.html."},{"key":"ref_17","unstructured":"Obe, R.O., and Hsu, L.S. (2015). PostGIS in Action, Manning Publications Co."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhong, Y., Han, J., Zhang, T., and Fang, J. (2012, January 15\u201317). A distributed geospatial data storage and processing framework for large-scale WebGIS. Proceedings of the 2012 20th International Conference on Geoinformatics (GEOINFORMATICS), Hong Kong, China.","DOI":"10.1109\/Geoinformatics.2012.6270347"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/j.cageo.2013.05.001","article-title":"Evaluating open-source cloud computing solutions for geosciences","volume":"59","author":"Huang","year":"2013","journal-title":"Comput. Geosci."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1016\/j.compenvurbsys.2016.10.010","article-title":"Utilizing Cloud Computing to address big geospatial data challenges","volume":"61","author":"Yang","year":"2017","journal-title":"Comput. Environ. Urban Syst."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1109\/ACCESS.2014.2332453","article-title":"Toward scalable systems for big data analytics: A technology tutorial","volume":"2","author":"Hu","year":"2014","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Kersten, M., and Manegold, S. (2013, January 22\u201327). SciQL: Array data processing inside an RDBMS. Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA.","DOI":"10.1145\/2463676.2463684"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Geng, Y., Huang, X., Zhu, M., Ruan, H., and Yang, G. (2013, January 16\u201318). SciHive: Array-based query processing with HiveQL. Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Melbourne, VIC, Australia.","DOI":"10.1109\/TrustCom.2013.108"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.14778\/2536222.2536227","article-title":"Hadoop GIS: A high performance spatial data warehousing system over mapreduce","volume":"6","author":"Aji","year":"2013","journal-title":"Proc. VLDB Endow."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Palamuttam, R., Mogrovejo, R.M., Mattmann, C., Wilson, B., Whitehall, K., Verma, R., McGibbney, L., and Ramirez, P. (November, January 29). SciSpark: Applying in-memory distributed computing to weather event detection and tracking. Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2015.7363983"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1145\/276305.276386","article-title":"The multidimensional database system RasDaMan","volume":"Volume 27","author":"Baumann","year":"1998","journal-title":"Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1109\/TPAMI.1984.4767553","article-title":"Database structure and manipulation capabilities of a picture database management system (PICDMS)","volume":"6","author":"Chock","year":"1984","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Kersten, M., Zhang, Y., Ivanova, M., and Nes, N. (2011, January 25). SciQL, a query language for science applications. Proceedings of the EDBT\/ICDT 2011 Workshop on Array Databases, Uppsala, Sweden.","DOI":"10.1145\/1966895.1966896"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1534","DOI":"10.14778\/1687553.1687584","article-title":"A demonstration of SciDB: A science-oriented DBMS","volume":"2","author":"Kimura","year":"2009","journal-title":"Proc. VLDB Endow."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Planthaber, G., Stonebraker, M., and Frew, J. (2012, January 6). EarthDB: Scalable analysis of MODIS data using SciDB. Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, Redondo Beach, CA, USA.","DOI":"10.1145\/2447481.2447483"},{"key":"ref_31","unstructured":"Amirian, P., Basiri, A., and Winstanley, A. (July, January 30). Evaluation of data management systems for geospatial big data. Proceedings of the International Conference on Computational Science and Its Applications, Guimar\u00e3es, Portugal."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Aniceto, R., Xavier, R., Holanda, M., Walter, M.E., and Lifschitz, S. (2014, January 2\u20135). Genomic data persistency on a NoSQL database system. Proceedings of the 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Belfast, UK.","DOI":"10.1109\/BIBM.2014.6999304"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Ameri, P., Grabowski, U., Meyer, J., and Streit, A. (2014, January 24\u201326). On the application and performance of MongoDB for climate satellite data. Proceedings of the 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Beijing, China.","DOI":"10.1109\/TrustCom.2014.84"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Han, D., and Stroulia, E. (July, January 28). Hgrid: A data model for large geospatial data sets in hbase. Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing (CLOUD), Santa Clara, CA, USA.","DOI":"10.1109\/CLOUD.2013.78"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Merticariu, G., Misev, D., and Baumann, P. (2015). Towards a General Array Database Benchmark: Measuring Storage Access. Big Data Benchmarking, Springer.","DOI":"10.1007\/978-3-319-49748-8_3"},{"key":"ref_36","unstructured":"(2018, April 06). Indexes. Available online: https:\/\/docs.mongodb.com\/manual\/indexes\/."},{"key":"ref_37","unstructured":"(2018, April 06). Aggregation. Available online: https:\/\/docs.mongodb.com\/manual\/aggregation\/."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Chevalier, M., El Malki, M., Kopliku, A., Teste, O., and Tournier, R. (2015, January 1\u20134). Implementation of multidimensional databases with document-oriented NoSQL. Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery, Valencia, Spain.","DOI":"10.1007\/978-3-319-22729-0_29"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Gudivada, V.N., Rao, D., and Raghavan, V.V. (July, January 27). NoSQL systems for big data management. Proceedings of the 2014 IEEE World Congress on Services (SERVICES), Anchorage, AK, USA.","DOI":"10.1109\/SERVICES.2014.42"},{"key":"ref_40","unstructured":"(2018, April 06). Compare to Relational Database. Available online: http:\/\/www.paradigm4.com\/try_scidb\/compare-to-relational-databases\/."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1080\/13658816.2015.1131830","article-title":"A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce","volume":"31","author":"Li","year":"2017","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1016\/j.compenvurbsys.2013.12.003","article-title":"MERRA analytic services: Meeting the big data challenges of climate science through cloud-enabled climate analytics-as-a-service","volume":"61","author":"Schnase","year":"2017","journal-title":"Comput. Environ. Urban Syst."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1109\/MCSE.2013.19","article-title":"SciDB: A database management system for applications with complex analytics","volume":"15","author":"Stonebraker","year":"2013","journal-title":"Comput. Sci. Eng."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"5498","DOI":"10.1073\/pnas.0909315108","article-title":"Using spatial principles to optimize distributed computing for enabling the physical science discoveries","volume":"14","author":"Yang","year":"2011","journal-title":"Proc. Natl. Acad. Sci."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/7\/4\/144\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T14:59:54Z","timestamp":1760194794000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/7\/4\/144"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,4,7]]},"references-count":44,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2018,4]]}},"alternative-id":["ijgi7040144"],"URL":"https:\/\/doi.org\/10.3390\/ijgi7040144","relation":{},"ISSN":["2220-9964"],"issn-type":[{"value":"2220-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,4,7]]}}}