{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T19:09:40Z","timestamp":1771700980741,"version":"3.50.1"},"reference-count":41,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2016,9,27]],"date-time":"2016-09-27T00:00:00Z","timestamp":1474934400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data- and computational-intensity challenges in GIScience in a more cost-efficient manner.<\/jats:p>","DOI":"10.3390\/ijgi5100173","type":"journal-article","created":{"date-parts":[[2016,9,27]],"date-time":"2016-09-27T05:57:13Z","timestamp":1474955833000},"page":"173","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":40,"title":["Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data"],"prefix":"10.3390","volume":"5","author":[{"given":"Zhenlong","family":"Li","sequence":"first","affiliation":[{"name":"Department of Geography, University of South Carolina, Columbia, SC 29208, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7768-4066","authenticated-orcid":false,"given":"Chaowei","family":"Yang","sequence":"additional","affiliation":[{"name":"Spatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USA"}]},{"given":"Kai","family":"Liu","sequence":"additional","affiliation":[{"name":"Spatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5231-2303","authenticated-orcid":false,"given":"Fei","family":"Hu","sequence":"additional","affiliation":[{"name":"Spatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USA"}]},{"given":"Baoxuan","family":"Jin","sequence":"additional","affiliation":[{"name":"Yunnan Provincial Geomatics Center, Kunming 650034, China"}]}],"member":"1968","published-online":{"date-parts":[[2016,9,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1016\/j.bdr.2015.01.003","article-title":"Geospatial big data: Challenges and opportunities","volume":"2","author":"Lee","year":"2015","journal-title":"Big Data Res."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"5498","DOI":"10.1073\/pnas.0909315108","article-title":"Using spatial principles to optimize distributed computing for enabling the physical science discoveries","volume":"108","author":"Yang","year":"2011","journal-title":"Proc. Natl. Acad. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1080\/00045601003791243","article-title":"A cyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis","volume":"100","author":"Wang","year":"2010","journal-title":"Ann. Assoc. Am. Geogr."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Asimakopoulou, E. (2010). Advanced ICTs for Disaster Management and Threat Detection: Collaborative and Distributed Frameworks: Collaborative and Distributed Frameworks, IGI Global.","DOI":"10.4018\/978-1-61520-987-3"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1080\/17538947.2011.587547","article-title":"Spatial cloud computing: How can the geospatial sciences use and help shape cloud computing?","volume":"4","author":"Yang","year":"2011","journal-title":"Int. J. Digit. Earth"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Karimi, H.A. (2014). Big Data: Techniques and Technologies in Geoinformatics, CRC Press.","DOI":"10.1201\/b16524"},{"key":"ref_7","unstructured":"Schnase, J.L., Duffy, D.Q., Tamkin, G.S., Nadeau, D., Thompson, J.H., Grieg, C.M., and Webster, W.P. (2014). MERRA analytic services: Meeting the big data challenges of climate science through cloud-enabled climate analytics-as-a-service. Comput. Environ. Urban Syst."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/j.cageo.2010.05.015","article-title":"Optimizing grid computing configuration and scheduling for geospatial analysis: An example with interpolating DEM","volume":"37","author":"Huang","year":"2011","journal-title":"Comput. Geosci."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Buck, J.B., Watkins, N., LeFevre, J., Ioannidou, K., Maltzahn, C., Polyzotis, N., and Brandt, S. (2011, January 12\u201318). SciHadoop: Array-based query processing in Hadoop. Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Seattle, DC, USA.","DOI":"10.1145\/2063384.2063473"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1230","DOI":"10.14778\/2536274.2536283","article-title":"A demonstration of spatial Hadoop: An efficient MapReduce framework for spatial data","volume":"6","author":"Eldawy","year":"2013","journal-title":"Proc. VLDB Endow."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Li, Z., Hu, F., Schnase, J.L., Duffy, D.Q., Lee, T., Bowen, M.K., and Yang, C. (2016). A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce. Int. J. Geogr. Inf. Sci., 1\u201319.","DOI":"10.1080\/13658816.2015.1131830"},{"key":"ref_12","unstructured":"Gao, S., Li, L., Li, W., Janowicz, K., and Zhang, Y. (2014). Constructing gazetteers from volunteered big geo-data based on Hadoop. Comput. Environ. Urban Syst."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Li, Z., Yang, C., Jin, B., Yu, M., Liu, K., Sun, M., and Zhan, M. (2015). Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework. PLoS ONE.","DOI":"10.1371\/journal.pone.0116781"},{"key":"ref_14","unstructured":"Pierce, M.E., Fox, G.C., Ma, Y., and Wang, J. (2009). Cloud computing and spatial cyberinfrastructure. J. Comput. Sci. Indiana Univ."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1080\/13658810902733682","article-title":"Introduction to distributed geographic information processing research","volume":"23","author":"Yang","year":"2009","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1080\/17538947.2014.929750","article-title":"Adopting cloud computing to optimize spatial web portals for better performance to support Digital Earth and other global geospatial initiatives","volume":"8","author":"Xia","year":"2015","journal-title":"Int. J. Digit. Earth"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tu, S., Flanagin, M., Wu, Y., Abdelguerfi, M., Normand, E., Mahadevan, V., and Shaw, K. (2004, January 5\u20137). Design strategies to improve performance of GIS web services. Proceedings of the International Conference on Information Technology: Coding and Computing, Las Vegas, NV, USA.","DOI":"10.1109\/ITCC.2004.1286692"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1038\/nrg2857","article-title":"Computational solutions to large-scale data management and analysis","volume":"11","author":"Schadt","year":"2010","journal-title":"Nat. Rev. Genet."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","article-title":"MapReduce: Simplified data processing on large clusters","volume":"51","author":"Dean","year":"2008","journal-title":"Commun. ACM"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1007\/s11036-013-0489-0","article-title":"Big data: A survey","volume":"19","author":"Chen","year":"2014","journal-title":"Mob. Netw. Appl."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1080\/15481603.2013.810976","article-title":"Storage and processing of massive remote sensing images using a novel cloud computing platform","volume":"50","author":"Lin","year":"2013","journal-title":"GISci. Remote Sens."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Krishnan, S., Baru, C., and Crosby, C. (2010). Evaluation of MapReduce for gridding LIDAR data. Cloud Comput. Technol. Sci.","DOI":"10.1109\/CloudCom.2010.34"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.14778\/2536222.2536227","article-title":"Hadoop GIS: A high performance spatial data warehousing system over MapReduce","volume":"6","author":"Aji","year":"2013","journal-title":"Proc. VLDB Endow."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1145\/1740390.1740405","article-title":"On the energy (in) efficiency of Hadoop clusters","volume":"44","author":"Leverich","year":"2010","journal-title":"ACM SIGOPS Oper. Syst. Rev."},{"key":"ref_25","unstructured":"Kaushik, R.T., and Bhandarkar, M. (2010, January 23\u201325). GreenHDFS: Towards an energy-conserving storage-efficient, hybrid Hadoop compute cluster. Proceedings of the USENIX Annual Technical Conference, Boston, MA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.future.2011.07.001","article-title":"Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework","volume":"28","author":"Maheshwari","year":"2012","journal-title":"Futur. Gener. Comput. Syst."},{"key":"ref_27","first-page":"1","article-title":"The NIST definition of cloud computing","volume":"53","author":"Mell","year":"2009","journal-title":"Natl. Ins. Stand. Technol."},{"key":"ref_28","unstructured":"Getting Started with Hadoop with Amazon\u2019s Elastic MapReduce. Available online: http:\/\/www.slideshare.net\/DrSkippy27\/amazon-elastic-map-reduce-getting-started-with-hadoop."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Baheti, V.K. (2014). Windows azure HDInsight: Where big data meets the cloud. IT Bus. Ind. Gov.","DOI":"10.1109\/CSIBIG.2014.7056928"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Herodotou, H., Dong, F., and Babu, S. (2011, January 26\u201328). No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. Proceedings of the 2nd ACM Symposium on Cloud Computing, Cascais, Portugal.","DOI":"10.1145\/2038916.2038934"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Agrawal, D., Das, S., and Abbadi, A. (2011, January 21\u201325). Big data and cloud computing: Current state and future opportunities. Proceedings of the 14th International Conference on Extending Database Technology, Uppsala, Sweden.","DOI":"10.1145\/1951365.1951432"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wang, Y., Wang, S., and Zhou, D. (2009). Retrieving and Indexing Spatial Data in the Cloud Computing Environment, Springer.","DOI":"10.1007\/978-3-642-10665-1_29"},{"key":"ref_33","first-page":"275","article-title":"Handling intensities of data, computation, concurrent access, and spatiotemporal patterns","volume":"Volume 16","author":"Yang","year":"2015","journal-title":"Spatial Cloud Computing: A Practical Approach"},{"key":"ref_34","unstructured":"Li, Z., Yang, C., Huang, Q., Liu, K., Sun, M., and Xia, J. (2014). Building model as a service for supporting geosciences. Comput. Environ. Urban Syst."},{"key":"ref_35","unstructured":"R\u00f6me, T. (2010). Autoscaling Hadoop Clusters. [Master\u2019s Thesis, University of Tartu]."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gandhi, A., Thota, S., Dube, P., Kochut, A., and Zhang, L. (2016, January 16\u201318). Autoscaling for Hadoop clusters. Proceedings of the NSDI 2016, Santa Clara, CA, USA.","DOI":"10.1109\/IC2E.2016.11"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The Hadoop distributed file system. IEEE Comput. Soc.","DOI":"10.1109\/MSST.2010.5496972"},{"key":"ref_38","unstructured":"Amazon EC2 Pricing. Available online: https:\/\/aws.amazon.com\/ec2\/pricing\/."},{"key":"ref_39","first-page":"10","article-title":"Spark: Cluster computing with working sets","volume":"10","author":"Zaharia","year":"2010","journal-title":"HotCloud"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1016\/j.compenvurbsys.2010.04.001","article-title":"Geospatial cyberinfrastructure: Past, present and future","volume":"34","author":"Yang","year":"2010","journal-title":"Comput. Environ. Urban Syst."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1080\/13658810801918509","article-title":"A theoretical approach to the use of cyberinfrastructure in geographical analysis","volume":"23","author":"Wang","year":"2009","journal-title":"Int. J. Geogr. Inf. Sci."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/5\/10\/173\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T19:31:50Z","timestamp":1760211110000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/5\/10\/173"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,9,27]]},"references-count":41,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2016,10]]}},"alternative-id":["ijgi5100173"],"URL":"https:\/\/doi.org\/10.3390\/ijgi5100173","relation":{},"ISSN":["2220-9964"],"issn-type":[{"value":"2220-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,9,27]]}}}