{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T02:44:50Z","timestamp":1778726690732,"version":"3.51.4"},"reference-count":49,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2022,10,5]],"date-time":"2022-10-05T00:00:00Z","timestamp":1664928000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Strategic Priority Research Program of the Chinese Academy of Sciences","award":["XDA19030101"],"award-info":[{"award-number":["XDA19030101"]}]},{"name":"Strategic Priority Research Program of the Chinese Academy of Sciences","award":["2022122"],"award-info":[{"award-number":["2022122"]}]},{"name":"Strategic Priority Research Program of the Chinese Academy of Sciences","award":["guikeAA20302022"],"award-info":[{"award-number":["guikeAA20302022"]}]},{"name":"Youth Innovation Promotion Association","award":["XDA19030101"],"award-info":[{"award-number":["XDA19030101"]}]},{"name":"Youth Innovation Promotion Association","award":["2022122"],"award-info":[{"award-number":["2022122"]}]},{"name":"Youth Innovation Promotion Association","award":["guikeAA20302022"],"award-info":[{"award-number":["guikeAA20302022"]}]},{"name":"China-ASEAN Big Earth Data Platform and Applications","award":["XDA19030101"],"award-info":[{"award-number":["XDA19030101"]}]},{"name":"China-ASEAN Big Earth Data Platform and Applications","award":["2022122"],"award-info":[{"award-number":["2022122"]}]},{"name":"China-ASEAN Big Earth Data Platform and Applications","award":["guikeAA20302022"],"award-info":[{"award-number":["guikeAA20302022"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Data stream partitioning is a fundamental and important mechanism for distributed systems. However, use of an inappropriate partition scheme may generate a data skew problem, which can influence the execution efficiency of many application tasks. Processing of skewed partitions usually takes a longer time, need more computational resources to complete the task and can even become a performance bottleneck. To solve such data skew issues, this paper proposes a novel partition method to divide on demand the image tiles uniformly into partitions. The partitioning problem is then transformed into a uniform and compact clustering problem whereby the image tiles are regarded as image pixels without spectrum and texture information. First, the equal area conversion principle was used to select the seed points of the partitions and then the image tiles were aggregated in an image layout, thus achieving an initial partition scheme. Second, the image tiles of the initial partition were finely adjusted in the vertical and horizontal directions in separate steps to achieve a uniform distribution among the partitions. Two traditional partition methods were adopted to evaluate the efficiency of the proposed method in terms of the image segmentation testing, data shuffle testing, and image clipping testing. The results demonstrated that the proposed partition method solved the data skew problem observed in the hash partition method. In addition, this method is designed specifically for processing of image tiles and makes the related processing operations for large-scale images faster and more efficient.<\/jats:p>","DOI":"10.3390\/rs14194964","type":"journal-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T03:07:28Z","timestamp":1665371248000},"page":"4964","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["A Cluster-Based Partition Method of Remote Sensing Data for Efficient Distributed Image Processing"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7163-3644","authenticated-orcid":false,"given":"Lei","family":"Wang","sequence":"first","affiliation":[{"name":"International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China"},{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bo","family":"Yu","sequence":"additional","affiliation":[{"name":"International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China"},{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fang","family":"Chen","sequence":"additional","affiliation":[{"name":"International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China"},{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Congrong","family":"Li","sequence":"additional","affiliation":[{"name":"International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China"},{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bin","family":"Li","sequence":"additional","affiliation":[{"name":"International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China"},{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ning","family":"Wang","sequence":"additional","affiliation":[{"name":"International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China"},{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,10,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1016\/j.isprsjprs.2020.02.012","article-title":"Segmentation of large-scale remotely sensed images on a Spark platform: A strategy for handling massive image tiles with the MapReduce model","volume":"162","author":"Wang","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Chen, F., Wang, N., Yu, B., Qin, Y.C., and Wang, L. (2021). A Strategy of Parallel Seed-Based Image Segmentation Algorithms for Handling Massive Image Tiles over the Spark Platform. Remote Sens., 13.","DOI":"10.3390\/rs13101969"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"158474","DOI":"10.1016\/j.scitotenv.2022.158474","article-title":"High emissions could increase the future risk of maize drought in China by 60\u201370%","volume":"852","author":"Jia","year":"2022","journal-title":"Sci. Total Environ."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"102724","DOI":"10.1016\/j.ijdrr.2021.102724","article-title":"Flood risk management in the Yangtze River basin\u2014Comparison of 1998 and 2020 events","volume":"68","author":"Jia","year":"2022","journal-title":"Int. J. Disaster Risk Reduct."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1050","DOI":"10.1016\/j.scib.2021.01.012","article-title":"Big Earth Data: A practice of sustainability science to achieve the Sustainable Development Goals","volume":"66","author":"Guo","year":"2021","journal-title":"Sci. Bull."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/20964471.2017.1405925","article-title":"Big data drives the development of Earth science","volume":"1","author":"Guo","year":"2017","journal-title":"Big Earth Data"},{"key":"ref_7","first-page":"102853","article-title":"HADeenNet: A hierarchical-attention multi-scale deconvolution network for landslide detection","volume":"111","author":"Yu","year":"2022","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_8","first-page":"102930","article-title":"SNNFD, spiking neural segmentation network in frequency domain using high spatial resolution images for building extraction","volume":"112","author":"Yu","year":"2022","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_9","unstructured":"(2022, July 18). Apache Hadoop. Available online: http:\/\/hadoop.apache.org\/."},{"key":"ref_10","unstructured":"(2022, July 18). Apache Spark. Available online: https:\/\/spark.apache.org\/."},{"key":"ref_11","unstructured":"(2022, July 18). Apache Flink. Available online: https:\/\/flink.apache.org\/."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1007\/s41060-016-0027-9","article-title":"Big data analytics on Apache Spark","volume":"1","author":"Salloum","year":"2016","journal-title":"Int. J. Data Sci. Anal."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1016\/j.jmsy.2019.11.004","article-title":"Big data and stream processing platforms for Industry 4.0 requirements mapping for a predictive maintenance use case","volume":"54","author":"Sahal","year":"2020","journal-title":"J. Manuf. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1111\/1541-4337.12540","article-title":"Utilization of text mining as a big data analysis tool for food science and nutrition","volume":"19","author":"Tao","year":"2020","journal-title":"Compr. Rev. Food Sci. Food Saf."},{"key":"ref_15","first-page":"41","article-title":"Dynamic fair priority optimization task scheduling algorithm in cloud computing: Concepts and implementations","volume":"8","author":"Saxena","year":"2016","journal-title":"Int. J. Comput. Netw. Inf. Secur."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1007\/s10586-020-03075-5","article-title":"A novel hybrid antlion optimization algorithm for multi-objective task scheduling problems in cloud computing environments","volume":"24","author":"Abualigah","year":"2021","journal-title":"Clust. Comput."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1494","DOI":"10.1109\/JSTARS.2022.3146430","article-title":"Res2-Unet, a New Deep Architecture for Building Detection from High Spatial Resolution Images","volume":"15","author":"Chen","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1109\/JPROC.2021.3087029","article-title":"Recent developments in parallel and distributed computing for remotely sensed big data processing","volume":"109","author":"Wu","year":"2021","journal-title":"Proc. IEEE"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"85","DOI":"10.26599\/BDMA.2019.9020015","article-title":"A survey of data partitioning and sampling methods to support big data analysis","volume":"3","author":"Mahmud","year":"2020","journal-title":"Big Data Min. Anal."},{"key":"ref_20","first-page":"431","article-title":"Big Data technologies: A survey","volume":"30","author":"Oussous","year":"2018","journal-title":"J. King Saud Univ. Comput. Inf. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"154300","DOI":"10.1109\/ACCESS.2019.2946884","article-title":"A survey of distributed data stream processing frameworks","volume":"7","author":"Isah","year":"2019","journal-title":"IEEE Access"},{"key":"ref_22","unstructured":"Bertolucci, M., Carlini, E., Dazzi, P., Lulli, A., and Ricci, L. (2016). Static and dynamic big data partitioning on apache spark. Parallel Computing: On the Road to Exascale, IOS Press."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Geetha, J., and Harshit, N. (2019, January 6\u20138). Implementation and performance comparison of partitioning techniques in apache spark. Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kanpur, India.","DOI":"10.1109\/ICCCNT45670.2019.8944759"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Kwon, Y., Balazinska, M., Howe, B., and Rolia, J. (2012, January 20\u201324). Skewtune: Mitigating skew in mapreduce applications. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, AZ, USA.","DOI":"10.1145\/2213836.2213840"},{"key":"ref_25","unstructured":"(2022, July 18). Data Skew. Available online: https:\/\/www.ibm.com\/docs\/en\/psfa\/7.2.1?topic=appliance-data-skew."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1080\/20964471.2017.1403062","article-title":"Big Earth data: A new frontier in Earth and information sciences","volume":"1","author":"Guo","year":"2017","journal-title":"Big Earth Data"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"eaax8574","DOI":"10.1126\/sciadv.aax8574","article-title":"The fate of tropical forest fragments","volume":"6","author":"Hansen","year":"2020","journal-title":"Sci. Adv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.future.2014.10.029","article-title":"Remote sensing big data computing: Challenges and opportunities","volume":"51","author":"Ma","year":"2015","journal-title":"Future Gen. Comp. Syst."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1298","DOI":"10.1109\/LGRS.2017.2709700","article-title":"Exploiting different types of parallelism in distributed analysis of remote sensing data","volume":"14","author":"Costa","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"4294","DOI":"10.1109\/TGRS.2018.2890513","article-title":"An efficient and scalable framework for processing remotely sensed big data in cloud computing environments","volume":"57","author":"Sun","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yu, J., Chen, H., and Hu, F. (2015, January 18\u201320). SASM: Improving spark performance with adaptive skew mitigation. Proceedings of the 2015 IEEE International Conference on Progress in Informatics and Computing (PIC), Nanjing, China.","DOI":"10.1109\/PIC.2015.7489818"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1016\/j.future.2016.06.027","article-title":"An intermediate data placement algorithm for load balancing in spark computing environment","volume":"78","author":"Tang","year":"2018","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1054","DOI":"10.1016\/j.future.2017.07.014","article-title":"SP-Partitioner: A novel partition method to handle intermediate data skew in spark streaming","volume":"86","author":"Liu","year":"2018","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1109\/TCC.2018.2878838","article-title":"An intermediate data partition algorithm for skew mitigation in spark computing environment","volume":"9","author":"Tang","year":"2018","journal-title":"IEEE Trans. Cloud Comput."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Xiujin, S., and Yueqin, Q. (2020, January 16\u201318). An algorithm of data skew in spark based on partition. Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, ON, Canada.","DOI":"10.1109\/CIPAE51077.2020.00063"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wang, K., Khan, M.M.H., Nguyen, N., and Gokhale, S. (2019, January 24\u201326). A model driven approach towards improving the performance of apache spark applications. Proceedings of the 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Madison, WI, USA.","DOI":"10.1109\/ISPASS.2019.00036"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"102699","DOI":"10.1016\/j.parco.2020.102699","article-title":"ImRP: A Predictive Partition Method for Data Skew Alleviation in Spark Streaming Environment","volume":"100","author":"Fu","year":"2020","journal-title":"Parallel Comput."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"012109","DOI":"10.1088\/1742-6596\/1575\/1\/012109","article-title":"Load Balancing Mechanism Based on Linear Regression Partition Prediction in Spark","volume":"1575","author":"Huang","year":"2020","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"e5637","DOI":"10.1002\/cpe.5637","article-title":"Handling data skew at reduce stage in Spark by ReducePartition","volume":"32","author":"Guo","year":"2020","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1016\/j.jpdc.2021.12.002","article-title":"MiCS-P: Parallel Mutual-information Computation of Big Categorical Data on Spark","volume":"161","author":"Li","year":"2022","journal-title":"J. Parallel Distrib. Comput."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Shen, Y., Xiong, J., and Jiang, D. (2020, January 2\u20134). SrSpark: Skew-resilient spark based on adaptive parallel processing. Proceedings of the 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, China.","DOI":"10.1109\/ICPADS51040.2020.00067"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wang, S., Jia, Z., and Wang, W. (2021, January 9\u201310). Research on Optimization of data balancing partition algorithm based on spark platform. Proceedings of the International Conference on Artificial Intelligence and Security, Jaipur, India.","DOI":"10.1007\/978-3-030-78612-0_1"},{"key":"ref_43","unstructured":"Yin, R., He, G., Wang, G., and Long, T. (2019). 30-meter Global Mosaic Map of 2018. Sci. Data Bank, 4."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"741","DOI":"10.5194\/essd-13-741-2021","article-title":"Annual 30m dataset for glacial lakes in High Mountain Asia from 2008 to 2017","volume":"13","author":"Chen","year":"2021","journal-title":"Earth Syst. Sci. Data"},{"key":"ref_45","unstructured":"(2022, July 18). HashPartitioner. Available online: https:\/\/spark.apache.org\/docs\/2.3.1\/api\/scala\/index.html#org.apache.spark.HashPartitioner."},{"key":"ref_46","unstructured":"(2022, July 18). RangePartitioner. Available online: https:\/\/spark.apache.org\/docs\/2.3.1\/api\/scala\/index.html#org.apache.spark.RangePartitioner."},{"key":"ref_47","unstructured":"(2022, July 18). Shuffle Operations. Available online: https:\/\/spark.apache.org\/docs\/latest\/rdd-programming-guide.html."},{"key":"ref_48","unstructured":"(2022, July 18). Image Clipping Function. Available online: https:\/\/geotrellis.io\/."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Wang, N., Chen, F., Yu, B., and Wang, L. (2022). A Strategy of Parallel SLIC Superpixels for Handling Large-Scale Images over Apache Spark. Remote Sens., 14.","DOI":"10.3390\/rs14071568"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/19\/4964\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:46:55Z","timestamp":1760143615000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/19\/4964"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,5]]},"references-count":49,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["rs14194964"],"URL":"https:\/\/doi.org\/10.3390\/rs14194964","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,5]]}}}