{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T09:10:06Z","timestamp":1749633006187,"version":"3.41.0"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2025,3,28]],"date-time":"2025-03-28T00:00:00Z","timestamp":1743120000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,3,28]],"date-time":"2025-03-28T00:00:00Z","timestamp":1743120000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["Grant No. 62225205","Grant No. 92055213","Grant No. 62302160"],"award-info":[{"award-number":["Grant No. 62225205","Grant No. 92055213","Grant No. 62302160"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Science and Technology Program of Changsha","award":["kh2301011"],"award-info":[{"award-number":["kh2301011"]}]},{"DOI":"10.13039\/501100017607","name":"Shenzhen Basic Research Project","doi-asserted-by":"crossref","award":["JCYJ20210324140002006"],"award-info":[{"award-number":["JCYJ20210324140002006"]}],"id":[{"id":"10.13039\/501100017607","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004761","name":"the Natural Science Foundation of Hunan Province","doi-asserted-by":"crossref","award":["2024JJ6154"],"award-info":[{"award-number":["2024JJ6154"]}],"id":[{"id":"10.13039\/501100004761","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["CCF Trans. HPC"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Distributed computing frameworks play a crucial role in supporting compute-intensive applications in the era of big data. The growing demand for computing resources has spurred the interconnection of data centers, leading to the formation of supercomputing Internet. MapReduce is a popular distributed computing framework designed for large independent clusters. The original MapReduce framework deployed on supercomputing Internet performs inefficiently due to redundant geo-distributed reduce operations. Nonetheless, its abstraction remains significant potential. This paper proposes an enhanced MapReduce framework for geo-distributed supercomputing Internet to minimize the necessity for data transmission across data centers. Leveraging hierarchical scheduling techniques, the framework optimizes data locality to mitigate network latency and bandwidth consumption during reduce operations, thereby reducing overall job execution times. The paper introduces a mathematical model for task scheduling within supercomputing Internet and formally describes the data transmission process among data centers. In the job scheduling phase, our framework facilitates efficient overlap of transferring and computing through pre-selected data centers. Meanwhile, in the data transmission phase, the framework aggregate data to reduce the frequency of transmission, thus alleviating the adverse effects on transmission of hierarchical network architecture. Comparative analysis with existing methods demonstrates the efficacy of the proposed framework in addressing similar computational challenges. Empirical evaluations underscore the effectiveness of our method in practice.<\/jats:p>","DOI":"10.1007\/s42514-025-00218-1","type":"journal-article","created":{"date-parts":[[2025,3,31]],"date-time":"2025-03-31T08:14:35Z","timestamp":1743408875000},"page":"245-259","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["An optimized hierarchical MapReduce framework in supercomputing Internet environment"],"prefix":"10.1007","volume":"7","author":[{"given":"Yalin","family":"Zhu","sequence":"first","affiliation":[]},{"given":"Youquan","family":"Chang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0364-3568","authenticated-orcid":false,"given":"Jiapeng","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yingjie","family":"Song","sequence":"additional","affiliation":[]},{"given":"Zhuo","family":"Tang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,28]]},"reference":[{"key":"218_CR1","unstructured":"Afrati, F., Dolev, S., Sharma, S., Ullman, J.D.: Meta-mapreduce: a technique for reducing communication in mapreduce computations (2015). arXiv:1508.01171"},{"issue":"1","key":"218_CR2","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1145\/2189750.2150984","volume":"40","author":"F Ahmad","year":"2012","unstructured":"Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.: Tarazu: optimizing mapreduce on heterogeneous clusters. ACM SIGARCH Comput. Archit. News 40(1), 61\u201374 (2012)","journal-title":"ACM SIGARCH Comput. Archit. News"},{"key":"218_CR3","doi-asserted-by":"crossref","unstructured":"Cardosa, M., Wang, C., Nangia, A., Chandra, A., Weissman, J.: Exploring mapreduce efficiency with highly-distributed data. In: Proceedings of the Second International Workshop on MapReduce and Its Applications, pp. 27\u201334 (2011)","DOI":"10.1145\/1996092.1996100"},{"issue":"6","key":"218_CR4","first-page":"1","volume":"6","author":"J Chen","year":"2024","unstructured":"Chen, J.: Construction of supercomputing internet in the context of computing power networks. Electron. Commun. Comput. Sci. 6(6), 1\u20133 (2024)","journal-title":"Electron. Commun. Comput. Sci."},{"key":"218_CR5","unstructured":"Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: osdi, vol. 4, p. 5 (2004)"},{"issue":"1","key":"218_CR6","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1145\/1327452.1327492","volume":"51","author":"J Dean","year":"2008","unstructured":"Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107\u2013113 (2008)","journal-title":"Commun. ACM"},{"issue":"8","key":"218_CR7","doi-asserted-by":"publisher","first-page":"2473","DOI":"10.1002\/cpe.3584","volume":"28","author":"M Duan","year":"2016","unstructured":"Duan, M., Li, K., Tang, Z., Xiao, G., Li, K.: Selection and replacement algorithms for memory performance improvement in spark. Concurr. Comput.: Pract. Exp. 28(8), 2473\u20132486 (2016)","journal-title":"Concurr. Comput.: Pract. Exp."},{"issue":"10","key":"218_CR8","doi-asserted-by":"publisher","first-page":"2406","DOI":"10.1109\/TPDS.2020.2992073","volume":"31","author":"Z Fu","year":"2020","unstructured":"Fu, Z., Tang, Z., Yang, L., Liu, C.: An optimal locality-aware task scheduling algorithm based on bipartite graph modelling for spark applications. IEEE Trans. Parallel Distrib. Syst. 31(10), 2406\u20132420 (2020)","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"issue":"3","key":"218_CR9","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1145\/2160803.2160876","volume":"39","author":"H Gadre","year":"2011","unstructured":"Gadre, H., Rodero, I., Parashar, M.: Investigating mapreduce framework extensions for efficient processing of geographically scattered datasets. ACM SIGMETRICS Perform. Eval. Rev. 39(3), 116\u2013118 (2011)","journal-title":"ACM SIGMETRICS Perform. Eval. Rev."},{"key":"218_CR10","doi-asserted-by":"crossref","unstructured":"Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in mapreduce. In: 2012 12th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp. 419\u2013426 IEEE (2012)","DOI":"10.1109\/CCGrid.2012.42"},{"key":"218_CR11","unstructured":"Hadoop, A.: Apache Hadoop (2024). http:\/\/hadoop.apache.org\/"},{"key":"218_CR12","doi-asserted-by":"crossref","unstructured":"Hammoud, M., Rehman, M.S., Sakr, M.F.: Center-of-gravity reduce task scheduling to lower mapreduce network traffic. In: 2012 IEEE Fifth International Conference on Cloud Computing, pp. 49\u201358. IEEE (2012)","DOI":"10.1109\/CLOUD.2012.92"},{"issue":"1","key":"218_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.ejor.2021.05.004","volume":"297","author":"S Hartmann","year":"2022","unstructured":"Hartmann, S., Briskorn, D.: An updated survey of variants and extensions of the resource-constrained project scheduling problem. Eur. J. Oper. Res. 297(1), 1\u201314 (2022)","journal-title":"Eur. J. Oper. Res."},{"key":"218_CR14","doi-asserted-by":"crossref","unstructured":"He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a mapreduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 260\u2013269 (2008)","DOI":"10.1145\/1454115.1454152"},{"issue":"3","key":"218_CR15","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1109\/TCC.2014.2355225","volume":"4","author":"B Heintz","year":"2014","unstructured":"Heintz, B., Chandra, A., Sitaraman, R.K., Weissman, J.: End-to-end optimization for geo-distributed mapreduce. IEEE Trans. Cloud Comput. 4(3), 293\u2013306 (2014)","journal-title":"IEEE Trans. Cloud Comput."},{"key":"218_CR16","doi-asserted-by":"publisher","first-page":"166","DOI":"10.1016\/j.compeleceng.2015.06.013","volume":"50","author":"X Huang","year":"2016","unstructured":"Huang, X., Zhang, L., Li, R., Wan, L., Li, K.: Novel heuristic speculative execution strategies in heterogeneous distributed environments. Comput. Electr. Eng. 50, 166\u2013179 (2016)","journal-title":"Comput. Electr. Eng."},{"key":"218_CR17","doi-asserted-by":"crossref","unstructured":"Ibrahim, S., Jin, H., Cheng, B., Cao, H., Wu, S., Qi, L.: Cloudlet: towards mapreduce implementation on virtual machines. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, pp. 65\u201366 (2009)","DOI":"10.1145\/1551609.1551624"},{"key":"218_CR18","doi-asserted-by":"crossref","unstructured":"Jin, C., Buyya, R.: Mapreduce programming model for. net-based cloud computing. In: Euro-Par 2009 Parallel Processing: 15th International Euro-Par Conference, Delft, The Netherlands, August 25\u201328, 2009. Proceedings 15, pp. 417\u2013428. Springer (2009)","DOI":"10.1007\/978-3-642-03869-3_41"},{"issue":"3","key":"218_CR19","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1145\/2160803.2160873","volume":"39","author":"S Kim","year":"2011","unstructured":"Kim, S., Won, J., Han, H., Eom, H., Yeom, H.Y.: Improving Hadoop performance in intercloud environments. ACM SIGMETRICS Perform. Eval. Rev. 39(3), 107\u2013109 (2011)","journal-title":"ACM SIGMETRICS Perform. Eval. Rev."},{"issue":"1\u20132","key":"218_CR20","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1002\/nav.3800020109","volume":"2","author":"HW Kuhn","year":"1955","unstructured":"Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1\u20132), 83\u201397 (1955)","journal-title":"Nav. Res. Logist. Q."},{"issue":"6","key":"218_CR21","doi-asserted-by":"publisher","first-page":"3317","DOI":"10.1109\/TSC.2021.3092563","volume":"15","author":"X Li","year":"2021","unstructured":"Li, X., Chen, F., Ruiz, R., Zhu, J.: Mapreduce task scheduling in heterogeneous geo-distributed data centers. IEEE Trans. Serv. Comput. 15(6), 3317\u20133329 (2021)","journal-title":"IEEE Trans. Serv. Comput."},{"key":"218_CR22","doi-asserted-by":"publisher","first-page":"1054","DOI":"10.1016\/j.future.2017.07.014","volume":"86","author":"G Liu","year":"2018","unstructured":"Liu, G., Zhu, X., Wang, J., Guo, D., Bao, W., Guo, H.: Sp-partitioner: a novel partition method to handle intermediate data skew in spark streaming. Future Gener. Comput. Syst. 86, 1054\u20131063 (2018)","journal-title":"Future Gener. Comput. Syst."},{"key":"218_CR23","doi-asserted-by":"crossref","unstructured":"Mattess, M., Calheiros, R.N., Buyya, R.: Scaling mapreduce applications across hybrid clouds to meet soft deadlines. In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), pp. 629\u2013636. IEEE (2013)","DOI":"10.1109\/AINA.2013.51"},{"key":"218_CR24","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1016\/j.future.2018.07.043","volume":"90","author":"NS Naik","year":"2019","unstructured":"Naik, N.S., Negi, A., Br, T.B., Anitha, R.: A data locality based scheduler to enhance mapreduce performance in heterogeneous environments. Future Gener. Comput. Syst. 90, 423\u2013434 (2019)","journal-title":"Future Gener. Comput. Syst."},{"key":"218_CR25","doi-asserted-by":"crossref","unstructured":"Pallickara, S., Ekanayake, J., Fox, G.: Granules: a lightweight, streaming runtime for cloud computing with support, for map-reduce. In: 2009 IEEE International Conference on Cluster Computing and Workshops, pp. 1\u201310. IEEE (2009)","DOI":"10.1109\/CLUSTR.2009.5289160"},{"key":"218_CR26","doi-asserted-by":"crossref","unstructured":"Pan, F., Xiong, J., Shen, Y., Wang, T., Jiang, D.: H-scheduler: storage-aware task scheduling for heterogeneous-storage spark clusters. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pp. 1\u20139. IEEE (2018)","DOI":"10.1109\/PADSW.2018.8644650"},{"key":"218_CR27","unstructured":"Rabkin, A., Arye, M., Sen, S., Pai, V., Freedman, M.J.: Making every bit count in {Wide-Area} analytics. In: 14th Workshop on Hot Topics in Operating Systems (HotOS XIV) (2013)"},{"key":"218_CR28","doi-asserted-by":"crossref","unstructured":"Ryden, M., Oh, K., Chandra, A., Weissman, J.: Nebula: distributed edge cloud for data intensive computing. In: 2014 IEEE International Conference on Cloud Engineering, pp. 57\u201366. IEEE (2014)","DOI":"10.1109\/IC2E.2014.34"},{"key":"218_CR29","unstructured":"Sciences\u00a0Bulletin, C.A.: \"National Supercomputer Internet\" is to make the national supercomputer network? It\u2019s not that simple (2024). http:\/\/old2022.bulletin.cas.cn\/zgkxyyk\/ch\/reader\/view_news.aspx?id=20230830103959001"},{"key":"218_CR30","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1016\/j.future.2018.06.035","volume":"90","author":"O Selvitopi","year":"2019","unstructured":"Selvitopi, O., Demirci, G.V., Turk, A., Aykanat, C.: Locality-aware and load-balanced static task scheduling for mapreduce. Future Gener. Comput. Syst. 90, 49\u201361 (2019)","journal-title":"Future Gener. Comput. Syst."},{"key":"218_CR31","doi-asserted-by":"crossref","unstructured":"Shan, Y., Wang, B., Yan, J., Wang, Y., Xu, N., Yang, H.: Fpmr: mapreduce framework on fpga. In: Proceedings of the 18th Annual ACM\/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 93\u2013102 (2010)","DOI":"10.1145\/1723112.1723129"},{"key":"218_CR32","unstructured":"Spark, A.: Apache spark (2024). http:\/\/spark.apache.org\/"},{"issue":"4","key":"218_CR33","doi-asserted-by":"publisher","first-page":"1149","DOI":"10.1109\/TCC.2016.2607738","volume":"8","author":"Z Tang","year":"2016","unstructured":"Tang, Z., Ma, W., Li, K., Li, K.: A data skew oriented reduce placement algorithm based on sampling. IEEE Trans. Cloud Comput. 8(4), 1149\u20131161 (2016)","journal-title":"IEEE Trans. Cloud Comput."},{"issue":"1","key":"218_CR34","doi-asserted-by":"publisher","first-page":"16","DOI":"10.4018\/IJITSA.2018010102","volume":"11","author":"O Tomarchio","year":"2018","unstructured":"Tomarchio, O., Di Modica, G., Cavallo, M., Polito, C.: A hierarchical Hadoop framework to handle big data in geo-distributed computing environments. Int. J. Inf. Technol. Syst. Approach: IJITSA 11(1), 16\u201347 (2018)","journal-title":"Int. J. Inf. Technol. Syst. Approach: IJITSA"},{"issue":"3","key":"218_CR35","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1016\/j.future.2012.09.001","volume":"29","author":"L Wang","year":"2013","unstructured":"Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.: G-hadoop: mapreduce across distributed data centers for data-intensive computing. Future Gener. Comput. Syst. 29(3), 739\u2013750 (2013)","journal-title":"Future Gener. Comput. Syst."},{"key":"218_CR36","doi-asserted-by":"crossref","unstructured":"Xue, R., Gao, S., Ao, L., Guan, Z.: Bolas: bipartite-graph oriented locality-aware scheduling for mapreduce tasks. In: 2015 14th International Symposium on Parallel and Distributed Computing, pp. 37\u201345. IEEE (2015)","DOI":"10.1109\/ISPDC.2015.12"},{"key":"218_CR37","doi-asserted-by":"crossref","unstructured":"Zacheilas, N., Kalogeraki, V.: Chess: cost-effective scheduling across multiple heterogeneous mapreduce clusters. In: 2016 IEEE International Conference on Autonomic Computing (ICAC), pp. 65\u201374. IEEE (2016)","DOI":"10.1109\/ICAC.2016.58"},{"key":"218_CR38","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Liu, L., Lee, K., Zhou, Y., Singh, A., Mandagere, N., Gopisetty, S., Alatorre, G.: Improving hadoop service provisioning in a geographically distributed cloud. In: 2014 IEEE 7th International Conference on Cloud Computing, pp. 432\u2013439. IEEE (2014)","DOI":"10.1109\/CLOUD.2014.65"},{"key":"218_CR39","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1016\/j.future.2016.03.008","volume":"62","author":"J Zhang","year":"2016","unstructured":"Zhang, J., Zhang, L., Huang, H., Jiang, Z.L., Wang, X.: Key based data analytics across data centers considering bi-level resource provision in cloud computing. Future Gener. Comput. Syst. 62, 40\u201350 (2016)","journal-title":"Future Gener. Comput. Syst."}],"container-title":["CCF Transactions on High Performance Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42514-025-00218-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42514-025-00218-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42514-025-00218-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T08:49:00Z","timestamp":1749631740000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42514-025-00218-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,28]]},"references-count":39,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["218"],"URL":"https:\/\/doi.org\/10.1007\/s42514-025-00218-1","relation":{},"ISSN":["2524-4922","2524-4930"],"issn-type":[{"type":"print","value":"2524-4922"},{"type":"electronic","value":"2524-4930"}],"subject":[],"published":{"date-parts":[[2025,3,28]]},"assertion":[{"value":"20 August 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 February 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 March 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding authors state that there is no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}