{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T14:54:35Z","timestamp":1754146475747,"version":"3.41.2"},"reference-count":27,"publisher":"National Library of Serbia","issue":"3","license":[{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ComSIS","COMPUT SCI INF SYST","COMPUT SCI INFORM SY","COMPUTER SCI INFORM","COMSIS J"],"published-print":{"date-parts":[[2025]]},"abstract":"<jats:p>For solving the low CPU and network resource utilization in the task scheduler process of the Spark and Flink computing frameworks, this paper proposes a Delay-Aware Resource-Efficient Interleaved Task Scheduling Strategy (DRTS). This algorithm can schedule parallel tasks in a pipelined fashion, effectively improving the system resource utilization and shortening the job completion times. Firstly, based on historical data of task completion times, we stagger the execution of tasks within the stage with the longest completion time. This helps optimize the utilization of system resources and ensures the smooth completion of the entire pipeline job. Secondly, the execution tasks are categorized into CPU-intensive and non-CPU-intensive phases, which include network I\/O and disk I\/O operations. During the non-CPU-intensive phase where tasks involve data fetch, parallel tasks are scheduled at suitable intervals to mitigate resource contention and minimize job completion time. Finally, we implemented DRTS on Spark 2.4.0 and conducted experiments to evaluate its performance. The results show that compared to DelayStage, DRTS reduces job execution time by 3.18% to 6.48% and improves CPU and network utilization of the cluster by 6.33% and 7.02%, respectively.<\/jats:p>","DOI":"10.2298\/csis240831018z","type":"journal-article","created":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T09:17:15Z","timestamp":1741166235000},"page":"839-858","source":"Crossref","is-referenced-by-count":0,"title":["Delay-aware resource-efficient interleaved task scheduling strategy in spark"],"prefix":"10.2298","volume":"22","author":[{"given":"Yanhao","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Software, Henan University Kaifeng, Henan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Congyang","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Software, Henan University Kaifeng, Henan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xin","family":"He","sequence":"additional","affiliation":[{"name":"School of Software, Henan University Kaifeng, Henan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junyang","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Software, Henan University Kaifeng, Henan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rui","family":"Zhai","sequence":"additional","affiliation":[{"name":"School of Software, Henan University Kaifeng, Henan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yalin","family":"Song","sequence":"additional","affiliation":[{"name":"School of Software, Henan University Kaifeng, Henan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1078","reference":[{"key":"ref1","unstructured":"Apache flink- stateful computations over data streams. https:\/\/flink.apache.org."},{"key":"ref2","unstructured":"Apache spark - unified engine for large-scale data analytics. https:\/\/spark.apache.org."},{"key":"ref3","unstructured":"Apache storm is a free and open source distributed realtime computation system. https:\/\/storm.apache.org."},{"key":"ref4","doi-asserted-by":"crossref","unstructured":"Dhawalia p, kailasam s, janakiram d. chisel: A resource savvy approach for handling skew in mapreduce applications[c]\/\/2013 ieee sixth international conference on cloud computing. ieee, 2013: 652-660.","DOI":"10.1109\/CLOUD.2013.43"},{"key":"ref5","doi-asserted-by":"crossref","unstructured":"Duan y, wang n, wu j. accelerating dag-style job execution via optimizing resource pipeline scheduling[j]. journal of computer science and technology, 2022, 37(4): 852-868.","DOI":"10.1007\/s11390-021-1488-4"},{"key":"ref6","doi-asserted-by":"crossref","unstructured":"Fu z, tang z, yang l, et al. an optimal locality-aware task scheduling algorithm based on bipartite graph modelling for spark applications[j]. ieee transactions on parallel and distributed systems, 2020, 31(10): 2406-2420.","DOI":"10.1109\/TPDS.2020.2992073"},{"key":"ref7","doi-asserted-by":"crossref","unstructured":"Gu r, tang y, tian c, et al. improving execution concurrency of large-scale matrix multiplication on distributed data-parallel platforms[j]. ieee transactions on parallel and distributed systems, 2017, 28(9): 2539-2552.","DOI":"10.1109\/TPDS.2017.2686384"},{"key":"ref8","unstructured":"\"hadoop,\" 2021. [online]. https:\/\/hadoop.apache.org."},{"key":"ref9","doi-asserted-by":"crossref","unstructured":"He x, shenoy p. firebird: Network-aware task scheduling for spark using sdns[c]\/\/2016 25th international conference on computer communication and networks (icccn). ieee, 2016: 1-10.","DOI":"10.1109\/ICCCN.2016.7568524"},{"key":"ref10","doi-asserted-by":"crossref","unstructured":"Hu z, li b, qin z, et al. job scheduling without prior information in big data processing systems[ c]\/\/2017 ieee 37th international conference on distributed computing systems (icdcs). ieee, 2017: 572-582.","DOI":"10.1109\/ICDCS.2017.105"},{"key":"ref11","doi-asserted-by":"crossref","unstructured":"Hu z, li d. improved heuristic job scheduling method to enhance throughput for big data analytics[ j]. tsinghua science and technology, 2021, 27(2): 344-357.","DOI":"10.26599\/TST.2020.9010047"},{"key":"ref12","doi-asserted-by":"crossref","unstructured":"Jiang j, ma s, li b, et al. symbiosis: Network-aware task scheduling in data-parallel frameworks[ c]\/\/ieee infocom 2016-the 35th annual ieee international conference on computer communications. ieee, 2016: 1-9.","DOI":"10.1109\/INFOCOM.2016.7524415"},{"key":"ref13","doi-asserted-by":"crossref","unstructured":"Li x, ren f, yang b. modeling and analyzing the performance of high-speed packet i\/o[j]. tsinghua science and technology, 2021, 26(4): 426-439.","DOI":"10.26599\/TST.2019.9010080"},{"key":"ref14","doi-asserted-by":"crossref","unstructured":"Lu s x, zhao m, li c, et al. time-aware data partition optimization and heterogeneous task scheduling strategies in spark clusters[j]. the computer journal, 2023: bxad017.","DOI":"10.1093\/comjnl\/bxad017"},{"key":"ref15","doi-asserted-by":"crossref","unstructured":"Pan f, xiong j, shen y, et al. h-scheduler: Storage-aware task scheduling for heterogeneousstorage spark clusters[c]\/\/2018 ieee 24th international conference on parallel and distributed systems (icpads). ieee, 2018: 1-9.","DOI":"10.1109\/PADSW.2018.8644650"},{"key":"ref16","doi-asserted-by":"crossref","unstructured":"Shao w, xu f, chen l, et al. stage delay scheduling: Speeding up dag-style data analytics jobs with resource interleaving[c]\/\/proceedings of the 48th international conference on parallel processing. 2019: 1-11.","DOI":"10.1145\/3337821.3337872"},{"key":"ref17","doi-asserted-by":"crossref","unstructured":"Tang z, xiao z, yang l, et al. a network load perception based task scheduler for parallel distributed data processing systems[j]. ieee transactions on cloud computing, 2021, 11(2): 1352- 1364.","DOI":"10.1109\/TCC.2021.3132627"},{"key":"ref18","doi-asserted-by":"crossref","unstructured":"Tang z, zeng a, zhang x, et al. dynamic memory-aware scheduling in spark computing environment[ j]. journal of parallel and distributed computing, 2020, 141: 10-22.","DOI":"10.1016\/j.jpdc.2020.03.010"},{"key":"ref19","doi-asserted-by":"crossref","unstructured":"Wang j, gu h, yu j, et al. research on virtual machine consolidation strategy based on combined prediction and energy-aware in cloud computing platform[j]. journal of cloud computing, 2022, 11(1): 50.","DOI":"10.1186\/s13677-022-00309-2"},{"key":"ref20","doi-asserted-by":"crossref","unstructured":"Wang j, yu j, song y, et al. an efficient energy-aware and service quality improvement strategy applied in cloud computing[j]. cluster computing, 2023, 26(6): 4031-4049.","DOI":"10.1007\/s10586-022-03795-w"},{"key":"ref21","doi-asserted-by":"crossref","unstructured":"Wang j, yu j, zhai r, et al. gmpr: a two-phase heuristic algorithm for virtual machine placement in large-scale cloud data centers[j]. ieee systems journal, 2022, 17(1): 1419-1430.","DOI":"10.1109\/JSYST.2022.3187971"},{"key":"ref22","doi-asserted-by":"crossref","unstructured":"Xu g, xu c z, jiang s. prophet: Scheduling executors with time-varying resource demands on data-parallel computation frameworks[c]\/\/2016 ieee international conference on autonomic computing (icac). ieee, 2016: 45-54.","DOI":"10.1109\/ICAC.2016.42"},{"key":"ref23","doi-asserted-by":"crossref","unstructured":"Xu y, liu l, ding z. dag-aware joint task scheduling and cache management in spark clusters[ c]\/\/2020 ieee international parallel and distributed processing symposium (ipdps). ieee, 2020: 378-387.","DOI":"10.1109\/IPDPS47924.2020.00047"},{"key":"ref24","doi-asserted-by":"crossref","unstructured":"Zaharia m, borthakur d, sen sarma j, et al. delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling[c]\/\/proceedings of the 5th european conference on computer systems. 2010: 265-278.","DOI":"10.1145\/1755913.1755940"},{"key":"ref25","unstructured":"Zhang x, li z, liu g, et al. a spark scheduling strategy for heterogeneous cluster[j]. computers, materials continua, 2018, 55(3)."},{"key":"ref26","doi-asserted-by":"crossref","unstructured":"Dean J, G.S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 1st International Conference on Preparing ComSIS Articles. pp. 107-113. Communications of the ACM (2008)","DOI":"10.1145\/1327452.1327492"},{"key":"ref27","unstructured":"Islam M T, Karunasekera S, B.R.: Performance and cost-efficient spark job scheduling based on deep reinforcement learning in cloud computing environments. In: IEEE Transactions on Parallel and Distributed Systems. pp. 107-113. Communications of the ACM (2021)"}],"container-title":["Computer Science and Information Systems"],"original-title":[],"language":"en","deposited":{"date-parts":[[2025,7,18]],"date-time":"2025-07-18T09:17:33Z","timestamp":1752830253000},"score":1,"resource":{"primary":{"URL":"https:\/\/doiserbia.nb.rs\/Article.aspx?ID=1820-02142500018Z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":27,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025]]}},"URL":"https:\/\/doi.org\/10.2298\/csis240831018z","relation":{},"ISSN":["1820-0214","2406-1018"],"issn-type":[{"type":"print","value":"1820-0214"},{"type":"electronic","value":"2406-1018"}],"subject":[],"published":{"date-parts":[[2025]]}}}