{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T15:05:11Z","timestamp":1761663911007,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2019,5,5]],"date-time":"2019-05-05T00:00:00Z","timestamp":1557014400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National High Technology Development 863 Program of China","award":["2013AA01A215"],"award-info":[{"award-number":["2013AA01A215"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Improving reliability is one of the major concerns of scientific workflow scheduling in clouds. The ever-growing computational complexity and data size of workflows present challenges to fault-tolerant workflow scheduling. Therefore, it is essential to design a cost-effective fault-tolerant scheduling approach for large-scale workflows. In this paper, we propose a dynamic fault-tolerant workflow scheduling (DFTWS) approach with hybrid spatial and temporal re-execution schemes. First, DFTWS calculates the time attributes of tasks and identifies the critical path of workflow in advance. Then, DFTWS assigns appropriate virtual machine (VM) for each task according to the task urgency and budget quota in the phase of initial resource allocation. Finally, DFTWS performs online scheduling, which makes real-time fault-tolerant decisions based on failure type and task criticality throughout workflow execution. The proposed algorithm is evaluated on real-world workflows. Furthermore, the factors that affect the performance of DFTWS are analyzed. The experimental results demonstrate that DFTWS achieves a trade-off between high reliability and low cost objectives in cloud computing environments.<\/jats:p>","DOI":"10.3390\/info10050169","type":"journal-article","created":{"date-parts":[[2019,5,9]],"date-time":"2019-05-09T11:22:35Z","timestamp":1557400955000},"page":"169","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Dynamic Fault-Tolerant Workflow Scheduling with Hybrid Spatial-Temporal Re-Execution in Clouds"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0544-4834","authenticated-orcid":false,"given":"Na","family":"Wu","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China"}]},{"given":"Decheng","family":"Zuo","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China"}]},{"given":"Zhan","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,5,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1080\/10618600.2017.1384734","article-title":"50 years of data science","volume":"26","author":"Donoho","year":"2017","journal-title":"J. Comput. Graph. Stat."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yu, J., Buyya, R., and Ramamohanarao, K. (2008). Workflow scheduling algorithms for grid computing. Metaheuristics for Scheduling in Distributed Computing Environments, Springer.","DOI":"10.1007\/978-3-540-69277-5_7"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3501","DOI":"10.1109\/TPDS.2016.2543731","article-title":"Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds","volume":"27","author":"Zhu","year":"2016","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1109\/TNSM.2012.091012.120238","article-title":"QoS guarantees and service differentiation for dynamic cloud applications","volume":"10","author":"Rao","year":"2013","journal-title":"IEEE Trans. Netw. Serv. Manag."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1145\/1721654.1721672","article-title":"A view of cloud computing","volume":"53","author":"Armbrust","year":"2010","journal-title":"Commun. ACM"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1109\/TETCI.2017.2755691","article-title":"Entropy4Cloud: Using Entropy-Based Complexity to Optimize Cloud Service Resource Management","volume":"2","author":"Chen","year":"2018","journal-title":"IEEE Trans. Emerg. Top. Comput. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Poola, D., Salehi, M.A., Ramamohanarao, K., and Buyya, R. (2017). A taxonomy and survey of fault-tolerant workflow management systems in cloud and distributed computing environments. Software Architecture for Big Data and the Cloud, Elsevier.","DOI":"10.1016\/B978-0-12-805467-3.00015-6"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1016\/j.parco.2006.06.006","article-title":"A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems","volume":"32","author":"Qin","year":"2006","journal-title":"Parallel Comput."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1109\/TCC.2014.2314655","article-title":"Deadline based resource provisioningand scheduling algorithm for scientific workflows on clouds","volume":"2","author":"Rodriguez","year":"2014","journal-title":"IEEE Trans. Cloud Comput."},{"key":"ref_10","unstructured":"Zheng, Q. (2010, January 19\u201323). Improving MapReduce fault tolerance in the cloud. Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), Atlanta, GA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1016\/j.jnca.2016.01.018","article-title":"Towards workflow scheduling in cloud computing: A comprehensive analysis","volume":"66","author":"Masdari","year":"2016","journal-title":"J. Netw. Comput. Appl."},{"key":"ref_12","first-page":"351","article-title":"Ant colony optimization","volume":"8","author":"Yaseen","year":"2008","journal-title":"IJCSNS"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1007\/s10723-015-9344-9","article-title":"Cost-time efficient scheduling plan for executing workflows in the cloud","volume":"13","author":"Verma","year":"2015","journal-title":"J. Grid Comput."},{"key":"ref_14","first-page":"25","article-title":"An optimized scheduling algorithm on a cloud workflow using a discrete particle swarm","volume":"14","author":"Cao","year":"2014","journal-title":"Cybern. Inf. Technol."},{"key":"ref_15","first-page":"21","article-title":"A survey of workflow scheduling algorithms and research issues","volume":"74","author":"Singh","year":"2013","journal-title":"Int. J. Comput. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lin, C., and Lu, S. (2011, January 4\u20139). Scheduling scientific workflows elastically for cloud computing. Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, Washington, DC, USA.","DOI":"10.1109\/CLOUD.2011.110"},{"key":"ref_17","unstructured":"Wu, H., Tang, Z., and Li, R. (2012, January 5\u201310). A priority constrained scheduling strategy of multiple workflows for cloud computing. Proceedings of the 2012 14th IEEE International Conference on Advanced Communication Technology (ICACT), Washington, DC, USA."},{"key":"ref_18","unstructured":"Verma, A., and Kaushal, S. (2012, January 21\u201323). Deadline and budget distribution based cost-time optimization workflow scheduling algorithm for cloud. Proceedings of the IJCA Proceedings on International Conference on Recent Advances And Future Trends in Information Technology (iRAFIT 2012), Patiala, India. iRAFIT (7)."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"312","DOI":"10.12720\/jcm.9.4.312-321","article-title":"High-throughput scientific workflow scheduling under deadline constraint in clouds","volume":"9","author":"Zhu","year":"2014","journal-title":"J. Commun."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"350934","DOI":"10.1155\/2013\/350934","article-title":"Multi-objective approach for energy-aware workflow scheduling in cloud computing environments","volume":"2013","author":"Yassa","year":"2013","journal-title":"Sci. World J."},{"key":"ref_21","first-page":"27","article-title":"A goal-oriented workflow scheduling in heterogeneous distributed systems","volume":"52","author":"Delavar","year":"2012","journal-title":"Int. J. Comput. Appl."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Shengjun, X., Jie, Z., and Xiaolong, X. (2012). An improved algorithm based on ACO for cloud service PDTs scheduling. Adv. Inf. Sci. Serv. Sci., 4.","DOI":"10.4156\/aiss.vol4.issue18.41"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1039","DOI":"10.1002\/cpe.994","article-title":"Scientific workflow management and the Kepler system","volume":"18","author":"Altintas","year":"2006","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/j.future.2014.10.008","article-title":"Pegasus, a workflow management system for science automation","volume":"46","author":"Deelman","year":"2015","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_25","unstructured":"Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., and Johnsson, L. (2005, January 24\u201327). Scheduling strategies for mapping application workflows onto the grid. Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC-14), Research Triangle Park, NC, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Fard, H.M., Prodan, R., Barrionuevo, J.J.D., and Fahringer, T. (2012, January 13\u201316). A multi-objective approach for workflow scheduling in heterogeneous environments. Proceedings of the 2012 12th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), Ottawa, ON, Canada.","DOI":"10.1109\/CCGrid.2012.114"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1109\/TASE.2009.2014643","article-title":"Bi-criteria scheduling of scientific grid workflows","volume":"7","author":"Prodan","year":"2010","journal-title":"IEEE Trans. Autom. Sci. Eng."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Shi, J., Luo, J., Dong, F., and Zhang, J. (2014, January 21\u201323). A budget and deadline aware scientific workflow resource provisioning and scheduling mechanism for cloud. Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Hsinchu, Taiwan.","DOI":"10.1109\/CSCWD.2014.6846925"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jss.2015.11.023","article-title":"Cost optimization approaches for scientific workflow scheduling in cloud and grid computing: A review, classifications, and open issues","volume":"113","author":"Alkhanak","year":"2016","journal-title":"J. Syst. Softw."},{"key":"ref_30","unstructured":"Anghel, L., Alexandrescu, D., and Nicolaidis, M. (2000, January 18\u201324). Evaluation of a soft error tolerance technique based on time and\/or space redundancy. Proceedings of the 13th Symposium on Integrated Circuits and Systems Design (Cat. No. PR00843), Manaus, Brazil."},{"key":"ref_31","unstructured":"Hwang, S., and Kesselman, C. (2003, January 22\u201324). Grid workflow: A flexible failure handling framework for the grid. Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, Seattle, WA, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Gao, Y., Gupta, S.K., Wang, Y., and Pedram, M. (2014, January 24\u201328). An energy-aware fault tolerant scheduling framework for soft error resilient cloud computing systems. Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, Dresden, Germany.","DOI":"10.7873\/DATE2014.107"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1016\/j.eswa.2014.09.014","article-title":"Intelligent failure prediction models for scientific workflows","volume":"42","author":"Bala","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1109\/71.584093","article-title":"Fault-tolerance through scheduling of aperiodic tasks in hard real-time multiprocessor systems","volume":"8","author":"Ghosh","year":"1997","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/71.735960","article-title":"A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis","volume":"9","author":"Manimaran","year":"1998","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1016\/j.jcss.2016.10.010","article-title":"Building a fault tolerant framework with deadline guarantee in big data stream computing environments","volume":"89","author":"Sun","year":"2017","journal-title":"J. Comput. Syst. Sci."},{"key":"ref_37","unstructured":"Qiu, X., Dai, Y., Xiang, Y., and Xing, L. (2017). Correlation modeling and resource optimization for cloud service with fault recovery. IEEE Trans. Cloud Comput."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"772","DOI":"10.1093\/comjnl\/bxp067","article-title":"Multi-criteria scheduling of precedence task graphs on heterogeneous platforms","volume":"53","author":"Benoit","year":"2010","journal-title":"Comput. J."},{"key":"ref_39","unstructured":"Xie, G., Zeng, G., Li, R., and Li, K. (2017). Quantitative fault-tolerance for reliable workflows on heterogeneous IaaS clouds. IEEE Trans. Cloud Comput."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1007\/s10723-015-9331-1","article-title":"Fault-tolerant dynamic rescheduling for heterogeneous computing systems","volume":"13","author":"Mei","year":"2015","journal-title":"J. Grid Comput."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1109\/TPDS.2015.2403861","article-title":"Task scheduling for maximizing performance and reliability considering fault recovery in heterogeneous distributed systems","volume":"27","author":"Chen","year":"2016","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1109\/TC.2013.167","article-title":"NCCloud: A network-coding-based storage system in a cloud-of-clouds","volume":"63","author":"Chen","year":"2014","journal-title":"IEEE Trans. Comput."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/5\/169\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:49:17Z","timestamp":1760186957000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/5\/169"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,5]]},"references-count":42,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2019,5]]}},"alternative-id":["info10050169"],"URL":"https:\/\/doi.org\/10.3390\/info10050169","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2019,5,5]]}}}