{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,7,27]],"date-time":"2022-07-27T16:52:41Z","timestamp":1658940761945},"reference-count":25,"publisher":"IGI Global","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,1]]},"abstract":"<jats:p>To better understand task failures in cloud computing systems, the authors analyze failure frequency of tasks based on Google cluster dataset, and find some frequently failing tasks that suffer from long-term failures and repeated rescheduling, which are called killer tasks as they can be a big concern of cloud systems. Hence there is a need to analyze killer tasks thoroughly and recognize them precisely. In this article, the authors first investigate resource usage pattern of killer tasks and analyze rescheduling strategies of killer tasks in Google cluster to find that repeated rescheduling causes large amount of resource wasting. Based on the above observations, they then propose an online killer task recognition service to recognize killer tasks at the very early stage of their occurrence so as to avoid unnecessary resource wasting. The experiment results show that the proposed service performs a 93.6% accuracy in recognizing killer tasks with an 87% timing advance and 86.6% resource saving for the cloud system averagely.<\/jats:p>","DOI":"10.4018\/ijdst.2018010102","type":"journal-article","created":{"date-parts":[[2017,12,26]],"date-time":"2017-12-26T15:51:26Z","timestamp":1514303486000},"page":"16-38","source":"Crossref","is-referenced-by-count":1,"title":["Analysis of Frequently Failing Tasks and Rescheduling Strategy in the Cloud System"],"prefix":"10.4018","volume":"9","author":[{"given":"Hongyan","family":"Tang","sequence":"first","affiliation":[{"name":"School of Software and Microelectronics, Peking University, Beijing, China"}]},{"given":"Ying","family":"Li","sequence":"additional","affiliation":[{"name":"National Engineering Center of Software Engineering, Peking University, Beijing, China"}]},{"given":"Tong","family":"Jia","sequence":"additional","affiliation":[{"name":"School of Software and Microelectronics, Peking University, Beijing, China"}]},{"given":"Xiaoyong","family":"Yuan","sequence":"additional","affiliation":[{"name":"Department of Computer and Information Science and Engineering, University of Florida, Florida, USA"}]},{"given":"Zhonghai","family":"Wu","sequence":"additional","affiliation":[{"name":"National Engineering Center of Software Engineering, Peking University, Beijing, China"}]}],"member":"2432","reference":[{"key":"IJDST.2018010102-0","doi-asserted-by":"crossref","unstructured":"Alam, M., Shakil, K. A., & Sethi, S. (2015). Analysis and Clustering of Workload in Google Cluster Trace based on Resource Usage. arXiv preprint arXiv:1501.01426.","DOI":"10.1109\/CSE-EUC-DCABES.2016.271"},{"key":"IJDST.2018010102-1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.18"},{"key":"IJDST.2018010102-2","doi-asserted-by":"publisher","DOI":"10.1145\/2506155.2506159"},{"key":"IJDST.2018010102-3","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2012.129"},{"key":"IJDST.2018010102-4","unstructured":"Chen, X. (2014). Failure analysis and prediction in compute clouds (Doctoral dissertation, University of British Columbia)."},{"key":"IJDST.2018010102-5","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE.2014.34"},{"key":"IJDST.2018010102-6","doi-asserted-by":"publisher","DOI":"10.1109\/ISSREW.2014.105"},{"key":"IJDST.2018010102-7","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.56"},{"key":"IJDST.2018010102-8","unstructured":"El-Sayed, N., & Schroeder, B. How reliable are large-scale jobs in parallel clusters?"},{"key":"IJDST.2018010102-9","doi-asserted-by":"publisher","DOI":"10.1109\/CSICC.2009.5349381"},{"key":"IJDST.2018010102-10","doi-asserted-by":"publisher","DOI":"10.1109\/HASE.2014.24"},{"issue":"1","key":"IJDST.2018010102-11","first-page":"52","article-title":"Ensemble of Bayesian predictors and decision trees for proactive failure management in cloud computing systems.","volume":"7","author":"Q.Guan","year":"2012","journal-title":"Journal of Communication"},{"key":"IJDST.2018010102-12","doi-asserted-by":"publisher","DOI":"10.1109\/CC.2014.6827564"},{"key":"IJDST.2018010102-13","doi-asserted-by":"publisher","DOI":"10.1002\/9780470479216.corpsy0524"},{"key":"IJDST.2018010102-14","doi-asserted-by":"publisher","DOI":"10.1145\/1773394.1773400"},{"key":"IJDST.2018010102-15","doi-asserted-by":"publisher","DOI":"10.1109\/e-Science.2009.51"},{"key":"IJDST.2018010102-16","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391236"},{"key":"IJDST.2018010102-17","unstructured":"Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., & Kozuch, M. A. (2012b). Towards understanding heterogeneous clouds at scale: Google trace analysis (Tech. Rep, 84). Intel Science and Technology Center for Cloud Computing."},{"key":"IJDST.2018010102-18","unstructured":"Reiss, C., Wilkes, J., & Hellerstein, J. L. (2011). Google cluster-usage traces: format+ schema (White Paper). Google Inc."},{"key":"IJDST.2018010102-19","doi-asserted-by":"crossref","unstructured":"Rosa, A., Chen, L. Y., & Binder, W. (2015a). Understanding the dark side of big data clusters: An analysis beyond failures. In Proceedings of the 2015 45th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (pp. 207-218). IEEE.","DOI":"10.1109\/DSN.2015.37"},{"key":"IJDST.2018010102-20","doi-asserted-by":"publisher","DOI":"10.1109\/CCGrid.2015.139"},{"key":"IJDST.2018010102-21","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC-CSS-ICESS.2015.170"},{"key":"IJDST.2018010102-22","doi-asserted-by":"publisher","DOI":"10.1109\/INM.2011.5990537"},{"key":"IJDST.2018010102-23","doi-asserted-by":"publisher","DOI":"10.1109\/CloudCom.2012.6427566"},{"key":"IJDST.2018010102-24","first-page":"10","article-title":"Spark: Cluster computing with working sets.","volume":"10","author":"M.Zaharia","year":"2010","journal-title":"HotCloud"}],"container-title":["International Journal of Distributed Systems and Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=196265","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T12:27:55Z","timestamp":1651840075000},"score":1,"resource":{"primary":{"URL":"http:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/IJDST.2018010102"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2018,1]]},"references-count":25,"journal-issue":{"issue":"1"},"URL":"https:\/\/doi.org\/10.4018\/ijdst.2018010102","relation":{},"ISSN":["1947-3532","1947-3540"],"issn-type":[{"value":"1947-3532","type":"print"},{"value":"1947-3540","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,1]]}}}