{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T14:20:53Z","timestamp":1776781253814,"version":"3.51.2"},"reference-count":28,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2018,9,28]],"date-time":"2018-09-28T00:00:00Z","timestamp":1538092800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Developers of resource-allocation and scheduling algorithms share test datasets (i.e., benchmarks) to enable others to compare the performance of newly developed algorithms. However, mostly it is hard to acquire real cloud datasets due to the users\u2019 data confidentiality issues and policies maintained by Cloud Service Providers (CSP). Accessibility of large-scale test datasets, depicting the realistic high-performance computing requirements of cloud users, is very limited. Therefore, the publicly available real cloud dataset will significantly encourage other researchers to compare and benchmark their applications using an open-source benchmark. To meet these objectives, the contemporary state of the art has been scrutinized to explore a real workload behavior in Google cluster traces. Starting from smaller- to moderate-size cloud computing infrastructures, the dataset generation process is demonstrated using the Monte Carlo simulation method to produce a Google Cloud Jobs (GoCJ) dataset based on the analysis of Google cluster traces. With this article, the dataset is made publicly available to enable other researchers in the field to investigate and benchmark their scheduling and resource-allocation schemes for the cloud. The GoCJ dataset is archived and available on the Mendeley Data repository.<\/jats:p>","DOI":"10.3390\/data3040038","type":"journal-article","created":{"date-parts":[[2018,9,28]],"date-time":"2018-09-28T10:31:28Z","timestamp":1538130688000},"page":"38","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":81,"title":["GoCJ: Google Cloud Jobs Dataset for Distributed and Cloud Computing Infrastructures"],"prefix":"10.3390","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0558-1380","authenticated-orcid":false,"given":"Altaf","family":"Hussain","sequence":"first","affiliation":[{"name":"Department of Computer Science, Faculty of Computing, Capital University of Science and Technology, Islamabad 44000, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8342-5757","authenticated-orcid":false,"given":"Muhammad","family":"Aleem","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Computing, Capital University of Science and Technology, Islamabad 44000, Pakistan"}]}],"member":"1968","published-online":{"date-parts":[[2018,9,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Makonin, S., Wang, Z.J., and Tumpach, Z.J. (2018). \u2018RAE: The Rainforest Automation Energy Dataset for Smart Grid Meter Data Analysis\u2019. Data, 3.","DOI":"10.3390\/data3010008"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1007\/s10586-013-0275-6","article-title":"\u2018HSGA: A hybrid heuristic algorithm for workflow scheduling in cloud systems HSGA: A hybrid heuristic algorithm for workflow scheduling in cloud systems\u2019","volume":"17","author":"Ghorbannia","year":"2014","journal-title":"Cluster Comput."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1366","DOI":"10.1109\/TPDS.2012.240","article-title":"\u2018Managing Overloaded Hosts for Dynamic Consolidation of Virtual Machines in Cloud Data Centers under Quality of Service Constraints\u2019","volume":"24","author":"Beloglazov","year":"2013","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Yeo, C.S., and Buyya, R. (2005, January 27\u201330). Service Level Agreement based Allocation of Cluster Resources: Handling Penalty to Enhance Utility. Proceedings of the 7th IEEE International Conference on Cluster Computing, Burlington, MA, USA.","DOI":"10.1109\/CLUSTR.2005.347075"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1321","DOI":"10.1007\/s11227-014-1089-x","article-title":"A systematic review on cloud computing","volume":"68","author":"Durao","year":"2014","journal-title":"J. Supercomput."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1145\/1925861.1925869","article-title":"Dynamically Scaling Applications in the Cloud","volume":"41","author":"Vaquero","year":"2011","journal-title":"ACM SIGCOMM Comput. Commun. Rev."},{"key":"ref_7","first-page":"21","article-title":"Scheduling in Cloud Computing","volume":"4","author":"Tripathy","year":"2014","journal-title":"Int. J. Cloud Comput. Serv. Archit."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"11459","DOI":"10.1007\/s11042-017-5495-y","article-title":"Big network traffic data visualization","volume":"77","author":"Ruan","year":"2018","journal-title":"Multimed. Tools Appl."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.future.2018.04.050","article-title":"Multi-threaded learning control mechanism for neural networks","volume":"87","author":"Wei","year":"2018","journal-title":"Futur. Gener. Comput. Syst."},{"key":"ref_10","first-page":"19","article-title":"Performance tests on merge sort and recursive merge sort for big data processing","volume":"21","year":"2018","journal-title":"Tech. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Marsza\u0142ek, Z. (2017). Parallelization of Modified Merge Sort Algorithm. Symmetry, 9.","DOI":"10.3390\/sym9090176"},{"key":"ref_12","unstructured":"(2018, August 24). Heterogeneous Computing Scheduling Problem (HCSP) Instances. Available online: https:\/\/www.fing.edu.uy\/inco\/grupos\/cecal\/hpc\/HCSP\/HCSP_inst.htm."},{"key":"ref_13","unstructured":"Ali, S., Siegel, H.J., Maheswaran, M., Hensgen, D., and Ali, S. (2000, January 1). Task execution time modeling for heterogeneous computing systems. Proceedings of the 9th Heterogeneous Computing Workshop, Cancun, Mexico."},{"key":"ref_14","unstructured":"(2018, August 24). Google cluster traces. Available online: https:\/\/github.com\/google\/cluster-data."},{"key":"ref_15","unstructured":"(2018, August 24). Yahoo Cluster traces. Available online: https:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=s&guccounter=1."},{"key":"ref_16","unstructured":"(2018, August 22). Facebook Hadoop Workload. Available online: https:\/\/github.com\/SWIMProjectUCB\/SWIM\/wiki\/Workloads-repository."},{"key":"ref_17","unstructured":"(2018, August 20). OpenCloud Hadoop workload. Available online: http:\/\/ftp.pdl.cmu.edu\/pub\/datasets\/hla\/."},{"key":"ref_18","unstructured":"(2018, August 20). Eucalyptus IaaS cloud Workload. Available online: https:\/\/www.cs.ucsb.edu\/~rich\/workload\/."},{"key":"ref_19","unstructured":"(2018, August 24). GWA-T-12 traces. Available online: http:\/\/gwa.ewi.tudelft.nl\/datasets\/gwa-t-12-bitbrains."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Hussain, A., Aleem, M., Khan, A., Iqbal, M.A., and Islam, M.A. (2018). RALBA: A computation-aware load balancing scheduler for cloud computing. Clust. Comput., 1\u201314.","DOI":"10.1007\/s10586-018-2414-6"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, Z., and Cho, S. (2012, January 10\u201313). Characterizing machines and workloads on a Google cluster. Proceedings of the 41st International Conference on Parallel Processing Workshops, Pittsburgh, PA, USA.","DOI":"10.1109\/ICPPW.2012.57"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Moreno, I.S., Garraghan, P., Townend, P., and Xu, J. (2013, January 25\u201328). An approach for characterizing workloads in google cloud to derive realistic resource utilization models. Proceedings of the 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering, Redwood City, CA, USA.","DOI":"10.1109\/SOSE.2013.24"},{"key":"ref_23","unstructured":"Chen, Y., Ganapathi, A.S., Griffith, R., and Katz, R.H. (2018, August 24). Analysis and Lessons from a Publicly Available Google Cluster Trace. Available online: https:\/\/www2.eecs.berkeley.edu\/Pubs\/TechRpts\/2010\/EECS-2010-95.html."},{"key":"ref_24","unstructured":"Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., and Kozuch, M.A. (2018, August 22). Towards understanding heterogeneous clouds at scale: Google trace analysis. Available online: http:\/\/www.pdl.cmu.edu\/PDL-FTP\/CloudComputing\/ISTC-CC-TR-12-101.pdf."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kavulya, S., Tany, J., Gandhi, R., and Narasimhan, P. (2010, January 17\u201320). An analysis of traces from a production MapReduce cluster. Proceedings of the 11th IEEE\/ACM International Conference on Grid Computing (CCGrid), Melbourne, Australia.","DOI":"10.1109\/CCGRID.2010.112"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/j.jnca.2016.12.017","article-title":"An adaptive prediction approach based on workload pattern discrimination in the cloud","volume":"80","author":"Liu","year":"2017","journal-title":"J. Netw. Comput. Appl."},{"key":"ref_27","unstructured":"Hussain, A., and Aleem, M. (2018, August 24). GoCJ: Google Cloud Jobs Dataset, 2018. Available online: https:\/\/data.mendeley.com\/datasets\/b7bp6xhrcd\/1."},{"key":"ref_28","unstructured":"Mason, S.J., Hill, R.R., M\u00f6nch, L., Rose, O., Jefferson, T., and Fowler, J.W. (2008, January 7\u201310). Introduction to Monte Carlo Simulation. Proceedings of the 2008 Winter Simulation Conference, Miami, FL, USA."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/3\/4\/38\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:23:01Z","timestamp":1760196181000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/3\/4\/38"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,9,28]]},"references-count":28,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2018,12]]}},"alternative-id":["data3040038"],"URL":"https:\/\/doi.org\/10.3390\/data3040038","relation":{"is-referenced-by":[{"id-type":"doi","id":"10.1007\/s00607-025-01556-2","asserted-by":"object"}]},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,9,28]]}}}