{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T21:44:31Z","timestamp":1775771071789,"version":"3.50.1"},"reference-count":74,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,2,25]],"date-time":"2021-02-25T00:00:00Z","timestamp":1614211200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,2,25]],"date-time":"2021-02-25T00:00:00Z","timestamp":1614211200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In the era of global-scale services, organisations produce huge volumes of data, often distributed across multiple data centres, separated by vast geographical distances. While cluster computing applications, such as MapReduce and Spark, have been widely deployed in data centres to support commercial applications and scientific research, they are not designed for running jobs across geo-distributed data centres. The necessity to utilise such infrastructure introduces new challenges in the data analytics process due to bandwidth limitations of the inter-data-centre communication. In this article, we discuss challenges and survey the latest geo-distributed big-data analytics frameworks and schedulers (based on MapReduce and Spark) with WAN-bandwidth awareness.<\/jats:p>","DOI":"10.1186\/s40537-021-00427-9","type":"journal-article","created":{"date-parts":[[2021,2,25]],"date-time":"2021-02-25T13:03:29Z","timestamp":1614258209000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["A survey on bandwidth-aware geo-distributed frameworks for big-data analytics"],"prefix":"10.1186","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7300-7965","authenticated-orcid":false,"given":"Mohammed","family":"Bergui","sequence":"first","affiliation":[]},{"given":"Said","family":"Najah","sequence":"additional","affiliation":[]},{"given":"Nikola S.","family":"Nikolov","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,2,25]]},"reference":[{"key":"427_CR1","unstructured":"Tudoran R, Antoniu G, Boug\u00e9 L. SAGE: geo-distributed streaming data analysis in clouds. In: 2013 IEEE\ninternational symposium on parallel distributed processing, workshops and Phd Forum. 2013, vol. 2013, p. 2278\u201381."},{"key":"427_CR2","doi-asserted-by":"crossref","unstructured":"Tudoran R, Costan A, Wang R, Boug\u00e9 L, Bridging Antoniu G. Data in the clouds: an environment-aware system for geographically distributed data transfers. In: 2014 14th IEEE\/ACM international symposium on cluster, cloud and grid computing; 2014, p. 92\u2013101.","DOI":"10.1109\/CCGrid.2014.86"},{"key":"427_CR3","doi-asserted-by":"crossref","unstructured":"Cardosa M, Wang C, Nangia A, Chandra A, Weissman J. Exploring mapreduce efficiency with highly-distributed data. In: Proceedings of the second international workshop on mapreduce and its applications. ACM; 2011. p. 27\u201334.","DOI":"10.1145\/1996092.1996100"},{"issue":"3","key":"427_CR4","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1109\/TCC.2014.2355225","volume":"4","author":"B Heintz","year":"2016","unstructured":"Heintz B, Chandra A, Sitaraman RK, Weissman J. End-to-end optimization for geo-distributed mapreduce. IEEE Trans Cloud Comput. 2016;4(3):293\u2013306.","journal-title":"IEEE Trans Cloud Comput"},{"key":"427_CR5","unstructured":"Rabkin A, Arye M, Sen S, Pai V, Freedman MJ. Making every bit count in wide-area analytics. In: Presented as part of the 14th workshop on hot topics in operating systems. USENIX; 2013."},{"issue":"3","key":"427_CR6","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1016\/j.future.2012.09.001","volume":"29","author":"L Wang","year":"2013","unstructured":"Wang L, Tao J, Ranjan R, Marten H, Streit A, Chen J, et al. G-Hadoop: mapreduce across distributed data centers for data-intensive computing. Fut Gener Comput Syst. 2013;29(3):739\u201350.","journal-title":"Fut Gener Comput Syst"},{"issue":"1","key":"427_CR7","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1145\/1327452.1327492","volume":"51","author":"J Dean","year":"2008","unstructured":"Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107\u201313.","journal-title":"Commun ACM"},{"key":"427_CR8","unstructured":"Apache Hadoop. http:\/\/hadoop.apache.org\/."},{"key":"427_CR9","unstructured":"Apache Spark. http:\/\/spark.apache.org\/."},{"issue":"3","key":"427_CR10","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1145\/1272998.1273005","volume":"41","author":"M Isard","year":"2007","unstructured":"Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper Syst Rev. 2007;41(3):59\u201372.","journal-title":"SIGOPS Oper Syst Rev"},{"key":"427_CR11","doi-asserted-by":"crossref","unstructured":"Vulimiri A, Curino C, Godfrey PB, Jungblut T, Karanasos K, Padhye J, et\u00a0al. WANalytics: geo-distributed analytics for a data intensive world. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. SIGMOD \u201915. ACM; 2015. p. 1087\u20131092.","DOI":"10.1145\/2723372.2735365"},{"issue":"4","key":"427_CR12","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1145\/2829988.2787505","volume":"45","author":"Q Pu","year":"2015","unstructured":"Pu Q, Ananthanarayanan G, Bodik P, Kandula S, Akella A, Bahl P, et al. Low latency geo-distributed data analytics. SIGCOMM Comput Commun Rev. 2015;45(4):421\u201334.","journal-title":"SIGCOMM Comput Commun Rev"},{"issue":"1","key":"427_CR13","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1109\/TC.2013.121","volume":"63","author":"C Jayalath","year":"2014","unstructured":"Jayalath C, Stephen J, Eugster P. From the cloud to the atmosphere: running mapreduce across data centers. IEEE Trans Comput. 2014;63(1):74\u201387.","journal-title":"IEEE Trans Comput"},{"issue":"11","key":"427_CR14","doi-asserted-by":"publisher","first-page":"3229","DOI":"10.1109\/TPDS.2017.2717883","volume":"28","author":"A Jonathan","year":"2017","unstructured":"Jonathan A, Ryden M, Oh K, Chandra A, Weissman J. Nebula: distributed edge cloud for data intensive computing. IEEE Trans Parallel Distrib Syst. 2017;28(11):3229\u201342.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"427_CR15","doi-asserted-by":"crossref","unstructured":"Kettimuthu R, Agrawal G, Sadayappan P, Foster I. Differentiated scheduling of response-critical and best-effort wide-area data transfers. In: 2016 IEEE international parallel and distributed processing symposium (IPDPS); 2016. p. 1113\u201322.","DOI":"10.1109\/IPDPS.2016.97"},{"key":"427_CR16","doi-asserted-by":"crossref","unstructured":"Hu Z, Li B, Luo J. Flutter: scheduling tasks closer to data across geo-distributed datacenters. In: IEEE INFOCOM 2016\u2014The 35th Annual IEEE international conference on computer communications; 2016. p. 1\u20139.","DOI":"10.1109\/INFOCOM.2016.7524469"},{"issue":"4","key":"427_CR17","doi-asserted-by":"publisher","first-page":"134","DOI":"10.1145\/2043164.2018452","volume":"41","author":"S Sundaresan","year":"2011","unstructured":"Sundaresan S, de Donato W, Feamster N, Teixeira R, Crawford S, Pescap\u00e8 A. Broadband internet performance: a view from the gateway. SIGCOMM Comput Commun Rev. 2011;41(4):134\u201345.","journal-title":"SIGCOMM Comput Commun Rev"},{"key":"427_CR18","doi-asserted-by":"crossref","unstructured":"Sitaraman RK, Kasbekar M, Lichtenstein W, Jain M. 16. In: Overlay networks: an akamai perspective. Wiley; 2014. p. 305\u201328.","DOI":"10.1002\/9781118909690.ch16"},{"key":"427_CR19","doi-asserted-by":"crossref","unstructured":"Hung CC, Ananthanarayanan G, Golubchik L, Yu M, Zhang M. Wide-area Analytics with Multiple Resources. In: Proceedings of the thirteenth eurosys conference. EuroSys \u201918. ACM; 2018. p. 12:1\u201312:16.","DOI":"10.1145\/3190508.3190528"},{"key":"427_CR20","doi-asserted-by":"crossref","unstructured":"Zhou AC, Ibrahim S, He B. On achieving efficient data transfer for graph processing in geo-distributed datacenters. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS); 2017. p. 1397\u20131407.","DOI":"10.1109\/ICDCS.2017.98"},{"key":"427_CR21","doi-asserted-by":"crossref","unstructured":"Vulimiri A, Curino C, Godfrey PB, Jungblut T, Karanasos K, Padhye J, et\u00a0al. WANalytics: geo-distributed analytics for a data intensive world. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. SIGMOD \u201915. New York, NY, USA: Association for Computing Machinery; 2015. p. 1087\u201392.","DOI":"10.1145\/2723372.2735365"},{"issue":"2","key":"427_CR22","doi-asserted-by":"publisher","first-page":"72","DOI":"10.14778\/2850578.2850582","volume":"9","author":"K Kloudas","year":"2015","unstructured":"Kloudas K, Mamede M, Pregui\u00e7a N, Rodrigues R. Pixida: optimizing data parallel jobs in wide-area data analytics. Proc VLDB Endow. 2015;9(2):72\u201383.","journal-title":"Proc VLDB Endow"},{"key":"427_CR23","doi-asserted-by":"crossref","unstructured":"Jonathan A, Chandra A, Weissman J. Awan: locality-aware resource manager for geo-distributed data-intensive applications. In: 2016 IEEE international conference on cloud engineering (IC2E); 2016. p. 32\u201341.","DOI":"10.1109\/IC2E.2016.15"},{"issue":"2","key":"427_CR24","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1109\/TPDS.2019.2938164","volume":"31","author":"L Zhao","year":"2020","unstructured":"Zhao L, Yang Y, Munir A, Liu AX, Li Y, Qu W. Optimizing geo-distributed data analytics with coordinated task scheduling and routing. IEEE Trans Parallel Distrib Syst. 2020;31(2):279\u201393.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"12","key":"427_CR25","doi-asserted-by":"publisher","first-page":"2155","DOI":"10.14778\/3352063.3352132","volume":"12","author":"Y Huang","year":"2019","unstructured":"Huang Y, Shi Y, Zhong Z, Feng Y, Cheng J, Li J, et al. Yugong: geo-distributed data and job placement at scale. Proc VLDB Endow. 2019;12(12):2155\u201369.","journal-title":"Proc VLDB Endow"},{"key":"427_CR26","unstructured":"Iordache A, Morin C, Parlavantzas N, Feller E, Riteau P, Resilin: elastic mapreduce over multiple clouds. In: 13th IEEE\/ACM international symposium on cluster. Cloud, and Grid Computing. 2013;2013:261\u20138."},{"key":"427_CR27","doi-asserted-by":"crossref","unstructured":"Ananthanarayanan R, Basker V, Das S, Gupta A, Jiang H, Qiu T, et\u00a0al. Photon: fault-tolerant and scalable joining of continuous data streams. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. SIGMOD \u201913. New York, NY, USA: Association for Computing Machinery; 2013. p. 577\u201388.","DOI":"10.1145\/2463676.2465272"},{"key":"427_CR28","unstructured":"Hsieh K, Harlap A, Vijaykumar N, Konomis D, Ganger GR, Gibbons PB, et\u00a0al. Gaia: geo-distributed machine learning approaching LAN speeds. In: 14th USENIX symposium on networked systems design and implementation (NSDI 17). Boston, MA: USENIX Association; 2017. p. 629\u201347."},{"issue":"1","key":"427_CR29","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1109\/TBDATA.2017.2723473","volume":"5","author":"S Dolev","year":"2019","unstructured":"Dolev S, Florissi P, Gudes E, Sharma S, Singer I. A survey on geographically distributed big-data processing using mapreduce. IEEE Trans Big Data. 2019;5(1):60\u201380.","journal-title":"IEEE Trans Big Data"},{"issue":"2","key":"427_CR30","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1109\/TST.2016.7442496","volume":"21","author":"S Ji","year":"2016","unstructured":"Ji S, Li B. Wide area analytics for geographically distributed datacenters. Tsinghua Sci Technol. 2016;21(2):125\u201335.","journal-title":"Tsinghua Sci Technol"},{"key":"427_CR31","unstructured":"WT Hadoop: The definitive guide. 4th ed. Newton: O\u2019Reilly Media, Inc.; 2015."},{"key":"427_CR32","unstructured":"Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, et\u00a0al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. NSDI\u201912. USENIX Association; 2012. p. 2."},{"key":"427_CR33","doi-asserted-by":"crossref","unstructured":"Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I. Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles. SOSP \u201913. ACM; 2013. p. 423\u201338.","DOI":"10.1145\/2517349.2522737"},{"key":"427_CR34","unstructured":"Carbone P, Katsifodimos A, Kth, Sweden S, Ewen S, Markl V, et\u00a0al. Apache flink: stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Committee Data Eng. 2015;38(4):28\u201338."},{"key":"427_CR35","unstructured":"Apache Storm. http:\/\/storm.apache.org\/."},{"key":"427_CR36","unstructured":"European Commission press release. Commission to pursue role as honest broker in future global negotiations on internet governance. http:\/\/tinyurl.com\/k8xcvy4."},{"key":"427_CR37","unstructured":"Zhang X, Qian Z, Zhang S, Li Y, Li X, Wang X, et\u00a0al. Towards reliable (and efficient) job executions in a practical geo-distributed data analytics system. arXiv e-prints. 2018, p. arXiv:1802.00245."},{"key":"427_CR38","unstructured":"Viswanathan R, Ananthanarayanan G, Akella A. CLARINET: WAN-aware optimization for analytics queries. In: Proceedings of the 12th USENIX conference on operating systems design and implementation. OSDI\u201916. USENIX Association; 2016. p. 435\u201350."},{"issue":"4","key":"427_CR39","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1145\/2534169.2486019","volume":"43","author":"S Jain","year":"2013","unstructured":"Jain S, Kumar A, Mandal S, Ong J, Poutievski L, Singh A, et al. B4: experience with a globally-deployed software defined wan. SIGCOMM Comput Commun Rev. 2013;43(4):3\u201314.","journal-title":"SIGCOMM Comput Commun Rev"},{"issue":"4","key":"427_CR40","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1145\/2534169.2486012","volume":"43","author":"CY Hong","year":"2013","unstructured":"Hong CY, Kandula S, Mahajan R, Zhang M, Gill V, Nanduri M, et al. Achieving high utilization with software-driven WAN. SIGCOMM Comput Commun Rev. 2013;43(4):15\u201326.","journal-title":"SIGCOMM Comput Commun Rev"},{"key":"427_CR41","doi-asserted-by":"crossref","unstructured":"Calder M, Fan X, Hu Z, Katz-Bassett E, Heidemann J, Govindan R. Mapping the expansion of Google\u2019s serving infrastructure. In: Proceedings of the 2013 conference on internet measurement conference. IMC \u201913. ACM; 2013. p. 313\u201326.","DOI":"10.1145\/2504730.2504754"},{"key":"427_CR42","unstructured":"Wang H, Li B. Lube: mitigating bottlenecks in wide area data analytics. In: Proceedings of the 9th USENIX conference on hot topics in cloud computing. HotCloud\u201917. USENIX Association; 2017. p. 1."},{"key":"427_CR43","doi-asserted-by":"crossref","unstructured":"Costa PARS, Bai X, Ramos FMV, Correia M. Medusa: an efficient cloud fault-tolerant MapReduce. In: 2016\n16th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid); 2016, p. 443\u201352.","DOI":"10.1109\/CCGrid.2016.20"},{"key":"427_CR44","doi-asserted-by":"crossref","unstructured":"Costa PARS, Ramos FMV, Correia M, Chrysaor: Fine-Grained, Fault-Tolerant Cloud-of-Clouds MapReduce. In: 2017 IEEE International Conference on Computer and Information Technology (CIT); 2017, p. 421\u201330.","DOI":"10.1109\/CCGRID.2017.89"},{"issue":"1","key":"427_CR45","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1007\/s00607-017-0564-7","volume":"100","author":"MW Convolbo","year":"2018","unstructured":"Convolbo MW, Chou J, Hsu CH, Chung YC. GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers. Computing. 2018;100(1):21\u201346.","journal-title":"Computing"},{"issue":"6","key":"427_CR46","doi-asserted-by":"publisher","first-page":"1785","DOI":"10.1109\/TPDS.2016.2626285","volume":"28","author":"P Li","year":"2017","unstructured":"Li P, Guo S, Miyazaki T, Liao X, Jin H, Zomaya AY, et al. Traffic-aware geo-distributed big data analytics with predictable job completion time. IEEE Trans Parallel Distrib Syst. 2017;28(6):1785\u201396.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"427_CR47","doi-asserted-by":"crossref","unstructured":"Zhang G, Wang H, Luan Z, Wu W, Qian D. Improving performance for geo-distributed data process in wide-area. In: 2017 IEEE international conference on computer and information technology (CIT); 2017. p. 162\u20137.","DOI":"10.1109\/CIT.2017.48"},{"key":"427_CR48","doi-asserted-by":"crossref","unstructured":"Zhang H, Ramapantulu L, Teo YM. Harmony: an approach for geo-distributed processing of big-data applications. In: 2019 IEEE international conference on cluster computing (CLUSTER); 2019. p. 1\u201311.","DOI":"10.1109\/CLUSTER.2019.8891053"},{"key":"427_CR49","doi-asserted-by":"crossref","unstructured":"Oh K, Chandra A, Network Weissman J. A system cost-aware geo-distributed data analytics. In: 20th IEEE\/ACM international symposium on cluster. Cloud and Internet Computing (CCGRID). 2020, P. 649\u201358.","DOI":"10.1109\/CCGrid49817.2020.00-28"},{"key":"427_CR50","doi-asserted-by":"crossref","unstructured":"Wu D, Sakr S, Zhu L, Towards WuH, analytics big data, across multiple clusters. In: 17th IEEE\/ACM international symposium on cluster. Cloud and Grid Computing (CCGRID). 2017,p. 218\u201327.","DOI":"10.1109\/CCGRID.2017.73"},{"key":"427_CR51","doi-asserted-by":"crossref","unstructured":"Wang H, Niu D, Li B. Dynamic and decentralized global analytics via machine learning. In: Proceedings of the ACM symposium on cloud computing. SoCC \u201918. ACM; 2018. p. 14\u201325.","DOI":"10.1145\/3267809.3267812"},{"key":"427_CR52","doi-asserted-by":"crossref","unstructured":"Li H, Xu H, Nutanong S. Bohr: similarity aware geo-distributed data analytics. In: 9th USENIX workshop on hot topics in cloud computing (HotCloud 17). USENIX Association; 2017.","DOI":"10.1145\/3281411.3281418"},{"issue":"6","key":"427_CR53","doi-asserted-by":"publisher","first-page":"1434","DOI":"10.1109\/TPDS.2018.2880189","volume":"30","author":"W Li","year":"2019","unstructured":"Li W, Niu D, Liu Y, Liu S, Li B. Wide-area spark streaming: automated routing and batch sizing. IEEE Trans Parallel Distrib Syst. 2019;30(6):1434\u201348.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"427_CR54","doi-asserted-by":"crossref","unstructured":"Jonathan A, Chandra A, Weissman J. Multi-query optimization in wide-area streaming analytics. In: Proceedings of the ACM symposium on cloud computing. SoCC \u201918. ACM; 2018. p. 412\u201325.","DOI":"10.1145\/3267809.3267842"},{"key":"427_CR55","doi-asserted-by":"crossref","unstructured":"Jonathan A, Chandra A, Weissman J. WASP: wide-area adaptive stream processing. In: Proceedings of the 21st international middleware conference. Middleware \u201920. New York, NY, USA: Association for Computing Machinery; 2020. p. 221\u201335.","DOI":"10.1145\/3423211.3425668"},{"key":"427_CR56","unstructured":"Rabkin A, Arye M, Sen S, Pai VS, Freedman MJ. Aggregation and degradation in JetStream: streaming analytics in the wide area. In: Proceedings of the 11th USENIX conference on networked systems design and implementation. NSDI\u201914. USENIX Association; 2014. p. 275\u201388."},{"issue":"4","key":"427_CR57","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1109\/MCC.2017.3791027","volume":"4","author":"PARS Costa","year":"2017","unstructured":"Costa PARS, Ramos FMV, Correia M. On the design of resilient multicloud mapreduce. IEEE Cloud Comput. 2017;4(4):74\u201382.","journal-title":"IEEE Cloud Comput"},{"issue":"3","key":"427_CR58","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1145\/502807.502808","volume":"33","author":"E Ch\u00e1vez","year":"2001","unstructured":"Ch\u00e1vez E, Navarro G, Baeza-Yates R, Marroqu\u00edn JL. Searching in metric spaces. ACM Comput Surv. 2001;33(3):273\u2013321.","journal-title":"ACM Comput Surv"},{"key":"427_CR59","volume-title":"Time series analysis, forecasting and control","author":"GEP Box","year":"1990","unstructured":"Box GEP, Jenkins G. Time series analysis, forecasting and control. San Francisco: Holden-Day Inc.; 1990."},{"key":"427_CR60","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1007\/978-3-642-40725-3_19","volume-title":"Computer performance engineering","author":"T Chis","year":"2013","unstructured":"Chis T. Sliding hidden markov model for evaluating discrete data. In: Balsamo MS, Knottenbelt WJ, Marin A, editors. Computer performance engineering. Berlin Heidelberg: Springer; 2013. p. 251\u201362."},{"key":"427_CR61","doi-asserted-by":"crossref","unstructured":"Ousterhout K, Wendell P, \u00a0 M, Stoica I. Sparrow: distributed, low latency scheduling. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles. SOSP \u201913. ACM; 2013. p. 69\u201384.","DOI":"10.1145\/2517349.2522716"},{"key":"427_CR62","unstructured":"Gurobi Optimization. http:\/\/www.gurobi.com\/."},{"key":"427_CR63","doi-asserted-by":"crossref","unstructured":"Zaharia M, Borthakur D, Sen\u00a0Sarma J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on computer systems. EuroSys \u201910. ACM; 2010. p. 265\u201378.","DOI":"10.1145\/1755913.1755940"},{"issue":"4","key":"427_CR64","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1145\/2740070.2626334","volume":"44","author":"R Grandl","year":"2014","unstructured":"Grandl R, Ananthanarayanan G, Kandula S, Rao S, Akella A. Multi-resource packing for cluster schedulers. SIGCOMM Comput Commun Rev. 2014;44(4):455\u201366.","journal-title":"SIGCOMM Comput Commun Rev"},{"key":"427_CR65","unstructured":"TPC-DS Decision Support Benchmark. http:\/\/www.tpc.org\/tpcds."},{"key":"427_CR66","unstructured":"Big Data Benchmark. https:\/\/amplab.cs.berkeley.edu\/benchmark\/."},{"key":"427_CR67","doi-asserted-by":"crossref","unstructured":"Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, et\u00a0al. Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing. SOCC \u201913. ACM; 2013. p. 5:1\u20135:16.","DOI":"10.1145\/2523616.2523633"},{"key":"427_CR68","unstructured":"Hunt P, Konar M, Junqueira FP, Reed B. ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference. USENIXATC\u201910. USENIX Association; 2010. p. 11."},{"key":"427_CR69","doi-asserted-by":"crossref","unstructured":"Cuzzocrea A, Bellatreche L, Song IY. Data Warehousing and OLAP over big data: current challenges and future research directions. In: Proceedings of the sixteenth international workshop on data warehousing and OLAP. DOLAP \u201913. ACM; 2013. p. 67\u201370.","DOI":"10.1145\/2513190.2517828"},{"key":"427_CR70","doi-asserted-by":"crossref","unstructured":"Wang L, Tao J, Marten H, Streit A, Khan SU, Kolodziej J, et\u00a0al. MapReduce across distributed clusters for data-intensive applications. In: 2012 IEEE 26th international parallel and distributed processing symposium workshops PhD Forum; 2012. p. 2004\u201311.","DOI":"10.1109\/IPDPSW.2012.249"},{"key":"427_CR71","unstructured":"Conviva. https:\/\/www.conviva.com\/datasheets\/experience-benchmarks."},{"key":"427_CR72","doi-asserted-by":"crossref","unstructured":"Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and trends\u00aein machine learning. 2011;3(1):1\u2013122. 10.1561\/2200000016.","DOI":"10.1561\/2200000016"},{"key":"427_CR73","unstructured":"Twitter Streaming API\u2019s. https:\/\/developer.twitter.com\/en\/docs."},{"key":"427_CR74","doi-asserted-by":"crossref","unstructured":"Fan X, Lang B, Zhou Y, Zang T. Adding network bandwidth resource management to Hadoop YARN. In: 2017 seventh international conference on information science and technology (ICIST); 2017. p. 444\u20139.","DOI":"10.1109\/ICIST.2017.7926801"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00427-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s40537-021-00427-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00427-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,25]],"date-time":"2021-02-25T13:41:34Z","timestamp":1614260494000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-021-00427-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,25]]},"references-count":74,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["427"],"URL":"https:\/\/doi.org\/10.1186\/s40537-021-00427-9","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,25]]},"assertion":[{"value":"18 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 February 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 February 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}],"article-number":"40"}}