{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:31:27Z","timestamp":1772166687608,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,7,6]],"date-time":"2020-07-06T00:00:00Z","timestamp":1593993600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,7,6]],"date-time":"2020-07-06T00:00:00Z","timestamp":1593993600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new method to estimate the runtime of a job. For this purpose, after analysis the anatomy of processing a job in Hadoop MapReduce precisely, we consider two cases: when a job runs for the first time or a job has run previously. In the first case, by considering essential and efficient parameters that higher impact on runtime we formulate each phase of the Hadoop execution pipeline and state them by mathematical expressions to calculate runtime of a job. In the second case, by referring to the profile or history of a job in the database and use a weighting system the runtime is estimated. The results show the average error rate is less than 12% in the estimation of runtime for the first run and less than 8.5% when the profile or history of the job has existed.<\/jats:p>","DOI":"10.1186\/s40537-020-00319-4","type":"journal-article","created":{"date-parts":[[2020,7,6]],"date-time":"2020-07-06T06:02:58Z","timestamp":1594015378000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Estimating runtime of a job in Hadoop MapReduce"],"prefix":"10.1186","volume":"7","author":[{"given":"Narges","family":"Peyravi","sequence":"first","affiliation":[]},{"given":"Ali","family":"Moeini","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,7,6]]},"reference":[{"key":"319_CR1","volume-title":"Hadoop: the definitive guide","author":"T White","year":"2015","unstructured":"White T. Hadoop: the definitive guide. 4th ed. Newton: O\u2019Reilly Media, Inc; 2015.","edition":"4"},{"key":"319_CR2","volume-title":"Hadoop MapReduce Cookbook","author":"S Perera","year":"2013","unstructured":"Perera S. Hadoop MapReduce Cookbook. Birmingham: Packt Publishing Ltd; 2013."},{"key":"319_CR3","volume-title":"Expert Hadoop administration: managing, tuning, and securing spark, YARN, and HDFS","author":"SR Alapati","year":"2016","unstructured":"Alapati SR. Expert Hadoop administration: managing, tuning, and securing spark, YARN, and HDFS. Boston: Addison-Wesley Professional; 2016."},{"issue":"1","key":"319_CR4","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1186\/s40537-019-0236-x","volume":"6","author":"S Heidari","year":"2019","unstructured":"Heidari S, Alborzi M, Radfar R, Afsharkazemi MA, Ghatari AR. Big data clustering with varied density based on MapReduce. J Big Data. 2019;6(1):77.","journal-title":"J Big Data"},{"issue":"1","key":"319_CR5","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1186\/s40537-016-0051-6","volume":"3","author":"R Singh","year":"2016","unstructured":"Singh R, Kaur PJ. Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud. J Big Data. 2016;3(1):19.","journal-title":"J Big Data"},{"issue":"3","key":"319_CR6","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1002\/nem.1928","volume":"26","author":"Z Liu","year":"2016","unstructured":"Liu Z, Zhang Q, Boutaba R, Liu Y, Gong Z. ROUTE: run-time robust reducer workload estimation for MapReduce. Int J Network Manage. 2016;26(3):224\u201344.","journal-title":"Int J Network Manage"},{"issue":"2","key":"319_CR7","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1109\/TPDS.2015.2405552","volume":"27","author":"M Khan","year":"2015","unstructured":"Khan M, Jin Y, Li M, Xiang Y, Jiang C. Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans Parallel Distrib Syst. 2015;27(2):441\u201354.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"319_CR8","unstructured":"Khan M. Hadoop performance modeling and job optimization for big data analytics. Doctoral dissertation, Brunel University London."},{"issue":"9","key":"319_CR9","doi-asserted-by":"publisher","first-page":"1386","DOI":"10.3390\/s16091386","volume":"16","author":"Q Liu","year":"2016","unstructured":"Liu Q, Cai W, Jin D, Shen J, Fu Z, Liu X, Linge N. Estimation accuracy on execution time of run-time tasks in a heterogeneous distributed environment. Sensors. 2016;16(9):1386.","journal-title":"Sensors."},{"issue":"6","key":"319_CR10","doi-asserted-by":"publisher","first-page":"14061","DOI":"10.1007\/s10586-018-2234-8","volume":"22","author":"R Ramanathan","year":"2019","unstructured":"Ramanathan R, Latha B. Towards optimal resource provisioning for Hadoop-MapReduce jobs using scale-out strategy and its performance analysis in private cloud environment. Cluster Comput. 2019;22(6):14061\u201371.","journal-title":"Cluster Comput"},{"issue":"4","key":"319_CR11","doi-asserted-by":"publisher","first-page":"3465","DOI":"10.1007\/s11277-016-3786-7","volume":"94","author":"YJ Chen","year":"2017","unstructured":"Chen YJ, Horng GJ, Cheng ST, Wang HC. Forming spn-MapReduce model for estimation job execution time in cloud computing. Wireless Pers Commun. 2017;94(4):3465\u201393.","journal-title":"Wireless Pers Commun"},{"issue":"1","key":"319_CR12","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1134\/S0361768816010059","volume":"42","author":"VP Kozyrev","year":"2016","unstructured":"Kozyrev VP. Estimation of the execution time in real-time systems. Program Comput Softw. 2016;42(1):41\u20138.","journal-title":"Program Comput Softw"},{"key":"319_CR13","doi-asserted-by":"crossref","unstructured":"Amannejad Y, Shah S, Krishnamurthy D, Wang M. Fast and lightweight execution time predictions for spark applications. In: 2019 IEEE 12th international conference on cloud computing (CLOUD). IEEE; 2019, p. 493\u20135.","DOI":"10.1109\/CLOUD.2019.00088"},{"issue":"3","key":"319_CR14","doi-asserted-by":"publisher","first-page":"737","DOI":"10.1007\/s10586-018-2849-9","volume":"22","author":"G Kecskemeti","year":"2019","unstructured":"Kecskemeti G, Nemeth Z, Kertesz A, Ranjan R. Cloud workload prediction based on workflow execution time discrepancies. Cluster Comput. 2019;22(3):737\u201355.","journal-title":"Cluster Comput"},{"issue":"118","key":"319_CR15","doi-asserted-by":"publisher","first-page":"316","DOI":"10.1016\/j.jpdc.2017.11.001","volume":"1","author":"Z Lu","year":"2018","unstructured":"Lu Z, Wang N, Wu J, Qiu M. IoTDeM: an IoT Big Data-oriented MapReduce performance prediction extended model in multiple edge clouds. J Parallel Distrib Comput. 2018;1(118):316\u201327.","journal-title":"J Parallel Distrib Comput"},{"key":"319_CR16","unstructured":"Uvaneshwari M, Kumar NS. Load Balancing and Runtime Prediction using Map Reduce Framework. International Journal of Civil Engineering & Technology (IJCIET); 2017, p. 834\u201342."},{"key":"319_CR17","doi-asserted-by":"crossref","unstructured":"Song G, Meng Z, Huet F, Magoules F, Yu L, Lin X. A hadoop mapreduce performance prediction method. In: 2013 IEEE 10th international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing. IEEE; 2013, p. 820\u20135.","DOI":"10.1109\/HPCC.and.EUC.2013.118"},{"issue":"10","key":"319_CR18","doi-asserted-by":"publisher","first-page":"216","DOI":"10.1016\/j.ieri.2014.09.080","volume":"1","author":"AM Chirkin","year":"2014","unstructured":"Chirkin AM, Kovalchuk SV. Towards better workflow execution time estimation. IERI Procedia. 2014;1(10):216\u201323.","journal-title":"IERI Procedia"},{"key":"319_CR19","doi-asserted-by":"crossref","unstructured":"Verma A, Cherkasova L, Campbell RH. Resource provisioning framework for mapreduce jobs with performance goals. In: ACM\/IFIP\/USENIX international conference on distributed systems platforms and open distributed processing. Springer, Berlin, Heidelberg; 2011, p. 165\u201386","DOI":"10.1007\/978-3-642-25821-3_9"},{"key":"319_CR20","unstructured":"Li J. Time estimation for large scale of data processing in Hadoop MapReduce scenario. Master\u2019s thesis, Universitetet i Agder\/University of Agder."},{"key":"319_CR21","unstructured":"Wang G. Evaluating mapreduce system performance: a simulation approach. Doctoral dissertation, Virginia Tech."},{"key":"319_CR22","volume-title":"Optimizing Hadoop for MapReduce","author":"K Tannir","year":"2014","unstructured":"Tannir K. Optimizing Hadoop for MapReduce. Birmingham: Packt Publishing Ltd; 2014."},{"key":"319_CR23","unstructured":"Lattyak WJ, Stokes HH. Exponential smoothing forecasting using SCAB34S and SCA WorkBench."},{"key":"319_CR24","unstructured":"https:\/\/en.wikipedia.org\/wiki\/Moving_average; This page was last edited on 19 November 2018."},{"issue":"3","key":"319_CR25","doi-asserted-by":"publisher","first-page":"1247","DOI":"10.5194\/gmd-7-1247-2014","volume":"7","author":"T Chai","year":"2014","unstructured":"Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)?\u2014arguments against avoiding RMSE in the literature. Geosci Model Dev. 2014;7(3):1247\u201350.","journal-title":"Geosci Model Dev"},{"key":"319_CR26","unstructured":"https:\/\/www.statisticshowto.datasciencecentral.com\/mean-absolute-percentage-error-mape\/; This page was last edited on 2019."},{"key":"319_CR27","unstructured":"https:\/\/www.forecastpro.com\/Trends\/forecasting101August2011.html; This page was last edited on 2019."},{"key":"319_CR28","unstructured":"http:\/\/hadoop.apache.org\/docs\/r2.9.1\/hadoop-project-dist\/hadoop-common\/DeprecatedProperties.html;Last. Published: 2018-04-16."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00319-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-020-00319-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00319-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,5]],"date-time":"2021-07-05T19:39:44Z","timestamp":1625513984000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-020-00319-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,6]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["319"],"URL":"https:\/\/doi.org\/10.1186\/s40537-020-00319-4","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.2.20701\/v1","asserted-by":"object"},{"id-type":"doi","id":"10.21203\/rs.2.20701\/v2","asserted-by":"object"}]},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,6]]},"assertion":[{"value":"8 January 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 June 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 July 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"44"}}