{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T18:40:18Z","timestamp":1776278418801,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2019,11,30]],"date-time":"2019-11-30T00:00:00Z","timestamp":1575072000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2019,11,30]],"date-time":"2019-11-30T00:00:00Z","timestamp":1575072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Due to the advent of new technologies, devices, and communication tools such as social networking sites, the amount of data produced by mankind is growing rapidly every year. Big data is a collection of large datasets that cannot be processed using traditional computing techniques. MapReduce has been introduced to solve large-data computational problems. It is specifically designed to run on commodity hardware, and it depends on dividing and conquering principles. Nowadays, the focus of researchers has shifted towards Hadoop MapReduce. One of the most outstanding characteristics of MapReduce is data locality-aware scheduling. Data locality-aware scheduler is a further efficient solution to optimize one or a set of performance metrics such as data locality, energy consumption and job completion time. Similar to all situations, time and scheduling are the most important aspects of the MapReduce framework. Therefore, many scheduling algorithms have been proposed in the past decades. The main ideas of these algorithms are increasing data locality rate and decreasing the response and completion time. In this paper, a new hybrid scheduling algorithm has been proposed, which uses dynamic priority and localization ID techniques and focuses on increasing data locality rate and decreasing completion time. The proposed algorithm was evaluated and compared with Hadoop default schedulers (FIFO, Fair), by running concurrent workloads consisting of Wordcount and Terasort benchmarks. The experimental results show that the proposed algorithm is faster than FIFO and Fair scheduling, achieves higher data locality rate and avoids wasting resources.<\/jats:p>","DOI":"10.1186\/s40537-019-0253-9","type":"journal-article","created":{"date-parts":[[2019,12,2]],"date-time":"2019-12-02T10:41:27Z","timestamp":1575283287000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework"],"prefix":"10.1186","volume":"6","author":[{"given":"Abolfazl","family":"Gandomi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Midia","family":"Reshadi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ali","family":"Movaghar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ahmad","family":"Khademzadeh","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2019,11,30]]},"reference":[{"issue":"12","key":"253_CR1","doi-asserted-by":"publisher","first-page":"2014","DOI":"10.14778\/2367502.2367562","volume":"5","author":"J Dittrich","year":"2012","unstructured":"Dittrich J, Quian\u00e9-Ruiz JA. Efficient big data processing in Hadoop, MapReduce. Proceedings of the VLDB Endowment. 2012;5(12):2014\u20135. https:\/\/doi.org\/10.14778\/2367502.2367562.","journal-title":"Proceedings of the VLDB Endowment"},{"issue":"1","key":"253_CR2","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1145\/1327452.1327492","volume":"51","author":"J Dean","year":"2008","unstructured":"Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107\u201313. https:\/\/doi.org\/10.1145\/1327452.1327492.","journal-title":"Commun ACM"},{"key":"253_CR3","doi-asserted-by":"crossref","unstructured":"Babu S. Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM symposium on Cloud computing. 2010; p. 137\u2013142. http:\/\/dx.doi.org\/10.1145\/1807128.1807150.","DOI":"10.1145\/1807128.1807150"},{"issue":"4","key":"253_CR4","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1145\/2094114.2094118","volume":"40","author":"KH Lee","year":"2012","unstructured":"Lee KH, Lee YJ, Choi H, Chung YD, Moon B. Parallel data processing with MapReduce: a survey. ACM SIGMOD Record. 2012;40(4):11\u201320. https:\/\/doi.org\/10.1145\/2094114.2094118.","journal-title":"ACM SIGMOD Record"},{"key":"253_CR5","doi-asserted-by":"crossref","unstructured":"Bu X, Rao J, Xu CZ. Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing. 2013; p. 227\u2013238. http:\/\/dx.doi.org\/10.1145\/2493123.2462904.","DOI":"10.1145\/2493123.2462904"},{"key":"253_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.jnca.2014.07.022","volume":"46","author":"I Polato","year":"2014","unstructured":"Polato I, R\u00e9 R, Goldman A, Kon F. A comprehensive view of Hadoop research\u2014a systematic literature review. J Netw Comput Appl. 2014;46:1\u201325. https:\/\/doi.org\/10.1016\/j.jnca.2014.07.022.","journal-title":"J Netw Comput Appl"},{"key":"253_CR7","unstructured":"T White (2015) Hadoop: The definitive guide. O\u2019Reilly Media, Inc."},{"key":"253_CR8","doi-asserted-by":"crossref","unstructured":"Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling In Proceedings of the 5th European conference on Computer systems. 2010; p. 265\u2013278. http:\/\/dx.doi.org\/10.1145\/1755913.1755940.","DOI":"10.1145\/1755913.1755940"},{"key":"253_CR9","doi-asserted-by":"publisher","unstructured":"He C, Lu Y, Swanson D. Matchmaking: a new MapReduce scheduling technique. In: IEEE third international conference on cloud computing technology and science (CloudCom), (2011). 2011; p. 40\u201347. https:\/\/doi.org\/10.1109\/cloudcom.2011.16.","DOI":"10.1109\/cloudcom.2011.16"},{"key":"253_CR10","doi-asserted-by":"crossref","unstructured":"Nguyen P, Simon T, Halem M, Chapman D, Le Q. A hybrid scheduling algorithm for data intensive workloads in a MapReduce environment. In: Proceedings of the 2012 IEEE\/ACM fifth international conference on utility and cloud computing. 2012; p 161\u2013167. http:\/\/dx.doi.org\/10.1109\/UCC.2012.32.","DOI":"10.1109\/UCC.2012.32"},{"issue":"1","key":"253_CR11","first-page":"563","volume":"4","author":"AN Nandakumar","year":"2014","unstructured":"Nandakumar AN, Nandita Y. A survey on data mining algorithms on Apache Hadoop platform. Int J Emerg Technol Adv Eng. 2014;4(1):563\u20135.","journal-title":"Int J Emerg Technol Adv Eng"},{"key":"253_CR12","volume-title":"MapReduce design patterns: building effective algorithms and analytics for Hadoop and other systems","author":"D Miner","year":"2012","unstructured":"Miner D, Shook A. MapReduce design patterns: building effective algorithms and analytics for Hadoop and other systems. Sebastopol: O\u2019Reilly Media, Inc.; 2012."},{"key":"253_CR13","volume-title":"Hadoop in practice","author":"A Holmes","year":"2012","unstructured":"Holmes A. Hadoop in practice. Shelter Island: Manning Publications Co.; 2012."},{"issue":"2","key":"253_CR14","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1109\/TPDS.2015.2405552","volume":"27","author":"M Khan","year":"2016","unstructured":"Khan M, Jin Y, Li M, Xiang Y, Jiang C. Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans Parallel Distrib Syst. 2016;27(2):441\u201354. https:\/\/doi.org\/10.1109\/TPDS.2015.2405552.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"253_CR15","doi-asserted-by":"crossref","unstructured":"Wang K, Zhou X, Li T, Zhao D, Lang M, Raicu I. Optimizing load balancing and data-locality with data-aware scheduling. In\u00a02014 IEEE international conference on Big Data (Big Data). 2014; p. 119\u2013128. http:\/\/dx.doi.org\/10.1109\/BigData.2014.7004220.","DOI":"10.1109\/BigData.2014.7004220"},{"issue":"4","key":"253_CR16","first-page":"7","volume":"8","author":"M Zaharia","year":"2008","unstructured":"Zaharia M, Konwinski A, Joseph AD, Katz RH, Stoica I. Improving MapReduce performance in heterogeneous environments. OSDI. 2008;8(4):7.","journal-title":"OSDI"},{"key":"253_CR17","doi-asserted-by":"crossref","unstructured":"Chen Q, Zhang D, Guo M, Deng Q, Guo S. SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In: IEEE 10th international conference on computer and information technology (CIT). 2010; p 2736\u20132743. http:\/\/dx.doi.org\/10.1109\/CIT.2010.458.","DOI":"10.1109\/CIT.2010.458"},{"key":"253_CR18","doi-asserted-by":"crossref","unstructured":"Lei L, Wo T, Hu C. CREST: towards fast speculation of straggler tasks in MapReduce. In: IEEE 8th international conference on e-business engineering (ICEBE). 2011; p. 311\u2013316. http:\/\/dx.doi.org\/10.1109\/ICEBE.2011.37.","DOI":"10.1109\/ICEBE.2011.37"},{"key":"253_CR19","doi-asserted-by":"crossref","unstructured":"Hammoud M, Sakr MF. Locality-aware reduce task scheduling for MapReduce. In: IEEE third international conference on cloud computing technology and science (CloudCom), 2011. 2011; p. 570\u2013576. http:\/\/dx.doi.org\/10.1109\/CloudCom.2011.87.","DOI":"10.1109\/CloudCom.2011.87"},{"key":"253_CR20","doi-asserted-by":"crossref","unstructured":"Ibrahim S, Jin H, Lu L, He B, Antoniu G, Wu S. Maestro: replica-aware map scheduling for MapReduce. In: Proceedings of the 2012 12th IEEE\/ACM international symposium on cluster, cloud and grid computing (ccgrid 2012). 2012; p. 435\u2013442. http:\/\/dx.doi.org\/10.1109\/CCGrid.2012.122.","DOI":"10.1109\/CCGrid.2012.122"},{"issue":"1","key":"253_CR21","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1109\/TNET.2014.2362745","volume":"24","author":"W Wang","year":"2016","unstructured":"Wang W, Zhu K, Ying L, Tan J, Zhang L. Map task scheduling in MapReduce with data locality: throughput and heavy-traffic optimality. IEEE\/ACM Trans Netw. 2016;24(1):190\u2013203. https:\/\/doi.org\/10.1109\/TNET.2014.2362745.","journal-title":"IEEE\/ACM Trans Netw"},{"key":"253_CR22","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1016\/j.future.2018.07.043","volume":"90","author":"NS Naik","year":"2019","unstructured":"Naik NS, Negi A, Tapas Bapu BR, Anitha R. A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst. 2019;90:423\u201334. https:\/\/doi.org\/10.1016\/j.future.2018.07.043.","journal-title":"Future Gener Comput Syst"},{"key":"253_CR23","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2019.01.006","author":"Z Liu","year":"2019","unstructured":"Liu Z, Nath AK, Ding X, Fu H, Khan M, Yu W. Multivariate modeling and two-level scheduling of analytic queries. Parallel Comput. 2019. https:\/\/doi.org\/10.1016\/j.parco.2019.01.006.","journal-title":"Parallel Comput."},{"issue":"2","key":"253_CR24","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1007\/s10723-018-9433-7","volume":"16","author":"XT Tran","year":"2018","unstructured":"Tran XT, Van Do T, Rotter C, Wang D. A new data layout scheme for energy-efficient MapReduce processing tasks. J Grid Comput. 2018;16(2):285\u201398. https:\/\/doi.org\/10.1007\/s10723-018-9433-7.","journal-title":"J Grid Comput"},{"key":"253_CR25","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1016\/j.future.2018.06.035","volume":"90","author":"O Selvitopi","year":"2019","unstructured":"Selvitopi O, Demirci GV, Turk A, Aykanat C. Locality-aware and load-balanced static task scheduling for MapReduce. Future Gener Comput Syst. 2019;90:49\u201361. https:\/\/doi.org\/10.1016\/j.future.2018.06.035.","journal-title":"Future Gener Comput Syst"},{"issue":"4","key":"253_CR26","doi-asserted-by":"publisher","first-page":"3346","DOI":"10.1109\/JSYST.2017.2764481","volume":"12","author":"D Choi","year":"2017","unstructured":"Choi D, Jeon M, Kim N, Lee BD. An enhanced data-locality-aware task scheduling algorithm for hadoop applications. IEEE Syst J. 2017;12(4):3346\u201357. https:\/\/doi.org\/10.1109\/JSYST.2017.2764481.","journal-title":"IEEE Syst J"},{"key":"253_CR27","doi-asserted-by":"publisher","unstructured":"Beaumont O, Lambert T, Marchal L, Thomas B. Data-locality aware dynamic schedulers for independent tasks with replicated inputs. In:\u00a02018 IEEE\u00a0international parallel and distributed processing symposium workshops (IPDPSW). 2018; p. 1206\u20131213. https:\/\/doi.org\/10.1109\/IPDPSW.2018.00187.","DOI":"10.1109\/IPDPSW.2018.00187"},{"issue":"6","key":"253_CR28","doi-asserted-by":"publisher","first-page":"744","DOI":"10.26599\/TST.2018.9010115","volume":"23","author":"J Li","year":"2018","unstructured":"Li J, Wang J, Lyu B, Wu J, Yang X. An improved algorithm for optimizing MapReduce based on locality and overlapping. Tsinghua Sci Technol. 2018;23(6):744\u201353. https:\/\/doi.org\/10.26599\/TST.2018.9010115.","journal-title":"Tsinghua Sci Technol."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-019-0253-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s40537-019-0253-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-019-0253-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,29]],"date-time":"2020-11-29T00:06:05Z","timestamp":1606608365000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-019-0253-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,30]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["253"],"URL":"https:\/\/doi.org\/10.1186\/s40537-019-0253-9","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,30]]},"assertion":[{"value":"21 May 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 September 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 November 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"106"}}