{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T20:15:45Z","timestamp":1771272945440,"version":"3.50.1"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,9,19]],"date-time":"2022-09-19T00:00:00Z","timestamp":1663545600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,9,19]],"date-time":"2022-09-19T00:00:00Z","timestamp":1663545600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cloud Comp"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Scheduling of MapReduce jobs is an integral part of Hadoop and effective job scheduling has a direct impact on Hadoop performance. Data locality is one of the most important factors to be considered in order to improve efficiency, as it affects data transmission through the system. A number of researchers have suggested approaches for improving data locality, but few have considered cache locality. In this paper, we present a state-of-the-art job scheduler, CLQLMRS (Cache Locality with Q-Learning in MapReduce Scheduler) for improving both data locality and cache locality using reinforcement learning. The proposed algorithm is evaluated by various experiments in a heterogeneous environment. Experimental results show significantly decreased execution time compared with FIFO, Delay, and the Adaptive Cache Local scheduler.<\/jats:p>","DOI":"10.1186\/s13677-022-00322-5","type":"journal-article","created":{"date-parts":[[2022,9,19]],"date-time":"2022-09-19T10:02:56Z","timestamp":1663581776000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning"],"prefix":"10.1186","volume":"11","author":[{"given":"Rana","family":"Ghazali","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sahar","family":"Adabi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ali","family":"Rezaee","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Douglas G.","family":"Down","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ali","family":"Movaghar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,9,19]]},"reference":[{"key":"322_CR1","unstructured":"\u201cApache Hadoop\u201d\u00a0http:\/\/hadoop.apache.org\/"},{"key":"322_CR2","unstructured":"\u201cCentralized Cache Management\u201d\u00a0https:\/\/hadoop.apache.org\/docs\/current\/hadoop-project-dist\/hadoop-hdfs\/CentralizedCacheManagement.html"},{"key":"322_CR3","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1016\/j.dcan.2017.07.008","volume":"3","author":"M Usama","year":"2017","unstructured":"Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3:260\u2013273","journal-title":"Digit Commun Netw"},{"key":"322_CR4","doi-asserted-by":"publisher","first-page":"38","DOI":"10.5539\/mas.v13n7p38","volume":"13","author":"AA Abdallat","year":"2019","unstructured":"Abdallat AA, Alahmad AI, Amimi DAA, AlWidian JA (2019) Hadoop MapReduce job scheduling algorithms survey and use cases. Mod Appl Sci 13:38","journal-title":"Mod Appl Sci"},{"key":"322_CR5","doi-asserted-by":"publisher","first-page":"1101","DOI":"10.1016\/j.asej.2020.06.009","volume":"12","author":"K Kalia","year":"2021","unstructured":"Kalia K, Gupta N (2021) Analysis of Hadoop MapReduce scheduling in a heterogeneous environment. Ain Shams Eng J 12:1101\u20131110","journal-title":"Ain Shams Eng J"},{"key":"322_CR6","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1016\/j.future.2022.04.035","volume":"135","author":"Y Kang","year":"2022","unstructured":"Kang Y, Pan L, Liu S (2022) Job scheduling for big data analytical applications in clouds: A taxonomy study. Futur Gener Comput Syst 135:129\u2013145","journal-title":"Futur Gener Comput Syst"},{"key":"322_CR7","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2022.02.021","author":"KL Bawankule","year":"2022","unstructured":"Bawankule KL, Dewang RK, Singh AK (2022) A classification framework for straggler mitigation and management in a heterogeneous Hadoop cluster: A state-of-art survey. J King Saud Univ Comput Inf Sci. https:\/\/doi.org\/10.1016\/j.jksuci.2022.02.021","journal-title":"J King Saud Univ Comput Inf Sci"},{"key":"322_CR8","doi-asserted-by":"publisher","first-page":"3381","DOI":"10.1007\/s10586-021-03339-8","volume":"24","author":"R Ghazali","year":"2021","unstructured":"Ghazali R, Adabi S, Down DG, Movaghar A (2021) A classification of Hadoop job schedulers based on performance optimization approaches. Clust Comput 24:3381\u20133403","journal-title":"Clust Comput"},{"key":"322_CR9","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-019-0253-9","author":"A Gandomi","year":"2019","unstructured":"Gandomi A, Reshadi M, Movaghar A, Khademzadeh A (2019) HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework. J Big Data. https:\/\/doi.org\/10.1186\/s40537-019-0253-9","journal-title":"J Big Data"},{"key":"322_CR10","doi-asserted-by":"crossref","unstructured":"Zhang P, Li C, Zhao Y (2016) An improved task scheduling algorithm based on cache locality and data locality in Hadoop. Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings 0. p 244\u2013249","DOI":"10.1109\/PDCAT.2016.060"},{"key":"322_CR11","doi-asserted-by":"crossref","unstructured":"Pai VS, Aron M, Banga G, Svendsen M, Druschel P, Zwaenepoel W, Nahum E (1998) Locality-aware request distribution in cluster-based network servers. ACM Sigplan Notice 33:205\u2013216. ACM","DOI":"10.1145\/291006.291048"},{"key":"322_CR12","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1145\/2987550.2987553","volume-title":"Proceedings of the 7th ACM Symposium on Cloud Computing, SoCC","author":"A Floratou","year":"2016","unstructured":"Floratou A et al (2016) Adaptive caching in Big SQL using the HDFS cache. Proceedings of the 7th ACM Symposium on Cloud Computing, SoCC. p 321\u2013333. https:\/\/doi.org\/10.1145\/2987550.2987553"},{"key":"322_CR13","doi-asserted-by":"publisher","unstructured":"Zaharia M et al (2010)\u00a0Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems. p 265. https:\/\/doi.org\/10.1145\/1755913.1755940","DOI":"10.1145\/1755913.1755940"},{"key":"322_CR14","doi-asserted-by":"publisher","first-page":"3691","DOI":"10.1007\/s10586-017-0920-6","volume":"20","author":"B Lim","year":"2017","unstructured":"Lim B, Kim JW, Chung YD (2017) CATS: cache-aware task scheduling for Hadoop-based systems. Clust Comput 20:3691\u20133705","journal-title":"Clust Comput"},{"key":"322_CR15","doi-asserted-by":"publisher","unstructured":"Hwang E, Kim H, Nam B, Choi YR (2018) CAVA: Exploring memory locality for big data analytics in virtualized clusters. Proceedings - 18th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID. p 21\u201330. https:\/\/doi.org\/10.1109\/CCGRID.2018.00017","DOI":"10.1109\/CCGRID.2018.00017"},{"key":"322_CR16","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1504\/IJBDI.2018.10009526","volume":"5","author":"VN Sastry","year":"2018","unstructured":"Sastry VN, Negi A, Naik NS (2018) Improving straggler task performance in a heterogeneous MapReduce framework using reinforcement learning. Int J Big Data Intell 5:201","journal-title":"Int J Big Data Intell"},{"key":"322_CR17","first-page":"3311","volume":"12","author":"S Rashmi","year":"2017","unstructured":"Rashmi S, Basu A (2017) Q learning-based workflow scheduling in Hadoop. Int J Appl Eng Res 12:3311\u20133317","journal-title":"Int J Appl Eng Res"},{"key":"322_CR18","doi-asserted-by":"publisher","first-page":"292","DOI":"10.1016\/j.jpdc.2017.05.001","volume":"117","author":"AI Orhean","year":"2018","unstructured":"Orhean AI, Pop F, Raicu I (2018) New scheduling approach using reinforcement learning for heterogeneous distributed systems. J Parallel Distrib Comput 117:292\u2013302","journal-title":"J Parallel Distrib Comput"},{"key":"322_CR19","doi-asserted-by":"publisher","first-page":"361","DOI":"10.1016\/j.future.2020.02.018","volume":"108","author":"D Ding","year":"2020","unstructured":"Ding D et al (2020) Q-learning based dynamic task scheduling for energy-efficient cloud computing. Futur Gener Comput Syst 108:361\u2013371","journal-title":"Futur Gener Comput Syst"},{"key":"322_CR20","doi-asserted-by":"publisher","first-page":"2800","DOI":"10.1007\/s11227-020-03364-1","volume":"77","author":"A Asghari","year":"2021","unstructured":"Asghari A, Sohrabi MK, Yaghmaee F (2021) Task scheduling, resource provisioning, and load balancing on scientific workflows using parallel SARSA reinforcement learning agents and genetic algorithm. J Supercomput 77:2800\u20132828","journal-title":"J Supercomput"},{"key":"322_CR21","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"1998","unstructured":"Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge"},{"key":"322_CR22","doi-asserted-by":"publisher","first-page":"209320","DOI":"10.1109\/ACCESS.2020.3038605","volume":"8","author":"M Naeem","year":"2020","unstructured":"Naeem M, Rizvi STH, Coronato A (2020) A Gentle Introduction to Reinforcement Learning and its Application in Different Fields. IEEE Access 8:209320\u2013209344","journal-title":"IEEE Access"},{"key":"322_CR23","volume-title":"Markov decision processes: discrete stochastic dynamic programming","author":"ML Puterman","year":"2014","unstructured":"Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York"},{"key":"322_CR24","volume-title":"Learning from delayed rewards. Ph.D. Diss","author":"CJCH Watkins","year":"1989","unstructured":"Watkins CJCH (1989) Learning from delayed rewards. Ph.D. Diss. King\u2019s College, Cambridge"},{"key":"322_CR25","doi-asserted-by":"publisher","first-page":"133653","DOI":"10.1109\/ACCESS.2019.2941229","volume":"7","author":"B Jang","year":"2019","unstructured":"Jang B, Kim M, Harerimana G, Kim JW (2019) Q-Learning Algorithms: A Comprehensive Classification and Applications. IEEE Access 7:133653\u2013133667","journal-title":"IEEE Access"},{"key":"322_CR26","volume-title":"University Ca\u2019Foscari of Venice, Dept of Economics Research Paper Series No 15","author":"M Corazza","year":"2015","unstructured":"Corazza M, Sangalli (2015) Q-learning, and SARSA: a comparison between two intelligent stochastic control approaches for financial trading. University Ca\u2019Foscari of Venice, Dept of Economics Research Paper Series No 15"},{"key":"322_CR27","unstructured":"\u201cYarn\u201d https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-yarn\/hadoop-yarn-site\/YARN.html"},{"key":"322_CR28","unstructured":"\u201cGym\u201d https:\/\/gym.openai.com"},{"key":"322_CR29","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2010.5452747","volume-title":"The HiBench Benchmark Suite : Characterization of the MapReduce-Based Data Analysis","author":"S Huang","year":"2014","unstructured":"Huang S, Huang J, Dai J, Xie T, Huang B (2014) The HiBench Benchmark Suite\u202f: Characterization of the MapReduce-Based Data Analysis. https:\/\/doi.org\/10.1109\/ICDEW.2010.5452747"},{"key":"322_CR30","unstructured":"\u201cHibench\u201d https:\/\/github.com\/Intel-bigdata\/HiBench"}],"container-title":["Journal of Cloud Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13677-022-00322-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13677-022-00322-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13677-022-00322-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,19]],"date-time":"2022-09-19T10:12:15Z","timestamp":1663582335000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofcloudcomputing.springeropen.com\/articles\/10.1186\/s13677-022-00322-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,19]]},"references-count":30,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["322"],"URL":"https:\/\/doi.org\/10.1186\/s13677-022-00322-5","relation":{},"ISSN":["2192-113X"],"issn-type":[{"value":"2192-113X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,19]]},"assertion":[{"value":"13 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 August 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 September 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable. This manuscript does not have any individual person data.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"There are no financial or non-financial competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"45"}}