{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,3]],"date-time":"2026-01-03T15:21:11Z","timestamp":1767453671640,"version":"build-2065373602"},"reference-count":31,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2024,5,9]],"date-time":"2024-05-09T00:00:00Z","timestamp":1715212800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012639","name":"Prince Sultan University","doi-asserted-by":"publisher","award":["PSUG2024-031"],"award-info":[{"award-number":["PSUG2024-031"]}],"id":[{"id":"10.13039\/501100012639","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Single-board computers (SBCs) are emerging as an efficient and economical solution for fog and edge computing, providing localized big data processing with lower energy consumption. Newer and faster SBCs deliver improved performance while still maintaining a compact form factor and cost-effectiveness. In recent times, researchers have addressed scheduling issues in Hadoop-based SBC clusters. Despite their potential, traditional Hadoop configurations struggle to optimize performance in heterogeneous SBC clusters due to disparities in computing resources. Consequently, we propose modifications to the scheduling mechanism to address these challenges. In this paper, we leverage the use of node labels introduced in Hadoop 3+ and define a Frugality Index that categorizes and labels SBC nodes based on their physical capabilities, such as CPU, memory, disk space, etc. Next, an adaptive configuration policy modifies the native fair scheduling policy by dynamically adjusting resource allocation in response to workload and cluster conditions. Furthermore, the proposed frugal configuration policy considers prioritizing the reduced tasks based on the Frugality Index to maximize parallelism. To evaluate our proposal, we construct a 13-node SBC cluster and conduct empirical evaluation using the Hadoop CPU and IO intensive microbenchmarks. The results demonstrate significant performance improvements compared to native Hadoop FIFO and capacity schedulers, with execution times 56% and 22% faster than the best_cap and best_fifo scenarios. Our findings underscore the effectiveness of our approach in managing the heterogeneous nature of SBC clusters and optimizing performance across various hardware configurations.<\/jats:p>","DOI":"10.3390\/computation12050096","type":"journal-article","created":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T05:25:47Z","timestamp":1715318747000},"page":"96","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Optimizing Hadoop Scheduling in Single-Board-Computer-Based Heterogeneous Clusters"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7389-519X","authenticated-orcid":false,"given":"Basit","family":"Qureshi","sequence":"first","affiliation":[{"name":"Department of Computer Science, Prince Sultan University, Riyadh 11586, Saudi Arabia"}]}],"member":"1968","published-online":{"date-parts":[[2024,5,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1016\/j.future.2024.02.013","article-title":"Serverless-like Platform for Container-Based YARN Clusters","volume":"155","author":"Enes","year":"2024","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Warade, M., Schneider, J.-G., and Lee, K. (2022). Measuring the Energy and Performance of Scientific Workflows on Low-Power Clusters. Electronics, 11.","DOI":"10.3390\/electronics11111801"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1016\/j.future.2018.06.048","article-title":"Commodity Single Board Computer Clusters and Their Applications","volume":"89","author":"Johnston","year":"2018","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_4","first-page":"989","article-title":"An Efficient Implementation of Mobile Raspberry Pi Hadoop Clusters for Robust and Augmented Computing Performance","volume":"14","author":"Srinivasan","year":"2018","journal-title":"J. Inf. Process. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"108403","DOI":"10.1016\/j.compeleceng.2022.108403","article-title":"The Development of a Low-Cost Big Data Cluster Using Apache Hadoop and Raspberry Pi. A Complete Guide","volume":"104","author":"Neto","year":"2022","journal-title":"Comput. Electr. Eng."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"142551","DOI":"10.1109\/ACCESS.2021.3120660","article-title":"Big Data Processing on Single Board Computer Clusters: Exploring Challenges and Possibilities","volume":"9","author":"Lee","year":"2021","journal-title":"IEEE Access"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lambropoulos, G., Mitropoulos, S., Douligeris, C., and Maglaras, L. (2024). Implementing Virtualization on Single-Board Computers: A Case Study on Edge Computing. Computers, 13.","DOI":"10.3390\/computers13020054"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"55842","DOI":"10.1109\/ACCESS.2022.3176729","article-title":"Optimizing MapReduce Task Scheduling on Virtualized Heterogeneous Environments Using Ant Colony Optimization","volume":"10","author":"Jeyaraj","year":"2022","journal-title":"IEEE Access"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"e5752","DOI":"10.1002\/cpe.5752","article-title":"Novel Data-placement Scheme for Improving the Data Locality of Hadoop in Heterogeneous Environments","volume":"33","author":"Bae","year":"2021","journal-title":"Concurr. Comput."},{"key":"ref_10","unstructured":"Qureshi, B., and Koubaa, A. (2020). Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies, Springer."},{"key":"ref_11","unstructured":"(2024, May 03). Apache Hadoop YARN. Available online: https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-yarn\/hadoop-yarn-site\/YARN.html."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Qureshi, B., and Koubaa, A. (2019). On Energy Efficiency and Performance Evaluation of Single Board Computer Based Clusters: A Hadoop Case Study. Electronics, 8.","DOI":"10.3390\/electronics8020182"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Thesma, V., Rains, G.C., and Mohammadpour Velni, J. (2024). Development of a Low-Cost Distributed Computing Pipeline for High-Throughput Cotton Phenotyping. Sensors, 24.","DOI":"10.3390\/s24030970"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"19955","DOI":"10.1007\/s11356-021-13248-3","article-title":"Agricultural Irrigation Recommendation and Alert (AIRA) System Using Optimization and Machine Learning in Hadoop for Sustainable Agriculture","volume":"29","author":"Veerachamy","year":"2022","journal-title":"Environ. Sci. Pollut. Res."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"012028","DOI":"10.1088\/1742-6596\/2406\/1\/012028","article-title":"Wireless Engine Diagnostic Tool Based on Internet of Things (IoT) With PiOBD-II Using Raspberry on Honda Jazz VTEC","volume":"2406","author":"Setiyawan","year":"2022","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"79","DOI":"10.3390\/iot5010005","article-title":"Development and Assessment of Internet of Things-Driven Smart Home Security and Automation with Voice Commands","volume":"5","author":"Netinant","year":"2024","journal-title":"IoT"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Chen, I.-T., Tsai, J.-M., Chen, Y.-T., and Lee, C.-H. (2022). Lightweight Mutual Authentication for Healthcare IoT. Sustainability, 14.","DOI":"10.3390\/su142013411"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1016\/j.future.2019.07.040","article-title":"Performance Analysis of Single Board Computer Clusters","volume":"102","author":"Basford","year":"2020","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_19","unstructured":"Lim, S., and Park, D. Improving Hadoop Mapreduce Performance on Heterogeneous Single Board Computer Clusters."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"012070","DOI":"10.1088\/1742-6596\/1517\/1\/012070","article-title":"Designing Parallel Computing Using Raspberry Pi Clusters for IoT Servers on Apache Hadoop","volume":"1517","author":"Nugroho","year":"2020","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"156","DOI":"10.4018\/JITR.20201001.oa1","article-title":"Modelling Virtual Machine Workload in Heterogeneous Cloud Computing Platforms","volume":"13","author":"Fati","year":"2020","journal-title":"J. Inf. Technol. Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2879","DOI":"10.1109\/TPDS.2019.2923197","article-title":"Workload-Adaptive Configuration Tuning for Hierarchical Cloud Schedulers","volume":"30","author":"Han","year":"2019","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2906","DOI":"10.1109\/TPDS.2021.3080582","article-title":"RENDA: Resource and Network Aware Data Placement Algorithm for Periodic Workloads in Cloud","volume":"32","author":"Thakkar","year":"2021","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Han, T., and Yu, W. (2023, January 21\u201324). A Review of Hadoop Resource Scheduling Research. Proceedings of the 2023 8th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), Virtually.","DOI":"10.1109\/ICIIBMS60103.2023.10347841"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1158","DOI":"10.1109\/TCC.2019.2894779","article-title":"New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters","volume":"9","author":"Yao","year":"2021","journal-title":"IEEE Trans. Cloud Comput."},{"key":"ref_26","first-page":"1","article-title":"Load Balancing Algorithms for Hadoop Cluster in Unbalanced Environment","volume":"2022","author":"Fu","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Singh, A., Sandhu, R., Mehta, S., Giri, N.C., Kuziakin, O., Leliuk, S., Saprykin, R., and Dobrozhan, A. (2023, January 2\u20136). A Comparative Study of Bigdata Tools: Hadoop Vs Spark Vs Storm. Proceedings of the 2023 IEEE 4th KhPI Week on Advanced Technology (KhPIWeek), Kharkiv, Ukraine.","DOI":"10.1109\/KhPIWeek61412.2023.10311577"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"105578","DOI":"10.1109\/ACCESS.2023.3318553","article-title":"MTD-DHJS: Makespan-Optimized Task Scheduling Algorithm for Cloud Computing With Dynamic Computational Time Prediction","volume":"11","author":"Banerjee","year":"2023","journal-title":"IEEE Access"},{"key":"ref_29","first-page":"101973","article-title":"IDaPS\u2014Improved Data-Locality Aware Data Placement Strategy Based on Markov Clustering to Enhance MapReduce Performance on Hadoop","volume":"36","author":"Vengadeswaran","year":"2024","journal-title":"J. King Saud. Univ. Comput. Inf. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1186\/s40537-021-00499-7","article-title":"A Parallelization Model for Performance Characterization of Spark Big Data Jobs on Hadoop Clusters","volume":"8","author":"Ahmed","year":"2021","journal-title":"J. Big Data"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1016\/j.jpdc.2020.03.010","article-title":"Dynamic Memory-Aware Scheduling in Spark Computing Environment","volume":"141","author":"Tang","year":"2020","journal-title":"J. Parallel Distrib. Comput."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/5\/96\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:43:04Z","timestamp":1760107384000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/5\/96"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,9]]},"references-count":31,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2024,5]]}},"alternative-id":["computation12050096"],"URL":"https:\/\/doi.org\/10.3390\/computation12050096","relation":{},"ISSN":["2079-3197"],"issn-type":[{"type":"electronic","value":"2079-3197"}],"subject":[],"published":{"date-parts":[[2024,5,9]]}}}