{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T00:11:41Z","timestamp":1777421501587,"version":"3.51.4"},"reference-count":47,"publisher":"MDPI AG","issue":"24","license":[{"start":{"date-parts":[[2021,12,8]],"date-time":"2021-12-08T00:00:00Z","timestamp":1638921600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["101016835"],"award-info":[{"award-number":["101016835"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005416","name":"The Research Council of Norway","doi-asserted-by":"publisher","award":["309691"],"award-info":[{"award-number":["309691"]}],"id":[{"id":"10.13039\/501100005416","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution.<\/jats:p>","DOI":"10.3390\/s21248212","type":"journal-article","created":{"date-parts":[[2021,12,8]],"date-time":"2021-12-08T23:30:00Z","timestamp":1639006200000},"page":"8212","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Big Data Workflows: Locality-Aware Orchestration Using Software Containers"],"prefix":"10.3390","volume":"21","author":[{"given":"Andrei-Alin","family":"Corodescu","sequence":"first","affiliation":[{"name":"Department of Informatics, University of Oslo, 0373 Oslo, Norway"}]},{"given":"Nikolay","family":"Nikolov","sequence":"additional","affiliation":[{"name":"SINTEF AS, Software and Service Innovation, 0373 Oslo, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8346-780X","authenticated-orcid":false,"given":"Akif Quddus","family":"Khan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Norwegian University of Science and Technology, 2815 Gj\u00f8vik, Norway"}]},{"given":"Ahmet","family":"Soylu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, OsloMet\u2014Oslo Metropolitan University, 0166 Oslo, Norway"}]},{"given":"Mihhail","family":"Matskin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, KTH Royal Institute of Technology, 114 28 Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2748-8929","authenticated-orcid":false,"given":"Amir H.","family":"Payberah","sequence":"additional","affiliation":[{"name":"Department of Computer Science, KTH Royal Institute of Technology, 114 28 Stockholm, Sweden"}]},{"given":"Dumitru","family":"Roman","sequence":"additional","affiliation":[{"name":"SINTEF AS, Software and Service Innovation, 0373 Oslo, Norway"}]}],"member":"1968","published-online":{"date-parts":[[2021,12,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ashabi, A., Sahibuddin, S.B., and Haghighi, M.S. (2020, January 18\u201319). Big Data: Current Challenges and Future Scope. Proceedings of the IEEE 10th Symposium on Computer Applications & Industrial Electronics (ISCAIE 2020), Penang, Malaysia.","DOI":"10.1109\/ISCAIE47305.2020.9108826"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/MCC.2017.55","article-title":"Orchestrating BigData Analysis Workflows","volume":"4","author":"Ranjan","year":"2017","journal-title":"IEEE Cloud Comput."},{"key":"ref_3","first-page":"95:1","article-title":"Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions","volume":"52","author":"Barika","year":"2019","journal-title":"ACM Comput. Surv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhou, B., Svetashova, Y., Pychynski, T., Baimuratov, I., Soylu, A., and Kharlamov, E. (2020, January 19\u201323). SemFE: Facilitating ML Pipeline Development with Semantics. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM 2020), Online.","DOI":"10.1145\/3340531.3417436"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1016\/j.compind.2017.10.001","article-title":"Everything as a resource: Foundations and illustration through Internet-of-things","volume":"94","author":"Baker","year":"2018","journal-title":"Comput. Ind."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Maamar, Z., Cheikhrouhou, S., Asim, M., Qamar, A., Baker, T., and Ugljanin, E. (2019, January 15\u201319). Towards a Resource-aware Thing Composition Approach. Proceedings of the 17th International Conference on High Performance Computing & Simulation (HPCS 2019), Dublin, Ireland.","DOI":"10.1109\/HPCS48598.2019.9188186"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/MIC.2021.3050613","article-title":"Cloud, Fog or Edge: Where to Compute?","volume":"25","author":"Kimovski","year":"2021","journal-title":"IEEE Internet Comput."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1016\/j.future.2019.02.050","article-title":"Edge computing: A survey","volume":"97","author":"Khan","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Corodescu, A.A., Nikolov, N., Khan, A.Q., Soylu, A., Matskin, M., Payberah, A.H., and Roman, D. (2021, January 1\u20133). Locality-Aware Workflow Orchestration for Big Data. Proceedings of the 13th International Conference on Management of Digital EcoSystems (MEDES\u201921), Hammamet, Tunisia.","DOI":"10.1145\/3444757.3485106"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Roman, D., Alexiev, V., Paniagua, J., Elves\u00e6ter, B., von Zernichow, B.M., Soylu, A., Simeonov, B., and Taggart, C. (2021). The euBusinessGraph ontology: A&nbsp;lightweight ontology for harmonizing basic company information. Semant. Web, 1\u201328. in press.","DOI":"10.3233\/SW-210424"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Soylu, A., Corcho, O., Elves\u00e6ter, B., Badenes-Olmedo, C., Blount, T., Yedro Mart\u00ednez, F., Kovacic, M., Posinkovic, M., Makgill, I., and Taggart, C. (2021). TheyBuyForYou platform and knowledge graph: Expanding horizons in public procurement with open linked data. Semant. Web, 1\u201327. in press.","DOI":"10.3233\/SW-210442"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Nikolov, N., Dessalk, Y.D., Khan, A.Q., Soylu, A., Matskin, M., Payberah, A.H., and Roman, D. (2021). Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers. Internet Things, in press.","DOI":"10.1016\/j.iot.2021.100440"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1159","DOI":"10.1177\/1094342019877383","article-title":"Towards a computing continuum: Enabling edge-to-cloud integration for data-driven workflows","volume":"33","author":"Renart","year":"2019","journal-title":"Int. J. High Perform. Comput. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1109\/MIC.2017.26","article-title":"Challenges and Software Architecture for Fog Computing","volume":"21","author":"Hao","year":"2017","journal-title":"IEEE Internet Comput."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1109\/MCC.2014.51","article-title":"Containers and Cloud: From LXC to Docker to Kubernetes","volume":"1","author":"Bernstein","year":"2014","journal-title":"IEEE Cloud Comput."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Felter, W., Ferreira, A., Rajamony, R., and Rubio, J. (2015, January 29\u201331). An updated performance comparison of virtual machines and Linux containers. Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2015), Philadelphia, PA, USA.","DOI":"10.1109\/ISPASS.2015.7095802"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1109\/TCC.2017.2702586","article-title":"Cloud Container Technologies: A State-of-the-Art Review","volume":"7","author":"Pahl","year":"2017","journal-title":"IEEE Trans. Cloud Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jss.2017.01.001","article-title":"Understanding cloud-native applications after 10 years of cloud computing\u2014A systematic mapping study","volume":"126","author":"Kratzke","year":"2017","journal-title":"J. Syst. Softw."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Celesti, A., Mulfari, D., Fazio, M., Villari, M., and Puliafito, A. (2016, January 18\u201320). Exploring Container Virtualization in IoT Clouds. Proceedings of the IEEE International Conference on Smart Computing (SMARTCOMP 2016), St. Louis, MO, USA.","DOI":"10.1109\/SMARTCOMP.2016.7501691"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Bellavista, P., and Zanni, A. (2017, January 5\u20137). Feasibility of Fog Computing Deployment based on Docker Containerization over RaspberryPi. Proceedings of the 18th International Conference on Distributed Computing and Networking (ICDCN 2017), Hyderabad, India.","DOI":"10.1145\/3007748.3007777"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ismail, B.I., Goortani, E.M., Karim, M.B.A., Tat, W.M., Setapa, S., Luke, J.Y., and Hoe, O.H. (2015, January 24\u201326). Evaluation of Docker as Edge computing platform. Proceedings of the IEEE Conference on Open Systems (ICOS 2015), Melaka, Malaysia.","DOI":"10.1109\/ICOS.2015.7377291"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bhimani, J., Yang, Z., Leeser, M., and Mi, N. (2017, January 12\u201314). Accelerating big data applications using lightweight virtualization framework on enterprise cloud. Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC 2017), Waltham, MA, USA.","DOI":"10.1109\/HPEC.2017.8091086"},{"key":"ref_23","first-page":"76","article-title":"The Design and Architecture of Microservices","volume":"3","author":"Sill","year":"2016","journal-title":"IEEE Cloud Comput."},{"key":"ref_24","first-page":"6","article-title":"Practical Use of Microservices in Moving Workloads to the Cloud","volume":"3","author":"Linthicum","year":"2016","journal-title":"IEEE Cloud Comput."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1109\/TCC.2017.2754484","article-title":"ODDS: Optimizing Data-Locality Access for Scientific Data Analysis","volume":"8","author":"Wang","year":"2020","journal-title":"IEEE Trans. Cloud Comput."},{"key":"ref_26","first-page":"227","article-title":"Survey on RDMA-Based Distributed Storage Systems","volume":"56","author":"Youmin","year":"2019","journal-title":"J. Comput. Res. Dev."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Elshater, Y., Martin, P., Rope, D., McRoberts, M., and Statchuk, C. (July, January 27). A Study of Data Locality in YARN. Proceedings of the IEEE International Conference on Big Data (Big Data 2015), New York, NY, USA.","DOI":"10.1109\/BigDataCongress.2015.33"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Renner, T., Thamsen, L., and Kao, O. (2016, January 5\u20138). CoLoc: Distributed data and container colocation for data-intensive applications. Proceedings of the IEEE International Conference on Big Data (Big Data 2016), Washington, DC, USA.","DOI":"10.1109\/BigData.2016.7840954"},{"key":"ref_29","unstructured":"Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., and Stoica, I. (2010, January 22\u201325). Spark: Cluster Computing with Working Sets. Proceedings of the 2nd USENIX conference on Hot topics in cloud computing (HotCloud 2010) USENIX, Boston, MA, USA."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1016\/j.future.2018.07.043","article-title":"A data locality based scheduler to enhance MapReduce performance in heterogeneous environments","volume":"90","author":"Naik","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1109\/TCC.2018.2794344","article-title":"Locality-Aware Scheduling for Containers in Cloud Computing","volume":"8","author":"Zhao","year":"2020","journal-title":"IEEE Trans. Cloud Comput."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bourhim, E.H., Elbiaze, H., and Dieye, M. (2019, January 21\u201325). Inter-container Communication Aware Container Placement in Fog Computing. Proceedings of the 15th International Conference on Network and Service Management (CNSM 2019), Halifax, NS, Canada.","DOI":"10.23919\/CNSM46954.2019.9012671"},{"key":"ref_33","unstructured":"Abranches, M., Goodarzy, S., Nazari, M., Mishra, S., and Keller, E. (2019, January 9). Shimmy: Shared Memory Channels for High Performance Inter-Container Communication. Proceedings of the Workshop on Hot Topics in Edge Computing (HotEdge 2019) USENIX, Renton, WA, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zheng, C., and Thain, D. (2015, January 15). Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker. Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC 2015), Portland, OR, USA.","DOI":"10.1145\/2755979.2755984"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Hayot-Sasson, V., Brown, S.T., and Glatard, T. (2019, January 14\u201317). Performance Evaluation of Big Data Processing Strategies for Neuroimaging. Proceedings of the 19th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2019), Larnaca, Cyprus.","DOI":"10.1109\/CCGRID.2019.00059"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.future.2015.04.006","article-title":"Locality and loading aware virtual machine mapping techniques for optimizing communications in MapReduce applications","volume":"53","author":"Hsu","year":"2015","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Ernstsson, A., and Kessler, C. (2019). Extending smart containers for data locality-aware skeleton programming. Concurr. Comput. Pract. Exp., 31.","DOI":"10.1002\/cpe.5003"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bu, X., Rao, J., and Xu, C.Z. (2013, January 17\u201321). Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. Proceedings of the 22nd International Symposium on High-performance Parallel and Distributed Computing (HPDC 2013), New York, NY, USA.","DOI":"10.1145\/2462902.2462904"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1128","DOI":"10.1007\/s10766-016-0463-0","article-title":"Data-locality aware scientific workflow scheduling methods in HPC cloud environments","volume":"45","author":"Choi","year":"2017","journal-title":"Int. J. Parallel Program."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1444","DOI":"10.1109\/TNET.2013.2294111","article-title":"Video-aware scheduling and caching in the radio access network","volume":"22","author":"Ahlehagh","year":"2014","journal-title":"IEEE\/ACM Trans. Netw."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Gu, J., Wang, W., Huang, A., and Shan, H. (2013, January 8\u201311). Proactive storage at caching-enable base stations in cellular networks. Proceedings of the 24th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC 2013), London, UK.","DOI":"10.1109\/PIMRC.2013.6666387"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"15","DOI":"10.4018\/IJACI.2018070102","article-title":"An optimal data placement strategy for improving system performance of massive data applications using graph clustering","volume":"9","author":"Vengadeswaran","year":"2018","journal-title":"Int. J. Ambient Comput. Intell. (IJACI)"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"33","DOI":"10.12688\/f1000research.29032.2","article-title":"Sustainable data analysis with Snakemake","volume":"10","author":"Jablonski","year":"2021","journal-title":"F1000Research"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Albrecht, M., Donnelly, P., Bui, P., and Thain, D. (2012, January 20). Makeflow: A portable abstraction for data intensive computing on clusters, clouds, and grids. Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies (SWEET 2012), Scottsdale, AZ, USA.","DOI":"10.1145\/2443416.2443417"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Dessalk, Y.D., Nikolov, N., Matskin, M., Soylu, A., and Roman, D. (2020, January 2\u20134). Scalable Execution of Big Data Workflows using Software Containers. Proceedings of the 12th International Conference on Management of Digital EcoSystems (MEDES 2020), Online.","DOI":"10.1145\/3415958.3433082"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Mitchell, R., Pottier, L., Jacobs, S., Silva, R.F.d., Rynge, M., Vahi, K., and Deelman, E. (2019, January 9\u201312). Exploration of Workflow Management Systems Emerging Features from Users Perspectives. Proceedings of the IEEE International Conference on Big Data (Big Data 2019), Los Angeles, CA, USA.","DOI":"10.1109\/BigData47090.2019.9005494"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Martin, P. (2021). Multi-container Pod Design Patterns. Kubernetes: Preparing for the CKA and CKAD Certifications, Apress.","DOI":"10.1007\/978-1-4842-6494-2"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/24\/8212\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:43:32Z","timestamp":1760168612000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/24\/8212"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,8]]},"references-count":47,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["s21248212"],"URL":"https:\/\/doi.org\/10.3390\/s21248212","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,8]]}}}