{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:44:35Z","timestamp":1760147075351,"version":"build-2065373602"},"reference-count":54,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,4]],"date-time":"2023-01-04T00:00:00Z","timestamp":1672790400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"enRichMyData","award":["HE 101070284","H2020 101016835","NFR 309691"],"award-info":[{"award-number":["HE 101070284","H2020 101016835","NFR 309691"]}]},{"name":"DataCloud","award":["HE 101070284","H2020 101016835","NFR 309691"],"award-info":[{"award-number":["HE 101070284","H2020 101016835","NFR 309691"]}]},{"name":"BigDataMine","award":["HE 101070284","H2020 101016835","NFR 309691"],"award-info":[{"award-number":["HE 101070284","H2020 101016835","NFR 309691"]}]},{"name":"SINTEF SEP-DataPipes","award":["HE 101070284","H2020 101016835","NFR 309691"],"award-info":[{"award-number":["HE 101070284","H2020 101016835","NFR 309691"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Big data pipelines are developed to process data characterized by one or more of the three big data features, commonly known as the three Vs (volume, velocity, and variety), through a series of steps (e.g., extract, transform, and move), making the ground work for the use of advanced analytics and ML\/AI techniques. Computing continuum (i.e., cloud\/fog\/edge) allows access to virtually infinite amount of resources, where data pipelines could be executed at scale; however, the implementation of data pipelines on the continuum is a complex task that needs to take computing resources, data transmission channels, triggers, data transfer methods, integration of message queues, etc., into account. The task becomes even more challenging when data storage is considered as part of the data pipelines. Local storage is expensive, hard to maintain, and comes with several challenges (e.g., data availability, data security, and backup). The use of cloud storage, i.e., storage-as-a-service (StaaS), instead of local storage has the potential of providing more flexibility in terms of scalability, fault tolerance, and availability. In this article, we propose a generic approach to integrate StaaS with data pipelines, i.e., computation on an on-premise server or on a specific cloud, but integration with StaaS, and develop a ranking method for available storage options based on five key parameters: cost, proximity, network performance, server-side encryption, and user weights\/preferences. The evaluation carried out demonstrates the effectiveness of the proposed approach in terms of data transfer performance, utility of the individual parameters, and feasibility of dynamic selection of a storage option based on four primary user scenarios.<\/jats:p>","DOI":"10.3390\/s23020564","type":"journal-article","created":{"date-parts":[[2023,1,4]],"date-time":"2023-01-04T03:27:44Z","timestamp":1672802864000},"page":"564","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Smart Data Placement Using Storage-as-a-Service Model for Big Data Pipelines"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8346-780X","authenticated-orcid":false,"given":"Akif Quddus","family":"Khan","sequence":"first","affiliation":[{"name":"Department of Computer Science, Norwegian University of Science and Technology\u2014NTNU, 2815 Gj\u00f8vik, Norway"}]},{"given":"Nikolay","family":"Nikolov","sequence":"additional","affiliation":[{"name":"SINTEF Digital, SINTEF AS, 0373 Oslo, Norway"}]},{"given":"Mihhail","family":"Matskin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, KTH Royal Institute of Technology, 114 28 Stockholm, Sweden"}]},{"given":"Radu","family":"Prodan","sequence":"additional","affiliation":[{"name":"Department of Information Technology, University of Klagenfurt, 9020 Klagenfurt, Austria"}]},{"given":"Dumitru","family":"Roman","sequence":"additional","affiliation":[{"name":"SINTEF Digital, SINTEF AS, 0373 Oslo, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2687-3419","authenticated-orcid":false,"given":"Bekir","family":"Sahin","sequence":"additional","affiliation":[{"name":"Logistics Management, National University of Science and Technology, 111 Sohar, Oman"}]},{"given":"Christoph","family":"Bussler","sequence":"additional","affiliation":[{"name":"Robert Bosch LLC, Sunnyvale, CA 94085, USA"}]},{"given":"Ahmet","family":"Soylu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, OsloMet\u2014Oslo Metropolitan University, 0167 Oslo, Norway"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,4]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3332301","article-title":"Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions","volume":"52","author":"Barika","year":"2019","journal-title":"ACM Comput. Surv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1016\/j.sysarc.2019.02.009","article-title":"All one needs to know about fog computing and related edge computing paradigms: A complete survey","volume":"98","author":"Yousefpour","year":"2019","journal-title":"J. Syst. Archit."},{"key":"ref_3","unstructured":"Robinson, S., and Ferguson, R. (2012). The storage and transfer challenges of big data. MIT Sloan Manag. Rev., 7, Available online: https:\/\/sloanreview.mit.edu\/article\/the-storage-and-transfer-challenges-of-big-data\/."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3136623","article-title":"Data Storage Management in Cloud Environments: Taxonomy, Survey, and Future Directions","volume":"50","author":"Mansouri","year":"2017","journal-title":"ACM Comput. Surv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1080\/17538947.2013.769783","article-title":"Redefining the possibility of digital Earth and geosciences with spatial cloud computing","volume":"6","author":"Yang","year":"2013","journal-title":"Int. J. Digit. Earth"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3241737","article-title":"A Manifesto for Future Generation Cloud Computing: Research Directions for the Next Decade","volume":"51","author":"Buyya","year":"2018","journal-title":"ACM Comput. Surv."},{"key":"ref_7","first-page":"2218","article-title":"Big data storage and challenges","volume":"5","author":"Padgavankar","year":"2014","journal-title":"Int. J. Comput. Sci. Inf. Technol."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Khan Quddus, A., Nikolov, N., Matskin, M., Prodan, R., Song, H., Roman, D., and Soylu, A. (2022, January 6\u20139). Smart Data Placement for Big Data Pipelines: An Approach based on the Storage-as-a-Service Model. Proceedings of the UCC 2022, Vancouver, WA, USA.","DOI":"10.1109\/UCC56403.2022.00056"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Khan, A.Q. (2022). Smart Data Placement for Big Data Pipelines with Storage-as-a-Service Integration. [Master\u2019s Thesis, Norwegian University of Science and Technology].","DOI":"10.1109\/UCC56403.2022.00056"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/MCOM.2019.1800640","article-title":"Crowd Management: A New Challenge for Urban Big Data Analytics","volume":"57","author":"Celes","year":"2019","journal-title":"IEEE Commun. Mag."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/IOTM.0011.2000071","article-title":"Inferring Latent Patterns in Air Quality from Urban Big Data","volume":"4","author":"De","year":"2021","journal-title":"IEEE Internet Things Mag."},{"key":"ref_12","first-page":"1706","article-title":"Edge of things: The big picture on the integration of edge, IoT and the cloud in a distributed computing environment","volume":"6","author":"Sankar","year":"2017","journal-title":"IEEE Access"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/MIC.2021.3050613","article-title":"Cloud, Fog, or Edge: Where to Compute?","volume":"25","author":"Kimovski","year":"2021","journal-title":"IEEE Internet Comput."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1109\/MCOM.2017.1700120","article-title":"Bringing computation closer toward the user network: Is edge computing the solution?","volume":"55","author":"Ahmed","year":"2017","journal-title":"IEEE Commun. Mag."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1109\/MC.2022.3154148","article-title":"Big Data Pipelines on the Computing Continuum: Tapping the Dark Data","volume":"55","author":"Roman","year":"2022","journal-title":"IEEE Internet Comput."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1109\/MCOM.2018.1701095","article-title":"When Mobile Blockchain Meets Edge Computing","volume":"56","author":"Xiong","year":"2018","journal-title":"IEEE Commun. Mag."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Corodescu, A.A., Nikolov, N., Khan, A.Q., Soylu, A., Matskin, M., Payberah, A.H., and Roman, D. (2021). Big data workflows: Locality-aware orchestration using software containers. Sensors, 21.","DOI":"10.3390\/s21248212"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"100440","DOI":"10.1016\/j.iot.2021.100440","article-title":"Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers","volume":"16","author":"Nikolov","year":"2021","journal-title":"Internet Things"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Elshater, Y., Martin, P., Rope, D., McRoberts, M., and Statchuk, C. (July, January 27). A Study of Data Locality in YARN. Proceedings of the 2015 IEEE International Congress on Big Data, New York, NY, USA.","DOI":"10.1109\/BigDataCongress.2015.33"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Renner, T., Thamsen, L., and Kao, O. (2016, January 5\u20138). CoLoc: Distributed data and container colocation for data-intensive applications. Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA.","DOI":"10.1109\/BigData.2016.7840954"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1016\/j.future.2018.07.043","article-title":"A data locality based scheduler to enhance MapReduce performance in heterogeneous environments","volume":"90","author":"Naik","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Fei, X., Raicu, I., and Lu, S. (2011, January 10\u201312). Opportunities and Challenges in Running Scientific Workflows on the Cloud. Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Beijing, China.","DOI":"10.1109\/CyberC.2011.80"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Deelman, E., Singh, G., Livny, M., Berriman, B., and Good, J. (2008, January 15\u201321). The cost of doing science on the cloud: The montage example. Proceedings of the SC \u201908: Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing, Austin, TX, USA.","DOI":"10.1109\/SC.2008.5217932"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1109\/TPDS.2011.66","article-title":"Performance analysis of cloud computing services for many-tasks scientific computing","volume":"22","author":"Iosup","year":"2011","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-13-77","article-title":"Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support","volume":"13","author":"Abouelhoda","year":"2012","journal-title":"BMC Bioinform."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1630","DOI":"10.1016\/j.procs.2012.04.179","article-title":"Early cloud experiences with the kepler scientific workflow system","volume":"9","author":"Wang","year":"2012","journal-title":"Procedia Comput. Sci."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.bdr.2019.02.002","article-title":"Towards hybrid multi-cloud storage systems: Understanding how to perform data transfer","volume":"16","author":"Celesti","year":"2019","journal-title":"Big Data Res."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Liu, W., and Song, J. (2012, January 16\u201320). A novel solution of distributed file storage for cloud service. Proceedings of the 2012 IEEE 36th Annual Computer Software and Applications Conference Workshops, Izmir, Turkey.","DOI":"10.1109\/COMPSACW.2012.15"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1016\/j.future.2010.02.004","article-title":"A data placement strategy in scientific cloud workflows","volume":"26","author":"Yuan","year":"2010","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1016\/j.bdr.2014.07.002","article-title":"A dynamic data placement strategy for hadoop in heterogeneous environments","volume":"1","author":"Lee","year":"2014","journal-title":"Big Data Res."},{"key":"ref_31","first-page":"28","article-title":"An improved data placement strategy for Hadoop","volume":"1","year":"2012","journal-title":"J. South China Univ. Technol. (Nat. Sci. Ed.)"},{"key":"ref_32","unstructured":"Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., and Qin, X. (2010, January 19\u201323). Improving mapreduce performance through data placement in heterogeneous hadoop clusters. Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), Atlanta, GA, USA."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Er-Dun, Z., Yong-Qiang, Q., Xing-Xing, X., and Yi, C. (2012, January 17\u201318). A data placement strategy based on genetic algorithm for scientific workflows. Proceedings of the 2012 Eighth International Conference on Computational Intelligence and Security, Guangzhou, China.","DOI":"10.1109\/CIS.2012.40"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Milani, O.H., Motamedi, S.A., Sharifian, S., and Nazari-Heris, M. (2021). Intelligent Service Selection in a Multi-Dimensional Environment of Cloud Providers for Internet of Things Stream Data through Cloudlets. Energies, 14.","DOI":"10.3390\/en14248601"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"484","DOI":"10.18421\/TEM92-09","article-title":"Cloud service selection as a fuzzy multi-criteria problem","volume":"9","author":"Ilieva","year":"2020","journal-title":"TEM J."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1109\/CC.2018.8290808","article-title":"HASG: Security and efficient frame for accessing cloud storage","volume":"15","author":"Liu","year":"2018","journal-title":"China Commun."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1406","DOI":"10.1587\/transcom.2016EBP3403","article-title":"Cloud provider selection models for cloud storage services to satisfy availability requirements","volume":"E100.B","author":"Oki","year":"2017","journal-title":"IEICE Trans. Commun."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1007\/s00521-016-2364-y","article-title":"Multi-datacenter cloud storage service selection strategy based on AHP and backward cloud generator model","volume":"29","author":"Xiahou","year":"2018","journal-title":"Neural Comput. Appl."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhao, P., Shang, J., Lin, J., Li, B., and Sun, X. (2019, January 16\u201318). A dynamic convergent replica selection strategy based on cloud storage. Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM), Dublin, Ireland.","DOI":"10.1109\/AIAM48774.2019.00100"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1109\/MITP.2012.84","article-title":"What\u2019s Special about Cloud Security?","volume":"14","author":"Mell","year":"2012","journal-title":"IT Prof."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1109\/TCC.2017.2754484","article-title":"ODDS: Optimizing Data-Locality Access for Scientific Data Analysis","volume":"8","author":"Wang","year":"2020","journal-title":"IEEE Trans. Cloud Comput."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Toledo, K., Breitgand, D., Lorenz, D., and Keslassy, I. (2022, January 13\u201316). CloudPilot: Flow Acceleration in the Cloud. Proceedings of the 2022 IFIP Networking Conference (IFIP Networking), Catania, Italy.","DOI":"10.23919\/IFIPNetworking55013.2022.9829802"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Sahin, B., Yip, T.L., Tseng, P.-H., Kabak, M., and Soylu, A. (2020). An Application of a Fuzzy TOPSIS Multi-Criteria Decision Analysis Algorithm for Dry Bulk Carrier Selection. Information, 11.","DOI":"10.3390\/info11050251"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1016\/j.renene.2020.04.137","article-title":"A review of multi-criteria decision making applications for renewable energy site selection","volume":"157","author":"Shao","year":"2020","journal-title":"Renew. Energy"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Ishizaka, A., and Nemery, P. (2013). Multi-Criteria Decision Analysis: Methods and Software, John Wiley & Sons.","DOI":"10.1002\/9781118644898"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/j.omega.2018.07.004","article-title":"Generalised framework for multi-criteria method selection","volume":"86","author":"Jankowski","year":"2019","journal-title":"Omega"},{"key":"ref_47","unstructured":"Opricovi\u0107, S. (1998). Multicriteria Optimization of Civil Engineering Systems. [Ph.D. Thesis, Faculty of Civil Engineering, University of Belgrade]."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1016\/S0377-2217(03)00020-1","article-title":"Compromise solution by MCDM methods: A comparative analysis of VIKOR and TOPSIS","volume":"156","author":"Opricovic","year":"2004","journal-title":"Eur. J. Oper. Res."},{"key":"ref_49","first-page":"126","article-title":"Green supplier selection of a textile manufacturer: A hybrid approach based on AHP and VIKOR","volume":"7","author":"Billur","year":"2019","journal-title":"MANAS J. Eng."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"106793","DOI":"10.1016\/j.knosys.2021.106793","article-title":"Group decision-making based on complex spherical fuzzy VIKOR approach","volume":"216","author":"Akram","year":"2021","journal-title":"Knowl.-Based Syst."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4102\/jtscm.v10i1.230","article-title":"Fuzzy VIKOR approach for selection of big data analyst in procurement management","volume":"10","author":"Bag","year":"2016","journal-title":"J. Transp. Supply Chain Manag."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Kazan\u00e7o\u011flu, Y., Sa\u011fnak, M., Lafc\u0131, \u00c7., Luthra, S., Kumar, A., and Ta\u00e7o\u011flu, C. (2021). Big data-enabled solutions framework to overcoming the barriers to circular economy initiatives in healthcare sector. Int. J. Environ. Res. Public Health, 18.","DOI":"10.3390\/ijerph18147513"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Rezaee, S., Sadeghi-Niaraki, A., Shakeri, M., and Choi, S.M. (2021). Personalized Augmented Reality Based Tourism System: Big Data and User Demographic Contexts. Appl. Sci., 11.","DOI":"10.3390\/app11136047"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.cie.2019.01.051","article-title":"Improved decisions for marketing, supply and purchasing: Mining big data through an integration of sentiment analysis and intuitionistic fuzzy multi criteria assessment","volume":"129","author":"Balaman","year":"2019","journal-title":"Comput. Ind. Eng."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/2\/564\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T17:58:18Z","timestamp":1760119098000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/2\/564"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,4]]},"references-count":54,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["s23020564"],"URL":"https:\/\/doi.org\/10.3390\/s23020564","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,1,4]]}}}