{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T20:41:46Z","timestamp":1775162506243,"version":"3.50.1"},"reference-count":51,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T00:00:00Z","timestamp":1704672000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T00:00:00Z","timestamp":1704672000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cloud Comp"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In big data, analysis data is collected from different sources in various formats, transforming into the aspect of cleansing the data, customization, and loading it into a Data Warehouse. Extracting data in other formats and transforming it to the required format requires transformation algorithms. This transformation stage has redundancy issues and is stored across any location in the data warehouse, which increases computation costs. The main issues in big data ETL are handling high-dimensional data and maintaining similar data for effective data warehouse usage. Therefore, Extract, Transform, Load (ETL) plays a vital role in extracting meaningful information from the data warehouse and trying to retain the users. This paper proposes hybrid optimization of Swarm Intelligence with a tabu search algorithm for handling big data in a cloud-based architecture-based ETL process. This proposed work overcomes many issues related to complex data storage and retrieval in the data warehouse. Swarm Intelligence algorithms can overcome problems like high dimensional data, dynamical change of huge data and cost optimization in the transformation stage. In this work for the swarm intelligence algorithm, a Grey-Wolf Optimizer (GWO) is implemented to reduce the high dimensionality of data. Tabu Search (TS) is used for clustering the relevant data as a group. Clustering means the segregation of relevant data accurately from the data warehouse. The cluster size in the ETL process can be optimized by the proposed work of (GWO-TS). Therefore, the huge data in the warehouse can be processed within an expected latency.<\/jats:p>","DOI":"10.1186\/s13677-023-00571-y","type":"journal-article","created":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T11:02:35Z","timestamp":1704711755000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["An efficient hybrid optimization of ETL process in data warehouse of cloud architecture"],"prefix":"10.1186","volume":"13","author":[{"given":"Lina","family":"Dinesh","sequence":"first","affiliation":[]},{"given":"K. Gayathri","family":"Devi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,1,8]]},"reference":[{"key":"571_CR1","doi-asserted-by":"crossref","unstructured":"Zdravevski E, Lameski P, Dimitrievski A, Grzegorowski M, Apanowicz C (2019) Cluster-size optimization within a cloud-based ETL framework for Big Data. In: 2019 IEEE International Conference on Big Data (IEEE BigData 2019), at Los Angles, USA, pp 3754\u20133763","DOI":"10.1109\/BigData47090.2019.9006547"},{"key":"571_CR2","doi-asserted-by":"publisher","first-page":"41261","DOI":"10.1109\/ACCESS.2021.3064202","volume":"9","author":"O Aziz","year":"2021","unstructured":"Aziz O, Anees T, Mehmood E (2021) An efficient data access approach with queue and stack in optimized hybrid join. IEEE Access 9:41261\u201341274.","journal-title":"IEEE Access"},{"key":"571_CR3","unstructured":"Mehra KK et al (2017) Extract, transform and load (ETL) system and method. U.S. patent no. 9"},{"key":"571_CR4","doi-asserted-by":"publisher","first-page":"676","DOI":"10.1016\/j.procs.2019.09.223","volume":"159","author":"M Souigbui","year":"2019","unstructured":"Souigbui M, Augui F, Zammali S, Cherfi S, Yahia SB (2019) Data quality in ETL process: a preliminary study. Procedia Comput Sci 159:676\u2013687. Elsevier","journal-title":"Procedia Comput Sci"},{"key":"571_CR5","first-page":"387","volume-title":"Advances in data mining: applications and theoretical aspects. 19th Industrial Conference, ICDM 2019","author":"E Zdravevski","year":"2019","unstructured":"Zdravevski E, Apanowicz C, Stencel K, Slezak D (2019) Scalable cloud-based ETL for self-serving analytics. In: Perner P (ed) Advances in data mining: applications and theoretical aspects. 19th Industrial Conference, ICDM 2019. Springer International Publishing, Cham, pp 387\u2013394"},{"issue":"2","key":"571_CR6","doi-asserted-by":"publisher","first-page":"E417","DOI":"10.1016\/j.ijrobp.2016.06.1680","volume":"96","author":"C Mayo","year":"2016","unstructured":"Mayo C et al (2016) Taming big data: implementation of a clinical use-case driven architecture. Int J Radiat Oncol Biol Phys 96(2):E417-8","journal-title":"Int J Radiat Oncol Biol Phys"},{"key":"571_CR7","first-page":"861","volume-title":"Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, (CIT\/IUCC\/DASC\/PICO), IEEE International Conference on","author":"VS Belo","year":"2015","unstructured":"Belo VS (2015) Using relational algebra on the specification of real world ETL processes. Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, (CIT\/IUCC\/DASC\/PICO), IEEE International Conference on. IEEE, Liverpool, pp 861\u2013866"},{"key":"571_CR8","doi-asserted-by":"crossref","unstructured":"Parul SN, Teggihalli S (2015) Performance optimization forextraction, transformation, loading and reporting of data. In: Communication Technologies (GCCT), 2015 Global Conference on. IEEE, Thuckalay, pp 516\u2013519","DOI":"10.1109\/GCCT.2015.7342715"},{"issue":"3","key":"571_CR9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.4018\/jdwm.2009070101","volume":"5","author":"P Vassiliadis","year":"2009","unstructured":"Vassiliadis P (2009) A survey of extract - transform - load technology. Int J Data Warehous Min 5(3):1\u201327","journal-title":"Int J Data Warehous Min"},{"key":"571_CR10","doi-asserted-by":"crossref","unstructured":"Vassiliadis P, Simitsis A (2009) Extraction, transformation, and loading. In Encyclopedia of database systems. Springer, pp 1095\u20131101","DOI":"10.1007\/978-0-387-39940-9_158"},{"key":"571_CR11","doi-asserted-by":"crossref","unstructured":"Liu C, Wu T, Li Z, Ma T, Huang J (2022) Robust online tensor completion for IoT streaming data recovery. In: IEEE transactions on neural networks and learning systems","DOI":"10.1109\/TNNLS.2022.3165076"},{"issue":"11","key":"571_CR12","doi-asserted-by":"publisher","first-page":"12556","DOI":"10.1007\/s10489-021-03121-8","volume":"52","author":"X Zhou","year":"2022","unstructured":"Zhou X, Zhang L (2022) SA-FPN: an effective feature pyramid network for crowded human detection. Appl Intell 52(11):12556\u201312568","journal-title":"Appl Intell"},{"issue":"8","key":"571_CR13","doi-asserted-by":"publisher","first-page":"837","DOI":"10.3390\/machines11080837","volume":"11","author":"S Li","year":"2023","unstructured":"Li S, Chen H, Chen Y, Xiong Y, Song Z (2023) Hybrid method with parallel-factor theory, a support vector machine, and particle filter optimization for intelligent machinery failure identification. Machines 11(8):837","journal-title":"Machines"},{"issue":"4","key":"571_CR14","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1145\/3230644","volume":"17","author":"X Liang","year":"2018","unstructured":"Liang X, Huang Z, Yang S, Qiu L (2018) Device-free motion & trajectory detection via RFID. ACM Trans Embed Comput Syst 17(4):78","journal-title":"ACM Trans Embed Comput Syst"},{"issue":"8","key":"571_CR15","doi-asserted-by":"publisher","first-page":"5309","DOI":"10.1109\/TII.2019.2961340","volume":"16","author":"B Cao","year":"2020","unstructured":"Cao B, Zhao J, Gu Y, Fan S, Yang P (2020) Security-aware industrial wireless sensor network deployment optimization. IEEE Trans Industr Inform 16(8):5309\u20135316","journal-title":"IEEE Trans Industr Inform"},{"key":"571_CR16","doi-asserted-by":"crossref","unstructured":"Skoutas D, Simitsis A (2006) Designing ETL processes using semantic web technologies. In: Proceedings of the 9th international ACM workshop on data warehousing and OLAP, USA. pp 67\u201374","DOI":"10.1145\/1183512.1183526"},{"key":"571_CR17","doi-asserted-by":"publisher","first-page":"672","DOI":"10.1016\/j.ins.2022.11.101","volume":"621","author":"Y Peng","year":"2023","unstructured":"Peng Y, Zhao Y, Hu J (2023) On the role of community structure in evolution of opinion formation: a new bounded confidence opinion dynamics. Inf Sci 621:672\u2013690","journal-title":"Inf Sci"},{"key":"571_CR18","doi-asserted-by":"publisher","first-page":"105860","DOI":"10.1016\/j.engappai.2023.105860","volume":"120","author":"K Zhao","year":"2023","unstructured":"Zhao K, Jia Z, Jia F, Shao H (2023) Multi-scale integrated deep self-attention network for predicting remaining useful life of aero-engine. Eng Appl Artif Intell 120:105860","journal-title":"Eng Appl Artif Intell"},{"key":"571_CR19","doi-asserted-by":"crossref","unstructured":"Mhon GGW, Kham NSM (2020) ETL pre-processing with multiple data sources for academic data analysis. In: IEEE Conference on Computer Applications (ICCA). pp 1\u20135","DOI":"10.1109\/ICCA49400.2020.9022824"},{"key":"571_CR20","doi-asserted-by":"publisher","DOI":"10.1145\/3369740.3372778","volume-title":"Role of machine learning in ETL automation","author":"KC Mondal","year":"2020","unstructured":"Mondal KC, Biswas N, Saha S (2020) Role of machine learning in ETL automation"},{"key":"571_CR21","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1016\/j.dss.2019.03.008","volume":"120","author":"M Ghasemaghaei","year":"2019","unstructured":"Ghasemaghaei M, Calic G (2019) Can big data improve firm decision quality? The role of data quality and data diagnosticity. Decis Support Syst 120:38\u201349","journal-title":"Decis Support Syst"},{"key":"571_CR22","doi-asserted-by":"crossref","unstructured":"Kim S-S, Lee W-R, Go J-H (2019) A study on utilization of spatial information in heterogeneous system based on Apache NiFi. pp. 1117\u20131119","DOI":"10.1109\/ICTC46691.2019.8939734"},{"issue":"February","key":"571_CR23","doi-asserted-by":"publisher","first-page":"113138","DOI":"10.1016\/j.dss.2019.113138","volume":"126","author":"Y Timmerman","year":"2019","unstructured":"Timmerman Y, Bronselaer A (2019) Measuring data quality in information systems research. Decis Support Syst 126(February):113138","journal-title":"Decis Support Syst"},{"key":"571_CR24","doi-asserted-by":"crossref","unstructured":"Taleb I, Serhani MA, Dssouli R (2019) Big data quality assessment model for unstructured data. In: 13th International Conference on Innovations in Information Technology, IIT 2018. pp 69\u201374","DOI":"10.1109\/INNOVATIONS.2018.8605945"},{"key":"571_CR25","doi-asserted-by":"publisher","first-page":"24634","DOI":"10.1109\/ACCESS.2019.2899751","volume":"7","author":"C Cichy","year":"2019","unstructured":"Cichy C, Rass S (2019) An overview of data quality framework. IEEE Access 7:24634\u201324648","journal-title":"IEEE Access"},{"key":"571_CR26","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1016\/j.promfg.2019.02.114","volume":"29","author":"LC G\u00fcnther","year":"2019","unstructured":"G\u00fcnther LC, Colangelo E, Wiendahl HH, Bauer C (2019) Data quality assessment for improved decision-making: a methodology for small and medium-sized enterprises. Procedia Manuf 29:583\u2013591","journal-title":"Procedia Manuf"},{"key":"571_CR27","doi-asserted-by":"publisher","first-page":"104840","DOI":"10.1016\/j.cmpb.2019.01.012","volume":"181","author":"Q Tian","year":"2019","unstructured":"Tian Q, Liu M, Min L, An J, Lu X, Duan H (2019) An automated data verification approach for improving data quality in a clinical registry. Comput Methods Programs Biomed 181:104840","journal-title":"Comput Methods Programs Biomed"},{"issue":"1","key":"571_CR28","doi-asserted-by":"publisher","first-page":"018501","DOI":"10.1117\/1.JRS.14.018501","volume":"14","author":"BEB Semlali","year":"2020","unstructured":"Semlali BEB, El Amrani C, Ortiz G (2020) SAT-ETL-Integrator: an extract-transform-load software for satellite big data ingestion. J Appl Remote Sens 14(1):018501","journal-title":"J Appl Remote Sens"},{"key":"571_CR29","doi-asserted-by":"publisher","first-page":"148181","DOI":"10.1109\/ACCESS.2020.3012836","volume":"8","author":"RM Terol","year":"2020","unstructured":"Terol RM, Reina AR, Ziaei S, Gil D (2020) A machine learning approach to reduce dimensional space in large datasets. IEEE Access 8:148181\u2013148192","journal-title":"IEEE Access"},{"issue":"4","key":"571_CR30","doi-asserted-by":"publisher","first-page":"204","DOI":"10.3390\/info11040204","volume":"11","author":"R Galici","year":"2020","unstructured":"Galici R, Ordile L, Marchesi M, Pinna A, Tonelli R (2020) Applying the ETL process to blockchain data. Prospect and findings. Information 11(4):204","journal-title":"Information"},{"issue":"1","key":"571_CR31","doi-asserted-by":"publisher","first-page":"10","DOI":"10.3390\/informatics6010010","volume":"6","author":"O Azeroual","year":"2019","unstructured":"Azeroual O, Saake G, Abuosba M (2019) ETL best practices for data quality checks in RIS databases. Informatics 6(1):10","journal-title":"Informatics"},{"key":"571_CR32","doi-asserted-by":"publisher","first-page":"101260","DOI":"10.1016\/j.mex.2021.101260","volume":"8","author":"M Hendayun","year":"2021","unstructured":"Hendayun M, Yulianto E, Rusdi JF, Setiawan A, Ilman B (2021) Extract transform load process in banking reporting system. MethodsX 8:101260","journal-title":"MethodsX"},{"key":"571_CR33","doi-asserted-by":"crossref","unstructured":"Nwokeji JC, Matovu R (2021) A systematic literature review on big data extraction, transformation and loading (etl). In: Intelligent computing: proceedings of the 2021 computing conference, volume 2. Springer International Publishing, pp 308\u2013324","DOI":"10.1007\/978-3-030-80126-7_24"},{"issue":"9","key":"571_CR34","doi-asserted-by":"publisher","first-page":"2302","DOI":"10.14778\/3598581.3598600","volume":"16","author":"F Kossmann","year":"2023","unstructured":"Kossmann F, Wu Z, Lai E, Tatbul N, Cao L, Kraska T, Madden S (2023) Extract-transform-load for video streams. Proc VLDB Endow 16(9):2302\u20132315","journal-title":"Proc VLDB Endow"},{"issue":"9","key":"571_CR35","doi-asserted-by":"publisher","first-page":"12","DOI":"10.5539\/mas.v14n9p12","volume":"14","author":"J Alwidian","year":"2020","unstructured":"Alwidian J, Rahman SA, Gnaim M, Al-Taharwah F (2020) Big data ingestion and preparation tools. Mod Appl Sci 14(9):12\u201327","journal-title":"Mod Appl Sci"},{"issue":"12","key":"571_CR36","doi-asserted-by":"publisher","first-page":"2280","DOI":"10.3390\/sym13122280","volume":"13","author":"N Ul Hassan","year":"2021","unstructured":"Ul Hassan N, Bangyal WH, Ali Khan MS, Nisar K, Ag. Ibrahim AA, Rawat DB (2021) Improved opposition-based particle swarm optimization algorithm for global optimization. Symmetry 13(12):2280","journal-title":"Symmetry"},{"key":"571_CR37","doi-asserted-by":"crossref","unstructured":"Fan W, Yang L, Bouguila N (2022) Unsupervised grouped axial data modeling via hierarchical Bayesian nonparametric models with Watson distributions. IEEE Trans Pattern Anal Mach Intell 44:9654-68","DOI":"10.1109\/TPAMI.2021.3128271"},{"key":"571_CR38","doi-asserted-by":"crossref","unstructured":"Zhang X, Wen S, Yan L, Feng J, Xia Y (2022) A hybrid-convolution spatial\u2013temporal recurrent network for traffic flow prediction. Comput J c171","DOI":"10.1093\/comjnl\/bxac171"},{"key":"571_CR39","doi-asserted-by":"publisher","first-page":"384","DOI":"10.1016\/j.ins.2022.08.093","volume":"612","author":"B Li","year":"2022","unstructured":"Li B, Zhou X, Ning Z, Guan X, Yiu KC (2022) Dynamic event-triggered security control for networked control systems with cyber-attacks: a model predictive control approach. Inf Sci 612:384\u2013398","journal-title":"Inf Sci"},{"issue":"2","key":"571_CR40","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1007\/s11518-022-5521-0","volume":"31","author":"H Wu","year":"2022","unstructured":"Wu H, Jin S, Yue W (2022) Pricing policy for a dynamic spectrum allocation scheme with batch requests and impatient packets in cognitive radio networks. J Syst Sci Syst Eng 31(2):133\u2013149","journal-title":"J Syst Sci Syst Eng"},{"key":"571_CR41","doi-asserted-by":"crossref","unstructured":"Wang Y, Han X, Jin S (2022) MAP based modeling method and performance study of a task offloading scheme with time-correlated traffic and VM repair in MEC systems. Wireless Networks 29:47-68","DOI":"10.1007\/s11276-022-03099-2"},{"key":"571_CR42","doi-asserted-by":"crossref","unstructured":"Zhang J, Tang Y, Wang H, Xu K (2022) ASRO-DIO: Active subspace random optimization based depth inertial odometry. IEEE Trans Robot 1\u201313","DOI":"10.1109\/TRO.2022.3208503"},{"issue":"3","key":"571_CR43","doi-asserted-by":"publisher","first-page":"1187","DOI":"10.1109\/TNSE.2021.3137353","volume":"9","author":"Q Ni","year":"2022","unstructured":"Ni Q, Guo J, Wu W, Wang H, Wu J (2022) Continuous influence-based community partition for social networks. IEEE Trans Netw Sci Eng 9(3):1187\u20131197","journal-title":"IEEE Trans Netw Sci Eng"},{"key":"571_CR44","doi-asserted-by":"crossref","unstructured":"Xu Y, Chen H, Wang Z, Yin J, Shen Q, Wang D et al (2023) Multi-factor sequential re-ranking with perception-aware diversification. Paper presented at the KDD \u201823","DOI":"10.1145\/3580305.3599869"},{"key":"571_CR45","doi-asserted-by":"crossref","unstructured":"Tan J, Jin H, Hu H, Hu R, Zhang H et al (2022) WF-MTD: Evolutionary decision method for moving target defense based on Wright-Fisher process. In: IEEE transactions on dependable and secure computing","DOI":"10.1109\/TDSC.2022.3232537"},{"issue":"4","key":"571_CR46","doi-asserted-by":"publisher","first-page":"2082","DOI":"10.1109\/TNET.2017.2705239","volume":"25","author":"B Cheng","year":"2017","unstructured":"Cheng B, Wang M, Zhao S, Zhai Z, Zhu D et al (2017) Situation-aware dynamic service coordination in an IoT environment. IEEE\/ACM Trans Netw 25(4):2082\u20132095","journal-title":"IEEE\/ACM Trans Netw"},{"key":"571_CR47","unstructured":"Mathew S (2017) Overview of Amazon Web Services. Accessed 6 Apr 2019"},{"key":"571_CR48","doi-asserted-by":"publisher","first-page":"4371","DOI":"10.1109\/JSYST.2023.3263865","volume":"17","author":"J Zhang","year":"2023","unstructured":"Zhang J, Liu Y, Li Z, Lu Y (2023) Forecast-assisted service function chain dynamic deployment for SDN\/NFV-enabled cloud management systems. IEEE Syst J 17:4371\u20134382","journal-title":"IEEE Syst J"},{"issue":"3","key":"571_CR49","doi-asserted-by":"publisher","first-page":"04022008","DOI":"10.1061\/(ASCE)ME.1943-5479.0001015","volume":"38","author":"H Yuan","year":"2022","unstructured":"Yuan H, Yang B (2022) System dynamics approach for evaluating the interconnection performance of cross-border transport infrastructure. J Manag Eng 38(3):04022008","journal-title":"J Manag Eng"},{"key":"571_CR50","doi-asserted-by":"crossref","unstructured":"Guo F, Zhou W, Lu Q, Zhang C (2022) Path extension similarity link prediction method based on matrix algebra in directed networks. Comput Commun 187:83\u201392","DOI":"10.1016\/j.comcom.2022.02.002"},{"key":"571_CR51","doi-asserted-by":"crossref","unstructured":"Li Q, Lin H, Tan X, Du S (2020) Consensus for multiagent-based supply chain systems under switching topology and uncertain demands. IEEE Trans Syst Man Cybern 50(12):4905\u201318","DOI":"10.1109\/TSMC.2018.2884510"}],"container-title":["Journal of Cloud Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13677-023-00571-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13677-023-00571-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13677-023-00571-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T11:13:16Z","timestamp":1704712396000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofcloudcomputing.springeropen.com\/articles\/10.1186\/s13677-023-00571-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,8]]},"references-count":51,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["571"],"URL":"https:\/\/doi.org\/10.1186\/s13677-023-00571-y","relation":{},"ISSN":["2192-113X"],"issn-type":[{"value":"2192-113X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,8]]},"assertion":[{"value":"10 July 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 December 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The authors declare no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"12"}}