{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T04:56:40Z","timestamp":1776833800512,"version":"3.51.2"},"reference-count":41,"publisher":"Sociedade Brasileira de Computacao - SB","issue":"1","license":[{"start":{"date-parts":[[2019,11,20]],"date-time":"2019-11-20T00:00:00Z","timestamp":1574208000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2019,11,20]],"date-time":"2019-11-20T00:00:00Z","timestamp":1574208000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Internet Serv Appl"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The competitive dynamics of the globalized market demand information on the internal and external reality of corporations. Information is a precious asset and is responsible for establishing key advantages to enable companies to maintain their leadership. However, reliable, rich information is no longer the only goal. The time frame to extract information from data determines its usefulness. This work proposes DOD-ETL, a tool that addresses, in an innovative manner, the main bottleneck in Business Intelligence solutions, the Extract Transform Load process (ETL), providing it in near real-time. DOD-ETL achieves this by combining an on-demand data stream pipeline with a distributed, parallel and technology-independent architecture with in-memory caching and efficient data partitioning. We compared DOD-ETL with other Stream Processing frameworks used to perform near real-time ETL and found DOD-ETL executes workloads up to 10 times faster. We have deployed it in a large steelworks as a replacement for its previous ETL solution, enabling near real-time reports previously unavailable.<\/jats:p>","DOI":"10.1186\/s13174-019-0121-z","type":"journal-article","created":{"date-parts":[[2019,11,20]],"date-time":"2019-11-20T16:03:20Z","timestamp":1574265800000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["DOD-ETL: distributed on-demand ETL for near real-time business intelligence"],"prefix":"10.5753","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6433-171X","authenticated-orcid":false,"given":"Gustavo V.","family":"Machado","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"\u00cdtalo","family":"Cunha","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adriano C. M.","family":"Pereira","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leonardo B.","family":"Oliveira","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"3742","published-online":{"date-parts":[[2019,11,20]]},"reference":[{"key":"121_CR1","doi-asserted-by":"publisher","unstructured":"Malhotra Y. From information management to knowledge management: beyond the\u2019hi-tech hidebound\u2019systems. Knowl Manag Bus Model Innov. 2001;:115\u201334. https:\/\/doi.org\/10.4018\/978-1-878289-98-8.ch007.","DOI":"10.4018\/978-1-878289-98-8.ch007"},{"key":"121_CR2","doi-asserted-by":"publisher","unstructured":"Watson HJ, Wixom BH. The current state of business intelligence. Computer. 2007; 40(9):96\u20139. https:\/\/doi.org\/10.1109\/mc.2007.331.","DOI":"10.1109\/mc.2007.331"},{"key":"121_CR3","doi-asserted-by":"publisher","unstructured":"Sabtu A, Azmi NFM, Sjarif NNA, Ismail SA, Yusop OM, Sarkan H, Chuprat S. The challenges of extract, transform and loading (etl) system implementation for near real-time environment. In: 2017 International Conference On Research and Innovation in Information Systems (ICRIIS). IEEE: 2017. p. 1\u20135. https:\/\/doi.org\/10.1109\/icriis.2017.8002467.","DOI":"10.1109\/icriis.2017.8002467"},{"key":"121_CR4","volume-title":"Real-time Analytics: Techniques to Analyze and Visualize Streaming Data","author":"B Ellis","year":"2014","unstructured":"Ellis B. Real-time Analytics: Techniques to Analyze and Visualize Streaming Data. Konstanz: Wiley; 2014."},{"key":"121_CR5","volume-title":"Extending Database Technology","author":"M Mesiti","year":"2016","unstructured":"Mesiti M, Ferrari L, Valtolina S, Licari G, Galliani G, Dao M, Zettsu K, et al.Streamloader: an event-driven etl system for the on-line processing of heterogeneous sensor data. In: Extending Database Technology. Konstanz: OpenProceedings: 2016. p. 628\u201331."},{"key":"121_CR6","doi-asserted-by":"publisher","unstructured":"Naeem MA, Dobbie G, Webber G. An event-based near real-time data integration architecture. In: 2008 12th Enterprise Distributed Object Computing Conference Workshops. IEEE: 2008. p. 401\u20134. https:\/\/doi.org\/10.1109\/edocw.2008.14.","DOI":"10.1109\/edocw.2008.14"},{"key":"121_CR7","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1016\/j.future.2014.06.009","volume":"43","author":"F Zhang","year":"2015","unstructured":"Zhang F, Cao J, Khan SU, Li K, Hwang K. A task-level adaptive mapreduce framework for real-time streaming data in healthcare applications. Futur Gener Comput Syst. 2015; 43:149\u201360.","journal-title":"Futur Gener Comput Syst"},{"issue":"18","key":"121_CR8","first-page":"24","volume":"46","author":"T Jain","year":"2012","unstructured":"Jain T, Rajasree S, Saluja S. Refreshing datawarehouse in near real-time. Int J Comput Appl. 2012; 46(18):24\u20139.","journal-title":"Int J Comput Appl"},{"key":"121_CR9","unstructured":"Kreps J, Narkhede N, Rao J, et al.Kafka: A distributed messaging system for log processing. In: ACM SIGMOD Workshop on Networking Meets Databases. New York: 2011. p. 1\u20137."},{"key":"121_CR10","unstructured":"Apache. Apache Beam. 2015. https:\/\/beam.apache.org\/. Accessed 22 Mar 2019."},{"key":"121_CR11","first-page":"10","volume":"12","author":"M Zaharia","year":"2012","unstructured":"Zaharia M, Das T, Li H, Shenker S, Stoica I. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. HotCloud. 2012; 12:10.","journal-title":"HotCloud"},{"key":"121_CR12","volume-title":"New Trends in Data Warehousing and Data Analysis","author":"P Vassiliadis","year":"2009","unstructured":"Vassiliadis P, Simitsis A. Near real time etl. In: New Trends in Data Warehousing and Data Analysis. New York: Springer: 2009. p. 1\u201331."},{"issue":"3","key":"121_CR13","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1016\/S0169-023X(01)00042-8","volume":"39","author":"T Thalhammer","year":"2001","unstructured":"Thalhammer T, Schrefl M, Mohania M. Active data warehouses: complementing olap with analysis rules. Data Knowl Eng. 2001; 39(3):241\u201369.","journal-title":"Data Knowl Eng"},{"key":"121_CR14","doi-asserted-by":"crossref","unstructured":"Karakasidis A, Vassiliadis P, Pitoura E. Etl queues for active data warehousing. In: Proceedings of the 2nd International Workshop on Information Quality in Information Systems. ACM: 2005. p. 28\u201339.","DOI":"10.1145\/1077501.1077509"},{"key":"121_CR15","volume-title":"E-Commerce Technology, 2006. The 8th IEEE International Conference on and Enterprise Computing, E-Commerce, and E-Services, The 3rd IEEE International Conference On","author":"B Azvine","year":"2006","unstructured":"Azvine B, Cui Z, Nauck DD, Majeed B. Real time business intelligence for the adaptive enterprise. In: E-Commerce Technology, 2006. The 8th IEEE International Conference on and Enterprise Computing, E-Commerce, and E-Services, The 3rd IEEE International Conference On. New York: IEEE: 2006. p. 29."},{"issue":"1","key":"121_CR16","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1108\/09685220810862733","volume":"16","author":"B Sahay","year":"2008","unstructured":"Sahay B, Ranjan J. Real time business intelligence in supply chain analytics. Inf Manag Comput Secur. 2008; 16(1):28\u201348.","journal-title":"Inf Manag Comput Secur"},{"key":"121_CR17","volume-title":"Proceedings of the 8th ACM International Workshop on Data Warehousing and OLAP","author":"TM Nguyen","year":"2005","unstructured":"Nguyen TM, Schiefer J, Tjoa AM. Sense & response service architecture (saresa): an approach towards a real-time business intelligence solution and its use for a fraud detection application. In: Proceedings of the 8th ACM International Workshop on Data Warehousing and OLAP. New York: ACM: 2005. p. 77\u201386."},{"key":"121_CR18","volume-title":"2015 International Seminar On Intelligent Technology and Its Applications (ISITIA)","author":"A Wibowo","year":"2015","unstructured":"Wibowo A. Problems and available solutions on the stage of extract, transform, and loading in near real-time data warehousing (a literature study). In: 2015 International Seminar On Intelligent Technology and Its Applications (ISITIA). New York: IEEE: 2015. p. 345\u201350."},{"issue":"1","key":"121_CR19","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1145\/1327452.1327492","volume":"51","author":"J Dean","year":"2008","unstructured":"Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM. 2008; 51(1):107\u201313.","journal-title":"Commun ACM"},{"issue":"2","key":"121_CR20","doi-asserted-by":"publisher","first-page":"21","DOI":"10.4018\/jdwm.2013040102","volume":"9","author":"F Waas","year":"2013","unstructured":"Waas F, Wrembel R, Freudenreich T, Thiele M, Koncilia C, Furtado P. On-demand elt architecture for right-time bi: extending the vision. Int J Data Warehous Mining (IJDWM). 2013; 9(2):21\u201338.","journal-title":"Int J Data Warehous Mining (IJDWM)"},{"key":"121_CR21","volume-title":"Proceedings 20th IEEE International Conference on Distributed Computing Systems","author":"M Wiesmann","year":"2000","unstructured":"Wiesmann M, Pedone F, Schiper A, Kemme B, Alonso G. Understanding replication in databases and distributed systems. In: Proceedings 20th IEEE International Conference on Distributed Computing Systems. New York: IEEE: 2000. p. 464\u201374."},{"key":"121_CR22","volume-title":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","author":"A Thomson","year":"2012","unstructured":"Thomson A, Diamond T, Weng S-C, Ren K, Shao P, Abadi DJ. Calvin: fast distributed transactions for partitioned database systems. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York: ACM: 2012. p. 1\u201312."},{"key":"121_CR23","volume-title":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","author":"Y Sovran","year":"2011","unstructured":"Sovran Y, Power R, Aguilera MK, Li J. Transactional storage for geo-replicated systems. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. New York: ACM: 2011. p. 385\u2013400."},{"key":"121_CR24","volume-title":"Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference On","author":"N Polyzotis","year":"2007","unstructured":"Polyzotis N, Skiadopoulos S, Vassiliadis P, Simitsis A, Frantzell N-E. Supporting streaming updates in an active data warehouse. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference On. New York: IEEE: 2007. p. 476\u201385."},{"issue":"7","key":"121_CR25","doi-asserted-by":"publisher","first-page":"976","DOI":"10.1109\/TKDE.2008.27","volume":"20","author":"N Polyzotis","year":"2008","unstructured":"Polyzotis N, Skiadopoulos S, Vassiliadis P, Simitsis A, Frantzell N. Meshing streaming updates with persistent data in an active data warehouse. IEEE Trans Knowl Data Eng. 2008; 20(7):976\u201391.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"121_CR26","volume-title":"2011 IEEE 27th International Conference On Data Engineering (ICDE)","author":"MA Bornea","year":"2011","unstructured":"Bornea MA, Deligiannakis A, Kotidis Y, Vassalos V. Semi-streamed index join for near-real time execution of etl transformations. In: 2011 IEEE 27th International Conference On Data Engineering (ICDE). New York: IEEE: 2011. p. 159\u201370."},{"key":"121_CR27","volume-title":"Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP","author":"MA Naeem","year":"2010","unstructured":"Naeem MA, Dobbie G, Weber G, Alam S. R-meshjoin for near-real-time data warehousing. In: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP. New York: ACM: 2010. p. 53\u201360."},{"key":"121_CR28","volume-title":"Computer Science and Software Engineering, 2008 International Conference On","author":"J Shi","year":"2008","unstructured":"Shi J, Bao Y, Leng F, Yu G. Study on log-based change data capture and handling mechanism in real-time data warehouse. In: Computer Science and Software Engineering, 2008 International Conference On. New York: IEEE: 2008. p. 478\u201381."},{"key":"121_CR29","doi-asserted-by":"crossref","unstructured":"Neumeyer L, Robbins B, Nair A, Kesari A. S4: Distributed stream computing platform. In: 2010 IEEE International Conference On Data Mining Workshops (ICDMW). IEEE: 2010. p. 170\u20137.","DOI":"10.1109\/ICDMW.2010.172"},{"key":"121_CR30","volume-title":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","author":"A Toshniwal","year":"2014","unstructured":"Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, et al.Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. New York: ACM: 2014. p. 147\u201356."},{"issue":"4","key":"121_CR31","first-page":"28","volume":"36","author":"P Carbone","year":"2015","unstructured":"Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K. Apache flink: Stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Comm Data Eng. 2015; 36(4):28\u201338.","journal-title":"Bull IEEE Comput Soc Tech Comm Data Eng"},{"key":"121_CR32","unstructured":"Google. Google Dataflow. 2015. https:\/\/cloud.google.com\/dataflow\/. Accessed 23 Mar 2019."},{"key":"121_CR33","unstructured":"Microsoft. Azure Stream Analytics. 2015. https:\/\/azure.microsoft.com\/en-us\/services\/stream-analytics\/. Accessed 23 Mar 2019."},{"key":"121_CR34","unstructured":"Cutting D. Apache Avro. 2009. https:\/\/avro.apache.org\/. Accessed 10 Aug 2019."},{"key":"121_CR35","volume-title":"USENIX Annual Technical Conference, vol. 8","author":"P Hunt","year":"2010","unstructured":"Hunt P, Konar M, Junqueira FP, Reed B. Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX Annual Technical Conference, vol. 8. Boston: USENIX: 2010. p. 9."},{"key":"121_CR36","unstructured":"Mueller T. H2 Database. 2012. http:\/\/www.h2database.com\/. Accessed 10 Aug 2019."},{"key":"121_CR37","first-page":"24","volume":"32","author":"EF Codd","year":"1993","unstructured":"Codd EF, Codd SB, Salley CT. Providing olap (on-line analytical processing) to user-analysts: An it mandate. Codd Date. 1993; 32:24.","journal-title":"Codd Date"},{"key":"121_CR38","volume-title":"Object-oriented Data Warehouse Design: Building a Star Schema","author":"WA Giovinazzo","year":"2000","unstructured":"Giovinazzo WA. Object-oriented Data Warehouse Design: Building a Star Schema. Upper Saddle River: Prentice Hall PTR; 2000."},{"key":"121_CR39","doi-asserted-by":"crossref","unstructured":"Stamatis DH. The OEE Primer: Understanding Overall Equipment Effectiveness, Reliability, and Maintainability, 1 pap\/cdr edn: Productivity Press; 2010. http:\/\/amazon.com\/o\/ASIN\/1439814066\/. Accessed 12 Aug 2018.","DOI":"10.1201\/EBK1439814062"},{"issue":"5","key":"121_CR40","doi-asserted-by":"publisher","first-page":"495","DOI":"10.1108\/01443579810206334","volume":"18","author":"\u00d5 Ljungberg","year":"1998","unstructured":"Ljungberg \u00d5. Measurement of overall equipment effectiveness as a basis for tpm activities. Int J Oper Prod Manag. 1998; 18(5):495\u2013507.","journal-title":"Int J Oper Prod Manag"},{"key":"121_CR41","unstructured":"International Society of Automation. Enterprise-control system integration American national standard ; ANSI\/ISA-95.00. Research Triangle Park: ISA: 2001."}],"container-title":["Journal of Internet Services and Applications"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13174-019-0121-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13174-019-0121-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13174-019-0121-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,9]],"date-time":"2022-02-09T22:15:29Z","timestamp":1644444929000},"score":1,"resource":{"primary":{"URL":"https:\/\/jisajournal.springeropen.com\/articles\/10.1186\/s13174-019-0121-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,20]]},"references-count":41,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["121"],"URL":"https:\/\/doi.org\/10.1186\/s13174-019-0121-z","relation":{},"ISSN":["1867-4828","1869-0238"],"issn-type":[{"value":"1867-4828","type":"print"},{"value":"1869-0238","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,20]]},"assertion":[{"value":"6 December 2018","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 October 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 November 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"21"}}