{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:26:58Z","timestamp":1760059618619,"version":"build-2065373602"},"reference-count":25,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T00:00:00Z","timestamp":1750636800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Data workflows are an important component of modern analytical systems, enabling structured data extraction, transformation, integration, and delivery across diverse applications. Despite their importance, these workflows are often developed using ad hoc approaches, leading to scalability and maintenance challenges. This paper proposes a structured, three-level methodology\u2014conceptual, logical, and physical\u2014for modeling data workflows using Business Process Model and Notation (BPMN). A custom BPMN metamodel is introduced, along with a tool built on BPMN.io, that enforces modeling constraints and supports translation from high-level workflow designs to executable implementations. Logical models are further enriched through blueprint definitions, specified in a formal, implementation-agnostic JSON schema. The methodology is validated through a case study, demonstrating its applicability across ETL and machine learning domains, promoting clarity, reuse, and automation in data pipeline development.<\/jats:p>","DOI":"10.3390\/data10070097","type":"journal-article","created":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T10:44:41Z","timestamp":1750761881000},"page":"97","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Developing Data Workflows: From Conceptual Blueprints to Physical Implementation"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9138-9143","authenticated-orcid":false,"given":"Bruno","family":"Oliveira","sequence":"first","affiliation":[{"name":"CIICESI, School of Management and Technology, Porto Polytechnic, 4610-156 Felgueiras, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3807-7292","authenticated-orcid":false,"given":"\u00d3scar","family":"Oliveira","sequence":"additional","affiliation":[{"name":"CIICESI, School of Management and Technology, Porto Polytechnic, 4610-156 Felgueiras, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,23]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"van der Aalst, W.M., and van Hee, K. (2002). Workflow Management, The MIT Press.","key":"ref_1","DOI":"10.7551\/mitpress\/7301.001.0001"},{"doi-asserted-by":"crossref","unstructured":"Raj, A., Bosch, J., Olsson, H.H., and Wang, T.J. (2020, January 26\u201328). Modelling Data Pipelines. Proceedings of the 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Portoroz, Slovenia.","key":"ref_2","DOI":"10.1109\/SEAA51224.2020.00014"},{"doi-asserted-by":"crossref","unstructured":"Sullivan, D. (2020). Designing Data Pipelines. Official Google Cloud Certified Professional Data Engineer Study Guide, Wiley.","key":"ref_3","DOI":"10.1002\/9781119618461"},{"doi-asserted-by":"crossref","unstructured":"Wu, J., Wang, H., Ni, C., Zhang, C., and Lu, W. (2024, January 1\u20133). Data Pipeline Training: Integrating AutoML to Optimize the Data Flow of Machine Learning Models. Proceedings of the 2024 7th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, China.","key":"ref_4","DOI":"10.1109\/ICAACE61206.2024.10549260"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"724","DOI":"10.1016\/j.procir.2020.04.016","article-title":"A framework for designing data pipelines for manufacturing systems","volume":"93","author":"Oleghe","year":"2020","journal-title":"Procedia CIRP"},{"doi-asserted-by":"crossref","unstructured":"Munappy, A.R., Bosch, J., and Olsson, H.H. (2020). Data Pipeline Management in Practice: Challenges and Opportunities. Product-Focused Software Process Improvement, Proceedings of the 21st International Conference, PROFES 2020, Turin, Italy, 25\u201327 November 2020, Springer. Lecture Notes in Computer Science.","key":"ref_6","DOI":"10.1007\/978-3-030-64148-1_11"},{"doi-asserted-by":"crossref","unstructured":"Kimball, R., and Ross, M. (2016). The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence, Wiley.","key":"ref_7","DOI":"10.1002\/9781119228912"},{"unstructured":"Mu, L. (2008). Modelling ETL Processes of Data Warehouses with UML Activity Diagrams. On the Move to Meaningful Internet Systems OTM 2008 Workshops, Proceedings of the OTM Confederated International Workshops and Posters, ADI, AWeSoMe, COMBEK, EI2N, IWSSA, MONET, OnToContent & QSI, ORM, PerSys, RDDS, SEMELS, and SWWS 2008, Monterrey, Mexico, 9\u201314 November 2008, Springer.","key":"ref_8"},{"unstructured":"Akkaoui, Z.E., and Zimanyi, E. (2009, January 6). Defining ETL worfklows using BPMN and BPEL. Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP DOLAP 09, Hong Kong, China.","key":"ref_9"},{"unstructured":"Theodoratos, D., and Song, I.Y. (2002, January 8). Conceptual modeling for ETL processes. Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP\u2014DOLAP \u201902, McLean, VA, USA.","key":"ref_10"},{"doi-asserted-by":"crossref","unstructured":"Dupor, S., and Jovanovi, V. (2014, January 26\u201330). An approach to conceptual modelling of ETL processes. Proceedings of the 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia.","key":"ref_11","DOI":"10.1109\/MIPRO.2014.6859801"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"30","DOI":"10.4018\/IJACI.2019010102","article-title":"A New Approach for Conceptual Extraction-Transformation-Loading Process Modeling","volume":"10","author":"Biswas","year":"2019","journal-title":"Int. J. Ambient. Comput. Intell."},{"doi-asserted-by":"crossref","unstructured":"Simitsis, A., Vassiliadis, P., Terrovitis, M., and Skiadopoulos, S. (2005). Graph-Based Modeling of ETL Activities with Multi-level Transformations and Updates. Data Warehousing and Knowledge Discovery, Proceedings of the 7th International Conference, DaWak 2005, Copenhagen, Denmark, 22\u201326 August 2005, Springer.","key":"ref_13","DOI":"10.1007\/11546849_5"},{"key":"ref_14","first-page":"307","article-title":"A UML Based Approach for Modeling ETL Processes in Data Warehouses","volume":"Volume 2813","author":"Trujillo","year":"2003","journal-title":"Conceptual Modeling\u2014ER 2003, Proceedings of the 22nd International Conference on Conceptual Modeling, Chicago, IL, USA, 13\u201316 October 2003"},{"unstructured":"Aalst, V.D. (2021, January 6\u20138). Using BPMN for ETL Conceptual Modelling: A Case Study. Proceedings of the 10th International Conference on Data Science, Technology and Applications (Data), Online.","key":"ref_15"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"262","DOI":"10.5220\/0006807702620269","article-title":"From ETL Conceptual Design to ETL Physical Sketching using Patterns","volume":"Volume 1","author":"Oliveira","year":"2018","journal-title":"Proceedings of the 20th International Conference on Enterprise Information Systems\u2014Volume 1: ICEIS, Funchal"},{"doi-asserted-by":"crossref","unstructured":"Wilkinson, K., Simitsis, A., Castellanos, M., and Dayal, U. (2010). Leveraging Business Process Models for ETL Design. Conceptual Modeling\u2014ER 2010, Proceedings of the 29th International Conference on Conceptual Modeling, Vancouver, BC, Canada, 1\u20134 November 2010, Springer. Lecture Notes in Computer Science.","key":"ref_17","DOI":"10.1007\/978-3-642-16373-9_2"},{"unstructured":"Simitsis, A. (2003, January 12\u201313). Modeling and managing ETL processes. Proceedings of the VLDB 2003 PhD Workshop, Co-Located with the 29th International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany.","key":"ref_18"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"101837","DOI":"10.1016\/j.datak.2020.101837","article-title":"Design and implementation of ETL processes using BPMN and relational algebra","volume":"129","author":"Awiti","year":"2020","journal-title":"Data Knowl. Eng."},{"unstructured":"J\u00f6rg, T., and De\u00dfloch, S. (2009, January 2\u20136). Formalizing ETL Jobs for Incremental Loading of Data Warehouses. Proceedings of the Datenbanksysteme f\u00fcr Business, Technologie und Web, M\u00fcnster, Germany.","key":"ref_20"},{"doi-asserted-by":"crossref","unstructured":"L\u2019Esteve, R.C. (2021). Mapping Data Flows for Data Warehouse ETL. The Definitive Guide to Azure Data Engineering, Apress.","key":"ref_21","DOI":"10.1007\/978-1-4842-7182-7_11"},{"doi-asserted-by":"crossref","unstructured":"Ribeiro, A., Oliveira, B., and Oliveira, \u00d3. (2025, January 4\u20136). Business Process Modeling Techniques for Data Integration Conceptual Modeling. Proceedings of the 27th International Conference on Enterprise Information Systems, ICEIS\u2014Proceedings, Porto, Portugal.","key":"ref_22","DOI":"10.5220\/0013497800003929"},{"unstructured":"Silver, B. (2011). Bpmn Method and Style: A Levels-Based Methodology for Bpm Process Modeling and Improvement Using Bpmn 2.0, Cody-Cassidy Press. [2nd ed.].","key":"ref_23"},{"doi-asserted-by":"crossref","unstructured":"Vassiliadis, P., Simitsis, A., and Skiadopoulos, S. (2002). On the Logical Modeling of ETL Processes. Advanced Information Systems Engineering, Proceedings of the 14th International Conference, CAiSE 2002, Toronto, ON, Canada, 27\u201331 May 2002, Springer.","key":"ref_24","DOI":"10.1145\/583890.583893"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"781","DOI":"10.1016\/j.dss.2012.09.006","article-title":"A framework for transformation from conceptual to logical workflow models","volume":"54","author":"Fan","year":"2012","journal-title":"Decis. Support Syst."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/7\/97\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:57:18Z","timestamp":1760032638000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/7\/97"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,23]]},"references-count":25,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["data10070097"],"URL":"https:\/\/doi.org\/10.3390\/data10070097","relation":{},"ISSN":["2306-5729"],"issn-type":[{"type":"electronic","value":"2306-5729"}],"subject":[],"published":{"date-parts":[[2025,6,23]]}}}