{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T14:37:50Z","timestamp":1775745470886,"version":"3.50.1"},"reference-count":26,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T00:00:00Z","timestamp":1750377600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>In modern industrial environments, data-driven decision-making plays a crucial role in ensuring operational efficiency, predictive maintenance, and process optimization. However, the effectiveness of these decisions is highly dependent on the quality of the data. Industrial data is typically generated in real time by sensors integrated into IoT devices and smart manufacturing systems, resulting in high-volume, heterogeneous, and rapidly changing data streams. This paper presents the design and implementation of a data quality pipeline specifically adapted to such industrial contexts. The proposed pipeline includes modular components responsible for data ingestion, profiling, validation, and continuous monitoring, and is guided by a comprehensive set of data quality dimensions, including accuracy, completeness, consistency, and timeliness. For each dimension, appropriate metrics are applied, including accuracy measures based on dynamic intervals and validations based on consistency rules. To evaluate its effectiveness, we conducted a case study in a real manufacturing environment. By continuously monitoring data quality, problems can be proactively identified before they impact downstream processes, resulting in more reliable and timely decisions.<\/jats:p>","DOI":"10.3390\/computers14070241","type":"journal-article","created":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T05:17:42Z","timestamp":1750396662000},"page":"241","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Data Quality Pipeline for Industrial Environments: Architecture and Implementation"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-6136-8602","authenticated-orcid":false,"given":"Teresa","family":"Peixoto","sequence":"first","affiliation":[{"name":"CIICESI, ESTG, Polytechnic of Porto, rua do Curral, 4610-156 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3807-7292","authenticated-orcid":false,"given":"\u00d3scar","family":"Oliveira","sequence":"additional","affiliation":[{"name":"CIICESI, ESTG, Polytechnic of Porto, rua do Curral, 4610-156 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9757-6687","authenticated-orcid":false,"given":"Eliana","family":"Costa e Silva","sequence":"additional","affiliation":[{"name":"CIICESI, ESTG, Polytechnic of Porto, rua do Curral, 4610-156 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9138-9143","authenticated-orcid":false,"given":"Bruno","family":"Oliveira","sequence":"additional","affiliation":[{"name":"CIICESI, ESTG, Polytechnic of Porto, rua do Curral, 4610-156 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8851-9688","authenticated-orcid":false,"given":"Fillipe","family":"Ribeiro","sequence":"additional","affiliation":[{"name":"JPM Industry, 3731-901 Vale de Cambra, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3593043","article-title":"A Systematic Review of Data Quality in CPS and IoT for Industry 4.0","volume":"55","author":"Goknil","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hu, C., Sun, Z., Li, C., Zhang, Y., and Xing, C. (2023). Survey of Time Series Data Generation in IoT. Sensors, 23.","DOI":"10.3390\/s23156976"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Peixoto, T., Oliveira, B., Oliveira, \u00d3., and Ribeiro, F. (2025). Data Quality Assessment in Smart Manufacturing: A Review. Systems, 13.","DOI":"10.3390\/systems13040243"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Liu, C., Peng, G., Kong, Y., Li, S., and Chen, S. (2021). Data Quality Affecting Big Data Analytics in Smart Factories: Research Themes, Issues and Methods. Symmetry, 13.","DOI":"10.3390\/sym13081440"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Kuemper, D., Iggena, T., Toenjes, R., and Pulvermueller, E. (2018, January 12\u201315). Valid.IoT: A framework for sensor data quality analysis and interpolation. Proceedings of the 9th ACM Multimedia Systems Conference, Amsterdam, The Netherlands.","DOI":"10.1145\/3204949.3204972"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Batini, C., and Scannapieco, M. (2016). Data and Information Quality, Springer.","DOI":"10.1007\/978-3-319-24106-7"},{"key":"ref_7","unstructured":"Mahanti, R. (2019). Data Quality: Dimensions, Measurement, Strategy, Management, and Governance, ASQ Quality Press. Available online: https:\/\/asq.org\/quality-press\/display-item?item=H1552."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1080\/07421222.1996.11518099","article-title":"Beyond Accuracy: What Data Quality Means to Data Consumers","volume":"12","author":"Wang","year":"1996","journal-title":"J. Manag. Inf. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhang, L., Jeong, D., and Lee, S. (2021). Data Quality Management in the Internet of Things. Sensors, 21.","DOI":"10.3390\/s21175834"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"24634","DOI":"10.1109\/ACCESS.2019.2899751","article-title":"An Overview of Data Quality Frameworks","volume":"7","author":"Cichy","year":"2019","journal-title":"IEEE Access"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Oliveira, \u00d3., and Oliveira, B. (2022). An Extensible Framework for Data Reliability Assessment, SCITEPRESS\u2014Science and Technology Publications.","DOI":"10.5220\/0010863600003179"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Abideen, Z.u., Mazhar, T., Razzaq, A., Haq, I., Ullah, I., Alasmary, H., and Mohamed, H.G. (2023). Analysis of Enrollment Criteria in Secondary Schools Using Machine Learning and Data Mining Approach. Electronics, 12.","DOI":"10.3390\/electronics12030694"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"5870","DOI":"10.1007\/s11227-023-05685-3","article-title":"Protecting IoT devices from security attacks using effective decision-making strategy of appropriate features","volume":"80","author":"Ullah","year":"2023","journal-title":"J. Supercomput."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Seghezzi, E., Locatelli, M., Pellegrini, L., Pattini, G., Giuda, G.M.D., Tagliabue, L.C., and Boella, G. (2021). Towards an Occupancy-Oriented Digital Twin for Facility Management: Test Campaign and Sensors Assessment. Appl. Sci., 11.","DOI":"10.3390\/app11073108"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1016\/j.ifacol.2020.11.029","article-title":"Enabling predictive analytics for smart manufacturing through an IIoT platform","volume":"53","author":"Cerquitelli","year":"2020","journal-title":"IFAC-PapersOnLine"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Peixoto, T., Oliveira, B., Oliveira, \u00d3., and Ribeiro, F. (2025, January 6\u20138). Real-Time Manufacturing Data Quality: Leveraging Data Profiling and Quality Metrics. Proceedings of the 10th International Conference on Internet of Things, Big Data and Security\u2014IoTBDS, Porto, Portugal.","DOI":"10.5220\/0013242900003944"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Loshin, D. (2011). The Practitioner\u2019s Guide to Data Quality Improvement, Elsevier.","DOI":"10.1016\/B978-0-12-373717-5.00011-7"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Abedjan, Z., Golab, L., Naumann, F., and Papenbrock, T. (2018). Data Profiling, Springer. Synthesis Lectures on Data Management (SLDM).","DOI":"10.1007\/978-3-031-01865-7"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Costa e Silva, E., Peixoto, T., Oliveira, \u00d3., and Oliveira, B. (2025). Data Quality Assessment: A Practical Application. Innovations in Industrial Engineering IV, Springer. Chapter 42.","DOI":"10.1007\/978-3-031-94484-0_42"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ji, C., Shao, Q., Sun, J., Liu, S., Pan, L., Wu, L., and Yang, C. (2016). Device Data Ingestion for Industrial Big Data Platforms with a Case Study. Sensors, 16.","DOI":"10.3390\/s16030279"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Qiao, L., Li, Y., Takiar, S., Liu, Z., Veeramreddy, N., Tu, M., Dai, Y., Buenrostro, I., Surlaker, K., and Das, S. (2015). Gobblin, VLDB Endowment.","DOI":"10.14778\/2824032.2824073"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sawant, N., and Shah, H. (2013). Big Data Ingestion and Streaming Patterns, Apress.","DOI":"10.1007\/978-1-4302-6293-0_3"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Oliveira, B., Oliveira, \u00d3., Peixoto, T., Ribeiro, F., and Pereira, C. (2024, January 3\u20136). Extensible Data Ingestion System for Industry 4.0. Proceedings of the EPIA Conference on Artificial Intelligence, Viana do Castelo, Portugal.","DOI":"10.1007\/978-3-031-73503-5_9"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"407","DOI":"10.22214\/ijraset.2018.6063","article-title":"Conveyor Belt System with 3 Degrees of Freedom","volume":"6","author":"Prabhudesai","year":"2018","journal-title":"Int. J. Res. Appl. Sci. Eng. Technol."},{"key":"ref_25","unstructured":"Shabou, S. (2025, May 22). Outlier Detection in Time Series. Available online: https:\/\/s-ai-f.github.io\/Time-Series\/outlier-detection-in-time-series.html."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bhowmik, S., Jelfs, B., Arjunan, S.P., and Kumar, D.K. (2017, January 13\u201315). Outlier removal in facial surface electromyography through Hampel filtering technique. Proceedings of the 2017 IEEE Life Sciences Conference (LSC), Sydney, NSW, Australia.","DOI":"10.1109\/LSC.2017.8268192"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/7\/241\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:55:28Z","timestamp":1760032528000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/7\/241"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,20]]},"references-count":26,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["computers14070241"],"URL":"https:\/\/doi.org\/10.3390\/computers14070241","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,20]]}}}