{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,27]],"date-time":"2025-09-27T00:10:08Z","timestamp":1758931808326,"version":"3.44.0"},"reference-count":52,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T00:00:00Z","timestamp":1758844800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"ADRENALIN (Data-driven smart buildings: data sandbox and competition) project"},{"name":"Energy Technology Development and Demonstration Programme (EUDP) in Denmark","award":["64021-6025"],"award-info":[{"award-number":["64021-6025"]}]}],"content-domain":{"domain":["www.mdpi.com"],"crossmark-restriction":true},"short-container-title":["Information"],"abstract":"<jats:p>Nonintrusive load monitoring (NILM) relies on high-resolution sensor data to disaggregate total building energy into end-use load components, for example HVAC, ventilation, and appliances. On the ADRENALIN corpus, simple NaN handling with forward fill and mean substitution reduced average NMAE from 0.82 to 0.76 for the Bayesian baseline, from 0.71 to 0.64 for BI-LSTM, and from 0.59 to 0.53 for the Time\u2013Frequency Mask (TFM) model, across nine buildings and four temporal resolutions. However, many NILM models still show degraded accuracy due to unresolved data-quality issues, especially missing values, timestamp irregularities, and sensor inconsistencies, a limitation underexplored in current benchmarks. This paper presents a fully automated data-quality assurance pipeline for time-series energy datasets. The pipeline performs multivariate profiling, statistical analysis, and threshold-based diagnostics to compute standardized quality metrics, which are aggregated into an interpretable Building Quality Score (BQS) that predicts NILM performance and supports dataset ranking and selection. Explainability is provided by SHAP and a lightweight large language model, which turns visual diagnostics into concise, actionable narratives. The study evaluates practical quality improvement through systematic handling of missing values, linking metric changes to downstream error reduction. Using random-forest surrogates, SHAP identifies missingness and timestamp irregularity as dominant drivers of error across models. Core contributions include the definition and validation of BQS, an interpretable scoring and explanation framework for time-series quality, and an end-to-end evaluation of how quality diagnostics affect NILM performance at scale.<\/jats:p>","DOI":"10.3390\/info16100836","type":"journal-article","created":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T14:50:38Z","timestamp":1758898238000},"page":"836","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["An Automated Domain-Agnostic and Explainable Data Quality Assurance Framework for Energy Analytics and Beyond"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-4183-4340","authenticated-orcid":false,"given":"Bal\u00e1zs Andr\u00e1s","family":"Tolnai","sequence":"first","affiliation":[{"name":"SDU Center for Energy Informatics, Maersk Mc-Kinney Moeller Institute, The Faculty of Engineering, University of Southern Denmark, 5230 Odense, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4049-539X","authenticated-orcid":false,"given":"Zhipeng","family":"Ma","sequence":"additional","affiliation":[{"name":"SDU Center for Energy Informatics, Maersk Mc-Kinney Moeller Institute, The Faculty of Engineering, University of Southern Denmark, 5230 Odense, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5678-6602","authenticated-orcid":false,"given":"Bo N\u00f8rregaard","family":"J\u00f8rgensen","sequence":"additional","affiliation":[{"name":"SDU Center for Energy Informatics, Maersk Mc-Kinney Moeller Institute, The Faculty of Engineering, University of Southern Denmark, 5230 Odense, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9134-1032","authenticated-orcid":false,"given":"Zheng Grace","family":"Ma","sequence":"additional","affiliation":[{"name":"SDU Center for Energy Informatics, Maersk Mc-Kinney Moeller Institute, The Faculty of Engineering, University of Southern Denmark, 5230 Odense, Denmark"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1007\/s10791-025-09640-z","article-title":"Amismart an advanced metering infrastructure for power consumption monitoring and forecasting in smart buildings","volume":"28","author":"Hadri","year":"2025","journal-title":"Discov. Comput."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3125","DOI":"10.1109\/TSG.2018.2818167","article-title":"Review of smart meter data analytics: Applications, methodologies, and challenges","volume":"10","author":"Wang","year":"2018","journal-title":"IEEE Trans. Smart Grid"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"109711","DOI":"10.1016\/j.enbuild.2019.109711","article-title":"A comparative analysis of data-driven methods in building energy benchmarking","volume":"209","author":"Ding","year":"2020","journal-title":"Energy Build."},{"key":"ref_4","unstructured":"Liu, Y., Wang, Y., and Ma, J. (2024). Non-Intrusive Load Monitoring in Smart Grids: A Comprehensive Review. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1192","DOI":"10.1016\/j.rser.2017.04.095","article-title":"A review of data-driven building energy consumption prediction studies","volume":"81","author":"Amasyali","year":"2018","journal-title":"Renew. Sustain. Energy Rev."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/s40537-020-0285-1","article-title":"Sensor data quality: A systematic review","volume":"7","author":"Teh","year":"2020","journal-title":"J. Big Data"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/j.eng.2024.04.024","article-title":"On the Data Quality and Imbalance in Machine Learning-based Design and Manufacturing\u2014A Systematic Review","volume":"45","author":"Xie","year":"2025","journal-title":"Engineering"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"102549","DOI":"10.1016\/j.is.2025.102549","article-title":"The effects of data quality on machine learning performance on tabular data","volume":"132","author":"Mohammed","year":"2025","journal-title":"Inf. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Cicero, S., Guarascio, M., Guerrieri, A., and Mungari, S. (2023). A Deep Anomaly Detection System for IoT-Based Smart Buildings. Sensors, 23.","DOI":"10.3390\/s23239331"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1093\/comjnl\/bxab183","article-title":"IoT Data Quality Issues and Potential Solutions: A Literature Review","volume":"66","author":"Mansouri","year":"2023","journal-title":"Comput. J."},{"key":"ref_11","unstructured":"Sartipi, A., Delgado Fern\u00e1ndez, J., Potenciano Menci, S., and Magitteri, A. (2025). Bridging Smart Meter Gaps: A Benchmark of Statistical, Machine Learnin g and Time Series Foundation Models for Data Imputation. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"112701","DOI":"10.1016\/j.enbuild.2022.112701","article-title":"Building energy performance monitoring through the lens of data quality: A review","volume":"279","author":"Morewood","year":"2023","journal-title":"Energy Build."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1781","DOI":"10.14778\/3229863.3229867","article-title":"Automating large-scale data quality verification","volume":"11","author":"Schelter","year":"2018","journal-title":"Proc. VLDB Endow."},{"key":"ref_14","unstructured":"Gong, A., and Campbell, J. (2025). Great Expectations. Zenodo, Mar., 19."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Batra, N., Kelly, J., Parson, O., Dutta, H., Knottenbelt, W., Rogers, A., Singh, A., and Srivastava, M. (2014, January 11\u201313). NILMTK: An open source toolkit for non-intrusive load monitoring. Proceedings of the 5th International Conference on Future Energy Systems, Cambridge, UK.","DOI":"10.1145\/2602044.2602051"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"DeMedeiros, K., Hendawi, A., and Alvarez, M. (2023). A Survey of AI-Based Anomaly Detection in IoT and Sensor Networks. Sensors, 23.","DOI":"10.3390\/s23031352"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"100568","DOI":"10.1016\/j.iot.2022.100568","article-title":"IoT anomaly detection methods and applications: A survey","volume":"19","author":"Chatterjee","year":"2022","journal-title":"Internet Things"},{"key":"ref_18","unstructured":"Tolnai, B.A., Ma, Z.G., J\u00f8rgensen, B.N., Sartori, I., Pandiyan, S.V., Amos, M., Bengtsson, G., Lien, S.K., Walnum, H.T., and Hameed, A. (2025). ADRENALIN: Energy Data Preparation and Validation for HVAC Load Disaggregation in Commercial Buildings. Nordic Energy Informatics Academy Conference 2025, Springer. Lecture Notes in Computer Science."},{"key":"ref_19","unstructured":"Tolnai, B.A., Zimmermann, R.S., Xie, Y., Tran, N., \u00c7eliker, C.E., Ma, Z.G., J\u00f8rgensen, B.N., Sartori, I., Amos, M., and Bengtsson, G. (2025). Advancing Non-Intrusive Load Monitoring: Insights from the Winning Algorithms in the ADRENALIN 2024 Load Disaggregation Competition. Nordic Energy Informatics Academy Conference 2025, Springer. Lecture Notes in Computer Science."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Lavrinovica, I., Judvaitis, J., Laksis, D., Skromule, M., and Ozols, K. (2024). A Comprehensive Review of Sensor-Based Smart Building Monitoring and Data Gathering Techniques. Appl. Sci., 14.","DOI":"10.3390\/app142110057"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"105809","DOI":"10.1016\/j.envsoft.2023.105809","article-title":"System for automated Quality Control (SaQC) to enable traceable and reproducible data streams in environmental science","volume":"169","author":"Schmidt","year":"2023","journal-title":"Environ. Model. Softw."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"132979","DOI":"10.1016\/j.energy.2024.132979","article-title":"Evaluating missing data handling methods for developing building energy benchmarking models","volume":"308","author":"Lee","year":"2024","journal-title":"Energy"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1186\/s40537-024-00942-5","article-title":"A systematic data characteristic understanding framework towards physical-sensor big data challenges","volume":"11","author":"Ma","year":"2024","journal-title":"J. Big Data"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1038\/s41597-020-00712-x","article-title":"The building data genome project 2, energy meter data from the ASHRAE great energy predictor III competition","volume":"7","author":"Miller","year":"2020","journal-title":"Sci. Data"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1400","DOI":"10.1038\/s41597-024-04244-6","article-title":"A twenty-year dataset of hourly energy generation and consumption from district campus building energy systems","volume":"11","author":"Liao","year":"2024","journal-title":"Sci. Data"},{"key":"ref_26","first-page":"19823","article-title":"Buildingsbench: A large-scale dataset of 900k buildings and benchmark for short-term load forecasting","volume":"36","author":"Emami","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_27","first-page":"2971","article-title":"A Review of NILM Applications with Machine Learning Approaches","volume":"79","author":"Silva","year":"2024","journal-title":"Comput. Mater. Contin."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Maier, M., and Schramm, S. (2025). General NILM Methodology for Algorithm Parametrization, Optimization and Performance Evaluation. Buildings, 15.","DOI":"10.3390\/buildings15050705"},{"key":"ref_29","unstructured":"Shi, D. (2024). Non-intrusive load monitoring with missing data imputation based on tensor decomposition. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"53","DOI":"10.4236\/jcc.2024.1211004","article-title":"Missing data imputation: A comprehensive review","volume":"12","author":"Alwateer","year":"2024","journal-title":"J. Comput. Commun."},{"key":"ref_31","first-page":"31","article-title":"Missing data in time series: A review of imputation methods and case study","volume":"Volume 19","author":"Ribeiro","year":"2021","journal-title":"Learning and Nonlinear Models-Revista da Sociedade Brasileira de Redes Neurais-Special Issue: Time Series Analysis and Forecasting Using Computational Intelligence"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"121545","DOI":"10.1016\/j.applthermaleng.2023.121545","article-title":"Filling time-series gaps using image techniques: Multidimensional context autoencoder approach for building energy data imputation","volume":"236","author":"Fu","year":"2024","journal-title":"Appl. Therm. Eng."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"e1","DOI":"10.1017\/eds.2023.43","article-title":"Machine learning for smart and energy-efficient buildings","volume":"3","author":"Das","year":"2024","journal-title":"Environ. Data Sci."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"5409","DOI":"10.1109\/TSG.2021.3101831","article-title":"Data-driven copy-paste imputation for energy time series","volume":"12","author":"Weber","year":"2021","journal-title":"IEEE Trans. Smart Grid"},{"key":"ref_35","first-page":"112050","article-title":"Rethinking the diffusion models for missing data imputation: A gradient flow perspective","volume":"37","author":"Chen","year":"2024","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_36","unstructured":"Fang, C., and Wang, C. (2020). Time series data imputation: A survey on deep learning approaches. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"115586","DOI":"10.1016\/j.enbuild.2025.115586","article-title":"Ensuring real-time data integrity in smart building applications: A systematic end-to-end comprehensive pipeline evaluated in numerous real-life cases","volume":"336","author":"Stefanopoulou","year":"2025","journal-title":"Energy Build."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"114071","DOI":"10.1016\/j.enbuild.2024.114071","article-title":"Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight","volume":"310","author":"Liguori","year":"2024","journal-title":"Energy Build."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, L. (2020). A pattern-recognition-based ensemble data imputation framework for sensors from building energy systems. Sensors, 20.","DOI":"10.3390\/s20205947"},{"key":"ref_40","unstructured":"Henkel, P., Kasperski, T., Stoffel, P., and M\u00fcller, D. (2024, January 15\u201317). Interpretable data-driven model predictive control of building energy systems using SHAP. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, Oxford, UK."},{"key":"ref_41","unstructured":"de la Pe\u00f1a, M.F., G\u00f3mez, \u00c1.L.P., and Maim\u00f3, L.F. (2025). ShaTS: A Shapley-based Explainability Method for Time Series Artificial Intelligence Models applied to Anomaly Detection in Industrial Internet of Things. arXiv."},{"key":"ref_42","unstructured":"Han, Y., Zhang, C., Chen, X., Yang, X., Wang, Z., Yu, G., Fu, B., and Zhang, H. (2023). Chartllama: A multimodal llm for chart understanding and generation. arXiv."},{"key":"ref_43","unstructured":"Zhang, X., Roy Chowdhury, R., Gupta, R.K., and Shang, J. (2024). Large language models for time series: A survey. arXiv."},{"key":"ref_44","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_45","unstructured":"Li, X., Zhao, R., Chia, Y.K., Ding, B., Joty, S., Poria, S., and Bing, L. (2024, January 7\u201311). Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources. Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Castelnovo, A., Depalmas, R., Mercorio, F., Mombelli, N., Potert\u00ec, D., Serino, A., Seveso, A., Sorrentino, S., and Viola, L. (2024). Augmenting XAI with LLMs: A Case Study in Banking Marketing Recommendation. World Conference on Explainable Artificial Intelligence, Springer.","DOI":"10.1007\/978-3-031-63787-2_11"},{"key":"ref_47","unstructured":"Wang, Z., Zhang, H., Li, C.-L., Eisenschlos, J.M., Perot, V., Wang, Z., Miculicich, L., Fujii, Y., Shang, J., and Lee, C.-Y. (2024, January 7\u201311). Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Liu, F., Wang, X., Yao, W., Chen, J., Song, K., Cho, S., Yacoob, Y., and Yu, D. (2024, January 16\u201321). MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico.","DOI":"10.18653\/v1\/2024.naacl-long.70"},{"key":"ref_49","unstructured":"(2025, May 19). Available online: https:\/\/platform.openai.com\/docs\/models\/o4-mini."},{"key":"ref_50","unstructured":"Tolnai, B.A., Ma, Z.G., and J\u00f8rgensen, B.N. (2025). Comparison of Three Algorithms for Low-Frequency Temperature-Dependent Load Disaggregation in Buildings Without Submetering. Nordic Energy Informatics Academy Conference 2025, Springer. Lecture Notes in Computer Science."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Hussain, A., Giangrande, P., Franchini, G., Fenili, L., and Messi, S. (2025). Analyzing the Effect of Error Estimation on Random Missing Data Patterns in Mid-Term Electrical Forecasting. Electronics, 14.","DOI":"10.3390\/electronics14071383"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"103455","DOI":"10.1016\/j.mex.2025.103455","article-title":"Missing data imputation of climate time series: A review","volume":"15","year":"2025","journal-title":"MethodsX"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/10\/836\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T14:55:55Z","timestamp":1758898555000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/10\/836"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,26]]},"references-count":52,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["info16100836"],"URL":"https:\/\/doi.org\/10.3390\/info16100836","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,26]]}}}