{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T20:00:41Z","timestamp":1768420841870,"version":"3.49.0"},"reference-count":22,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2025,2,19]],"date-time":"2025-02-19T00:00:00Z","timestamp":1739923200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union under the Next Generation EU","award":["166,988,013.71"],"award-info":[{"award-number":["166,988,013.71"]}]},{"name":"European Union under the Next Generation EU","award":["97,111,730.27"],"award-info":[{"award-number":["97,111,730.27"]}]},{"name":"Portuguese Republic\u2019s Recovery and Resilience Plan (PRR) Partnership Agreement","award":["166,988,013.71"],"award-info":[{"award-number":["166,988,013.71"]}]},{"name":"Portuguese Republic\u2019s Recovery and Resilience Plan (PRR) Partnership Agreement","award":["97,111,730.27"],"award-info":[{"award-number":["97,111,730.27"]}]},{"name":"Agenda Mobilizadora da Fileira das Tecnologias de Produ\u00e7\u00e3o para a Reindustrializa\u00e7\u00e3o","award":["166,988,013.71"],"award-info":[{"award-number":["166,988,013.71"]}]},{"name":"Agenda Mobilizadora da Fileira das Tecnologias de Produ\u00e7\u00e3o para a Reindustrializa\u00e7\u00e3o","award":["97,111,730.27"],"award-info":[{"award-number":["97,111,730.27"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>The rapid integration of Machine Learning (ML) in organizational practices has driven demand for substantial computational resources, incurring both high economic costs and environmental impact, particularly from energy consumption. This challenge is amplified in dynamic data environments, where ML models must be frequently retrained to adapt to evolving data patterns. To address this, more sustainable Machine Learning Operations (MLOps) pipelines are needed for reducing environmental impacts while maintaining model accuracy. In this paper, we propose a model reuse approach based on data similarity metrics, which allows organizations to leverage previously trained models where applicable. We introduce a tailored set of meta-features to characterize data windows, enabling efficient similarity assessment between historical and new data. The effectiveness of the proposed method is validated across multiple ML tasks using the cosine and Bray\u2013Curtis distance functions, which evaluate both model reuse rates and the performance of reused models relative to newly trained alternatives. The results indicate that the proposed approach can reduce the frequency of model retraining by up to 70% to 90% while maintaining or even improving predictive performance, contributing to more resource-efficient and sustainable MLOps practices.<\/jats:p>","DOI":"10.3390\/bdcc9020047","type":"journal-article","created":{"date-parts":[[2025,2,19]],"date-time":"2025-02-19T03:29:53Z","timestamp":1739935793000},"page":"47","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Reusing ML Models in Dynamic Data Environments: Data Similarity-Based Approach for Efficient MLOps"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5998-7960","authenticated-orcid":false,"given":"Eduardo","family":"Peixoto","sequence":"first","affiliation":[{"name":"Escola Superior de Tecnologia e Gest\u00e3o, Instituto Polit\u00e9cnico do Porto, 4610-156 Felgueiras, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-1829-0649","authenticated-orcid":false,"given":"Diogo","family":"Torres","sequence":"additional","affiliation":[{"name":"Escola Superior de Tecnologia e Gest\u00e3o, Instituto Polit\u00e9cnico do Porto, 4610-156 Felgueiras, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6650-0388","authenticated-orcid":false,"given":"Davide","family":"Carneiro","sequence":"additional","affiliation":[{"name":"Escola Superior de Tecnologia e Gest\u00e3o, Instituto Polit\u00e9cnico do Porto, 4610-156 Felgueiras, Portugal"},{"name":"INESC TEC, R. Dr. Roberto Frias, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5139-1994","authenticated-orcid":false,"given":"Bruno","family":"Silva","sequence":"additional","affiliation":[{"name":"Muvu Technologies, 1050-052 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-9261-5841","authenticated-orcid":false,"given":"Ruben","family":"Marques","sequence":"additional","affiliation":[{"name":"Muvu Technologies, 1050-052 Lisboa, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1126\/science.aaa8415","article-title":"Machine learning: Trends, perspectives, and prospects","volume":"349","author":"Jordan","year":"2015","journal-title":"Science"},{"key":"ref_2","unstructured":"Strubell, E., Ganesh, A., and McCallum, A. (2020, January 7\u201312). Energy and policy considerations for modern deep learning research. Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_4","first-page":"1","article-title":"Towards the systematic reporting of the energy and carbon footprints of machine learning","volume":"21","author":"Henderson","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"638","DOI":"10.1109\/TEM.2021.3116187","article-title":"How artificial intelligence drives sustainable frugal innovation: A multitheoretical perspective","volume":"71","author":"Govindan","year":"2022","journal-title":"IEEE Trans. Eng. Manag."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Gupta, P., and Bagchi, A. (2024). MLOps: Machine Learning Operations. Essentials of Python for Artificial Intelligence and Machine Learning, Springer Nature.","DOI":"10.1007\/978-3-031-43725-0"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ruf, P., Madan, M., Reich, C., and Ould-Abdeslam, D. (2021). Demystifying mlops and presenting a recipe for the selection of open-source tools. Appl. Sci., 11.","DOI":"10.3390\/app11198861"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"31866","DOI":"10.1109\/ACCESS.2023.3262138","article-title":"Machine Learning Operations (MLOps): Overview, Definition, and Architecture","volume":"11","author":"Kreuzberger","year":"2023","journal-title":"IEEE Access"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kim, J., Chang, S., and Kwak, N. (2021). PQK: Model compression via pruning, quantization, and knowledge distillation. arXiv.","DOI":"10.21437\/Interspeech.2021-248"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1007\/s11704-016-6903-6","article-title":"Lifelong machine learning: A paradigm for continuous learning","volume":"11","author":"Liu","year":"2017","journal-title":"Front. Comput. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"128096","DOI":"10.1016\/j.neucom.2024.128096","article-title":"A review of green artificial intelligence: Towards a more sustainable future","volume":"599","author":"Cancela","year":"2024","journal-title":"Neurocomputing"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"118934","DOI":"10.1016\/j.eswa.2022.118934","article-title":"A survey on machine learning for recurring concept drifting data streams","volume":"213","author":"Quintana","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"104930","DOI":"10.1016\/j.ijmedinf.2022.104930","article-title":"Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction","volume":"173","author":"Rahmani","year":"2023","journal-title":"Int. J. Med. Inform."},{"key":"ref_14","unstructured":"Rivolli, A., Garcia, L.P., Soares, C., Vanschoren, J., and de Carvalho, A.C. (2018). Characterizing classification datasets: A study of meta-features for meta-learning. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2350011","DOI":"10.1142\/S0129065723500119","article-title":"Algorithm recommendation and performance prediction using meta-learning","volume":"33","author":"Palumbo","year":"2023","journal-title":"Int. J. Neural Syst."},{"key":"ref_16","first-page":"1","article-title":"MFE: Towards reproducible meta-feature extraction","volume":"21","author":"Siqueira","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1469","DOI":"10.1007\/s10994-017-5642-8","article-title":"Adaptive random forests for evolving data stream classification","volume":"106","author":"Gomes","year":"2017","journal-title":"Mach. Learn."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/S0168-1699(99)00046-0","article-title":"Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables","volume":"24","author":"Blackard","year":"2000","journal-title":"Comput. Electron. Agric."},{"key":"ref_19","unstructured":"Harries, M., and Wales, N.S. (1999). Splice-2 Comparative Evaluation: Electricity Pricing, The University of New South Wales."},{"key":"ref_20","unstructured":"Peixoto, E., Torres, D., Carneiro, D., Silva, B., and Novais, P. (2024, January 16). Efficient MLOps: Meta-learning meets Frugal AI. Proceedings of the 2nd European Symposium on Artificial Intelligence in Manufacturing\u2014ESAIM, Athens, Greece."},{"key":"ref_21","unstructured":"Peixoto, E., Carneiro, D., Torres, D., Silva, B., and Novais, P. (2024, January 26\u201328). Reusing past Machine Learning models based on data similarity metrics. Proceedings of the 15th International Symposium on Ambient Intelligence, Salamanca, Spain."},{"key":"ref_22","unstructured":"Losing, V. (2025, January 09). Drift Datasets. Available online: https:\/\/github.com\/vlosing\/driftDatasets\/."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/2\/47\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:37:25Z","timestamp":1760027845000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/2\/47"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,19]]},"references-count":22,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["bdcc9020047"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9020047","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,19]]}}}