{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,2]],"date-time":"2026-07-02T13:17:19Z","timestamp":1782998239098,"version":"3.54.5"},"reference-count":128,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"name":"MUR PRIN 2022 Project \u201cDiscount quality for responsible data science: Human-in-the-Loop for quality data\u201d"},{"name":"Horizon Europe project enRichMyData","award":["HE 101070284"],"award-info":[{"award-number":["HE 101070284"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2025,12,31]]},"abstract":"<jats:p>Data preparation is crucial for achieving good data management following the four foundational FAIR principles\u2014Findability, Accessibility, Interoperability, and Reusability. Processing datasets to achieve high data (and metadata) quality is mandatory in modern applications. However, the data preparation activities that are needed to reach such levels may easily become unsustainable due to, for example, resource intensity or scalability challenges. Moreover, some preparation efforts may become unnecessary if they result in negligible improvements or duplicate actions. This article examines the sustainability aspects of data preparation through the lens of a circular economy. Within the data landscape, this perspective encourages practices that minimize waste, extend the data life cycle, and maximize reuse in alignment with the FAIR principles. We explore these practices and their impact on selecting and configuring effective data preparation strategies to design sustainable, high-quality pipelines. To this end, we propose an evaluation model that integrates data quality metrics with sustainability parameters for human and computational tasks. Finally, we apply the model in a comparative analysis of key data preparation methods, demonstrating its effectiveness in assessing sustainability and quality tradeoffs.<\/jats:p>","DOI":"10.1145\/3769120","type":"journal-article","created":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T11:12:59Z","timestamp":1760008379000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Sustainable Quality in Data Preparation"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2034-9774","authenticated-orcid":false,"given":"Barbara","family":"Pernici","sequence":"first","affiliation":[{"name":"Politecnico di Milano","place":["Milan, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6062-5174","authenticated-orcid":false,"given":"Cinzia","family":"Cappiello","sequence":"additional","affiliation":[{"name":"Politecnico di Milano","place":["Milan, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5734-1274","authenticated-orcid":false,"given":"Carlo Alberto","family":"Bono","sequence":"additional","affiliation":[{"name":"Politecnico di Milano","place":["Milan, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3820-7870","authenticated-orcid":false,"given":"Camilla","family":"Sancricca","sequence":"additional","affiliation":[{"name":"Politecnico di Milano","place":["Milan, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3578-1121","authenticated-orcid":false,"given":"Tiziana","family":"Catarci","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Roma La Sapienza","place":["Rome, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9051-6972","authenticated-orcid":false,"given":"Marco","family":"Angelini","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Roma La Sapienza","place":["Rome, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8868-7907","authenticated-orcid":false,"given":"Matteo","family":"Filosa","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Roma La Sapienza","place":["Rome, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1801-5118","authenticated-orcid":false,"given":"Matteo","family":"Palmonari","sequence":"additional","affiliation":[{"name":"University of Milan-Bicocca","place":["Milan, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5047-7371","authenticated-orcid":false,"given":"Flavio","family":"De Paoli","sequence":"additional","affiliation":[{"name":"University of Milan-Bicocca","place":["Milan, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8087-6587","authenticated-orcid":false,"given":"Sonia","family":"Bergamaschi","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Modena e Reggio Emilia","place":["Modena, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3466-509X","authenticated-orcid":false,"given":"Giovanni","family":"Simonini","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Modena e Reggio Emilia","place":["Modena, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3465-5027","authenticated-orcid":false,"given":"Angelo","family":"Mozzillo","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Modena e Reggio Emilia","place":["Modena, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4856-0838","authenticated-orcid":false,"given":"Luca","family":"Zecchini","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Modena e Reggio Emilia","place":["Modena, Italy"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,12,5]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.11591\/ijeecs.v14.i3.pp1552-1563"},{"issue":"2","key":"e_1_3_3_3_2","doi-asserted-by":"crossref","first-page":"796","DOI":"10.3390\/app11020796","article-title":"Impact of dataset size on classification performance: An empirical evaluation in the medical domain","volume":"11","author":"Althnian Alhanoof","year":"2021","unstructured":"Alhanoof Althnian, Duaa AlSaeed, Heyam Al-Baity, Amani Samha, Alanoud Bin Dris, Najla Alzakari, Afnan Abou Elwafa, and Heba Kurdi. 2021. Impact of dataset size on classification performance: An empirical evaluation in the medical domain. Applied Sciences 11, 2 (2021), 796.","journal-title":"Applied Sciences"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556567"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850587"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.3390\/INFORMATICS5030031"},{"key":"e_1_3_3_7_2","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024","author":"Asai Akari","year":"2024","unstructured":"Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2024. Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In Proceedings of the 12th International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 1\u201330. Retrieved from https:\/\/openreview.net\/forum?id=hSyW5go0v8. Accessed: 2025-06-10."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.dib.2024.110969"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/J.CMPB.2021.106504"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.44.4.462"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/62065.62068"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1541880.1541883"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24106-7"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882919"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389732"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13678"},{"key":"e_1_3_3_17_2","unstructured":"Federico Belotti Fabio Dadda Marco Cremaschi Roberto Avogadro and Matteo Palmonari. 2024. Evaluating LLMs on Entity Disambiguation in Tables. arXiv:2408.06423. Retrieved from https:\/\/arxiv.org\/abs\/2408.06423"},{"key":"e_1_3_3_18_2","first-page":"86","volume-title":"Proceedings of the IFIP Conference on Human-Computer Interaction","author":"Benvenuti Dario","year":"2023","unstructured":"Dario Benvenuti, Matteo Filosa, Tiziana Catarci, and Marco Angelini. 2023. Modeling and assessing user interaction in big data visualization systems. In Proceedings of the IFIP Conference on Human-Computer Interaction. Springer, Cham, 86\u2013109."},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-42283-6_5"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313602"},{"key":"e_1_3_3_21_2","volume-title":"Proceedings of the CIDR","author":"Berti-Equille Laure","year":"2020","unstructured":"Laure Berti-Equille. 2020. Active reinforcement learning for data preparation: Learn2Clean with human-in-the-loop. In Proceedings of the CIDR. CIDR, Amsterdam, The Netherlands, 2. Retrieved from https:\/\/www.cidrdb.org\/cidr2020\/gongshow2020\/gongshow\/abstracts\/cidr2020_abstract59.pdf"},{"key":"e_1_3_3_22_2","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.cosust.2017.01.010","article-title":"Global governance by goal-setting: the novel approach of the UN Sustainable Development Goals","volume":"26","author":"Biermann Frank","year":"2017","unstructured":"Frank Biermann, Norichika Kanie, and Rakhyun E. Kim. 2017. Global governance by goal-setting: the novel approach of the UN Sustainable Development Goals. Current Opinion in Environmental Sustainability 26-27 (2017), 26\u201331.","journal-title":"Current Opinion in Environmental Sustainability"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597305"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1353\/lib.2007.0044"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1201\/9781498710411-35"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSPEC.2024.10491388"},{"issue":"1","key":"e_1_3_3_27_2","article-title":"The FAIR assessment conundrum: Reflections on tools and metrics","volume":"23","author":"Candela Leonardo","year":"2024","unstructured":"Leonardo Candela, Dario Mangione, and Gina Pavone. 2024. The FAIR assessment conundrum: Reflections on tools and metrics. Data Science Journal 23, 1 (2024), 21.","journal-title":"Data Science Journal"},{"key":"e_1_3_3_28_2","first-page":"1","volume-title":"Proceedings of the 2023 ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","author":"Casta\u00f1o Joel","year":"2023","unstructured":"Joel Casta\u00f1o, Silverio Mart\u00ednez-Fern\u00e1ndez, Xavier Franch, and Justus Bogner. 2023. Exploring the carbon footprint of hugging face\u2019s ML models: A repository mining study. In Proceedings of the 2023 ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, Piscataway, NJ, 1\u201312."},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3418896"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749431"},{"key":"e_1_3_3_31_2","unstructured":"World Wide Web Consortium. 2025. W3C. Retrieved from https:\/\/www.w3.org\/. (2025). Accessed: 2025-06-10."},{"key":"e_1_3_3_32_2","unstructured":"Wikidata contributors. 2025. Wikidata. Retrieved from https:\/\/www.wikidata.org\/. (2025). Accessed: 2025-06-10."},{"key":"e_1_3_3_33_2","unstructured":"Marco Cremaschi Blerina Spahiu Matteo Palmonari and Ernesto Jimenez-Ruiz. 2024. Survey on Semantic Interpretation of Tabular Data: Challenges and Directions. arXiv:2411.11891. Retrieved from https:\/\/arxiv.org\/abs\/2411.11891"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-30796-7_22"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.3389\/FDATA.2022.850611"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.3389\/frai.2020.00004"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3665252.3665263"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3637528.3671470"},{"key":"e_1_3_3_39_2","volume-title":"Progressive Data Analysis","author":"Fekete Jean-Daniel","year":"2024","unstructured":"Jean-Daniel Fekete, Danyel Fisher, and Michael Sedlmair. 2024. Progressive Data Analysis. Eurographics Association, Eindhoven, The Netherlands."},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/S42979-023-01828-8"},{"key":"e_1_3_3_41_2","first-page":"261:1\u2013261:61","article-title":"Auto-sklearn 2.0: Hands-free AutoML via meta-learning","volume":"23","author":"Feurer Matthias","year":"2022","unstructured":"Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. 2022. Auto-sklearn 2.0: Hands-free AutoML via meta-learning. J. Mach. Learn. Res. 23, 261 (2022), 261:1\u2013261:61. Retrieved from http:\/\/jmlr.org\/papers\/v23\/21-0992.html","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_3_42_2","first-page":"321","volume-title":"Proceedings of the International Conference on Human-Centred Software Engineering","author":"Filosa Matteo","year":"2024","unstructured":"Matteo Filosa, Alexandra Plexousaki, Dario Benvenuti, Tiziana Catarci, and Marco Angelini. 2024. InterView: A system to support interaction-driven visualization systems design. In Proceedings of the International Conference on Human-Centred Software Engineering. Springer, Cham, 321\u2013329."},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.14778\/2876473.2876474"},{"issue":"1","key":"e_1_3_3_44_2","first-page":"2:1\u20132:5","article-title":"Ethical dimensions for data quality","volume":"12","author":"Firmani Donatella","year":"2020","unstructured":"Donatella Firmani, Letizia Tanca, and Riccardo Torlone. 2020. Ethical dimensions for data quality. ACM J. Data Inf. Qual. 12, 1 (2020), 2:1\u20132:5.","journal-title":"ACM J. Data Inf. Qual."},{"key":"e_1_3_3_45_2","unstructured":"International Organization for Standardization. 2025. ISO. Retrieved from https:\/\/www.iso.org\/home.html. (2025). Accessed: 2025-06-10."},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00176"},{"issue":"1","key":"e_1_3_3_47_2","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1038\/s41597-025-04565-0","article-title":"Affording reusable data: Recommendations for researchers from a data-intensive project","volume":"12","author":"Fraga-Gonz\u00e1lez Gorka","year":"2025","unstructured":"Gorka Fraga-Gonz\u00e1lez, Hester van de Wiel, Francesco Garassino, Willy Kuo, Diane de Z\u00e9licourt, Vartan Kurtcuoglu, Leonhard Held, and Eva Furrer. 2025. Affording reusable data: Recommendations for researchers from a data-intensive project. Scientific Data 12, 1 (2025), 258.","journal-title":"Scientific Data"},{"key":"e_1_3_3_48_2","unstructured":"Yunfan Gao Yun Xiong Xinyu Gao Kangxiang Jia Jinliu Pan Yuxi Bi Yi Dai Jiawei Sun and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey. arXiv:2312.10997. Retrieved from https:\/\/arxiv.org\/abs\/2312.10997"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2019.11.146"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2004.22"},{"key":"e_1_3_3_51_2","volume-title":"ISO 9241-11:2018 - Ergonomics of human-system interaction\u2014Part 11: Usability: Definitions and concepts","author":"Standardization International Organization for","year":"2018","unstructured":"International Organization for Standardization. 2018. ISO 9241-11:2018 - Ergonomics of human-system interaction\u2014Part 11: Usability: Definitions and concepts. International Organization for Standardization (ISO), Geneva, Switzerland. Retrieved from https:\/\/www.iso.org\/standard\/63500.html"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452760"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","unstructured":"Annika Jacobsen Ricardo de Miranda Azevedo Nick Juty Dominique Batista Simon Coles Ronald Cornet M\u00e9lanie Courtot Merc\u00e8 Crosas Michel Dumontier Chris T. Evelo et\u00a0al. 2020. FAIR principles: Interpretations and implementation considerations. Data Intelligence 2 1-2 (2020) 10\u201329. DOI:10.1162\/dint_r_00024","DOI":"10.1162\/dint_r_00024"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1162\/dint_a_00028"},{"key":"e_1_3_3_55_2","first-page":"3394","volume-title":"Proceedings of the AISTATS (Proceedings of Machine Learning Research)","author":"J\u00e4ger Sebastian","year":"2024","unstructured":"Sebastian J\u00e4ger and Felix Biessmann. 2024. From data imputation to data cleaning - automated cleaning of tabular data improves downstream predictive performance. In Proceedings of the AISTATS (Proceedings of Machine Learning Research). PMLR, proceedings.mlr.press, 3394\u20133402."},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12599-024-00857-8"},{"issue":"6","key":"e_1_3_3_57_2","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1080\/14693062.2022.2083548","article-title":"Overcoming misleading carbon footprints in the financial sector","volume":"22","author":"Janssen Artjom","year":"2022","unstructured":"Artjom Janssen, Wouter Botzen, Justin Dijk, and Patty Duijm. 2022. Overcoming misleading carbon footprints in the financial sector. Climate Policy 22, 6 (2022), 817\u2013822.","journal-title":"Climate Policy"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.495"},{"issue":"1","key":"e_1_3_3_59_2","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1080\/07488008808408783","article-title":"The brundtland report:\u2018Our common future\u2019","volume":"4","author":"Keeble Brian R.","year":"1988","unstructured":"Brian R. Keeble. 1988. The brundtland report:\u2018Our common future\u2019. Medicine and War 4, 1 (1988), 17\u201325.","journal-title":"Medicine and War"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-71080-6_6"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.63"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.resconrec.2017.09.005"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994514"},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00009"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346452"},{"key":"e_1_3_3_66_2","first-page":"1","volume-title":"Proceedings of the Workshop on Tackling Climate Change with Machine Learning at NeurIPS 2019","author":"Lottick Kadan","year":"2019","unstructured":"Kadan Lottick, Silvia Susai, Sorelle A. Friedler, and Jonathan P. Wilson. 2019. Energy usage reports: Environmental awareness as part of algorithmic accountability. In Proceedings of the Workshop on Tackling Climate Change with Machine Learning at NeurIPS 2019. Vancouver Convention Center, British Columbia, Canada, 1\u201312."},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.imu.2022.100911"},{"issue":"253","key":"e_1_3_3_68_2","first-page":"1","article-title":"Estimating the carbon footprint of BLOOM, a 176B parameter language model","volume":"24","author":"Luccioni Alexandra Sasha","year":"2023","unstructured":"Alexandra Sasha Luccioni, Sylvain Viguier, and Anne-Laure Ligozat. 2023. Estimating the carbon footprint of BLOOM, a 176B parameter language model. Journal of Machine Learning Research 24, 253 (2023), 1\u201315.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00069"},{"issue":"1","key":"e_1_3_3_70_2","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1186\/s13174-019-0121-z","article-title":"DOD-ETL: Distributed on-demand ETL for near real-time business intelligence","volume":"10","author":"Machado Gustavo V.","year":"2019","unstructured":"Gustavo V. Machado, \u00cdtalo Cunha, Adriano C. M. Pereira, and Leonardo B. Oliveira. 2019. DOD-ETL: Distributed on-demand ETL for near real-time business intelligence. Journal of Internet Services and Applications 10, 1 (2019), 21.","journal-title":"Journal of Internet Services and Applications"},{"key":"e_1_3_3_71_2","volume-title":"Proceedings of the 11th Conference on Innovative Data Systems Research, CIDR 2021","author":"Mahdavi Mohammad","year":"2021","unstructured":"Mohammad Mahdavi and Ziawasch Abedjan. 2021. Semi-supervised data cleaning with Raha and Baran. In Proceedings of the 11th Conference on Innovative Data Systems Research, CIDR 2021. www.cidrdb.org, online, 7. Retrieved from http:\/\/cidrdb.org\/cidr2021\/papers\/cidr2021_paper14.pdf"},{"key":"e_1_3_3_72_2","first-page":"10","volume-title":"Proceedings of the Conference on \u201cLernen, Wissen, Daten, Analysen\u201d (CEUR Workshop Proceedings)","author":"Mahdavi Mohammad","year":"2019","unstructured":"Mohammad Mahdavi, Felix Neutatz, Larysa Visengeriyeva, and Ziawasch Abedjan. 2019. Towards automated data cleaning workflows. In Proceedings of the Conference on \u201cLernen, Wissen, Daten, Analysen\u201d (CEUR Workshop Proceedings). CEUR-WS.org, Aachen, Germany, 10\u201319. Retrieved from https:\/\/ceur-ws.org\/Vol-2454\/paper_8.pdf"},{"key":"e_1_3_3_73_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-37453-2_43"},{"key":"e_1_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/2656204"},{"issue":"6","key":"e_1_3_3_75_2","first-page":"115:1\u2013115:35","article-title":"A survey on bias and fairness in machine learning","volume":"54","author":"Mehrabi Ninareh","year":"2022","unstructured":"Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2022. A survey on bias and fairness in machine learning. ACM Comput. Surv. 54, 6 (2022), 115:1\u2013115:35.","journal-title":"ACM Comput. Surv."},{"key":"e_1_3_3_76_2","unstructured":"Leonel Aguilar Melgar David Dao Shaoduo Gan Nezihe Merve G\u00fcrel Nora Hollenstein Jiawei Jiang Bojan Karlas Thomas Lemmin Tian Li Yang Li Susie Xi Rao Johannes Rausch C\u00e9dric Renggli Luka Rimanic Maurice Weber Shuai Zhang Zhikuan Zhao Kevin Schawinski Wentao Wu Ce Zhang. 2021. Ease.ML: A lifecycle management system for machine learning. In Proceedings of the CIDR. www.cidrdb.org online 7. Retrieved from https:\/\/www.cidrdb.org\/cidr2021\/papers\/cidr2021_paper26.pdf"},{"key":"e_1_3_3_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/3578938"},{"key":"e_1_3_3_78_2","doi-asserted-by":"publisher","DOI":"10.48786\/EDBT.2025.27"},{"key":"e_1_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2003.12.005"},{"key":"e_1_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.1007\/S13222-022-00413-2"},{"issue":"6","key":"e_1_3_3_81_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/MS.1996.8740869","article-title":"Usability metrics: Tracking interface improvements","volume":"13","author":"Nielsen Jakob","year":"1996","unstructured":"Jakob Nielsen. 1996. Usability metrics: Tracking interface improvements. IEEE Software 13, 6 (1996), 1\u20132.","journal-title":"IEEE Software"},{"key":"e_1_3_3_82_2","doi-asserted-by":"crossref","unstructured":"Folorunso Y. Osisanwo Joseph E.T. Akinsola Oludele Awodele John O. Hinmikaiye Oluwole Olakanmi and Joseph Akinjobi. 2017. Supervised machine learning algorithms: Classification and comparison. International Journal of Computer Trends and Technology 48 2 (2017) 128\u2013138.","DOI":"10.14445\/22312803\/IJCTT-V48P126"},{"key":"e_1_3_3_83_2","first-page":"702616","volume-title":"Proceedings of the Frontiers in Education","author":"Ouwehand Kim","year":"2021","unstructured":"Kim Ouwehand, Avalon van der Kroef, Jacqueline Wong, and Fred Paas. 2021. Measuring cognitive load: Are there more valid alternatives to Likert rating scales?. In Proceedings of the Frontiers in Education. Frontiers Media SA, Lausanne, Switzerland, 702616."},{"key":"e_1_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-70903-6_12"},{"key":"e_1_3_3_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2359666"},{"key":"e_1_3_3_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/3493700.3493774"},{"key":"e_1_3_3_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3603709"},{"key":"e_1_3_3_88_2","unstructured":"Ralph Peeters Aaron Steiner and Christian Bizer. 2025. Entity matching using large language models. Proceedings of the 28th International Conference on Extending Database Technology (EDBT). Retrieved from https:\/\/arxiv.org\/abs\/2310.11244"},{"issue":"101","key":"e_1_3_3_89_2","doi-asserted-by":"crossref","first-page":"101","DOI":"10.12688\/openreseurope.17554.1","article-title":"An in-depth analysis of data reduction methods for sustainable deep learning","volume":"4","author":"Perera-Lago Javier","year":"2024","unstructured":"Javier Perera-Lago, Victor Toscano-Duran, Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Miguel A. Guti\u00e9rrez-Naranjo, and Matteo Rucco. 2024. An in-depth analysis of data reduction methods for sustainable deep learning. Open Research Europe 4, 101 (2024), 101.","journal-title":"Open Research Europe"},{"key":"e_1_3_3_90_2","unstructured":"Jos\u00e9 Potting Marko P. Hekkert Ernst Worrell and Aldert Hanemaaije. 2017. Circular economy: Measuring innovation in the product chain. PBL Netherlands Environmental Assessment Agency 2544 (2017) 46."},{"key":"e_1_3_3_91_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11625-018-0627-5"},{"key":"e_1_3_3_92_2","doi-asserted-by":"publisher","unstructured":"I. P\u00e9rez-Messina Marco Angelini Davide Ceneda Christian Tominski and Silvia Miksch. 2025. Coupling guidance and progressiveness in visual analytics. Computer Graphics Forum 44 3 (2025) 1\u201312. DOI:10.1111\/cgf.70115","DOI":"10.1111\/cgf.70115"},{"key":"e_1_3_3_93_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-73194-6_6"},{"key":"e_1_3_3_94_2","doi-asserted-by":"publisher","DOI":"10.1007\/S11390-021-1344-6"},{"key":"e_1_3_3_95_2","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137631"},{"issue":"7933","key":"e_1_3_3_96_2","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1038\/s41586-022-05224-9","article-title":"Comprehensive evidence implies a higher social cost of CO2","volume":"610","author":"Rennert Kevin","year":"2022","unstructured":"Kevin Rennert, Frank Errickson, Brian C. Prest, Lisa Rennels, Richard G. Newell, William Pizer, Cora Kingdon, Jordan Wingenroth, Roger Cooke, Bryan Parthum, et\u00a0al. 2022. Comprehensive evidence implies a higher social cost of CO2. Nature 610, 7933 (2022), 687\u2013692.","journal-title":"Nature"},{"key":"e_1_3_3_97_2","unstructured":"Camilla Sancricca and Cinzia Cappiello. 2025. Lightweight Pipelines: Good Enough is Sometimes Better. In VLDB 2025 Workshop: 2nd International Workshop on Data-Centric AI (DATAI)."},{"key":"e_1_3_3_98_2","doi-asserted-by":"publisher","DOI":"10.1145\/3381831"},{"key":"e_1_3_3_99_2","first-page":"9\/1\u201321","volume-title":"Proceedings of the International Conference on Automated Machine Learning","author":"Shchur Oleksandr","year":"2023","unstructured":"Oleksandr Shchur, Ali Caner T\u00fcrkmen, Nick Erickson, Huibin Shen, Alexander Shirkov, Tony Hu, and Bernie Wang. 2023. AutoGluon-timeseries: AutoML for probabilistic time series forecasting. In Proceedings of the International Conference on Automated Machine Learning. Proceedings of Machine Learning Research, Potsdam, Germany, 9\/1\u201321. Retrieved from https:\/\/arxiv.org\/pdf\/2308.05566"},{"key":"e_1_3_3_100_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2023.3241071"},{"key":"e_1_3_3_101_2","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1016\/B978-155860915-0\/50046-9","volume-title":"Proceedings of the Craft of Information Visualization","author":"Shneiderman Ben","year":"2003","unstructured":"Ben Shneiderman. 2003. The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings of the Craft of Information Visualization. Elsevier, 364\u2013371."},{"key":"e_1_3_3_102_2","doi-asserted-by":"publisher","DOI":"10.1109\/BIGDATA47090.2019.9006187"},{"key":"e_1_3_3_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00015"},{"key":"e_1_3_3_104_2","doi-asserted-by":"publisher","DOI":"10.14778\/3523210.3523226"},{"key":"e_1_3_3_105_2","unstructured":"Aditi Singh Abul Ehtesham Saket Kumar and Tala Talaei Khoei. 2025. Agentic retrieval-augmented generation: A survey on agentic RAG. arXiv:2501.09136. Retrieved from https:\/\/arxiv.org\/abs\/2501.09136"},{"key":"e_1_3_3_106_2","doi-asserted-by":"publisher","DOI":"10.1016\/J.ASOC.2019.105524"},{"key":"e_1_3_3_107_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11625-016-0412-2"},{"key":"e_1_3_3_108_2","doi-asserted-by":"publisher","unstructured":"John Sweller. 2017. Measuring cognitive load. Perspectives on Medical Education 7 1 (2017) 1\u20132. DOI:10.1007\/s40037-017-0395-4","DOI":"10.1007\/s40037-017-0395-4"},{"key":"e_1_3_3_109_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2006.5"},{"key":"e_1_3_3_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2023.3346641"},{"key":"e_1_3_3_111_2","doi-asserted-by":"publisher","DOI":"10.1007\/s43681-021-00043-6"},{"key":"e_1_3_3_112_2","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1109\/ICT4S55073.2022.00015","volume-title":"Proceedings of the International Conference on ICT for Sustainability (ICT4S)","author":"Verdecchia Roberto","year":"2022","unstructured":"Roberto Verdecchia, Lu\u00eds Cruz, June Sallou, Michelle Lin, James Wickenden, and Estelle Hotellier. 2022. Data-centric green AI an exploratory empirical study. In Proceedings of the International Conference on ICT for Sustainability (ICT4S). IEEE, Piscataway, NJ, 35\u201345."},{"key":"e_1_3_3_113_2","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2021.3102254"},{"key":"e_1_3_3_114_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-019-14108-y"},{"key":"e_1_3_3_115_2","doi-asserted-by":"publisher","unstructured":"Pattaramon Vuttipittayamongkol Eyad Elyan and Andrei Petrovski. 2021. On the class overlap problem in imbalanced data classification. Knowl. Based Syst. 212 (2021) 106631. DOI:10.1016\/j.knosys.2020.106631","DOI":"10.1016\/j.knosys.2020.106631"},{"issue":"2","key":"e_1_3_3_116_2","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1109\/TCC.2015.2453988","article-title":"On achieving energy efficiency and reducing CO \\(_{2}\\)  footprint in cloud computing","volume":"4","author":"Wajid Usman","year":"2015","unstructured":"Usman Wajid, Cinzia Cappiello, Pierluigi Plebani, Barbara Pernici, Nikolay Mehandjiev, Monica Vitali, Michael Gienger, Kostas Kavoussanakis, David Margery, David Garcia Perez, et\u00a0al. 2015. On achieving energy efficiency and reducing CO \\(_{2}\\) footprint in cloud computing. IEEE Transactions on Cloud Computing 4, 2 (2015), 138\u2013151.","journal-title":"IEEE Transactions on Cloud Computing"},{"issue":"2","key":"e_1_3_3_117_2","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1007\/s43615-021-00064-7","article-title":"What is the relation between circular economy and sustainability? Answers from frontrunner companies engaged with circular economy practices","volume":"2","author":"Walker Anna M.","year":"2022","unstructured":"Anna M. Walker, Katelin Opferkuch, Erik Roos Lindgreen, Andrea Raggi, Alberto Simboli, Walter J. V. Vermeulen, Sandra Caeiro, and Roberta Salomone. 2022. What is the relation between circular economy and sustainability? Answers from frontrunner companies engaged with circular economy practices. Circular Economy and Sustainability 2, 2 (2022), 731\u2013758.","journal-title":"Circular Economy and Sustainability"},{"key":"e_1_3_3_118_2","first-page":"239","volume-title":"Proceedings of the IFIP Conference on Human-Computer Interaction","author":"Waloszek Gerd","year":"2009","unstructured":"Gerd Waloszek and Ulrich Kreichgauer. 2009. User-centered evaluation of the responsiveness of applications. In Proceedings of the IFIP Conference on Human-Computer Interaction. Springer, Cham, 239\u2013242."},{"key":"e_1_3_3_119_2","doi-asserted-by":"publisher","DOI":"10.1080\/07421222.1996.11518099"},{"key":"e_1_3_3_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2012.43"},{"key":"e_1_3_3_121_2","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2016.18"},{"key":"e_1_3_3_122_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706418"},{"key":"e_1_3_3_123_2","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389738"},{"issue":"11","key":"e_1_3_3_124_2","doi-asserted-by":"crossref","first-page":"2563","DOI":"10.14778\/3476249.3476303","article-title":"Auto-pipeline: Synthesizing complex data pipelines by-target using reinforcement learning and search","volume":"14","author":"Yang Junwen","year":"2021","unstructured":"Junwen Yang, Yeye He, and Surajit Chaudhuri. 2021. Auto-pipeline: Synthesizing complex data pipelines by-target using reinforcement learning and search. Proceedings of the VLDB Endowment 14, 11 (2021), 2563\u20132575.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_3_125_2","first-page":"1197","volume-title":"Proceedings of the ICMLA","author":"Zagatti Fernando Rezende","year":"2021","unstructured":"Fernando Rezende Zagatti, Lucas Cardoso Silva, Lucas Nildaimon dos Santos Silva, Bruno Silva Sette, Helena de Medeiros Caseli, Daniel Lucr\u00e9dio, and Diego Furtado Silva. 2021. MetaPrep: Data preparation pipelines recommendation via meta-learning. In Proceedings of the ICMLA. IEEE, Piscataway, NJ, 1197\u20131202."},{"key":"e_1_3_3_126_2","doi-asserted-by":"publisher","DOI":"10.1145\/3639303"},{"key":"e_1_3_3_127_2","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611612"},{"key":"e_1_3_3_128_2","first-page":"6024","volume-title":"Proceedings of the NAACL (Volume 1: Long Papers)","author":"Zhang Tianshu","year":"2024","unstructured":"Tianshu Zhang, Xiang Yue, Yifei Li, and Huan Sun. 2024. TableLlama: Towards open large generalist models for tables. In Proceedings of the NAACL (Volume 1: Long Papers). Association for Computational Linguistics, Kerrville, TX, 6024\u20136044."},{"key":"e_1_3_3_129_2","doi-asserted-by":"publisher","DOI":"10.1145\/3446905"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3769120","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T12:21:06Z","timestamp":1764937266000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3769120"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,5]]},"references-count":128,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,12,31]]}},"alternative-id":["10.1145\/3769120"],"URL":"https:\/\/doi.org\/10.1145\/3769120","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"value":"1936-1955","type":"print"},{"value":"1936-1963","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,5]]},"assertion":[{"value":"2025-02-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-05","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}