{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T03:19:08Z","timestamp":1758079148240,"version":"3.44.0"},"reference-count":15,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>\n            Data practitioners often need to sample their datasets to produce representative subsets for their downstream tasks. Unfortunately, real-world datasets frequently contain duplicates, whose presence biases sampling and impacts the quality of the produced subsets, hence the outcome of downstream tasks. While deduplication is therefore fundamental, performing it on the entire dataset to run sampling on its cleaned version might be prohibitively expensive in terms of time and resources. Thus, we recently introduced RadlER, a solution to perform\n            <jats:italic toggle=\"yes\">deduplicated sampling on-demand<\/jats:italic>\n            , i.e., to produce a clean sample of a dirty dataset incrementally, according to a target distribution of some subpopulations, by focusing the cleaning effort only on entities required to appear in the sample.\n          <\/jats:p>\n          <jats:p>In this demonstration, we interactively show how RadlER can support practitioners in their data science pipelines, allowing them to save a relevant amount of time and resources.<\/jats:p>","DOI":"10.14778\/3750601.3750661","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:05Z","timestamp":1758029885000},"page":"5319-5322","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["RadlER: Deduplicated Sampling On-Demand"],"prefix":"10.14778","volume":"18","author":[{"given":"Luca","family":"Zecchini","sequence":"first","affiliation":[{"name":"BIFOLD &amp; TU Berlin, Berlin, Germany"}]},{"given":"Ziawasch","family":"Abedjan","sequence":"additional","affiliation":[{"name":"BIFOLD &amp; TU Berlin, Berlin, Germany"}]},{"given":"Vasilis","family":"Efthymiou","sequence":"additional","affiliation":[{"name":"Harokopio University, Athens, Greece"}]},{"given":"Giovanni","family":"Simonini","sequence":"additional","affiliation":[{"name":"University of Modena and Reggio Emilia, Italy"}]}],"member":"320","published-online":{"date-parts":[[2025,9,16]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Article 52","author":"Barlaug Nils","year":"2021","unstructured":"Nils Barlaug and Jon Atle Gulla. 2021. Neural Networks for Entity Matching: A Survey. TKDD 15, 3, Article 52 (2021), 37 pages."},{"key":"e_1_2_1_2_1","volume-title":"Data Matching","author":"Christen Peter","unstructured":"Peter Christen. 2012. Data Matching. Springer."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Dong Deng et al. 2019. Unsupervised String Transformation Learning for Entity Consolidation. In ICDE. 196\u2013207.","DOI":"10.1109\/ICDE.2019.00026"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2854006.2854008"},{"key":"e_1_2_1_5_1","unstructured":"Luca Gagliardelli Giovanni Simonini Domenico Beneventano and Sonia Bergamaschi. 2019. SparkER: Scaling Entity Resolution in Spark. In EDBT. 602\u2013605."},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Ninareh Mehrabi et al. 2021. A Survey on Bias and Fairness in Machine Learning. CSUR 54 6 Article 115 (2021) 35 pages.","DOI":"10.1145\/3457607"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"George Papadakis et al. 2020. Blocking and Filtering Techniques for Entity Resolution: A Survey. CSUR 53 2 Article 31 (2020) 42 pages.","DOI":"10.1145\/3377455"},{"key":"e_1_2_1_8_1","unstructured":"Ralph Peeters Aaron Steiner and Christian Bizer. 2025. Entity Matching using Large Language Models. In EDBT. 529\u2013541."},{"key":"e_1_2_1_9_1","first-page":"3279","article-title":"Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching","volume":"16","author":"Nima Shahbazi","year":"2023","unstructured":"Nima Shahbazi et al. 2023. Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching. PVLDB 16, 11 (2023), 3279\u20133292.","journal-title":"PVLDB"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3523210.3523226"},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Emma Strubell Ananya Ganesh and Andrew McCallum. 2019. Energy and Policy Considerations for Deep Learning in NLP. In ACL. 3645\u20133650.","DOI":"10.18653\/v1\/P19-1355"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Steven K. Thompson. 2012. Sampling. John Wiley & Sons.","DOI":"10.1002\/9781118162934"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3742728.3742742"},{"key":"e_1_2_1_14_1","unstructured":"Luca Zecchini Giovanni Simonini and Sonia Bergamaschi. 2020. Entity Resolution on Camera Records without Machine Learning. In DI2KG @ VLDB."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611612"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3750601.3750661","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:44:17Z","timestamp":1758030257000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3750601.3750661"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":15,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.14778\/3750601.3750661"],"URL":"https:\/\/doi.org\/10.14778\/3750601.3750661","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8]]},"assertion":[{"value":"2025-09-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}