{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T10:58:58Z","timestamp":1770461938086,"version":"3.49.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:p>\n            Entity Resolution (ER) aims to identify and merge records that refer to the same real-world entity. ER is typically employed as an expensive cleaning step on the entire data before consuming it. Yet, determining which entities are useful once cleaned depends solely on the user's application, which may need only a fraction of them. For instance, when dealing with Web data, we would like to be able to filter the entities of interest gathered from multiple sources without cleaning the entire, continuously-growing data. Similarly, when querying data lakes, we want to transform data on-demand and return the results in a timely manner---a fundamental requirement of ELT (\n            <jats:italic>Extract-Load-Transform<\/jats:italic>\n            ) pipelines.\n          <\/jats:p>\n          <jats:p>\n            We propose\n            <jats:italic>BrewER<\/jats:italic>\n            , a framework to evaluate SQL SP queries on dirty data while progressively returning results as if they were issued on cleaned data.\n            <jats:italic>BrewER<\/jats:italic>\n            tries to focus the cleaning effort on one entity at a time, following an ORDER BY predicate. Thus, it inherently supports\n            <jats:italic>top-k<\/jats:italic>\n            and stop-and-resume execution. For a wide range of applications, a significant amount of resources can be saved. We exhaustively evaluate and show the efficacy of\n            <jats:italic>BrewER<\/jats:italic>\n            on four real-world datasets.\n          <\/jats:p>","DOI":"10.14778\/3523210.3523226","type":"journal-article","created":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T22:23:21Z","timestamp":1655936601000},"page":"1506-1518","source":"Crossref","is-referenced-by-count":20,"title":["Entity resolution on-demand"],"prefix":"10.14778","volume":"15","author":[{"given":"Giovanni","family":"Simonini","sequence":"first","affiliation":[{"name":"University of Modena and Reggio Emilia, Italy"}]},{"given":"Luca","family":"Zecchini","sequence":"additional","affiliation":[{"name":"University of Modena and Reggio Emilia, Italy"}]},{"given":"Sonia","family":"Bergamaschi","sequence":"additional","affiliation":[{"name":"University of Modena and Reggio Emilia, Italy"}]},{"given":"Felix","family":"Naumann","sequence":"additional","affiliation":[{"name":"University of Potsdam, Germany"}]}],"member":"320","published-online":{"date-parts":[[2022,6,22]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Altosight. Accessed on 2022-03-11. Altosight Official Website. https:\/\/altosight.com  Altosight. Accessed on 2022-03-11. Altosight Official Website. https:\/\/altosight.com"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2623607"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850587"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442200"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-008-0098-x"},{"key":"e_1_2_1_6_1","volume-title":"Data Fusion. ACM Comput. Surv. 41, 1","author":"Bleiholder Jens","year":"2008","unstructured":"Jens Bleiholder and Felix Naumann . 2008. Data Fusion. ACM Comput. Surv. 41, 1 ( 2008 ), 1:1--1:41. Jens Bleiholder and Felix Naumann. 2008. Data Fusion. ACM Comput. Surv. 41, 1 (2008), 1:1--1:41."},{"key":"e_1_2_1_7_1","volume-title":"Entity Resolution, and Duplicate Detection","author":"Christen Peter","unstructured":"Peter Christen . 2012. Data Matching - Concepts and Techniques for Record Linkage , Entity Resolution, and Duplicate Detection . Springer . Peter Christen. 2012. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.127"},{"key":"e_1_2_1_9_1","volume-title":"An Overview of End-to-end Entity Resolution for Big Data. ACM Comput. Surv. 53, 6","author":"Christophides Vassilis","year":"2021","unstructured":"Vassilis Christophides , Vasilis Efthymiou , Themis Palpanas , George Papadakis , and Kostas Stefanidis . 2021. An Overview of End-to-end Entity Resolution for Big Data. ACM Comput. Surv. 53, 6 ( 2021 ), 127:1--127:42. Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2021. An Overview of End-to-end Entity Resolution for Big Data. ACM Comput. Surv. 53, 6 (2021), 127:1--127:42."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687576"},{"key":"e_1_2_1_11_1","volume-title":"Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, and Divesh Srivastava.","author":"Crescenzi Valter","year":"2021","unstructured":"Valter Crescenzi , Andrea De Angelis , Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, and Divesh Srivastava. 2021 . Alaska : A Flexible Benchmark for Data Integration Tasks. CoRR abs\/2101.11259 (2021). Valter Crescenzi, Andrea De Angelis, Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, and Divesh Srivastava. 2021. Alaska: A Flexible Benchmark for Data Integration Tasks. CoRR abs\/2101.11259 (2021)."},{"key":"e_1_2_1_12_1","volume-title":"SIGMOD 2020 Programming Contest Official Website. http:\/\/www.inf.uniroma3.it\/db\/sigmod2020contest","author":"Research Database","unstructured":"Database Research Group of the Roma Tre University. Accessed on 2022-03-11 . SIGMOD 2020 Programming Contest Official Website. http:\/\/www.inf.uniroma3.it\/db\/sigmod2020contest Database Research Group of the Roma Tre University. Accessed on 2022-03-11. SIGMOD 2020 Programming Contest Official Website. http:\/\/www.inf.uniroma3.it\/db\/sigmod2020contest"},{"key":"e_1_2_1_13_1","volume-title":"SIGMOD 2021 Programming Contest Official Website. https:\/\/dbgroup.ing.unimo.it\/sigmod21contest","author":"DBGroup of the University of Modena and Reggio Emilia and Database Research","unstructured":"DBGroup of the University of Modena and Reggio Emilia and Database Research Group of the Roma Tre University. Accessed on 2022-03-11 . SIGMOD 2021 Programming Contest Official Website. https:\/\/dbgroup.ing.unimo.it\/sigmod21contest DBGroup of the University of Modena and Reggio Emilia and Database Research Group of the Roma Tre University. Accessed on 2022-03-11. SIGMOD 2021 Programming Contest Official Website. https:\/\/dbgroup.ing.unimo.it\/sigmod21contest"},{"key":"e_1_2_1_14_1","volume-title":"Unsupervised String Transformation Learning for Entity Consolidation","author":"Deng Dong","unstructured":"Dong Deng , Wenbo Tao , Ziawasch Abedjan , Ahmed K. Elmagarmid , Ihab F. Ilyas , Guoliang Li , Samuel Madden , Mourad Ouzzani , Michael Stonebraker , and Nan Tang . 2019. Unsupervised String Transformation Learning for Entity Consolidation . In ICDE. IEEE , 196--207. Dong Deng, Wenbo Tao, Ziawasch Abedjan, Ahmed K. Elmagarmid, Ihab F. Ilyas, Guoliang Li, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2019. Unsupervised String Transformation Learning for Entity Consolidation. In ICDE. IEEE, 196--207."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3405476"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1969.10501049"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2876473.2876474"},{"key":"e_1_2_1_18_1","unstructured":"Luca Gagliardelli Giovanni Simonini and Sonia Bergamaschi. 2020. RulER: Scaling Up Record-level Matching Rules. In EDBT. OpenProceedings.org 611--614.  Luca Gagliardelli Giovanni Simonini and Sonia Bergamaschi. 2020. RulER: Scaling Up Record-level Matching Rules. In EDBT. OpenProceedings.org 611--614."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00656-7"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137833"},{"key":"e_1_2_1_21_1","volume-title":"CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks","author":"Li Peng","unstructured":"Peng Li , Xi Rao , Jennifer Blase , Yue Zhang , Xu Chu , and Ce Zhang . 2021. CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks . In ICDE. IEEE , 13--24. Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, and Ce Zhang. 2021. CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks. In ICDE. IEEE, 13--24."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421431"},{"key":"e_1_2_1_23_1","article-title":"Knowledge Transfer for Entity Resolution with Siamese Neural Networks","volume":"13","author":"Loster Michael","year":"2021","unstructured":"Michael Loster , Ioannis K. Koumarelas , and Felix Naumann . 2021 . Knowledge Transfer for Entity Resolution with Siamese Neural Networks . ACM J. Data Inf. Qual. 13 , 1 (2021), 2:1--2:25. Michael Loster, Ioannis K. Koumarelas, and Felix Naumann. 2021. Knowledge Transfer for Entity Resolution with Siamese Neural Networks. ACM J. Data Inf. Qual. 13, 1 (2021), 2:1--2:25.","journal-title":"ACM J. Data Inf. Qual."},{"key":"e_1_2_1_24_1","volume-title":"Alon Y. Halevy, Shawn R. Jeffery, David Ko, and Cong Yu.","author":"Madhavan Jayant","year":"2007","unstructured":"Jayant Madhavan , Shirley Cohen , Xin Luna Dong , Alon Y. Halevy, Shawn R. Jeffery, David Ko, and Cong Yu. 2007 . Web-scale Data Integration: You Can Afford to Pay as You Go. In CIDR. www.cidrdb.org, 342--350. Jayant Madhavan, Shirley Cohen, Xin Luna Dong, Alon Y. Halevy, Shawn R. Jeffery, David Ko, and Cong Yu. 2007. Web-scale Data Integration: You Can Afford to Pay as You Go. In CIDR. www.cidrdb.org, 342--350."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196926"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352116"},{"key":"e_1_2_1_27_1","volume-title":"A Review of Unsupervised and Semi-supervised Blocking Methods for Record Linkage. Linking and Mining Heterogeneous and Multi-view Data","author":"O'Hare Kevin","year":"2019","unstructured":"Kevin O'Hare , Anna Jurek-Loughrey , and Cassio de Campos . 2019. A Review of Unsupervised and Semi-supervised Blocking Methods for Record Linkage. Linking and Mining Heterogeneous and Multi-view Data ( 2019 ), 79--105. Kevin O'Hare, Anna Jurek-Loughrey, and Cassio de Campos. 2019. A Review of Unsupervised and Semi-supervised Blocking Methods for Record Linkage. Linking and Mining Heterogeneous and Multi-view Data (2019), 79--105."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2020.101565"},{"key":"e_1_2_1_29_1","volume-title":"Blocking and Filtering Techniques for Entity Resolution: A Survey. ACM Comput. Surv. 53, 2","author":"Papadakis George","year":"2020","unstructured":"George Papadakis , Dimitrios Skoutas , Emmanouil Thanos , and Themis Palpanas . 2020. Blocking and Filtering Techniques for Entity Resolution: A Survey. ACM Comput. Surv. 53, 2 ( 2020 ), 31:1--31:42. George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, and Themis Palpanas. 2020. Blocking and Filtering Techniques for Entity Resolution: A Survey. ACM Comput. Surv. 53, 2 (2020), 31:1--31:42."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2947618.2947624"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385658.3385664"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2359666"},{"key":"e_1_2_1_33_1","volume-title":"Towards Progressive Search-driven Entity Resolution. In SEBD (CEUR Workshop Proceedings)","volume":"2161","author":"Pietrangelo Alberto","unstructured":"Alberto Pietrangelo , Giovanni Simonini , Sonia Bergamaschi , Felix Naumann , and Ioannis K. Koumarelas . 2018 . Towards Progressive Search-driven Entity Resolution. In SEBD (CEUR Workshop Proceedings) , Vol. 2161 . CEUR-WS.org. Alberto Pietrangelo, Giovanni Simonini, Sonia Bergamaschi, Felix Naumann, and Ioannis K. Koumarelas. 2018. Towards Progressive Search-driven Entity Resolution. In SEBD (CEUR Workshop Proceedings), Vol. 2161. CEUR-WS.org."},{"key":"e_1_2_1_34_1","unstructured":"Qatar Computing Research Institute (QCRI). Accessed on 2022-03-11. Data Civilizer Address Dataset. https:\/\/raw.githubusercontent.com\/qcri\/data_civilizer_system\/master\/grecord_service\/gr\/data\/address\/address.csv  Qatar Computing Research Institute (QCRI). Accessed on 2022-03-11. Data Civilizer Address Dataset. https:\/\/raw.githubusercontent.com\/qcri\/data_civilizer_system\/master\/grecord_service\/gr\/data\/address\/address.csv"},{"key":"e_1_2_1_35_1","volume-title":"Bootstrapping Pay-as-you-go Data Integration Systems. In SIGMOD Conference. ACM, 861--874","author":"Sarma Anish Das","unstructured":"Anish Das Sarma , Xin Dong , and Alon Y. Halevy . 2008 . Bootstrapping Pay-as-you-go Data Integration Systems. In SIGMOD Conference. ACM, 861--874 . Anish Das Sarma, Xin Dong, and Alon Y. Halevy. 2008. Bootstrapping Pay-as-you-go Data Integration Systems. In SIGMOD Conference. ACM, 861--874."},{"key":"e_1_2_1_36_1","first-page":"21","article-title":"Entity-based Keyword Search in Web","volume":"21","author":"Sartori Enrico","year":"2016","unstructured":"Enrico Sartori , Yannis Velegrakis , and Francesco Guerra . 2016 . Entity-based Keyword Search in Web Documents. Trans. Comput. Collect. Intell. 21 (2016), 21 -- 49 . Enrico Sartori, Yannis Velegrakis, and Francesco Guerra. 2016. Entity-based Keyword Search in Web Documents. Trans. Comput. Collect. Intell. 21 (2016), 21--49.","journal-title":"Documents. Trans. Comput. Collect. Intell."},{"key":"e_1_2_1_37_1","volume-title":"Schema-agnostic Progressive Entity Resolution","author":"Simonini Giovanni","unstructured":"Giovanni Simonini , George Papadakis , Themis Palpanas , and Sonia Bergamaschi . 2018. Schema-agnostic Progressive Entity Resolution . In ICDE. IEEE Computer Society , 53--64. Giovanni Simonini, George Papadakis, Themis Palpanas, and Sonia Bergamaschi. 2018. Schema-agnostic Progressive Entity Resolution. In ICDE. IEEE Computer Society, 53--64."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2852763"},{"key":"e_1_2_1_39_1","unstructured":"Giovanni Simonini Luca Zecchini Sonia Bergamaschi and Felix Naumann. Accessed on 2022-03-11. Entity Resolution On-Demand (Technical Report). https:\/\/github.com\/dbmodena\/BrewER\/blob\/main\/technical_report.pdf  Giovanni Simonini Luca Zecchini Sonia Bergamaschi and Felix Naumann. Accessed on 2022-03-11. Entity Resolution On-Demand (Technical Report). https:\/\/github.com\/dbmodena\/BrewER\/blob\/main\/technical_report.pdf"},{"key":"e_1_2_1_40_1","unstructured":"Michael Stonebraker Daniel Bruckner Ihab F. Ilyas George Beskales Mitch Cherniack Stanley B. Zdonik Alexander Pagan and Shan Xu. 2013. Data Curation at Scale: The Data Tamer System. In CIDR. www.cidrdb.org.  Michael Stonebraker Daniel Bruckner Ihab F. Ilyas George Beskales Mitch Cherniack Stanley B. Zdonik Alexander Pagan and Shan Xu. 2013. Data Curation at Scale: The Data Tamer System. In CIDR. www.cidrdb.org."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476294"},{"key":"e_1_2_1_42_1","volume-title":"Joint Entity Resolution","author":"Whang Steven Euijong","unstructured":"Steven Euijong Whang and Hector Garcia-Molina . 2012. Joint Entity Resolution . In ICDE. IEEE Computer Society , 294--305. Steven Euijong Whang and Hector Garcia-Molina. 2012. Joint Entity Resolution. In ICDE. IEEE Computer Society, 294--305."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2012.43"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389743"},{"key":"e_1_2_1_45_1","volume-title":"DI2KG@VLDB (CEUR Workshop Proceedings)","author":"Zecchini Luca","unstructured":"Luca Zecchini , Giovanni Simonini , and Sonia Bergamaschi . 2020. Entity Resolution on Camera Records without Machine Learning . In DI2KG@VLDB (CEUR Workshop Proceedings) , Vol. 2726 . CEUR-WS. org. Luca Zecchini, Giovanni Simonini, and Sonia Bergamaschi. 2020. Entity Resolution on Camera Records without Machine Learning. In DI2KG@VLDB (CEUR Workshop Proceedings), Vol. 2726. CEUR-WS.org."},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Liang Zhu Xu Du Qin Ma Weiyi Meng and Haibo Liu. 2018. Keyword Search with Real-time Entity Resolution in Relational Databases. In ICMLC. ACM 134--139.  Liang Zhu Xu Du Qin Ma Weiyi Meng and Haibo Liu. 2018. Keyword Search with Real-time Entity Resolution in Relational Databases. In ICMLC. ACM 134--139.","DOI":"10.1145\/3195106.3195171"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3523210.3523226","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:53:47Z","timestamp":1672224827000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3523210.3523226"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3]]},"references-count":46,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["10.14778\/3523210.3523226"],"URL":"https:\/\/doi.org\/10.14778\/3523210.3523226","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,3]]}}}