{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T13:40:01Z","timestamp":1755870001211,"version":"3.44.0"},"reference-count":135,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,8]],"date-time":"2023-12-08T00:00:00Z","timestamp":1701993600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"the National Key R&D Program of China","award":["2021ZD0113903"],"award-info":[{"award-number":["2021ZD0113903"]}]},{"name":"Longhua Science and Technology Innovation Bureau","award":["10162A20220720B12AB12"],"award-info":[{"award-number":["10162A20220720B12AB12"]}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["62202313, 62225202"],"award-info":[{"award-number":["62202313, 62225202"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Royal Society Wolfson Research Merit Award","award":["WRM\/R1\/180014"],"award-info":[{"award-number":["WRM\/R1\/180014"]}]},{"DOI":"10.13039\/501100021171","name":"Guangdong Basic and Applied Basic Research Foundation","doi-asserted-by":"crossref","award":["2022A1515010120"],"award-info":[{"award-number":["2022A1515010120"]}],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,12,8]]},"abstract":"<jats:p>There has been a host of work on entity resolution (ER), to identify tuples that refer to the same entity. This paper studies the inverse of ER, to identify tuples to which distinct real-world entities are matched by mistake, and split such tuples into a set of tuples, one for each entity. We formulate the tuple splitting problem. We propose a scheme to decide what tuples to split and what tuples to correct without splitting, fix errors\/assign attribute values to the split tuples, and impute missing values. The scheme introduces a class of rules, which embed predicates for aligning entities across relations and knowledge graphs G, assessing correlation between attributes, and extracting data from G. It unifies logic deduction, correlation models, and data extraction by chasing the data with the rules. We train machine learning models to assess attribute correlation and predict missing values. We develop algorithms for the tuple splitting scheme. Using real-life data, we empirically verify that the scheme is efficient and accurate, with F-measure 0.92 on average.<\/jats:p>","DOI":"10.1145\/3626763","type":"journal-article","created":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T14:01:21Z","timestamp":1702389681000},"page":"1-29","source":"Crossref","is-referenced-by-count":1,"title":["Splitting Tuples of Mismatched Entities"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5149-2656","authenticated-orcid":false,"given":"Wenfei","family":"Fan","sequence":"first","affiliation":[{"name":"Beihang University &amp; Shenzhen Institute of Computing Sciences &amp; University of Edinburgh, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6614-3755","authenticated-orcid":false,"given":"Ziyan","family":"Han","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1710-8726","authenticated-orcid":false,"given":"Weilong","family":"Ren","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2937-4590","authenticated-orcid":false,"given":"Ding","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5760-5145","authenticated-orcid":false,"given":"Yaoshu","family":"Wang","sequence":"additional","affiliation":[{"name":"Shenzhen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2356-782X","authenticated-orcid":false,"given":"Min","family":"Xie","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8249-9695","authenticated-orcid":false,"given":"Mengyi","family":"Yan","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2023,12,12]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2013. Lego Friends. https:\/\/www.imdb.com\/title\/tt4049416\/."},{"key":"e_1_2_2_2_1","unstructured":"2013. Lego Friends. https:\/\/www.imdb.com\/title\/tt9148446\/."},{"key":"e_1_2_2_3_1","unstructured":"2013. Storm. http:\/\/filmstudieren.ch\/en\/storm#1."},{"key":"e_1_2_2_4_1","unstructured":"2022. Colleges. https:\/\/data.world\/dhs\/colleges-and-universities."},{"key":"e_1_2_2_5_1","unstructured":"2022. Colleges KG. https:\/\/nces.ed.gov\/GLOBALLOCATOR\/."},{"key":"e_1_2_2_6_1","unstructured":"2022. DBLP. https:\/\/dblp.org\/rdf\/release\/dblp-2022-05-02.nt.gz."},{"key":"e_1_2_2_7_1","unstructured":"2022. Elected Councillors in Kagawa at-large district of Japan. https:\/\/en.wikipedia.org\/?curid=27298128."},{"key":"e_1_2_2_8_1","unstructured":"2022. Help:Conflation of two people. https:\/\/www.wikidata.org\/wiki\/Help:Conflation_of_two_people."},{"key":"e_1_2_2_9_1","unstructured":"2022. Hirai Tar\u00afo (novelist). https:\/\/en.wikipedia.org\/wiki\/Edogawa_Ranpo."},{"key":"e_1_2_2_10_1","unstructured":"2022. Wikemedia. https:\/\/www.kaggle.com\/datasets\/kenshoresearch\/kensho-derived-wikimedia-data."},{"key":"e_1_2_2_11_1","unstructured":"2023. BA film. https:\/\/www.zhdk.ch\/en\/degree-programmes\/film\/ba-film."},{"key":"e_1_2_2_12_1","unstructured":"2023. Code datasets and full version. https:\/\/drive.google.com\/drive\/folders\/1-Bc20q3hc26cqW-7zJ3R0xHm-t00CrIu?usp=sharing."},{"key":"e_1_2_2_13_1","unstructured":"2023. DOK.fest. https:\/\/www.dokfest-muenchen.de\/."},{"key":"e_1_2_2_14_1","unstructured":"2023. Dun & Bradstreet. https:\/\/www.dnb.com\/."},{"volume-title":"Filmography by ZHdK. https:\/\/www.swissfilms.ch\/en\/company\/zrcher-hochschule-der-knste-zhdk-departement-darstellende-knste-und-film\/A96DAF3F0CF04DEDBD79404DC793ED02","key":"e_1_2_2_15_1","unstructured":"2023. Filmography by ZHdK. https:\/\/www.swissfilms.ch\/en\/company\/zrcher-hochschule-der-knste-zhdk-departement-darstellende-knste-und-film\/A96DAF3F0CF04DEDBD79404DC793ED02."},{"key":"e_1_2_2_16_1","unstructured":"2023. IMDB. https:\/\/www.imdb.com\/interfaces\/."},{"key":"e_1_2_2_17_1","unstructured":"2023. IMDB Name Split. https:\/\/help.imdb.com\/article\/contribution\/names-biographical-data\/names\/GSA3M6SFHRAERXZ3#."},{"key":"e_1_2_2_18_1","unstructured":"2023. Noemi Schneide (German). https:\/\/www.dokfest-muenchen.de\/films\/walaa?lang=en & https:\/\/de.wikipedia.org\/wiki\/Noemi_Schneider."},{"key":"e_1_2_2_19_1","unstructured":"2023. Noemi Schneider (Swiss). https:\/\/www.swissfilms.ch\/en\/person\/nomi-natascha-schneider\/385CEC7054A64FDC946F008A4432A4B9."},{"key":"e_1_2_2_20_1","unstructured":"2023. US Bureau of Labor Statistics. https:\/\/www.bls.gov\/."},{"key":"e_1_2_2_21_1","unstructured":"2023. Wikidata. https:\/\/www.wikidata.org."},{"key":"e_1_2_2_22_1","unstructured":"2023. Wikipedia. https:\/\/en.wikipedia.org\/."},{"volume-title":"Foundations of Databases","author":"Abiteboul Serge","key":"e_1_2_2_23_1","unstructured":"Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley."},{"key":"e_1_2_2_24_1","doi-asserted-by":"crossref","unstructured":"Arvind Arasu Michaela G\u00f6tz and Raghav Kaushik. 2010. On active learning of record matching packages. In SIGMOD. 783--794.","DOI":"10.1145\/1807167.1807252"},{"key":"e_1_2_2_25_1","doi-asserted-by":"crossref","unstructured":"Arvind Arasu Christopher R\u00e9 and Dan Suciu. 2009. Large-Scale Deduplication with Constraints Using Dedupalog. In ICDE. 952--963.","DOI":"10.1109\/ICDE.2009.43"},{"key":"e_1_2_2_26_1","doi-asserted-by":"crossref","unstructured":"Marcelo Arenas Leopoldo Bertossi and Jan Chomicki. 1999. Consistent Query Answers in Inconsistent Databases. In PODS. 68--79.","DOI":"10.1145\/303976.303983"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijar.2017.01.003"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476300"},{"volume-title":"Database Repairing and Consistent Query Answering","author":"Bertossi Leopoldo","key":"e_1_2_2_29_1","unstructured":"Leopoldo Bertossi. 2011. Database Repairing and Consistent Query Answering. Morgan & Claypool Publishers."},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-012-9402-7"},{"key":"e_1_2_2_31_1","first-page":"1","article-title":"DataWig: Missing Value Imputation for Tables","volume":"20","author":"Biessmann Felix","year":"2019","unstructured":"Felix Biessmann, Tammo Rukat, Philipp Schmidt, Prathik Naidu, Sebastian Schelter, Andrey Taptunov, Dustin Lange, and David Salinas. 2019. DataWig: Missing Value Imputation for Tables. J. Mach. Learn. Res. 20, 175 (2019), 1--6.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_2_32_1","volume-title":"Adaptive Blocking: Learning to Scale Up Record Linkage. In ICDM. 87--96.","author":"Bilenko Mikhail","year":"2006","unstructured":"Mikhail Bilenko, Beena Kamath, and Raymond J Mooney. 2006. Adaptive Blocking: Learning to Scale Up Record Linkage. In ICDM. 87--96."},{"key":"e_1_2_2_33_1","unstructured":"Cory Bohon. 2022. How to find and merge duplicate contacts in iOS 16. https:\/\/www.techrepublic.com\/article\/merge-duplicate-contacts-ios-16\/."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376746"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242153.3242156"},{"key":"e_1_2_2_36_1","doi-asserted-by":"crossref","unstructured":"Zhaoqiang Chen Qun Chen Boyi Hou Zhanhuai Li and Guoliang Li. 2020. Towards interpretable and learnable risk analysis for entity resolution. In SIGMOD. 1165--1180.","DOI":"10.1145\/3318464.3380572"},{"key":"e_1_2_2_37_1","volume-title":"Database Systems: 65--98","author":"Codd E. F.","year":"1972","unstructured":"E. F. Codd. 1972. Relational Completeness of Data Base Sublanguages. In: R. Rustin (ed.): Database Systems: 65--98, Prentice Hall and IBM Research Report RJ 987, San Jose, California (1972)."},{"key":"e_1_2_2_38_1","unstructured":"Jess Cody. 2022. Where does data come from. https:\/\/clearbit.com\/blog\/where-does-data-come-from."},{"key":"e_1_2_2_39_1","unstructured":"Gao Cong Wenfei Fan Floris Geerts Xibei Jia and Shuai Ma. 2007. Improving Data Quality: Consistency and Accuracy. In VLDB. 315--326."},{"key":"e_1_2_2_40_1","volume-title":"Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services. In SIGMOD. 1431--1446.","author":"Das Sanjib","year":"2017","unstructured":"Sanjib Das, Paul Suganthan G. C., AnHai Doan, Jeffrey F. Naughton, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, Vijay Raghavendra, and Youngchoon Park. 2017. Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services. In SIGMOD. 1431--1446."},{"key":"e_1_2_2_41_1","volume-title":"Yi Chen, and Subbarao Kambhampati.","author":"De Sushovan","year":"2015","unstructured":"Sushovan De, Yuheng Hu, Venkata Vamsikrishna Meduri, Yi Chen, and Subbarao Kambhampati. 2015. BayesWipe: A Scalable Probabilistic Framework for Cleaning BigData. CoRR abs\/1506.08908 (2015)."},{"key":"e_1_2_2_42_1","volume-title":"Deep and Collective Entity Resolution in Parallel","author":"Deng Ting","year":"2060","unstructured":"Ting Deng, Wenfei Fan, Ping Lu, Xiaomeng Luo, Xiaoke Zhu, and Wanhe An. 2022. Deep and Collective Entity Resolution in Parallel. In ICDE. IEEE, 2060--2072."},{"key":"e_1_2_2_43_1","doi-asserted-by":"crossref","unstructured":"Daniel Deutch Nave Frost Amir Gilad and Oren Sheffer. 2021. Explanations for Data Repair Through Shapley Values. In CIKM. ACM.","DOI":"10.1145\/3459637.3482341"},{"key":"e_1_2_2_44_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186."},{"key":"e_1_2_2_45_1","volume-title":"Leveraging currency for repairing inconsistent and incomplete data. TKDE","author":"Ding Xiaoou","year":"2020","unstructured":"Xiaoou Ding, Hongzhi Wang, Jiaxuan Su, Muxian Wang, Jianzhong Li, and Hong Gao. 2020. Leveraging currency for repairing inconsistent and incomplete data. TKDE (2020)."},{"key":"e_1_2_2_46_1","first-page":"1454","article-title":"Distributed Representations of Tuples for Entity Resolution","volume":"11","author":"Ebraheem Muhammad","year":"2018","unstructured":"Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed Representations of Tuples for Entity Resolution. PVLDB 11, 11 (2018), 1454--1467.","journal-title":"PVLDB"},{"key":"e_1_2_2_47_1","unstructured":"e_kartoffel. 2015. Names merged in error (by me). https:\/\/community-imdb.sprinklr.com\/conversations\/data-issues-policy-discussions\/names-merged-in-error-by-me\/5f4a79838815453dba7fbebc."},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-010-0206-6"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1366102.1366103"},{"volume-title":"Linking Entities across Relations and Graphs","author":"Fan Wenfei","key":"e_1_2_2_50_1","unstructured":"Wenfei Fan, Liang Geng, Ruochun Jin, Ping Lu, Resul Tugey, and Wenyuan Yu. 2022. Linking Entities across Relations and Graphs. In ICDE. IEEE, 634--647."},{"key":"e_1_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Wenfei Fan Ziyan Han Yaoshu Wang and Min Xie. 2022. Parallel Rule Discovery from Large Datasets by Sampling. In SIGMOD. ACM 384--398.","DOI":"10.1145\/3514221.3526165"},{"key":"e_1_2_2_52_1","unstructured":"Wenfei Fan Ziyan Han Yaoshu Wang and Min Xie. 2023. Discovering Top-k Rules using Subjective and Objective Criteria. In SIGMOD. ACM."},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402755.3402774"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-011-0253-7"},{"key":"e_1_2_2_55_1","volume-title":"Unifying Logic Rules and Machine Learning for Entity Enhancing. Sci. China Inf. Sci. 63, 7","author":"Fan Wenfei","year":"2020","unstructured":"Wenfei Fan, Ping Lu, and Chao Tian. 2020. Unifying Logic Rules and Machine Learning for Entity Enhancing. Sci. China Inf. Sci. 63, 7 (2020)."},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3317315.3317318"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.14778\/3457390.3457400"},{"key":"e_1_2_2_58_1","doi-asserted-by":"crossref","unstructured":"Cheng Fu Xianpei Han Le Sun Bo Chen Wei Zhang Suhui Wu and Hao Kong. 2019. End-to-end multi-perspective matching for entity resolution. In IJCAI. 4961--4967.","DOI":"10.24963\/ijcai.2019\/689"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380297"},{"key":"e_1_2_2_60_1","volume-title":"Bondell","author":"Gao Erdun","year":"2022","unstructured":"Erdun Gao, Ignavier Ng, Mingming Gong, Li Shen, Wei Huang, Tongliang Liu, Kun Zhang, and Howard D. Bondell. 2022. MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models. In NeurIPS."},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536360.2536363"},{"key":"e_1_2_2_62_1","doi-asserted-by":"crossref","unstructured":"Stella Giannakopoulou Manos Karpathiotakis and Anastasia Ailamaki. 2020. Cleaning denial constraint violations through relaxation. In SIGMOD. 805--815.","DOI":"10.1145\/3318464.3389775"},{"key":"e_1_2_2_63_1","doi-asserted-by":"crossref","unstructured":"Amir Gilad Daniel Deutch and Sudeepa Roy. 2020. On multiple semantics for declarative database repairs. In SIGMOD. 817--831.","DOI":"10.1145\/3318464.3389721"},{"key":"e_1_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588576"},{"key":"e_1_2_2_65_1","volume-title":"Multiple imputation using deep denoising autoencoders. arXiv preprint arXiv:1705.02737 280","author":"Gondara Lovedeep","year":"2017","unstructured":"Lovedeep Gondara and Ke Wang. 2017. Multiple imputation using deep denoising autoencoders. arXiv preprint arXiv:1705.02737 280 (2017)."},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920897"},{"key":"e_1_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00653-w"},{"key":"e_1_2_2_68_1","unstructured":"IMDb help center. 2023. How can I combine two IMDb name pages? https:\/\/help.imdb.com\/article\/contribution\/names-biographical-data\/how-can-i-combine-two-imdb-name-pages\/G3TNPWSGKZNRU3MP?ref_=helpsrall#."},{"key":"e_1_2_2_69_1","doi-asserted-by":"crossref","unstructured":"Benjamin Hilprecht and Carsten Binnig. 2021. ReStore - Neural Data Completion for Relational Databases. In SIGMOD. 710--722.","DOI":"10.1145\/3448016.3457264"},{"volume-title":"Rule learning from knowledge graphs guided by embedding models","author":"Ho Vinh Thinh","key":"e_1_2_2_70_1","unstructured":"Vinh Thinh Ho, Daria Stepanova, Mohamed H Gad-Elrab, Evgeny Kharlamov, and Gerhard Weikum. 2018. Rule learning from knowledge graphs guided by embedding models. In ISWC. Springer, 72--90."},{"key":"e_1_2_2_71_1","volume-title":"Long short-term memory. Neural computation 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780."},{"key":"e_1_2_2_72_1","first-page":"347","article-title":"r-HUMO: A Risk-aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees","volume":"32","author":"Hou Boyi","year":"2018","unstructured":"Boyi Hou, Qun Chen, Zhaoqiang Chen, Youcef Nafa, and Zhanhuai Li. 2018. r-HUMO: A Risk-aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees. TKDE 32, 2 (2018), 347--359.","journal-title":"TKDE"},{"key":"e_1_2_2_73_1","unstructured":"Huldra. 2020. Help talk: Conflation of two people. https:\/\/www.wikidata.org\/wiki\/Help_talk:Conflation_of_two_people."},{"key":"e_1_2_2_74_1","unstructured":"Vassilis N Ioannidis Xiang Song Saurav Manchanda Mufei Li Xiaoqin Pan Da Zheng Xia Ning Xiangxiang Zeng and George Karypis. 2020. Drkg-drug repurposing knowledge graph for covid-19. https:\/\/github.com\/gnn4dr\/DRKG\/."},{"key":"e_1_2_2_75_1","doi-asserted-by":"crossref","unstructured":"Robert Isele Anja Jentzsch and Christian Bizer. 2010. Silk server-adding missing links while consuming linked data. In COLD. 85--96.","DOI":"10.1007\/978-3-031-79432-2_6"},{"key":"e_1_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_2_2_77_1","doi-asserted-by":"publisher","DOI":"10.14778\/3430915.3430920"},{"key":"e_1_2_2_78_1","doi-asserted-by":"publisher","DOI":"10.14778\/3377369.3377383"},{"key":"e_1_2_2_79_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920904"},{"key":"e_1_2_2_80_1","first-page":"712","article-title":"MDedup: Duplicate detection with matching dependencies","volume":"13","author":"Papenbrock Thorsten","year":"2020","unstructured":"loannis Koumarelas, Thorsten Papenbrock, and Felix Naumann. 2020. MDedup: Duplicate detection with matching dependencies. PVLDB 13, 5 (2020), 712--725.","journal-title":"PVLDB"},{"key":"e_1_2_2_81_1","volume-title":"MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms. In NeurIPS. 23806--23817.","author":"Kyono Trent","year":"2021","unstructured":"Trent Kyono, Yao Zhang, Alexis Bellot, and Mihaela van der Schaar. 2021. MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms. In NeurIPS. 23806--23817."},{"volume-title":"Robust factorization of real-world tensor streams with patterns, missing values, and outliers","author":"Lee Dongjin","key":"e_1_2_2_82_1","unstructured":"Dongjin Lee and Kijung Shin. 2021. Robust factorization of real-world tensor streams with patterns, missing values, and outliers. In ICDE. IEEE, 840--851."},{"key":"e_1_2_2_83_1","volume-title":"DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web","author":"Lehmann Jens","year":"2015","unstructured":"Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, S\u00f6ren Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web (2015)."},{"key":"e_1_2_2_84_1","unstructured":"Adam Lerer Ledell Wu Jiajun Shen Timoth\u00e9e Lacroix Luca Wehrstedt Abhijit Bose and Alex Peysakhovich. 2019. Pytorch-BigGraph: A Large Scale Graph Embedding System. In MLSys."},{"key":"e_1_2_2_85_1","volume-title":"Mansinghka","author":"Lew Alexander K.","year":"2020","unstructured":"Alexander K. Lew, Monica Agrawal, David A. Sontag, and Vikash K. Mansinghka. 2020. PClean: Bayesian Data Cleaning at Scale with Domain-Specific Probabilistic Programming. CoRR abs\/2007.11838 (2020)."},{"key":"e_1_2_2_86_1","volume-title":"Muhammad Asif Ali, and Yi Wang","author":"Li Bing","year":"2020","unstructured":"Bing Li, Wei Wang, Yifang Sun, Linhan Zhang, Muhammad Asif Ali, and Yi Wang. 2020. GraphER: Token-Centric Entity Resolution with Graph Convolutional Neural Networks.. In AAAI. 8172--8179."},{"key":"e_1_2_2_87_1","volume-title":"CleanML: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]. CoRR abs\/1904.09483","author":"Li Peng","year":"2019","unstructured":"Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, and Ce Zhang. 2019. CleanML: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]. CoRR abs\/1904.09483 (2019)."},{"key":"e_1_2_2_88_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421431"},{"key":"e_1_2_2_89_1","doi-asserted-by":"crossref","unstructured":"Xi Liang Zechao Shang Sanjay Krishnan Aaron J Elmore and Michael J Franklin. 2020. Fast and reliable missing data contingency analysis with predicate-constraints. In SIGMOD. 285--295.","DOI":"10.1145\/3318464.3389785"},{"key":"e_1_2_2_90_1","volume-title":"Picket: Self-supervised Data Diagnostics for ML Pipelines. CoRR abs\/2006.04730","author":"Liu Zifan","year":"2020","unstructured":"Zifan Liu, Zhechun Zhou, and Theodoros Rekatsinas. 2020. Picket: Self-supervised Data Diagnostics for ML Pipelines. CoRR abs\/2006.04730 (2020)."},{"key":"e_1_2_2_91_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407801"},{"key":"e_1_2_2_92_1","volume-title":"Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang.","author":"Mahdavi Mohammad","year":"2019","unstructured":"Mohammad Mahdavi, Ziawasch Abedjan, Raul Castro Fernandez, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2019. Raha: A Configuration-Free Error Detection System. In SIGMOD. 865--882."},{"key":"e_1_2_2_93_1","volume-title":"MIWAE: Deep generative modelling and imputation of incomplete data sets. In ICML. PMLR, 4413--4423.","author":"Mattei Pierre-Alexandre","year":"2019","unstructured":"Pierre-Alexandre Mattei and Jes Frellsen. 2019. MIWAE: Deep generative modelling and imputation of incomplete data sets. In ICML. PMLR, 4413--4423."},{"key":"e_1_2_2_94_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ifacol.2018.09.406"},{"key":"e_1_2_2_95_1","doi-asserted-by":"crossref","unstructured":"Venkata Vamsikrishna Meduri Lucian Popa Prithviraj Sen and Mohamed Sarwat. 2020. A comprehensive benchmark framework for active learning methods in entity matching. In SIGMOD. 1133--1147.","DOI":"10.1145\/3318464.3380597"},{"volume-title":"Capturing Semantics for Imputation with Pre-trained Language Models","author":"Mei Yinan","key":"e_1_2_2_96_1","unstructured":"Yinan Mei, Shaoxu Song, Chenguang Fang, Haifeng Yang, Jingyun Fang, and Jiang Long. 2021. Capturing Semantics for Imputation with Pre-trained Language Models. In ICDE. IEEE, 61--72."},{"key":"e_1_2_2_97_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494143"},{"key":"e_1_2_2_98_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i10.17086"},{"key":"e_1_2_2_99_1","doi-asserted-by":"crossref","unstructured":"Sidharth Mudgal Han Li Theodoros Rekatsinas AnHai Doan Youngchoon Park Ganesh Krishnan Rohit Deep Esteban Arcaute and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In SIGMOD. 19--34.","DOI":"10.1145\/3183713.3196926"},{"key":"e_1_2_2_100_1","volume-title":"International Conference on Machine Learning. PMLR, 7130--7140","author":"Muzellec Boris","year":"2020","unstructured":"Boris Muzellec, Julie Josse, Claire Boyer, and Marco Cuturi. 2020. Missing data imputation using optimal transport. In International Conference on Machine Learning. PMLR, 7130--7140."},{"key":"e_1_2_2_101_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107501"},{"key":"e_1_2_2_102_1","first-page":"1348","article-title":"An embedding-based approach to rule learning in knowledge graphs","volume":"33","author":"Omran Pouya Ghiasnezhad","year":"2019","unstructured":"Pouya Ghiasnezhad Omran, Kewen Wang, and Zhe Wang. 2019. An embedding-based approach to rule learning in knowledge graphs. TKDE 33, 4 (2019), 1348--1359.","journal-title":"TKDE"},{"key":"e_1_2_2_103_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2020.101565"},{"key":"e_1_2_2_104_1","doi-asserted-by":"publisher","DOI":"10.14778\/3377369.3377377"},{"key":"e_1_2_2_105_1","doi-asserted-by":"crossref","unstructured":"Kun Qian Lucian Popa and Prithviraj Sen. 2017. Active Learning for Large-Scale Entity Resolution. In CIKM. 1379--1388.","DOI":"10.1145\/3132847.3132949"},{"key":"e_1_2_2_106_1","doi-asserted-by":"crossref","unstructured":"Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In EMNLP-IJCNLP. 3980--3990.","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_2_107_1","unstructured":"Florian Reitz. 2020. Corrections in dblp. https:\/\/blog.dblp.org\/2020\/01\/08\/corrections-in-dblp-2019\/."},{"key":"e_1_2_2_108_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137631"},{"key":"e_1_2_2_109_1","doi-asserted-by":"crossref","unstructured":"Weilong Ren Xiang Lian and Kambiz Ghazinour. 2021. Online Topic-Aware Entity Resolution Over Incomplete Data Streams. In SIGMOD. 1478--1490.","DOI":"10.1145\/3448016.3457238"},{"key":"e_1_2_2_110_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476301"},{"key":"e_1_2_2_111_1","volume-title":"Ullman","author":"Sadri Fereidoon","year":"1980","unstructured":"Fereidoon Sadri and Jeffrey D. Ullman. 1980. The Interaction between Functional Dependencies and Template Dependencies. In SIGMOD."},{"key":"e_1_2_2_112_1","doi-asserted-by":"crossref","unstructured":"Philipp Schirmer Thorsten Papenbrock Ioannis K. Koumarelas and Felix Naumann. 2020. Efficient Discovery of Matching Dependencies. ACM Trans. Database Syst. (2020).","DOI":"10.1145\/3392778"},{"key":"e_1_2_2_113_1","doi-asserted-by":"publisher","DOI":"10.14778\/3149193.3149199"},{"volume-title":"Explaining Missing Data in Graphs: A Constraint-based Approach","author":"Song Qi","key":"e_1_2_2_114_1","unstructured":"Qi Song, Peng Lin, Hanchao Ma, and Yinghui Wu. 2021. Explaining Missing Data in Graphs: A Constraint-based Approach. In ICDE. IEEE, 1476--1487."},{"key":"e_1_2_2_115_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2013.06.003"},{"key":"e_1_2_2_116_1","first-page":"275","article-title":"Enriching data imputation under similarity rule constraints","volume":"32","author":"Song Shaoxu","year":"2018","unstructured":"Shaoxu Song, Yu Sun, Aoqian Zhang, Lei Chen, and Jianmin Wang. 2018. Enriching data imputation under similarity rule constraints. TKDE 32, 2 (2018), 275--287.","journal-title":"TKDE"},{"key":"e_1_2_2_117_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2020.06.005"},{"key":"e_1_2_2_118_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242667"},{"key":"e_1_2_2_119_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218488502001648"},{"key":"e_1_2_2_120_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476294"},{"key":"e_1_2_2_121_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is All you Need. In NeurlPS. 5998--6008."},{"key":"e_1_2_2_122_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-009-0136-3"},{"key":"e_1_2_2_123_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-013-0308-z"},{"key":"e_1_2_2_124_1","doi-asserted-by":"crossref","unstructured":"Renzhi Wu Sanya Chaba Saurabh Sawlani Xu Chu and Saravanan Thirumuruganathan. 2020. ZeroER: Entity Resolution using Zero Labeled Examples. In SIGMOD. 1149--1164.","DOI":"10.1145\/3318464.3389743"},{"key":"e_1_2_2_125_1","volume-title":"MLSys","author":"Wu Richard","year":"2020","unstructured":"Richard Wu, Aoqian Zhang, Ihab F. Ilyas, and Theodoros Rekatsinas. 2020. Attention-based Learning for Missing Data Imputation in HoloClean. In MLSys 2020."},{"key":"e_1_2_2_126_1","volume-title":"Elmagarmid","author":"Yakout Mohamed","year":"2013","unstructured":"Mohamed Yakout, Laure Berti-\u00c9quille, and Ahmed K. Elmagarmid. 2013. Don't Be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes. In SIGMOD. ACM."},{"key":"e_1_2_2_127_1","doi-asserted-by":"crossref","unstructured":"Yan Yan Stephen Meyles Aria Haghighi and Dan Suciu. 2020. Entity matching in the wild: A consistent and versatile framework to unify data in industrial applications. In SIGMOD. 2287--2301.","DOI":"10.1145\/3318464.3386143"},{"key":"e_1_2_2_128_1","volume-title":"GAIN: Missing Data Imputation using Generative Adversarial Nets. In ICML. PMLR, 5675--5684.","author":"Yoon Jinsung","year":"2018","unstructured":"Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. GAIN: Missing Data Imputation using Generative Adversarial Nets. In ICML. PMLR, 5675--5684."},{"key":"e_1_2_2_129_1","unstructured":"zeorb. 2018. How do I split a TV Series into 2 tv series? https:\/\/community-imdb.sprinklr.com\/conversations\/data-issues-policy-discussions\/how-do-i-split-a-tv-series-into-2-tv-series\/5f4a79fa8815453dba940741."},{"volume-title":"Learning individual models for imputation","author":"Zhang Aoqian","key":"e_1_2_2_130_1","unstructured":"Aoqian Zhang, Shaoxu Song, Yu Sun, and Jianmin Wang. 2019. Learning individual models for imputation. In ICDE. IEEE, 160--171."},{"volume-title":"A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution","author":"Zhang Dongxiang","key":"e_1_2_2_131_1","unstructured":"Dongxiang Zhang, Long Guo, Xiangnan He, Jie Shao, Sai Wu, and Heng Tao Shen. 2018. A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution. In ICDE. IEEE, 713--724."},{"key":"e_1_2_2_132_1","first-page":"1501","article-title":"Unsupervised entity resolution with blocking and graph algorithms","volume":"34","author":"Zhang Dongxiang","year":"2020","unstructured":"Dongxiang Zhang, Dongsheng Li, Long Guo, and Kian-Lee Tan. 2020. Unsupervised entity resolution with blocking and graph algorithms. TKDE 34, 3 (2020), 1501--1515.","journal-title":"TKDE"},{"key":"e_1_2_2_133_1","doi-asserted-by":"crossref","unstructured":"Wen Zhang Bibek Paudel Liang Wang Jiaoyan Chen Hai Zhu Wei Zhang Abraham Bernstein and Huajun Chen. 2019. Iteratively learning embeddings and rules for knowledge graph reasoning. In WWW. 2366--2377.","DOI":"10.1145\/3308558.3313612"},{"key":"e_1_2_2_134_1","volume-title":"Fairness in Missing Data Imputation. CoRR abs\/2110.12002","author":"Zhang Yiliang","year":"2021","unstructured":"Yiliang Zhang and Qi Long. 2021. Fairness in Missing Data Imputation. CoRR abs\/2110.12002 (2021)."},{"key":"e_1_2_2_135_1","doi-asserted-by":"crossref","unstructured":"Chen Zhao and Yeye He. 2019. Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning. In WWW. 2413--2424.","DOI":"10.1145\/3308558.3313578"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626763","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3626763","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T13:02:10Z","timestamp":1755867730000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626763"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,8]]},"references-count":135,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,8]]}},"alternative-id":["10.1145\/3626763"],"URL":"https:\/\/doi.org\/10.1145\/3626763","relation":{},"ISSN":["2836-6573"],"issn-type":[{"type":"electronic","value":"2836-6573"}],"subject":[],"published":{"date-parts":[[2023,12,8]]}}}