{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,30]],"date-time":"2025-06-30T12:26:31Z","timestamp":1751286391918,"version":"3.28.2"},"reference-count":139,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:p>This paper studies a new problem of relation enrichment. Given a relation<jats:italic>D<\/jats:italic>of schema<jats:italic>R<\/jats:italic>and a knowledge graph<jats:italic>G<\/jats:italic>with overlapping information, it is to identify a small number of relevant features from<jats:italic>G<\/jats:italic>, and extend schema<jats:italic>R<\/jats:italic>with the additional attributes, to maximally improve the accuracy of resolving entities represented by the tuples of<jats:italic>D.<\/jats:italic>We formulate the enrichment problem and show its intractability. Nonetheless, we propose a method to extract features from<jats:italic>G<\/jats:italic>that are diverse from the existing attributes of<jats:italic>R<\/jats:italic>, minimize null values, and moreover, reduce false positives and false negatives of entity resolution (ER) models. The method links tuples and vertices that refer to the same entity, learns a robust policy to extract attributes via reinforcement learning, and jointly trains the policy and ER models. Moreover, we develop algorithms for (incrementally) enriching<jats:italic>D.<\/jats:italic>Using real-life data, we experimentally verify that relation enrichment improves the accuracy of ER above 15.4% (percentage points) by adding 5 attributes, up to 33%.<\/jats:p>","DOI":"10.14778\/3681954.3681987","type":"journal-article","created":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T16:23:36Z","timestamp":1725035016000},"page":"3109-3123","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Enriching Relations with Additional Attributes for ER"],"prefix":"10.14778","volume":"17","author":[{"given":"Mengyi","family":"Yan","sequence":"first","affiliation":[{"name":"Beihang University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenfei","family":"Fan","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China and University of Edinburgh, United Kingdom and Beihang University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yaoshu","family":"Wang","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Min","family":"Xie","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,8,30]]},"reference":[{"unstructured":"2017. Identity fraud's impact on the insurance sector. https:\/\/legal.thomsonreuters.com\/en\/insights\/articles\/identity-frauds-impact-on-the-insurance-sector.","key":"e_1_2_1_1_1"},{"unstructured":"2019. IMDB. https:\/\/www.imdb.com\/interfaces\/.","key":"e_1_2_1_2_1"},{"unstructured":"2020. Knowledge Graphs for Financial Services. https:\/\/www2.deloitte.com\/content\/dam\/Deloitte\/nl\/Documents\/risk\/deloitte-nl-risk-knowledge-graphs-financial-services.pdf.","key":"e_1_2_1_3_1"},{"unstructured":"2022. DBpedia. http:\/\/wiki.dbpedia.org.","key":"e_1_2_1_4_1"},{"unstructured":"2022. Fraud detection using knowledge graph: How to detect and visualize fraudulent activities. https:\/\/www.nebula-graph.io\/posts\/fraud-detection-using-knowledge-and-graph-database.","key":"e_1_2_1_5_1"},{"unstructured":"2022. How Fraudsters Create Fake Identities. https:\/\/www.shift-technology.com\/resources\/perspectives\/sme-perspectives-how-fraudsters-create-fake-identities.","key":"e_1_2_1_6_1"},{"unstructured":"2022. Wikemedia. https:\/\/www.kaggle.com\/datasets\/kenshoresearch\/kensho-derived-wikimedia-data.","key":"e_1_2_1_7_1"},{"unstructured":"2022. Wikidata - Recent changes. https:\/\/www.amazon.science\/blog\/combining-knowledge-graphs-quickly-and-accurately.","key":"e_1_2_1_8_1"},{"unstructured":"2022. Wikipedia. https:\/\/www.wikipedia.org.","key":"e_1_2_1_9_1"},{"unstructured":"2023. Code datasets and full version. https:\/\/github.com\/SICS-Fundamental-Research-Center\/Enrichment.","key":"e_1_2_1_10_1"},{"unstructured":"2023. IMDb Non-Commercial Datasets. https:\/\/developer.imdb.com\/non-commercial-datasets.","key":"e_1_2_1_11_1"},{"unstructured":"2023. Leverage Data Enrichment to Ensure You're Dealing with Real People. https:\/\/seon.io\/resources\/online-insurance-fraud\/.","key":"e_1_2_1_12_1"},{"unstructured":"2023. SEON. https:\/\/seon.io\/.","key":"e_1_2_1_13_1"},{"unstructured":"2023. SIFT. https:\/\/sift.com\/.","key":"e_1_2_1_14_1"},{"unstructured":"2023. Social Network Usage and Growth Statistics. https:\/\/backlinko.com\/social-media-users.","key":"e_1_2_1_15_1"},{"unstructured":"2023. STARBUCKS eGIFT. https:\/\/www.starbucks.com\/terms\/gift-card-offer-terms\/.","key":"e_1_2_1_16_1"},{"unstructured":"2023. Wikidata:WikiProject Disambiguation pages. https:\/\/www.wikidata.org\/wiki\/Wikidata:WikiProject_Disambiguation_pages.","key":"e_1_2_1_17_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_18_1","DOI":"10.1016\/j.jnca.2016.04.007"},{"volume-title":"Accelerating Entity Lookups in Knowledge Graphs Through Embeddings","author":"Abuoda Ghadeer","unstructured":"Ghadeer Abuoda, Saravanan Thirumuruganathan, and Ashraf Aboulnaga. 2022. Accelerating Entity Lookups in Knowledge Graphs Through Embeddings. In ICDE. IEEE, 1111--1123.","key":"e_1_2_1_19_1"},{"key":"e_1_2_1_20_1","volume-title":"Bankert","author":"Aha David W.","year":"1995","unstructured":"David W. Aha and Richard L. Bankert. 1995. A Comparative Evaluation of Sequential Feature Selection Algorithms. In Learning from Data - Fifth International Workshop on Artificial Intelligence and Statistics (AISTATS). Springer, 199--206."},{"key":"e_1_2_1_21_1","volume-title":"Ismailcem Budak Arpinar, and Amit P. Sheth","author":"Aleman-Meza Boanerges","year":"2003","unstructured":"Boanerges Aleman-Meza, Christian Halaschek-Wiener, Ismailcem Budak Arpinar, and Amit P. Sheth. 2003. Context-Aware Semantic Association Ranking. In SWDB. 33--50."},{"doi-asserted-by":"crossref","unstructured":"Rohit Ananthakrishna Surajit Chaudhuri and Venkatesh Ganti. 2002. Eliminating Fuzzy Duplicates in Data Warehouses. In VLDB. 586--597.","key":"e_1_2_1_22_1","DOI":"10.1016\/B978-155860869-6\/50058-5"},{"doi-asserted-by":"crossref","unstructured":"Arvind Arasu Michaela G\u00f6tz and Raghav Kaushik. 2010. On active learning of record matching packages. In SIGMOD. 783--794.","key":"e_1_2_1_23_1","DOI":"10.1145\/1807167.1807252"},{"doi-asserted-by":"crossref","unstructured":"Arvind Arasu Christopher R\u00e9 and Dan Suciu. 2009. Large-Scale Deduplication with Constraints Using Dedupalog. In ICDE. 952--963.","key":"e_1_2_1_24_1","DOI":"10.1109\/ICDE.2009.43"},{"doi-asserted-by":"crossref","unstructured":"Marcelo Arenas Leopoldo Bertossi and Jan Chomicki. 1999. Consistent Query Answers in Inconsistent Databases. In PODS. 68--79.","key":"e_1_2_1_25_1","DOI":"10.1145\/303976.303983"},{"doi-asserted-by":"crossref","unstructured":"Abolfazl Asudeh Nima Shahbazi Zhongjun Jin and H. V. Jagadish. 2021. Identifying Insufficient Data Coverage for Ordinal Continuous-Valued Attributes. In SIGMOD. 129--141.","key":"e_1_2_1_26_1","DOI":"10.1145\/3448016.3457315"},{"key":"e_1_2_1_27_1","volume-title":"Bertossi","author":"Bahmani Zeinab","year":"2017","unstructured":"Zeinab Bahmani and Leopoldo E. Bertossi. 2017. Enforcing Relational Matching Dependencies with Datalog for Entity Resolution. In FLAIRS."},{"doi-asserted-by":"publisher","key":"e_1_2_1_28_1","DOI":"10.1016\/j.ijar.2017.01.003"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.14778\/3476249.3476300"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.1109\/72.298224"},{"doi-asserted-by":"publisher","key":"e_1_2_1_31_1","DOI":"10.1109\/IJCNN.2019.8852410"},{"doi-asserted-by":"publisher","key":"e_1_2_1_32_1","DOI":"10.1007\/s00224-012-9402-7"},{"key":"e_1_2_1_34_1","volume-title":"International Conference on Enterprise Information Systems","volume":"2","author":"Canalle Gabrielle Karine","year":"2017","unstructured":"Gabrielle Karine Canalle, Bernadette Farias Loscio, and Ana Carolina Salgado. 2017. A strategy for selecting relevant attributes for entity resolution in data integration systems. In International Conference on Enterprise Information Systems, Vol. 2. SCITEPRESS, 80--88."},{"key":"e_1_2_1_35_1","volume-title":"International Conference on Machine Learning. PMLR, 883--892","author":"Chen Jianbo","year":"2018","unstructured":"Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to explain: An information-theoretic perspective on model interpretation. In International Conference on Machine Learning. PMLR, 883--892."},{"doi-asserted-by":"publisher","key":"e_1_2_1_36_1","DOI":"10.1109\/TIP.2019.2910052"},{"doi-asserted-by":"publisher","key":"e_1_2_1_37_1","DOI":"10.14778\/3397230.3397235"},{"key":"e_1_2_1_38_1","volume-title":"An Overview of End-to-End Entity Resolution for Big Data. ACM Comput. Surv. 53, 6","author":"Christophides Vassilis","year":"2021","unstructured":"Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2021. An Overview of End-to-End Entity Resolution for Big Data. ACM Comput. Surv. 53, 6 (2021), 127:1--127:42."},{"doi-asserted-by":"publisher","key":"e_1_2_1_39_1","DOI":"10.1145\/320107.320109"},{"doi-asserted-by":"publisher","key":"e_1_2_1_40_1","DOI":"10.1145\/2901737"},{"key":"e_1_2_1_41_1","volume-title":"Deep and Collective Entity Resolution in Parallel","author":"Deng Ting","year":"2060","unstructured":"Ting Deng, Wenfei Fan, Ping Lu, Xiaomeng Luo, Xiaoke Zhu, and Wanhe An. 2022. Deep and Collective Entity Resolution in Parallel. In ICDE. IEEE, 2060--2072."},{"doi-asserted-by":"publisher","key":"e_1_2_1_42_1","DOI":"10.14778\/3430915.3430921"},{"doi-asserted-by":"crossref","unstructured":"Xin Dong Alon Y. Halevy and Jayant Madhavan. 2005. Reference Reconciliation in Complex Information Spaces. In SIGMOD. ACM 85--96.","key":"e_1_2_1_43_1","DOI":"10.1145\/1066157.1066168"},{"volume-title":"Efficient joinable table discovery in data lakes: A high-dimensional similarity-based approach","author":"Dong Yuyang","unstructured":"Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, and Masafumi Oyamada. 2021. Efficient joinable table discovery in data lakes: A high-dimensional similarity-based approach. In ICDE. IEEE, 456--467.","key":"e_1_2_1_44_1"},{"key":"e_1_2_1_45_1","first-page":"1944","article-title":"Distributed Representations of Tuples for Entity Resolution","volume":"16","author":"Ebraheem Muhammad","year":"2018","unstructured":"Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed Representations of Tuples for Entity Resolution. PVLDB 16, 8 (2018), 1944--1957.","journal-title":"PVLDB"},{"key":"e_1_2_1_46_1","volume-title":"COCOA: COrrelation COefficient-Aware Data Augmentation. In EDBT. 331--336.","author":"Esmailoghli Mahdi","year":"2021","unstructured":"Mahdi Esmailoghli, Jorge-Arnulfo Quian\u00e9-Ruiz, and Ziawasch Abedjan. 2021. COCOA: COrrelation COefficient-Aware Data Augmentation. In EDBT. 331--336."},{"doi-asserted-by":"publisher","key":"e_1_2_1_47_1","DOI":"10.14778\/3587136.3587146"},{"doi-asserted-by":"publisher","key":"e_1_2_1_48_1","DOI":"10.1007\/s00778-010-0206-6"},{"doi-asserted-by":"publisher","key":"e_1_2_1_49_1","DOI":"10.1145\/1862919.1862924"},{"doi-asserted-by":"publisher","key":"e_1_2_1_50_1","DOI":"10.1145\/1366102.1366103"},{"volume-title":"Linking Entities across Relations and Graphs","author":"Fan Wenfei","unstructured":"Wenfei Fan, Liang Geng, Ruochun Jin, Ping Lu, Resul Tugey, and Wenyuan Yu. 2022. Linking Entities across Relations and Graphs. In ICDE. IEEE, 634--647.","key":"e_1_2_1_51_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_52_1","DOI":"10.1145\/3626763"},{"unstructured":"Wenfei Fan Chunming Hu and Chao Tian. 2017. Incremental Graph Computations: Doable and Undoable. In SIGMOD. 155--169.","key":"e_1_2_1_53_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_54_1","DOI":"10.1007\/s00778-011-0253-7"},{"key":"e_1_2_1_55_1","volume-title":"Unifying Logic Rules and Machine Learning for Entity Enhancing. Sci. China Inf. Sci. 63, 7","author":"Fan Wenfei","year":"2020","unstructured":"Wenfei Fan, Ping Lu, and Chao Tian. 2020. Unifying Logic Rules and Machine Learning for Entity Enhancing. Sci. China Inf. Sci. 63, 7 (2020)."},{"doi-asserted-by":"publisher","key":"e_1_2_1_56_1","DOI":"10.14778\/3457390.3457400"},{"key":"e_1_2_1_57_1","volume-title":"Recursive feature generation for knowledge-based learning. arXiv preprint arXiv:1802.00050","author":"Friedman Lior","year":"2018","unstructured":"Lior Friedman and Shaul Markovitch. 2018. Recursive feature generation for knowledge-based learning. arXiv preprint arXiv:1802.00050 (2018)."},{"doi-asserted-by":"publisher","key":"e_1_2_1_58_1","DOI":"10.1145\/3366423.3380297"},{"doi-asserted-by":"publisher","key":"e_1_2_1_59_1","DOI":"10.1109\/ICDMW.2019.00161"},{"volume-title":"Computers and Intractability: A Guide to the Theory of NP-Completeness","author":"Garey Michael","unstructured":"Michael Garey and David Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company.","key":"e_1_2_1_60_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_61_1","DOI":"10.14778\/1920841.1920897"},{"key":"e_1_2_1_62_1","volume-title":"Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In International Conference on Machine Learning (ICML). Morgan Kaufmann, 359--366","author":"Hall Mark A.","year":"2000","unstructured":"Mark A. Hall. 2000. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In International Conference on Machine Learning (ICML). Morgan Kaufmann, 359--366."},{"doi-asserted-by":"publisher","key":"e_1_2_1_63_1","DOI":"10.1016\/j.ins.2021.09.036"},{"volume-title":"Few-Shot Tabular Data Enrichment Using Fine-Tuned Transformer Architectures","author":"Harari Asaf","unstructured":"Asaf Harari and Gilad Katz. 2022. Few-Shot Tabular Data Enrichment Using Fine-Tuned Transformer Architectures. In ACL. Association for Computational Linguistics, 1577--1591.","key":"e_1_2_1_64_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_65_1","DOI":"10.1145\/3366424.3383112"},{"doi-asserted-by":"crossref","unstructured":"Benjamin Hilprecht and Carsten Binnig. 2021. ReStore - Neural Data Completion for Relational Databases. In SIGMOD. 710--722.","key":"e_1_2_1_66_1","DOI":"10.1145\/3448016.3457264"},{"key":"e_1_2_1_67_1","first-page":"80","article-title":"Ridge Regression","volume":"42","author":"Hoerl Arthur E.","year":"2000","unstructured":"Arthur E. Hoerl and Robert W. Kennard. 2000. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 42, 1 (2000), 80--86.","journal-title":"Biased Estimation for Nonorthogonal Problems. Technometrics"},{"key":"e_1_2_1_68_1","volume-title":"Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann.","author":"Hogan Aidan","year":"2021","unstructured":"Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Guti\u00e9rrez, Sabrina Kirrane, Jos\u00e9 Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann. 2021. Knowledge Graphs. ACM Comput. Surv. 54, 4 (2021), 71:1--71:37."},{"key":"e_1_2_1_69_1","volume-title":"Yu","author":"Hu Xuming","year":"2023","unstructured":"Xuming Hu, Shen Wang, Xiao Qin, Chuan Lei, Zhengyuan Shen, Christos Faloutsos, Asterios Katsifodimos, George Karypis, Lijie Wen, and Philip S. Yu. 2023. Automatic Table Union Search with Tabular Representation Learning. In Findings of the Association for Computational Linguistics: ACL. Association for Computational Linguistics."},{"doi-asserted-by":"publisher","key":"e_1_2_1_70_1","DOI":"10.32473\/flairs.v35i.130584"},{"unstructured":"Vassilis N Ioannidis Xiang Song Saurav Manchanda Mufei Li Xiaoqin Pan Da Zheng Xia Ning Xiangxiang Zeng and George Karypis. 2020. DRKG-drug repurposing knowledge graph for covid-19. https:\/\/github.com\/gnn4dr\/DRKG\/.","key":"e_1_2_1_71_1"},{"doi-asserted-by":"crossref","unstructured":"Robert Isele Anja Jentzsch and Christian Bizer. 2010. Silk server-adding missing links while consuming linked data. In COLD. 85--96.","key":"e_1_2_1_72_1","DOI":"10.1007\/978-3-031-79432-2_6"},{"doi-asserted-by":"publisher","key":"e_1_2_1_73_1","DOI":"10.1016\/j.neucom.2015.01.031"},{"doi-asserted-by":"crossref","unstructured":"Jungo Kasai Kun Qian Sairam Gurajada Yunyao Li and Lucian Popa. 2019. Low-resource Deep Entity Resolution with Transfer and Active Learning. In ACL. 5851--5861.","key":"e_1_2_1_74_1","DOI":"10.18653\/v1\/P19-1586"},{"doi-asserted-by":"publisher","key":"e_1_2_1_75_1","DOI":"10.1037\/xlm0000391"},{"doi-asserted-by":"publisher","key":"e_1_2_1_76_1","DOI":"10.1145\/3588689"},{"doi-asserted-by":"publisher","key":"e_1_2_1_77_1","DOI":"10.14778\/3430915.3430920"},{"doi-asserted-by":"publisher","key":"e_1_2_1_78_1","DOI":"10.1016\/S0004-3702(97)00043-X"},{"key":"e_1_2_1_79_1","first-page":"712","article-title":"MDedup: Duplicate Detection with Matching Dependencies","volume":"13","author":"Papenbrock Thorsten","year":"2020","unstructured":"loannis Koumarelas, Thorsten Papenbrock, and Felix Naumann. 2020. MDedup: Duplicate Detection with Matching Dependencies. PVLDB 13, 5 (2020), 712--725.","journal-title":"PVLDB"},{"doi-asserted-by":"crossref","unstructured":"Walter Kropatsch. 1996. Building irregular pyramids by dual-graph contraction. In Vision Image and Signal Processing.","key":"e_1_2_1_80_1","DOI":"10.1049\/ip-vis:19952115"},{"doi-asserted-by":"publisher","key":"e_1_2_1_81_1","DOI":"10.1016\/0304-3975(90)90192-K"},{"doi-asserted-by":"crossref","unstructured":"Arun Kumar Jeffrey Naughton Jignesh M Patel and Xiaojin Zhu. 2016. To join or not to join? Thinking twice about joins before feature selection. In SIGMOD. 19--34.","key":"e_1_2_1_82_1","DOI":"10.1145\/2882903.2882952"},{"key":"e_1_2_1_83_1","volume-title":"International Conference on Artificial Intelligence and Statistics, (AISTATS) (Proceedings of Machine Learning Research).","author":"Lew Alexander K.","year":"2021","unstructured":"Alexander K. Lew, Monica Agrawal, David A. Sontag, and Vikash Mansinghka. 2021. PClean: Bayesian Data Cleaning at Scale with Domain-Specific Probabilistic Programming. In International Conference on Artificial Intelligence and Statistics, (AISTATS) (Proceedings of Machine Learning Research)."},{"doi-asserted-by":"crossref","unstructured":"Chenjie Li Zhengjie Miao Qitian Zeng Boris Glavic and Sudeepa Roy. 2021. Putting things into context: Rich explanations for query answers using join graphs. In SIGMOD. 1051--1063.","key":"e_1_2_1_84_1","DOI":"10.1145\/3448016.3459246"},{"doi-asserted-by":"publisher","key":"e_1_2_1_85_1","DOI":"10.1109\/ICCV48922.2021.00876"},{"doi-asserted-by":"publisher","key":"e_1_2_1_86_1","DOI":"10.14778\/3421424.3421431"},{"doi-asserted-by":"publisher","key":"e_1_2_1_87_1","DOI":"10.14778\/3384345.3384352"},{"volume-title":"Feature Augmentation with Reinforcement Learning","author":"Liu Jiabin","unstructured":"Jiabin Liu, Chengliang Chai, Yuyu Luo, Yin Lou, Jianhua Feng, and Nan Tang. 2022. Feature Augmentation with Reinforcement Learning. In ICDE. IEEE, 3360--3372.","key":"e_1_2_1_88_1"},{"key":"e_1_2_1_89_1","volume-title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs\/1907.11692","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs\/1907.11692 (2019)."},{"doi-asserted-by":"publisher","key":"e_1_2_1_90_1","DOI":"10.14778\/3407790.3407801"},{"doi-asserted-by":"publisher","key":"e_1_2_1_91_1","DOI":"10.1145\/1368088.1368150"},{"doi-asserted-by":"publisher","key":"e_1_2_1_92_1","DOI":"10.1145\/3448016.3457258"},{"key":"e_1_2_1_93_1","volume-title":"Cecilia Di Chio, and Riccardo Poli","author":"Moraglio Alberto","year":"2007","unstructured":"Alberto Moraglio, Cecilia Di Chio, and Riccardo Poli. 2007. Geometric Particle Swarm Optimisation. In EuroGP (Lecture Notes in Computer Science, Vol. 4445). Springer, 125--136."},{"doi-asserted-by":"publisher","key":"e_1_2_1_94_1","DOI":"10.1007\/978-3-319-67008-9_13"},{"doi-asserted-by":"crossref","unstructured":"Sidharth Mudgal Han Li Theodoros Rekatsinas AnHai Doan Youngchoon Park Ganesh Krishnan Rohit Deep Esteban Arcaute and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In SIGMOD. 19--34.","key":"e_1_2_1_95_1","DOI":"10.1145\/3183713.3196926"},{"doi-asserted-by":"publisher","key":"e_1_2_1_96_1","DOI":"10.14778\/3574245.3574258"},{"doi-asserted-by":"publisher","key":"e_1_2_1_97_1","DOI":"10.14778\/3192965.3192973"},{"doi-asserted-by":"publisher","key":"e_1_2_1_98_1","DOI":"10.1016\/j.is.2020.101565"},{"volume-title":"Encyclopedia of Machine Learning and Data Mining","author":"Peters Jan","unstructured":"Jan Peters and J. Andrew Bagnell. 2017. Policy Gradient Methods. In Encyclopedia of Machine Learning and Data Mining. Springer.","key":"e_1_2_1_99_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_100_1","DOI":"10.14778\/3377369.3377377"},{"doi-asserted-by":"publisher","key":"e_1_2_1_101_1","DOI":"10.1016\/j.knosys.2018.01.005"},{"doi-asserted-by":"crossref","unstructured":"Kun Qian Lucian Popa and Prithviraj Sen. 2017. Active Learning for Large-Scale Entity Resolution. In CIKM. 1379--1388.","key":"e_1_2_1_102_1","DOI":"10.1145\/3132847.3132949"},{"key":"e_1_2_1_103_1","volume-title":"ELDEN: Improved Entity Linking Using Densified Knowledge Graphs. In Conference of the North American","author":"Radhakrishnan Priya","year":"2018","unstructured":"Priya Radhakrishnan, Partha P. Talukdar, and Vasudeva Varma. 2018. ELDEN: Improved Entity Linking Using Densified Knowledge Graphs. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, 1844--1853."},{"doi-asserted-by":"publisher","key":"e_1_2_1_104_1","DOI":"10.18653\/v1\/D19-1410"},{"doi-asserted-by":"publisher","key":"e_1_2_1_105_1","DOI":"10.14778\/3137628.3137631"},{"doi-asserted-by":"crossref","unstructured":"A\u00e9cio Santos Aline Bessa Fernando Chirigati Christopher Musco and Juliana Freire. 2021. Correlation sketches for approximate join-correlation queries. In SIGMOD. 1531--1544.","key":"e_1_2_1_106_1","DOI":"10.1145\/3448016.3458456"},{"key":"e_1_2_1_107_1","volume-title":"Proximal Policy Optimization Algorithms. CoRR abs\/1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR abs\/1707.06347 (2017)."},{"doi-asserted-by":"publisher","key":"e_1_2_1_108_1","DOI":"10.3233\/SW-222986"},{"key":"e_1_2_1_109_1","volume-title":"Are key-foreign key joins safe to avoid when learning high-capacity classifiers? arXiv preprint arXiv:1704.00485","author":"Shah Vraj","year":"2017","unstructured":"Vraj Shah, Arun Kumar, and Xiaojin Zhu. 2017. Are key-foreign key joins safe to avoid when learning high-capacity classifiers? arXiv preprint arXiv:1704.00485 (2017)."},{"key":"e_1_2_1_110_1","volume-title":"AugDiff: Diffusion based Feature Augmentation for Multiple Instance Learning in Whole Slide Image. arXiv preprint arXiv:2303.06371","author":"Shao Zhuchen","year":"2023","unstructured":"Zhuchen Shao, Liuxi Dai, Yifeng Wang, Haoqian Wang, and Yongbing Zhang. 2023. AugDiff: Diffusion based Feature Augmentation for Multiple Instance Learning in Whole Slide Image. arXiv preprint arXiv:2303.06371 (2023)."},{"doi-asserted-by":"publisher","key":"e_1_2_1_111_1","DOI":"10.1145\/3366424.3391264"},{"doi-asserted-by":"publisher","key":"e_1_2_1_112_1","DOI":"10.1371\/journal.pone.0160005"},{"doi-asserted-by":"publisher","key":"e_1_2_1_113_1","DOI":"10.1145\/3068777.3068781"},{"doi-asserted-by":"publisher","key":"e_1_2_1_114_1","DOI":"10.1016\/0950-5849(93)90027-Z"},{"doi-asserted-by":"publisher","key":"e_1_2_1_115_1","DOI":"10.1016\/S0167-8655(99)00083-5"},{"key":"e_1_2_1_116_1","first-page":"275","article-title":"Enriching data imputation under similarity rule constraints","volume":"32","author":"Song Shaoxu","year":"2018","unstructured":"Shaoxu Song, Yu Sun, Aoqian Zhang, Lei Chen, and Jianmin Wang. 2018. Enriching data imputation under similarity rule constraints. TKDE 32, 2 (2018), 275--287.","journal-title":"TKDE"},{"key":"e_1_2_1_117_1","volume-title":"Missing data imputation with adversarially-trained graph convolutional networks. Neural Networks","author":"Spinelli Indro","year":"2020","unstructured":"Indro Spinelli, Simone Scardapane, and Aurelio Uncini. 2020. Missing data imputation with adversarially-trained graph convolutional networks. Neural Networks (2020)."},{"doi-asserted-by":"publisher","key":"e_1_2_1_118_1","DOI":"10.1109\/AICCSA.2008.4493515"},{"doi-asserted-by":"publisher","key":"e_1_2_1_119_1","DOI":"10.1111\/j.2517-6161.1996.tb02080.x"},{"unstructured":"Chung-Jui Tu Li-Yeh Chuang Jun-Yang Chang and Cheng-Hong Yang. 2006. Feature Selection using PSO-SVM. In International MultiConference of Engineers and Computer Scientists (IMECS) (Lecture Notes in Engineering and Computer Science). Newswood Limited 138--143.","key":"e_1_2_1_120_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_121_1","DOI":"10.1016\/j.jbi.2018.07.014"},{"key":"e_1_2_1_122_1","volume-title":"Representation Learning with Contrastive Predictive Coding. CoRR abs\/1807.03748","author":"van den Oord A\u00e4ron","year":"2018","unstructured":"A\u00e4ron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. CoRR abs\/1807.03748 (2018)."},{"doi-asserted-by":"publisher","key":"e_1_2_1_123_1","DOI":"10.14778\/3561261.3561267"},{"doi-asserted-by":"publisher","key":"e_1_2_1_124_1","DOI":"10.14778\/3565816.3565836"},{"key":"e_1_2_1_125_1","volume-title":"Policy Gradient Method For Robust Reinforcement Learning. In International Conference on Machine Learning (ICML).","author":"Wang Yue","year":"2022","unstructured":"Yue Wang and Shaofeng Zou. 2022. Policy Gradient Method For Robust Reinforcement Learning. In International Conference on Machine Learning (ICML)."},{"doi-asserted-by":"crossref","unstructured":"Melanie Weis and Felix Naumann. 2005. DogmatiX Tracks down Duplicates in XML. In SIGMOD. ACM 431--442.","key":"e_1_2_1_126_1","DOI":"10.1145\/1066157.1066207"},{"doi-asserted-by":"publisher","key":"e_1_2_1_127_1","DOI":"10.1007\/s00778-013-0308-z"},{"key":"e_1_2_1_128_1","volume-title":"MLSys","author":"Wu Richard","year":"2020","unstructured":"Richard Wu, Aoqian Zhang, Ihab F. Ilyas, and Theodoros Rekatsinas. 2020. Attention-based Learning for Missing Data Imputation in HoloClean. In MLSys 2020."},{"key":"e_1_2_1_129_1","volume-title":"GAIN: Missing Data Imputation using Generative Adversarial Nets. In ICML. PMLR, 5675--5684.","author":"Yoon Jinsung","year":"2018","unstructured":"Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. GAIN: Missing Data Imputation using Generative Adversarial Nets. In ICML. PMLR, 5675--5684."},{"key":"e_1_2_1_130_1","volume-title":"On Explaining Confounding Bias","author":"Youngmann Brit","year":"1846","unstructured":"Brit Youngmann, Michael Cafarella, Yuval Moskovitch, and Babak Salimi. 2023. On Explaining Confounding Bias. In ICDE. IEEE, 1846--1859."},{"doi-asserted-by":"publisher","key":"e_1_2_1_131_1","DOI":"10.14778\/3603581.3603602"},{"key":"e_1_2_1_132_1","volume-title":"A Survey of Knowledge-enhanced Text Generation. ACM Comput. Surv. 54, 11s","author":"Yu Wenhao","year":"2022","unstructured":"Wenhao Yu, Chenguang Zhu, Zaitang Li, Zhiting Hu, Qingyun Wang, Heng Ji, and Meng Jiang. 2022. A Survey of Knowledge-enhanced Text Generation. ACM Comput. Surv. 54, 11s (2022), 227:1--227:38."},{"doi-asserted-by":"publisher","key":"e_1_2_1_133_1","DOI":"10.1016\/j.inffus.2015.07.002"},{"doi-asserted-by":"crossref","unstructured":"Dongxiang Zhang Long Guo Xiangnan He Jie Shao Sai Wu and Heng Tao Shen. 2018. A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution. In ICDE.","key":"e_1_2_1_134_1","DOI":"10.1109\/ICDE.2018.00070"},{"doi-asserted-by":"publisher","key":"e_1_2_1_135_1","DOI":"10.1145\/3318464.3389726"},{"key":"e_1_2_1_136_1","volume-title":"Spectral Feature Augmentation for Graph Contrastive Learning and Beyond. arXiv preprint arXiv:2212.01026","author":"Zhang Yifei","year":"2022","unstructured":"Yifei Zhang, Hao Zhu, Zixing Song, Piotr Koniusz, and Irwin King. 2022. Spectral Feature Augmentation for Graph Contrastive Learning and Beyond. arXiv preprint arXiv:2212.01026 (2022)."},{"key":"e_1_2_1_137_1","volume-title":"Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation. In SIGMOD. ACM, 1504--1517.","author":"Zhao Zixuan","year":"2022","unstructured":"Zixuan Zhao and Raul Castro Fernandez. 2022. Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation. In SIGMOD. ACM, 1504--1517."},{"doi-asserted-by":"publisher","key":"e_1_2_1_138_1","DOI":"10.1145\/3299869.3300065"},{"unstructured":"H. Zou and T. Hastie. 2003. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2003).","key":"e_1_2_1_139_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_140_1","DOI":"10.1109\/BMEI.2011.6098568"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3681954.3681987","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,27]],"date-time":"2024-11-27T15:17:43Z","timestamp":1732720663000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3681954.3681987"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":139,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10.14778\/3681954.3681987"],"URL":"https:\/\/doi.org\/10.14778\/3681954.3681987","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"2024-08-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}