{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T11:34:55Z","timestamp":1763811295031},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,11]]},"abstract":"<jats:p>Entity resolution (ER) approaches typically consist of a blocker and a matcher. They share the same goal and cooperate in different roles: the blocker first quickly removes obvious non-matches, and the matcher subsequently determines whether the remaining pairs refer to the same real-world entity. Despite the state-of-the-art performance achieved by deep learning methods in ER, these techniques often rely on a large amount of labeled data for training, which can be challenging or costly to obtain. Thus, there is a need to develop effective ER systems under low-resource settings. In this work, we propose an end-to-end iterative Co-learning framework for ER, aimed at jointly training the blocker and the matcher by leveraging their cooperative relationship. In particular, we let the blocker and the matcher share their learned knowledge with each other via iteratively updated pseudo labels, which broaden the supervision signals. To mitigate the impact of noise in pseudo labels, we develop optimization techniques from three aspects: label generation, label selection and model training. Through extensive experiments on benchmark datasets, we demonstrate that our proposed framework outperforms baselines by an average of 9.13--51.55%. Furthermore, our analysis confirms that our framework achieves mutual benefits between the blocker and the matcher.<\/jats:p>","DOI":"10.14778\/3632093.3632096","type":"journal-article","created":{"date-parts":[[2024,1,20]],"date-time":"2024-01-20T11:26:31Z","timestamp":1705749991000},"page":"292-304","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity Resolution"],"prefix":"10.14778","volume":"17","author":[{"given":"Shiwen","family":"Wu","sequence":"first","affiliation":[{"name":"The Hong Kong University of Science and Technology"}]},{"given":"Qiyu","family":"Wu","sequence":"additional","affiliation":[{"name":"The University of Tokyo"}]},{"given":"Honghua","family":"Dong","sequence":"additional","affiliation":[{"name":"University of Toronto &amp; Vector Institute"}]},{"given":"Wen","family":"Hua","sequence":"additional","affiliation":[{"name":"The Hong Kong Polytechnic University"}]},{"given":"Xiaofang","family":"Zhou","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology"}]}],"member":"320","published-online":{"date-parts":[[2024,1,20]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 17th International Conference on Neural Information Processing Systems","volume":"17","author":"Balcan Maria-Fiorina","year":"2004","unstructured":"Maria-Fiorina Balcan, Avrim Blum, and Ke Yang. 2004. Co-training and expansion: Towards bridging theory and practice. In Proceedings of the 17th International Conference on Neural Information Processing Systems, Vol. 17. 89--96."},{"key":"e_1_2_1_2_1","volume-title":"Neural networks for entity matching: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 3","author":"Barlaug Nils","year":"2021","unstructured":"Nils Barlaug and Jon Atle Gulla. 2021. Neural networks for entity matching: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 3 (2021), 1--37."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/279943.279962"},{"key":"e_1_2_1_4_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389742"},{"key":"e_1_2_1_6_1","volume-title":"International conference on machine learning. PMLR, 1597--1607","author":"Chen Ting","year":"2020","unstructured":"Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607."},{"key":"e_1_2_1_7_1","volume-title":"Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, and Divesh Srivastava.","author":"Crescenzi Valter","year":"2021","unstructured":"Valter Crescenzi, Andrea De Angelis, Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, and Divesh Srivastava. 2021. Alaska: A flexible benchmark for data integration tasks. arXiv preprint arXiv:2101.11259 (2021)."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3269461"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303754"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.552"},{"key":"e_1_2_1_12_1","volume-title":"CollaborEM: a self-supervised entity matching framework using multi-features collaboration","author":"Ge Congcong","year":"2021","unstructured":"Congcong Ge, Pengfei Wang, Lu Chen, Xiaoze Liu, Baihua Zheng, and Yunjun Gao. 2021. CollaborEM: a self-supervised entity matching framework using multi-features collaboration. IEEE Transactions on Knowledge and Data Engineering (2021)."},{"key":"e_1_2_1_13_1","volume-title":"Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems 31","author":"Han Bo","year":"2018","unstructured":"Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems 31 (2018)."},{"key":"e_1_2_1_14_1","volume-title":"2019 5th International Conference on Web Research (ICWR). IEEE, 41--44","author":"Javdani Delaram","year":"2019","unstructured":"Delaram Javdani, Hossein Rahmani, Milad Allahgholi, and Fatemeh Karimkhani. 2019. Deepblock: A novel blocking approach for entity resolution using deep learning. In 2019 5th International Conference on Web Research (ICWR). IEEE, 41--44."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1586"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007263.3007314"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"35","author":"Li Bing","year":"2021","unstructured":"Bing Li, Yukai Miao, Yaoshu Wang, Yifang Sun, and Wei Wang. 2021. Improving the efficiency and effectiveness for BERT-based entity resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13226--13233."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421431"},{"key":"e_1_2_1_19_1","volume-title":"Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457258"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196926"},{"key":"e_1_2_1_23_1","volume-title":"Can Foundation Models Wrangle Your Data? arXiv preprint arXiv:2205.09911","author":"Narayan Avanika","year":"2022","unstructured":"Avanika Narayan, Ines Chami, Laurel Orr, and Christopher R\u00e9. 2022. Can Foundation Models Wrangle Your Data? arXiv preprint arXiv:2205.09911 (2022)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377455"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/2947618.2947624"},{"key":"e_1_2_1_26_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3467861.3467878"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_9"},{"key":"e_1_2_1_29_1","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei Ilya Sutskever et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1 8 (2019) 9."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476294"},{"key":"e_1_2_1_32_1","volume-title":"Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517870"},{"key":"e_1_2_1_34_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_1_35_1","volume-title":"2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 1502--1515","author":"Wang Runhui","year":"2023","unstructured":"Runhui Wang, Yuliang Li, and Jin Wang. 2023. Sudowoodo: Contrastive self-supervised learning for multi-purpose data integration and preparation. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 1502--1515."},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault Remi Louf Morgan Funtowicz et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 12052--12066","author":"Wu Qiyu","year":"2022","unstructured":"Qiyu Wu, Chongyang Tao, Tao Shen, Can Xu, Xiubo Geng, and Daxin Jiang. 2022. PCL: Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 12052--12066."},{"key":"e_1_2_1_38_1","first-page":"1","article-title":"Ground Truth Inference for Weakly Supervised Entity Matching","volume":"1","author":"Wu Renzhi","year":"2023","unstructured":"Renzhi Wu, Alexander Bendeck, Xu Chu, and Yeye He. 2023. Ground Truth Inference for Weakly Supervised Entity Matching. Proceedings of the ACM on Management of Data 1, 1 (2023), 1--28.","journal-title":"Proceedings of the ACM on Management of Data"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517872"},{"key":"e_1_2_1_40_1","volume-title":"A comparative survey of deep active learning. arXiv preprint arXiv:2203.13450","author":"Zhan Xueying","year":"2022","unstructured":"Xueying Zhan, Qingzhong Wang, Kuan-hao Huang, Haoyi Xiong, Dejing Dou, and Antoni B Chan. 2022. A comparative survey of deep active learning. arXiv preprint arXiv:2203.13450 (2022)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380017"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371813"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313578"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3594512.3594527"},{"key":"e_1_2_1_45_1","volume-title":"A brief introduction to weakly supervised learning. National science review 5, 1","author":"Zhou Zhi-Hua","year":"2018","unstructured":"Zhi-Hua Zhou. 2018. A brief introduction to weakly supervised learning. National science review 5, 1 (2018), 44--53."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3632093.3632096","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,20]],"date-time":"2024-01-20T11:31:00Z","timestamp":1705750260000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3632093.3632096"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11]]},"references-count":45,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,11]]}},"alternative-id":["10.14778\/3632093.3632096"],"URL":"https:\/\/doi.org\/10.14778\/3632093.3632096","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,11]]},"assertion":[{"value":"2024-01-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}