{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T12:16:31Z","timestamp":1775736991723,"version":"3.50.1"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:p>\n            State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as\n            <jats:italic>BERT<\/jats:italic>\n            , for generating highly contex-tualized embeddings of terms. The embeddings are then used to predict whether pairs of entity descriptions refer to the same real-world entity. BERT-based EM models demonstrated to be effective, but act as black-boxes for the users, who have limited insight into the motivations behind their decisions.\n          <\/jats:p>\n          <jats:p>In this paper, we perform a multi-facet analysis of the components of pre-trained and fine-tuned BERT architectures applied to an EM task. The main findings resulting from our extensive experimental evaluation are (1) the fine-tuning process applied to the EM task mainly modifies the last layers of the BERT components, but in a different way on tokens belonging to descriptions of matching \/ non-matching entities; (2) the special structure of the EM datasets, where records are pairs of entity descriptions is recognized by BERT; (3) the pair-wise semantic similarity of tokens is not a key knowledge exploited by BERT-based EM models.<\/jats:p>","DOI":"10.14778\/3529337.3529356","type":"journal-article","created":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T22:23:05Z","timestamp":1655936585000},"page":"1726-1738","source":"Crossref","is-referenced-by-count":22,"title":["Analyzing how BERT performs entity matching"],"prefix":"10.14778","volume":"15","author":[{"given":"Matteo","family":"Paganelli","sequence":"first","affiliation":[{"name":"University of Modena and Reggio Emilia, Modena, Italy"}]},{"given":"Francesco Del","family":"Buono","sequence":"additional","affiliation":[{"name":"University of Modena and Reggio Emilia, Modena, Italy"}]},{"given":"Andrea","family":"Baraldi","sequence":"additional","affiliation":[{"name":"University of Modena and Reggio Emilia, Modena, Italy"}]},{"given":"Francesco","family":"Guerra","sequence":"additional","affiliation":[{"name":"University of Modena and Reggio Emilia, Modena, Italy"}]}],"member":"320","published-online":{"date-parts":[[2022,6,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442200"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_2_1_3_1","unstructured":"Gino Brunner Yang Liu Damian Pascual Oliver Richter Massimiliano Ciaramita and Roger Wattenhofer. 2020. On Identifiability in Transformers. In ICLR. Open-Review.net.  Gino Brunner Yang Liu Damian Pascual Oliver Richter Massimiliano Ciaramita and Roger Wattenhofer. 2020. On Identifiability in Transformers. In ICLR. Open-Review.net."},{"key":"e_1_2_1_4_1","unstructured":"Ursin Brunner and Kurt Stockinger. 2020. Entity Matching with Transformer Architectures - A Step Forward in Data Integration. In EDBT. OpenProceedings.org 463--473.  Ursin Brunner and Kurt Stockinger. 2020. Entity Matching with Transformer Architectures - A Step Forward in Data Integration. In EDBT. OpenProceedings.org 463--473."},{"key":"e_1_2_1_5_1","volume-title":"Rush","author":"Cao Steven","year":"2021","unstructured":"Steven Cao , Victor Sanh , and Alexander M . Rush . 2021 . Low-Complexity Probing via Finding Subnetworks. CoRR abs\/2104.03514 (2021). Steven Cao, Victor Sanh, and Alexander M. Rush. 2021. Low-Complexity Probing via Finding Subnetworks. CoRR abs\/2104.03514 (2021)."},{"key":"e_1_2_1_6_1","volume-title":"Manning","author":"Clark Kevin","year":"2019","unstructured":"Kevin Clark , Urvashi Khandelwal , Omer Levy , and Christopher D . Manning . 2019 . What Does BERT Look At? An Analysis of BERT's Attention. CoRR abs\/1906.04341 (2019). Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What Does BERT Look At? An Analysis of BERT's Attention. CoRR abs\/1906.04341 (2019)."},{"key":"e_1_2_1_7_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1) . Association for Computational Linguistics , 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). Association for Computational Linguistics, 4171--4186."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236198"},{"key":"e_1_2_1_9_1","volume-title":"Assessing BERT's Syntactic Abilities. CoRR abs\/1901.05287","author":"Goldberg Yoav","year":"2019","unstructured":"Yoav Goldberg . 2019. Assessing BERT's Syntactic Abilities. CoRR abs\/1901.05287 ( 2019 ). Yoav Goldberg. 2019. Assessing BERT's Syntactic Abilities. CoRR abs\/1901.05287 (2019)."},{"key":"e_1_2_1_10_1","volume-title":"EMNLP\/IJCNLP (1)","author":"Hao Yaru","unstructured":"Yaru Hao , Li Dong , Furu Wei , and Ke Xu. 2019. Visualizing and Understanding the Effectiveness of BERT . In EMNLP\/IJCNLP (1) . Association for Computational Linguistics , 4141--4150. Yaru Hao, Li Dong, Furu Wei, and Ke Xu. 2019. Visualizing and Understanding the Effectiveness of BERT. In EMNLP\/IJCNLP (1). Association for Computational Linguistics, 4141--4150."},{"key":"e_1_2_1_11_1","volume-title":"Investigating Learning Dynamics of BERT Fine-Tuning","author":"Hao Yaru","unstructured":"Yaru Hao , Li Dong , Furu Wei , and Ke Xu. 2020. Investigating Learning Dynamics of BERT Fine-Tuning . In AACL\/IJCNLP. Association for Computational Linguistics , 87--92. Yaru Hao, Li Dong, Furu Wei, and Ke Xu. 2020. Investigating Learning Dynamics of BERT Fine-Tuning. In AACL\/IJCNLP. Association for Computational Linguistics, 87--92."},{"key":"e_1_2_1_12_1","volume-title":"EMNLP\/IJCNLP (1)","author":"Hewitt John","unstructured":"John Hewitt and Percy Liang . 2019. Designing and Interpreting Probes with Control Tasks . In EMNLP\/IJCNLP (1) . Association for Computational Linguistics , 2733--2743. John Hewitt and Percy Liang. 2019. Designing and Interpreting Probes with Control Tasks. In EMNLP\/IJCNLP (1). Association for Computational Linguistics, 2733--2743."},{"key":"e_1_2_1_13_1","volume-title":"Manning","author":"Hewitt John","year":"2019","unstructured":"John Hewitt and Christopher D . Manning . 2019 . A Structural Probe for Finding Syntax in Word Representations. In NAACL-HLT (1). Association for Computational Linguistics , 4129--4138. John Hewitt and Christopher D. Manning. 2019. A Structural Probe for Finding Syntax in Word Representations. In NAACL-HLT (1). Association for Computational Linguistics, 4129--4138."},{"key":"e_1_2_1_14_1","volume-title":"Bowman","author":"Htut Phu Mon","year":"2019","unstructured":"Phu Mon Htut , Jason Phang , Shikha Bordia , and Samuel R . Bowman . 2019 . Do Attention Heads in BERT Track Syntactic Dependencies? CoRR abs\/1911.12246 (2019). Phu Mon Htut, Jason Phang, Shikha Bordia, and Samuel R. Bowman. 2019. Do Attention Heads in BERT Track Syntactic Dependencies? CoRR abs\/1911.12246 (2019)."},{"key":"e_1_2_1_15_1","volume-title":"Wallace","author":"Jain Sarthak","year":"2019","unstructured":"Sarthak Jain and Byron C . Wallace . 2019 . Attention is not Explanation. CoRR abs\/1902.10186 (2019). Sarthak Jain and Byron C. Wallace. 2019. Attention is not Explanation. CoRR abs\/1902.10186 (2019)."},{"key":"e_1_2_1_16_1","volume-title":"EMNLP\/IJCNLP (1)","author":"Kovaleva Olga","unstructured":"Olga Kovaleva , Alexey Romanov , Anna Rogers , and Anna Rumshisky . 2019. Revealing the Dark Secrets of BERT . In EMNLP\/IJCNLP (1) . Association for Computational Linguistics , 4364--4373. Olga Kovaleva, Alexey Romanov, Anna Rogers, and Anna Rumshisky. 2019. Revealing the Dark Secrets of BERT. In EMNLP\/IJCNLP (1). Association for Computational Linguistics, 4364--4373."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421431"},{"key":"e_1_2_1_18_1","volume-title":"Yi Chern Tan, and Robert Frank","author":"Lin Yongjie","year":"2019","unstructured":"Yongjie Lin , Yi Chern Tan, and Robert Frank . 2019 . Open Sesame : Getting Inside BERT's Linguistic Knowledge. CoRR abs\/1906.01698 (2019). Yongjie Lin, Yi Chern Tan, and Robert Frank. 2019. Open Sesame: Getting Inside BERT's Linguistic Knowledge. CoRR abs\/1906.01698 (2019)."},{"key":"e_1_2_1_19_1","volume-title":"Smith","author":"Liu Nelson F.","year":"2019","unstructured":"Nelson F. Liu , Matt Gardner , Yonatan Belinkov , Matthew E. Peters , and Noah A . Smith . 2019 . Linguistic Knowledge and Transferability of Contextual Representations. In NAACL-HLT (1). Association for Computational Linguistics , 1073--1094. Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. 2019. Linguistic Knowledge and Transferability of Contextual Representations. In NAACL-HLT (1). Association for Computational Linguistics, 1073--1094."},{"key":"e_1_2_1_20_1","unstructured":"Paul Michel Omer Levy and Graham Neubig. 2019. Are Sixteen Heads Really Better than One?. In NeurIPS. 14014--14024.  Paul Michel Omer Levy and Graham Neubig. 2019. Are Sixteen Heads Really Better than One?. In NeurIPS. 14014--14024."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196926"},{"key":"e_1_2_1_22_1","volume-title":"Marco Pevarello, Francesco Guerra, and Maurizio Vincini.","author":"Paganelli Matteo","year":"2021","unstructured":"Matteo Paganelli , Francesco Del Buono , Marco Pevarello, Francesco Guerra, and Maurizio Vincini. 2021 . Automated Machine Learning for Entity Matching Tasks. In EDBT. OpenProceedings .org, 325--330. Matteo Paganelli, Francesco Del Buono, Marco Pevarello, Francesco Guerra, and Maurizio Vincini. 2021. Automated Machine Learning for Entity Matching Tasks. In EDBT. OpenProceedings.org, 325--330."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3467861.3467878"},{"key":"e_1_2_1_24_1","volume-title":"Dissecting Contextual Word Embeddings: Architecture and Representation","author":"Peters Matthew E.","unstructured":"Matthew E. Peters , Mark Neumann , Luke Zettlemoyer , and Wen-tau Yih. 2018. Dissecting Contextual Word Embeddings: Architecture and Representation . In EMNLP. Association for Computational Linguistics , 1499--1509. Matthew E. Peters, Mark Neumann, Luke Zettlemoyer, and Wen-tau Yih. 2018. Dissecting Contextual Word Embeddings: Architecture and Representation. In EMNLP. Association for Computational Linguistics, 1499--1509."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00349"},{"key":"e_1_2_1_26_1","volume-title":"Smith","author":"Serrano Sofia","year":"2019","unstructured":"Sofia Serrano and Noah A . Smith . 2019 . Is Attention Interpretable?. In ACL (1). Association for Computational Linguistics , 2931--2951. Sofia Serrano and Noah A. Smith. 2019. Is Attention Interpretable?. In ACL (1). Association for Computational Linguistics, 2931--2951."},{"key":"e_1_2_1_27_1","volume-title":"ICML (Proceedings of Machine Learning Research)","volume":"70","author":"Sundararajan Mukund","year":"2017","unstructured":"Mukund Sundararajan , Ankur Taly , and Qiqi Yan . 2017 . Axiomatic Attribution for Deep Networks . In ICML (Proceedings of Machine Learning Research) , Vol. 70 . PMLR, 3319--3328. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks. In ICML (Proceedings of Machine Learning Research), Vol. 70. PMLR, 3319--3328."},{"key":"e_1_2_1_28_1","volume-title":"ACL (1)","author":"Tenney Ian","unstructured":"Ian Tenney , Dipanjan Das , and Ellie Pavlick . 2019. BERT Rediscovers the Classical NLP Pipeline . In ACL (1) . Association for Computational Linguistics , 4593--4601. Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In ACL (1). Association for Computational Linguistics, 4593--4601."},{"key":"e_1_2_1_29_1","volume-title":"Samuel R. Bowman, Dipanjan Das, and Ellie Pavlick.","author":"Tenney Ian","year":"2019","unstructured":"Ian Tenney , Patrick Xia , Berlin Chen , Alex Wang , Adam Poliak , R. Thomas McCoy , Najoung Kim , Benjamin Van Durme , Samuel R. Bowman, Dipanjan Das, and Ellie Pavlick. 2019 . What do you learn from context? Probing for sentence structure in contextualized word representations. In ICLR (Poster). OpenReview .net. Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, and Ellie Pavlick. 2019. What do you learn from context? Probing for sentence structure in contextualized word representations. In ICLR (Poster). OpenReview.net."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476294"},{"key":"e_1_2_1_31_1","volume-title":"Gaurav Singh Tomar, and Manaal Faruqui","author":"Vashishth Shikhar","year":"2019","unstructured":"Shikhar Vashishth , Shyam Upadhyay , Gaurav Singh Tomar, and Manaal Faruqui . 2019 . Attention Interpretability Across NLP Tasks. CoRR abs\/1909.11218 (2019). Shikhar Vashishth, Shyam Upadhyay, Gaurav Singh Tomar, and Manaal Faruqui. 2019. Attention Interpretability Across NLP Tasks. CoRR abs\/1909.11218 (2019)."},{"key":"e_1_2_1_32_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008."},{"key":"e_1_2_1_33_1","volume-title":"EMNLP\/IJCNLP (1)","author":"Wiegreffe Sarah","unstructured":"Sarah Wiegreffe and Yuval Pinter . 2019. Attention is not not Explanation . In EMNLP\/IJCNLP (1) . Association for Computational Linguistics , 11--20. Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not Explanation. In EMNLP\/IJCNLP (1). Association for Computational Linguistics, 11--20."},{"key":"e_1_2_1_34_1","volume-title":"Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT","author":"Wu Zhiyong","year":"2020","unstructured":"Zhiyong Wu , Yun Chen , Ben Kao , and Qun Liu . 2020 . Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT . In ACL. Association for Computational Linguistics , 4166--4176. Zhiyong Wu, Yun Chen, Ben Kao, and Qun Liu. 2020. Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT. In ACL. Association for Computational Linguistics, 4166--4176."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3529337.3529356","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:50:22Z","timestamp":1672221022000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3529337.3529356"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4]]},"references-count":34,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["10.14778\/3529337.3529356"],"URL":"https:\/\/doi.org\/10.14778\/3529337.3529356","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,4]]}}}