{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:51:29Z","timestamp":1767340289738,"version":"build-2065373602"},"reference-count":27,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2023,6,29]],"date-time":"2023-06-29T00:00:00Z","timestamp":1687996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"MIUR \u201cFondo Departments of Excellence 2018-2022\u201d of the DII Department at the University of Brescia, Italy"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>In recent years, many studies have been devoted to discovering the inner workings of Transformer-based models, such as BERT, for instance, attempting to identify what information is contained within them. However, little is known about how these models store this information in their millions of parameters and which parts of the architecture are the most important. In this work, we propose an approach to identify self-attention mechanisms, called heads, that contain semantic and real-world factual knowledge in BERT. Our approach includes a metric computed from attention weights and exploits a standard clustering algorithm for extracting the most relevant connections between tokens in a head. In our experimental analysis, we focus on how heads can connect synonyms, antonyms and several types of factual knowledge regarding subjects such as geography and medicine.<\/jats:p>","DOI":"10.3390\/fi15070230","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:51:06Z","timestamp":1688086266000},"page":"230","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Synonyms, Antonyms and Factual Knowledge in BERT Heads"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1835-0442","authenticated-orcid":false,"given":"Lorenzo","family":"Serina","sequence":"first","affiliation":[{"name":"Department of Information Engineering, Universit\u00e0 Degli Studi di Brescia, Via Branze 38, 25100 Brescia, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Luca","family":"Putelli","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, Universit\u00e0 Degli Studi di Brescia, Via Branze 38, 25100 Brescia, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alfonso","family":"Gerevini","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, Universit\u00e0 Degli Studi di Brescia, Via Branze 38, 25100 Brescia, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7785-9492","authenticated-orcid":false,"given":"Ivan","family":"Serina","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, Universit\u00e0 Degli Studi di Brescia, Via Branze 38, 25100 Brescia, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,29]]},"reference":[{"key":"ref_1","first-page":"4171","article-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","volume":"Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019","journal-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Peng, Y., Yan, S., and Lu, Z. (2019, January 1). Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. Proceedings of the 2019 Workshop on Biomedical Natural Language Processing (BioNLP 2019), Florence, Italy.","DOI":"10.18653\/v1\/W19-5006"},{"key":"ref_3","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_4","unstructured":"Tenney, I., Das, D., and Pavlick, E. (August, January 28). BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Miaschi, A., Brunato, D., Dell\u2019Orletta, F., and Venturi, G. (2020, January 8\u201313). Linguistic Profiling of a Neural Language Model. Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online).","DOI":"10.18653\/v1\/2020.coling-main.65"},{"key":"ref_6","first-page":"3651","article-title":"What Does BERT Learn about the Structure of Language?","volume":"Volume 1: Long Papers","author":"Jawahar","year":"2019","journal-title":"Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1007\/s10579-021-09575-z","article-title":"A comparative evaluation and analysis of three generations of Distributional Semantic Models","volume":"56","author":"Lenci","year":"2022","journal-title":"Lang. Resour. Eval."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1162\/tacl_a_00324","article-title":"How Can We Know What Language Models Know","volume":"8","author":"Jiang","year":"2020","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_9","unstructured":"Das, D., Hajishirzi, H., McCallum, A., and Singh, S. (2020, January 22\u201324). How Context Affects Language Models\u2019 Factual Predictions. Proceedings of the Conference on Automated Knowledge Base Construction, AKBC 2020, Virtual."},{"key":"ref_10","unstructured":"Inui, K., Jiang, J., Ng, V., and Wan, X. (2019, January 3\u20137). Language Models as Knowledge Bases?. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.1162\/tacl_a_00410","article-title":"Measuring and Improving Consistency in Pretrained Language Models","volume":"9","author":"Elazar","year":"2021","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_12","unstructured":"Calzolari, N., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani, J., and Mazo, H. (2018, January 7\u201312). T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples. Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"842","DOI":"10.1162\/tacl_a_00349","article-title":"A Primer in BERTology: What We Know About How BERT Works","volume":"8","author":"Rogers","year":"2020","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Clark, K., Khandelwal, U., Levy, O., and Manning, C.D. (2019, January 1). What Does BERT Look at? An Analysis of BERT\u2019s Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy.","DOI":"10.18653\/v1\/W19-4828"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kovaleva, O., Romanov, A., Rogers, A., and Rumshisky, A. (2019, January 3\u20137). Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China.","DOI":"10.18653\/v1\/D19-1445"},{"key":"ref_16","first-page":"37","article-title":"A Multiscale Visualization of Attention in the Transformer Model","volume":"Volume 3: System Demonstrations","author":"Vig","year":"2019","journal-title":"Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019"},{"key":"ref_17","unstructured":"Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., and Garnett, R. (2017, January 4\u20139). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA."},{"key":"ref_18","first-page":"457","article-title":"A BERT-Based Scoring System for Workplace Safety Courses in Italian","volume":"Volume 13796","author":"Dovier","year":"2022","journal-title":"Proceedings of the AIxIA 2022\u2014Advances in Artificial Intelligence\u2014XXIst International Conference of the Italian Association for Artificial Intelligence, AIxIA 2022"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Kassner, N., Dufter, P., and Sch\u00fctze, H. (2021). Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models. arXiv.","DOI":"10.18653\/v1\/2021.eacl-main.284"},{"key":"ref_20","unstructured":"Muresan, S., Nakov, P., and Villavicencio, A. (2022, January 22\u201327). Knowledge Neurons in Pretrained Transformers. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Anelli, V.W., Biancofiore, G.M., De Bellis, A., Di Noia, T., and Di Sciascio, E. (2022, January 17\u201321). Interpretability of BERT Latent Space through Knowledge Graphs. Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA. CIKM \u201922.","DOI":"10.1145\/3511808.3557617"},{"key":"ref_22","unstructured":"Musto, C., Guidotti, R., Monreale, A., and Semeraro, G. (December, January 28). On the Behaviour of BERT\u2019s Attention for the Classification of Medical Reports. Proceedings of the 3rd Italian Workshop on Explainable Artificial Intelligence Co-Located with 21th International Conference of the Italian Association for Artificial Intelligence (AIxIA 2022), Udine, Italy."},{"key":"ref_23","first-page":"349","article-title":"Deep Learning for Classification of Radiology Reports with a Hierarchical Schema","volume":"Volume 176","author":"Cristani","year":"2020","journal-title":"Proceedings of the Knowledge-Based and Intelligent Information & Engineering Systems: 24th International Conference KES-2020"},{"key":"ref_24","first-page":"367","article-title":"Attention-Based Explanation in a Deep Learning Model For Classifying Radiology Reports","volume":"Volume 12721","author":"Tucker","year":"2021","journal-title":"Proceedings of the Artificial Intelligence in Medicine\u201419th International Conference on Artificial Intelligence in Medicine, AIME 2021"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1109\/34.1000236","article-title":"Mean Shift: A Robust Approach Toward Feature Space Analysis","volume":"24","author":"Comaniciu","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1423","DOI":"10.1016\/j.patrec.2013.05.004","article-title":"On the convergence of the mean shift algorithm in the one-dimensional space","volume":"34","author":"Ghassabeh","year":"2013","journal-title":"Pattern Recognit. Lett."},{"key":"ref_27","unstructured":"(2023, June 26). Princeton University, About Wordnet. Available online: https:\/\/wordnet.princeton.edu\/."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/15\/7\/230\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:03:30Z","timestamp":1760126610000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/15\/7\/230"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,29]]},"references-count":27,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["fi15070230"],"URL":"https:\/\/doi.org\/10.3390\/fi15070230","relation":{},"ISSN":["1999-5903"],"issn-type":[{"type":"electronic","value":"1999-5903"}],"subject":[],"published":{"date-parts":[[2023,6,29]]}}}