{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T13:14:04Z","timestamp":1774444444567,"version":"3.50.1"},"reference-count":31,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,10,31]],"date-time":"2024-10-31T00:00:00Z","timestamp":1730332800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Comput. Sci."],"abstract":"<jats:p>This study investigates the integration of language models for knowledge extraction (KE) from Italian TEI\/XML encoded texts, focusing on Giacomo Leopardi's works. The objective is to create structured, machine-readable knowledge graphs (KGs) from unstructured texts for better exploration and linkage to external resources. The research introduces a methodology that combines large language models (LLMs) with traditional relation extraction (RE) algorithms to overcome the limitations of current models with Italian literary documents. The process adopts a multilingual LLM, that is, ChatGPT, to extract natural language triples from the text. These are then converted into RDF\/XML format using the REBEL model, which maps natural language relations to Wikidata properties. A similarity-based filtering mechanism using SBERT is applied to keep semantic consistency. The final RDF graph integrates these filtered triples with document metadata, utilizing established ontologies and controlled vocabularies. The research uses a dataset of 41 TEI\/XML files from a semi-diplomatic edition of Leopardi's letters as case study. The proposed KE pipeline significantly outperformed the baseline model, that is, mREBEL, with remarkable improvements in semantic accuracy and consistency. An ablation study demonstrated that combining LLMs with traditional RE models enhances the quality of KGs extracted from complex texts. The resulting KG had fewer, but semantically richer, relations, predominantly related to Leopardi's literary activities and health, highlighting the extracted knowledge's relevance to understanding his life and work.<\/jats:p>","DOI":"10.3389\/fcomp.2024.1472512","type":"journal-article","created":{"date-parts":[[2024,10,31]],"date-time":"2024-10-31T06:11:04Z","timestamp":1730355064000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Combining language models for knowledge extraction from Italian TEI editions"],"prefix":"10.3389","volume":"6","author":[{"given":"Cristian","family":"Santini","sequence":"first","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,10,31]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"505","DOI":"10.1007\/978-3-319-23201-0_51","article-title":"\u201cDisambiguation of named entities in cultural heritage texts using linked data sets,\u201d","author":"Brando","year":"2015","journal-title":"New Trends in Databases and Information Systems, Communications in Computer and Information Science"},{"key":"B2","first-page":"168","article-title":"\u201cGATE: a framework and graphical development environment for robust NLP tools and applications,\u201d","volume-title":"Proc. 40th annual meeting of the association for computational linguistics (ACL 2002)","author":"Cunningham","year":"2002"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2109.11406","article-title":"Named entity recognition and classification on historical documents: a survey","author":"Ehrmann","year":"2021","journal-title":"arXiv"},{"key":"B4","article-title":"\u201cText2amr2fred, a tool for transforming text into rdf\/owl knowledge graphs via abstract meaning representation,\u201d","volume-title":"ISWC (Posters\/Demos\/Industry)","author":"Gangemi","year":"2023"},{"key":"B5","unstructured":"\u201cKnowledge extraction from multilingual and historical texts for advanced question answering,\u201d\n          \n          \n            \n              Graciotti\n              A.\n            \n          \n          Proceedings of the Doctoral Consortium at ISWC 2023 co-located with 22nd International Semantic Web Conference (ISWC 2023), Athens, Greece, November 7, 2023, volume 3678 of CEUR Workshop Proceedings\n          \n          2023"},{"key":"B6","first-page":"10172","article-title":"\u201cLatent vs explicit knowledge representation: how ChatGPT answers questions about low-frequency entities,\u201d","author":"Graciotti","year":"2024","journal-title":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)"},{"key":"B7","doi-asserted-by":"publisher","first-page":"8","DOI":"10.5334\/johd.21","article-title":"Methods for extracting relational data from unstructured texts prior to network visualization in humanities research","volume":"6","author":"Graham","year":"2020","journal-title":"J. Open Humanit. Data"},{"key":"B8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3447772","article-title":"Knowledge graphs","volume":"71","author":"Hogan","year":"2021","journal-title":"ACM Comput. Surv"},{"key":"B9","doi-asserted-by":"publisher","first-page":"2370","DOI":"10.18653\/v1\/2021.findings-emnlp.204","article-title":"\u201cREBEL: relation extraction by end-to-end language generation,\u201d","author":"Huguet Cabot","year":"2021","journal-title":"Findings of the Association for Computational Linguistics: EMNLP 2021"},{"key":"B10","doi-asserted-by":"publisher","first-page":"4326","DOI":"10.18653\/v1\/2023.acl-long.237","article-title":"\u201cREDfm: a filtered and multilingual relation extraction dataset,\u201d","author":"Huguet Cabot","year":"2023","journal-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)"},{"key":"B11","first-page":"52","article-title":"\u201cGenerating domain-specific knowledge graphs: challenges with open information extraction,\u201d","volume-title":"TEXT2KG\/MK@ ESWC","author":"Jain","year":"2022"},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv:2402.13364","article-title":"A simple but effective approach to improve structured language model output for information extraction","author":"Li","year":"2024","journal-title":"arXiv"},{"key":"B13","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1007\/s00799-021-00319-6","article-title":"MELHISSA: a multilingual entity linking architecture for historical press articles","volume":"23","author":"Linhares Pontes","year":"2022","journal-title":"Int. J. Digit. Libr"},{"key":"B14","doi-asserted-by":"crossref","first-page":"10572","DOI":"10.18653\/v1\/2023.findings-emnlp.710","article-title":"\u201cLarge language model is not a good few-shot information extractor, but a good Reranker for Hard Samples!\u201d","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023","author":"Ma","year":"2023"},{"key":"B15","doi-asserted-by":"crossref","first-page":"55","DOI":"10.3115\/v1\/P14-5010","article-title":"\u201cThe stanford CoreNLP natural language processing toolkit,\u201d","volume-title":"Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations","author":"Manning","year":"2014"},{"key":"B16","doi-asserted-by":"publisher","first-page":"65","DOI":"10.36181\/digitalia-00026","article-title":"Il progetto biblioteca digitale leopardiana: per una catalogazione e digitalizzazione dei manoscritti autografi di Giacomo Leopardi","volume":"16","author":"Melosi","year":"2021","journal-title":"DigItalia"},{"key":"B17","unstructured":"ChatGPT: Optimizing Language Models for Dialogue\n          \n          2023"},{"key":"B18","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1908.10084","article-title":"Sentence-BERT: sentence embeddings using siamese BERT-networks","author":"Reimers","year":"2019","journal-title":"arXiv"},{"key":"B19","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1007\/978-3-642-40501-3_16","article-title":"\u201cEntity network extraction based on association finding and relation extraction,\u201d","volume-title":"Research and Advanced Technology for Digital Libraries: International Conference on Theory and Practice of Digital Libraries, TPDL 2013, Valletta, Malta, September 22-26, 2013. Proceedings 3","author":"Reinanda","year":"2013"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.46298\/jdmdh.5044","article-title":"Mapping the Bentham corpus: concept-based navigation","author":"Ruiz","year":"2019","journal-title":"J. Data Min. Digit. Humanit"},{"key":"B21","article-title":"\u201cThe art of relations,\u201d","volume-title":"Book of Abstracts DHd 2024","author":"Santini","year":"2024"},{"key":"B22","unstructured":"\u201cKnowledge extraction for art history: the case of Vasari's the lives of the artists (1568),\u201d\n          \n          \n            \n              Santini\n              C.\n            \n            \n              Tan\n              M. A.\n            \n            \n              Tietz\n              T.\n            \n            \n              Bruns\n              O.\n            \n            \n              Posthumus\n              E.\n            \n            \n              Sack\n              H.\n            \n          \n          Proceedings of the Third Conference on Digital Curation Technologies (Qurator 2022) Berlin, Germany, Sept. 19th-23rd, 2022, volume 3234 of CEUR Workshop Proceedings\n          \n          2022"},{"key":"B23","doi-asserted-by":"publisher","first-page":"527","DOI":"10.3233\/SW-222986","article-title":"Neural entity linking: a survey of models based on deep learning","volume":"13","author":"Sevgili","year":"2022","journal-title":"Semant. Web"},{"key":"B24","doi-asserted-by":"publisher","first-page":"100679","DOI":"10.1016\/j.websem.2021.100679","article-title":"A study of the quality of Wikidata","volume":"72","author":"Shenoy","year":"2022","journal-title":"J. Web Semant"},{"key":"B25","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1111\/j.1749-818X.2010.00230.x","article-title":"Natural language processing for cultural heritage domains","volume":"4","author":"Sporleder","year":"2010","journal-title":"Lang. Linguist. Compass"},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.4855\/arXiv.2305.04676","article-title":"Enhancing knowledge graph construction using large language models","author":"Trajanoska","year":"2023","journal-title":"arXiv"},{"key":"B27","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1093\/llc\/fqt067","article-title":"Exploring entity recognition and disambiguation for cultural heritage collections","volume":"30","author":"van Hooland","year":"2015","journal-title":"Digit. Scholarsh. Humanit"},{"key":"B28","volume-title":"Natural language processing with Python and spaCy: A practical introduction","author":"Vasiliev","year":"2020"},{"key":"B29","doi-asserted-by":"publisher","first-page":"607","DOI":"10.1016\/j.fmre.2021.09.003","article-title":"Knowledge graph quality control: a survey","volume":"1","author":"Wang","year":"2021","journal-title":"Fundam. Res"},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.01555","article-title":"How to unleash the power of large language models for few-shot relation extraction?","author":"Xu","year":"2023","journal-title":"arXiv"},{"key":"B31","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1145\/3674501","article-title":"A comprehensive survey on relation extraction: recent advances and new frontiers","volume":"56","author":"Zhao","year":"2024","journal-title":"ACM Comput. Surv"}],"container-title":["Frontiers in Computer Science"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1472512\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,31]],"date-time":"2024-10-31T06:11:12Z","timestamp":1730355072000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1472512\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,31]]},"references-count":31,"alternative-id":["10.3389\/fcomp.2024.1472512"],"URL":"https:\/\/doi.org\/10.3389\/fcomp.2024.1472512","relation":{},"ISSN":["2624-9898"],"issn-type":[{"value":"2624-9898","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,31]]},"article-number":"1472512"}}