{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T17:54:17Z","timestamp":1775066057656,"version":"3.50.1"},"reference-count":33,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T00:00:00Z","timestamp":1774483200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100010801","name":"Xunta de Galicia","doi-asserted-by":"publisher","award":["GPC-ED431B 2024\/26"],"award-info":[{"award-number":["GPC-ED431B 2024\/26"]}],"id":[{"id":"10.13039\/501100010801","id-type":"DOI","asserted-by":"publisher"}]},{"award":["GPC-ED431B 2024\/26"],"award-info":[{"award-number":["GPC-ED431B 2024\/26"]}],"id":[{"id":"https:\/\/ror.org\/0181xnw06","id-type":"ROR","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Scholarly knowledge graphs integrate bibliographic records from heterogeneous sources and therefore require controlled, auditable deduplication. This paper presents OntoDup, an ontology-driven approach that models entity matching as a governed decision process: Matching outcomes are recorded as reified assertions enriched with governance state, evidence, provenance and operational metadata, while a separate operational view is exposed through policy-driven materialization of consumable identity links. We evaluate OntoDup on the DBLP-ACM and DBLP-Scholar benchmarks under two regimes: (i) a pre-blocked setting using the benchmark candidate lists to compare matching methods under a fixed candidate set, and (ii) an end-to-end setting that generates candidates from the graph with DeepBlocker and applies governed triage and materialization. We report operational precision\/recall\/F1 computed directly on the graph via SPARQL aggregations, characterize governance workload through state distributions, and quantify inference cost for LLM-based matchers via token and latency metadata attached to assertions. For end-to-end evaluation, we anchor operational links against a full positive reference encoded as idealized validations derived from the benchmark labels, enabling analysis of missed positives in terms of governance status and materialization policy. The experiments show that OntoDup enables evaluation at the level of consumable identity links, review workload, and inference cost, revealing operational trade-offs that are not visible from pairwise matching metrics alone.<\/jats:p>","DOI":"10.3390\/info17040325","type":"journal-article","created":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T15:10:24Z","timestamp":1774537824000},"page":"325","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["OntoDup: Governance-Aware Entity Matching for Scholarly Knowledge Graph Deduplication"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6027-9854","authenticated-orcid":false,"given":"Jorge","family":"Gal\u00e1n-Mena","sequence":"first","affiliation":[{"name":"atlanTTic Research Center for Telecommunication Technologies, Department of Telematics Engineering, Universidade de Vigo, 36310 Vigo, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4802-607X","authenticated-orcid":false,"given":"Mart\u00edn","family":"L\u00f3pez-Nores","sequence":"additional","affiliation":[{"name":"atlanTTic Research Center for Telecommunication Technologies, Department of Telematics Engineering, Universidade de Vigo, 36310 Vigo, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Pulla-S\u00e1nchez","sequence":"additional","affiliation":[{"name":"Artificial Intelligence and Assistive Technologies Research Group (GI-IATa), UNESCO Chair on Support Technologies for Educational Inclusion, Universidad Polit\u00e9cnica Salesiana, Cuenca 170143, Ecuador"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3327-2347","authenticated-orcid":false,"given":"Luis Fernando","family":"Guerrero-V\u00e1squez","sequence":"additional","affiliation":[{"name":"Artificial Intelligence and Assistive Technologies Research Group (GI-IATa), UNESCO Chair on Support Technologies for Educational Inclusion, Universidad Polit\u00e9cnica Salesiana, Cuenca 170143, Ecuador"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3687-3220","authenticated-orcid":false,"given":"Juan Pablo","family":"Salgado-Guerrero","sequence":"additional","affiliation":[{"name":"Economics and Business Management Faculty, Pontificia Universidad Cat\u00f3lica del Ecuador, Quito 170143, Ecuador"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, J., Sun, T., Xian, G., Huang, Y., and Zhao, R. (2022). Scientific knowledge graph-driven research profiling. Proceedings of the 6th International Conference on Computer Science and Application Engineering, ACM.","DOI":"10.1145\/3565387.3565423"},{"key":"ref_2","first-page":"131","article-title":"Topic discovery and hotspot analysis of scientific literature based on fine-gained knowledge graph","volume":"47","author":"Liu","year":"2024","journal-title":"Inf. Stud. Theory Appl."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Vaidhyaraman, R., Sharon Jessika, S., and Sahaaya Arul Mary, S. (2024). BERT based citation recommender and impactful nodes identifier in RDF knowledge graphs. Proceedings of the 4th International Conference on Soft Computing for Security Applications (ICSCSA), IEEE.","DOI":"10.1109\/ICSCSA64454.2024.00077"},{"key":"ref_4","first-page":"782","article-title":"Cross-domain co-author recommendation based on Knowledge Graph clustering","volume":"Volume 12672","author":"Munna","year":"2021","journal-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1059","DOI":"10.1007\/s40747-022-00806-6","article-title":"Scholarly knowledge graphs through structuring scholarly communication: A review","volume":"9","author":"Verma","year":"2023","journal-title":"Complex Intell. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"991","DOI":"10.1162\/qss_a_00322","article-title":"Challenges in building scholarly knowledge graphs for research assessment in open science","volume":"5","author":"Manghi","year":"2024","journal-title":"Quant. Sci. Stud."},{"key":"ref_7","first-page":"409","article-title":"Entity deduplication in big data graphs for scholarly communication","volume":"54","author":"Manghi","year":"2020","journal-title":"Data Technol. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"F\u00e4rber, M. (2019). The microsoft academic knowledge graph: A linked data source with 8 billion triples of scholarly data. Proceedings of the International Semantic Web Conference, Springer.","DOI":"10.1007\/978-3-030-30796-7_8"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1162\/qss_a_00183","article-title":"The Microsoft Academic Knowledge Graph enhanced: Author name disambiguation, publication classification, and embeddings","volume":"3","author":"Ao","year":"2022","journal-title":"Quant. Sci. Stud."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1609\/icwsm.v9i1.14630","article-title":"Impact of entity disambiguation errors on social network properties","volume":"Volume 9","author":"Diesner","year":"2015","journal-title":"Proceedings of the International AAAI Conference on Web and Social Media"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1660","DOI":"10.1080\/01621459.2015.1105807","article-title":"A Bayesian Approach to Graphical Record Linkage and Deduplication","volume":"111","author":"Steorts","year":"2016","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., and Raghavendra, V. (2018). Deep Learning for Entity Matching: A design space exploration. Proceedings of the International Conference on Management of Data (SIGMOD), Association for Computing Machinery.","DOI":"10.1145\/3183713.3196926"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2459","DOI":"10.14778\/3476249.3476294","article-title":"Deep learning for blocking in entity matching: A design space exploration","volume":"14","author":"Thirumuruganathan","year":"2021","journal-title":"Proc. VLDB Endow."},{"key":"ref_14","unstructured":"Zhang, Z., Groth, P., Calixto, I., and Schelter, S. (2025). A Deep Dive Into Cross-Dataset Entity Matching with Large and Small Language Models. Proceedings of the EDBT, Open Proceedings."},{"key":"ref_15","unstructured":"Peeters, R., Steiner, A., and Bizer, C. (2023). Entity matching using large language models. arXiv."},{"key":"ref_16","unstructured":"Arvanitis-Kasinikos, I., and Papadakis, G. (2026, February 26). Entity Matching with 7B LLMs: A Study on Prompting Strategies and Hardware Limitations. In Proceedings of the 27th InternationalWorkshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP 2025), Co-Located with EDBT\/ICDT 2025. Available online: https:\/\/ceur-ws.org\/Vol-3931\/paper4.pdf."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Steiner, A., Peeters, R., and Bizer, C. (2025). Fine-tuning large language models for entity matching. Proceedings of the IEEE 41st International Conference on Data Engineering Workshops (ICDEW), IEEE.","DOI":"10.1109\/ICDEW67478.2025.00006"},{"key":"ref_18","first-page":"3","article-title":"VIVO: Connecting People, Creating a Virtual Life Sciences Community","volume":"13","author":"Devare","year":"2007","journal-title":"D-Lib Mag."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1182","DOI":"10.21105\/joss.01182","article-title":"VIVO: A system for research discovery","volume":"4","author":"Conlon","year":"2019","journal-title":"J. Open Source Softw."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1080\/00031305.2022.2041482","article-title":"A Practical Approach to Proper Inference with Linked Data","volume":"76","author":"Kaplan","year":"2022","journal-title":"Am. Stat."},{"key":"ref_21","unstructured":"Lebo, T., Sahoo, S., McGuinness, D., Belhajjame, K., Cheney, J., Corsar, D., Garijo, D., Soil-Reyes, S., Zednik, S., and Zhao, J. (2013). PROV-O: The PROV Ontology; W3C Recommendation, World Wide Web Consortium."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1186\/2041-1480-4-37","article-title":"PAV ontology: Provenance, authoring and versioning","volume":"4","author":"Ciccarese","year":"2013","journal-title":"J. Biomed. Semant."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, Y., Li, J., Suhara, Y., Doan, A., and Tan, W.C. (2020). Deep entity matching with pre-trained language models. arXiv.","DOI":"10.14778\/3421424.3421431"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3665","DOI":"10.3906\/elk-1806-132","article-title":"Incremental author name disambiguation using author profile models and self-citations","volume":"27","author":"Hussain","year":"2019","journal-title":"Turk. J. Electr. Eng. Comput. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"e1536","DOI":"10.7717\/peerj-cs.1536","article-title":"Graph-based methods for Author Name Disambiguation: A survey","volume":"9","author":"Falchi","year":"2023","journal-title":"PeerJ Comput. Sci."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1016\/j.patrec.2020.05.025","article-title":"Knowledge graph based methods for record linkage","volume":"136","author":"Gautam","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_27","unstructured":"Trivedi, R., Dai, H., Wang, Y., and Song, L. (2017). Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. Proceedings of the International Conference on Machine Learning, PMLR."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3988","DOI":"10.1609\/aaai.v34i04.5815","article-title":"Diachronic embedding for temporal knowledge graph completion","volume":"Volume 34","author":"Goel","year":"2020","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Dasgupta, S.S., Ray, S.N., and Talukdar, P. (2018). Hyte: Hyperplane-based temporally aware knowledge graph embedding. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics.","DOI":"10.18653\/v1\/D18-1225"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Papadakis, G., Skoutas, D., Thanos, E., and Palpanas, T. (2019). A Survey of Blocking and Filtering Techniques for Entity Resolution. arXiv.","DOI":"10.1145\/3377455"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, W., Wei, H., Sisman, B., Dong, X.L., Faloutsos, C., and Page, D. (2020). AutoBlock: A Hands-off Blocking Framework for Entity Matching. Proceedings of the 13th ACM International Conference on Web Search and Data Mining (WSDM \u201920), ACM.","DOI":"10.1145\/3336191.3371813"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"484","DOI":"10.14778\/1920841.1920904","article-title":"Evaluation of Entity Resolution Approaches on Real-World Match Problems","volume":"3","author":"Thor","year":"2010","journal-title":"Proc. VLDB Endow."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1093\/logcom\/exi076","article-title":"A six-valued logic to reason about uncertainty and inconsistency in requirements specifications","volume":"16","year":"2006","journal-title":"J. Log. Comput."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/4\/325\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T16:01:31Z","timestamp":1775059291000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/4\/325"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,26]]},"references-count":33,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2026,4]]}},"alternative-id":["info17040325"],"URL":"https:\/\/doi.org\/10.3390\/info17040325","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,26]]}}}