{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T07:23:52Z","timestamp":1777879432209,"version":"3.51.4"},"reference-count":32,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T00:00:00Z","timestamp":1761955200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T00:00:00Z","timestamp":1761955200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Semantic Web: \u2013 Interoperability, Usability, Applicability"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:p>Integrating Schema.org markup into web pages has resulted in the generation of billions of RDF triples. However, around 75% of web pages still lack this critical markup. Large language models (LLMs) present a promising solution by automatically generating the missing Schema.org markup. Despite this potential, there is currently no benchmark to evaluate the markup quality produced by LLMs. This article introduces LLM4Schema.org, an innovative approach for assessing the performance of LLMs in generating Schema.org markup. Unlike traditional methods, LLM4Schema.org does not require a predefined ground truth. Instead, it compares the quality of LLM-generated markup against human-generated markup. Our findings reveal that 40%\u201350% of the markup produced by GPT-3.5 and GPT-4 is invalid, non-factual, or non-compliant with the Schema.org ontology. These errors underscore the limitations of LLMs in adhering strictly to structured ontologies like Schema.org without additional filtering and validation mechanisms. We demonstrate that specialized LLM-powered agents can effectively identify and eliminate these errors. After applying such filtering for both human and LLM-generated markup, GPT-4 shows notable improvements in quality and outperforms humans. LLM4Schema.org highlights both the potential and the challenges of leveraging LLMs for semantic annotations, emphasizing the critical role of careful curation and validation to achieve reliable results.<\/jats:p>","DOI":"10.1177\/22104968251382172","type":"journal-article","created":{"date-parts":[[2025,11,6]],"date-time":"2025-11-06T18:40:15Z","timestamp":1762454415000},"update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["LLM4Schema.org: Generating Schema.org Markups With Large Language Models"],"prefix":"10.1177","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3531-0132","authenticated-orcid":false,"given":"Minh-Hoang","family":"Dang","sequence":"first","affiliation":[{"name":"Laboratoire des Sciences du Num\u00e9rique de Nantes (LS2N), Nantes Universit\u00e9, Nantes, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0176-2245","authenticated-orcid":false,"given":"Thi Hoang Thi","family":"Pham","sequence":"additional","affiliation":[{"name":"Laboratoire des Sciences du Num\u00e9rique de Nantes (LS2N), Nantes Universit\u00e9, Nantes, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8048-273X","authenticated-orcid":false,"given":"Pascal","family":"Molli","sequence":"additional","affiliation":[{"name":"Laboratoire des Sciences du Num\u00e9rique de Nantes (LS2N), Nantes Universit\u00e9, Nantes, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1062-6659","authenticated-orcid":false,"given":"Hala","family":"Skaf-Molli","sequence":"additional","affiliation":[{"name":"Laboratoire des Sciences du Num\u00e9rique de Nantes (LS2N), Nantes Universit\u00e9, Nantes, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3597-8557","authenticated-orcid":false,"given":"Alban","family":"Gaignard","sequence":"additional","affiliation":[{"name":"CNRS, INSERM, l\u2019Institut du Thorax, Nantes Universit\u00e9, Nantes, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2025,11,6]]},"reference":[{"key":"e_1_3_4_2_1","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.1163"},{"key":"e_1_3_4_3_1","unstructured":"Achiam J. Adler S. Agarwal S. Ahmad L. Akkaya I. Aleman F. L. Almeida D. Altenschmidt J. Altman S. Anadkat S. Avila R. Babuschkin l Balaji S. Balcom V. Baltescu P. Bao H. Bavarian M. Belgum J. Bello I. Berdine J. ... Zoph B. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774."},{"key":"e_1_3_4_4_1","doi-asserted-by":"crossref","unstructured":"Allen B. P. Polat F. Groth P. (2024). Shroom-indelab at semeval-2024 task 6: Zero-and few-shot LLM-based classification for hallucination detection. arXiv preprint arXiv:2404.03732.","DOI":"10.18653\/v1\/2024.semeval-1.120"},{"key":"e_1_3_4_5_1","unstructured":"Allen B. P. Stork L. Groth P. (2023). Knowledge engineering using large language models. arXiv preprint arXiv:2310.00637."},{"key":"e_1_3_4_6_1","doi-asserted-by":"publisher","DOI":"10.1080\/19386389.2024.2392419"},{"key":"e_1_3_4_7_1","unstructured":"Boylan J. Mangla S. Thorn D. Ghalandari D. G. Ghaffari P. Hokamp C. (2024). Kgvalidator: A framework for automatic validation of knowledge graph construction. arXiv preprint arXiv:2404.15923."},{"key":"e_1_3_4_8_1","doi-asserted-by":"crossref","unstructured":"Brinkmann A. Primpeli A. Bizer C. (2023). The web data commons Schema.org data set series. In Companion proceedings of the ACM web conference 2023 WWW 2023 (pp. 136\u2013139). ACM.","DOI":"10.1145\/3543873.3587331"},{"key":"e_1_3_4_9_1","unstructured":"Brown T. B. Mann B. Ryder N. Subbiah M. Kaplan J. D. Dhariwal P. Neelakantan A. Shyam P. Sastry G. Askell A. Agarwal S. Herbert-Voss A. Krueger G. Henighan T. Child R. Ramesh A. Ziegler D. Wu J. Winter C. Hesse C. ... Amodei D. (2020). Language models are few-shot learners. In 33: Annual conference on neural information processing systems 2020 NeurIPS 2020 December 6\u201312 2020 virtual."},{"key":"e_1_3_4_10_1","unstructured":"Chalkidis I. Dai X. Fergadiotis M. Malakasiotis P. Elliott D. (2022). An exploration of hierarchical attention transformers for efficient long document classification. CoRR abs\/2210.05529. https:\/\/doi.org\/10.48550\/arXiv.2210.05529."},{"key":"e_1_3_4_11_1","unstructured":"Dang M. H. Gaignard A. Skaf-Molli H. Molli P. (2023). Schema.org: How is it used? In 22nd International semantic web conference (ISWC 2023) Posters and Demos track CEUR Workshop Proceedings (Vol. 3632). CEUR-WS.org."},{"key":"e_1_3_4_12_1","unstructured":"Dong Z. Tang T. Li J. Zhao W. X. (2023). A survey on long text modeling with transformers. CoRR abs\/2302.14502. https:\/\/doi.org\/10.48550\/arXiv.2302.14502."},{"key":"e_1_3_4_13_1","doi-asserted-by":"publisher","DOI":"10.1108\/EL-06-2023-0160"},{"key":"e_1_3_4_14_1","unstructured":"Google. (2025) General structured data guidelines. Retrieved June 12 2025 from https:\/\/developers.google.com\/search\/docs\/appearance\/structured-data\/sd-policies#content."},{"key":"e_1_3_4_15_1","unstructured":"Groth P. Lauruhn M. Scerri A. Daniel Jr R. (2018). Open information extraction on scientific text: An evaluation. arXiv preprint arXiv:1802.05574."},{"key":"e_1_3_4_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2844544"},{"key":"e_1_3_4_17_1","doi-asserted-by":"crossref","unstructured":"Han J. Collier N. Buntine W. L. Shareghi E. (2023). Pive: Prompting with iterative verification improving graph-based generative capability of LLMs. CoRR abs\/2305.12392.","DOI":"10.18653\/v1\/2024.findings-acl.400"},{"key":"e_1_3_4_18_1","unstructured":"Huang Y. Song J. Wang Z. Chen H. Ma L. (2023). Look before you leap: An exploratory study of uncertainty measurement for large language models. arXiv preprint arXiv:2307.10236."},{"key":"e_1_3_4_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3571730"},{"key":"e_1_3_4_20_1","unstructured":"Jiang A. Q. Sablayrolles A. Roux A. Mensch A. Savary B. Bamford C. Chaplot D. S. Casas D. dl. Hanna E. B. Bressand F. Lengyel G. Bour G. Lample G. Lavaud L. R. Saulnier L. Lachaux M. A. Stock P. Subramanian S. Yang S. Antoniak S. ... Sayed W. E. (2024). Mixtral of experts. arXiv preprint arXiv:2401.04088."},{"key":"e_1_3_4_21_1","doi-asserted-by":"crossref","unstructured":"Kumar A. Pandey A. Gadia R. Mishra M. (2020). Building knowledge graph using pre-trained language model for learning entity-aware relationships. In 2020 IEEE international conference on computing power and communication technologies (GUCON) (pp. 310\u2013315). https:\/\/doi.org\/10.1109\/GUCON48875.2020.9231227.","DOI":"10.1109\/GUCON48875.2020.9231227"},{"key":"e_1_3_4_22_1","doi-asserted-by":"crossref","unstructured":"Manakul P. Liusie A. Gales M. J. (2023). Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.","DOI":"10.18653\/v1\/2023.emnlp-main.557"},{"key":"e_1_3_4_23_1","doi-asserted-by":"crossref","unstructured":"Mehta R. Hoblitzell A. O\u2019Keefe J. Jang H. Varma V. (2024). Metacheckgpt\u2014A multi-task hallucination detection using llm uncertainty and meta-models. arXiv preprint arXiv:2404.06948.","DOI":"10.18653\/v1\/2024.semeval-1.52"},{"key":"e_1_3_4_24_1","doi-asserted-by":"crossref","unstructured":"Meyer L. P. Stadler C. Frey J. Radtke N. Junghanns K. Meissner R. Dziwis G. Bulert K. Martin M. (2023). LLM-assisted knowledge graph engineering: Experiments with chatgpt. In Working conference on artificial intelligence development for a resilient and sustainable tomorrow (pp. 103\u2013115). Springer Fachmedien Wiesbaden Wiesbaden.","DOI":"10.1007\/978-3-658-43705-3_8"},{"key":"e_1_3_4_25_1","doi-asserted-by":"crossref","unstructured":"Mickus T. Zosa E. V\u00e1zquez R. Vahtola T. Tiedemann J. Segonne V. Raganato A. Apidianaki M. (2024). Semeval-2024 shared task 6: Shroom a shared-task on hallucinations and related observable overgeneration mistakes. arXiv preprint arXiv:2403.07726.","DOI":"10.18653\/v1\/2024.semeval-1.273"},{"key":"e_1_3_4_26_1","doi-asserted-by":"crossref","unstructured":"Mihindukulasooriya N. Tiwari S. Enguix C. F. Lata K. (2023). Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text. In 22nd International semantic web conference proceedings Part II Lecture Notes in Computer Science (Vol. 14266 pp. 247\u2013265). Springer.","DOI":"10.1007\/978-3-031-47243-5_14"},{"key":"e_1_3_4_27_1","unstructured":"M\u00fcndler N. He J. Jenko S. Vechev M. (2023). Self-contradictory hallucinations of large language models: Evaluation detection and mitigation. arXiv preprint arXiv:2305.15852."},{"key":"e_1_3_4_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2024.3352100"},{"key":"e_1_3_4_29_1","unstructured":"Schemaorg. (2025) Getting started with Schema.org. Retrieved June 12 2025 from https:\/\/schema.org\/docs\/gs.html#schemaorg_expected."},{"key":"e_1_3_4_30_1","doi-asserted-by":"crossref","unstructured":"Shen L. Tan W. Chen S. Chen Y. Zhang J. Xu H. Zheng B. Koehn P. Khashabi D. (2024). The language barrier: Dissecting safety challenges of LLMs in multilingual contexts. arXiv preprint arXiv:2401.13136.","DOI":"10.18653\/v1\/2024.findings-acl.156"},{"key":"e_1_3_4_31_1","doi-asserted-by":"crossref","unstructured":"Wei C. Chen Z. Fang S. He J. Gao M. (2024). Opdai at semeval-2024 task 6: Small LLMs can accelerate hallucination detection with weakly supervised data. arXiv preprint arXiv:2402.12913.","DOI":"10.18653\/v1\/2024.semeval-1.104"},{"key":"e_1_3_4_32_1","doi-asserted-by":"crossref","unstructured":"Zhang X. Li S. Hauer B. Shi N. Kondrak G. (2023). Don\u2019t trust chatgpt when your question is not in English: A study of multilingual abilities and types of LLMs. In Proceedings of the 2023 conference on empirical methods in natural language processing (pp. 7915\u20137927).","DOI":"10.18653\/v1\/2023.emnlp-main.491"},{"key":"e_1_3_4_33_1","doi-asserted-by":"crossref","unstructured":"Zhu Y. Wang X. Chen J. Qiao S. Ou Y. Yao Y. Deng S. Chen H. Zhang N. (2023). LLMs for knowledge graph construction and reasoning: Recent capabilities and future opportunities. CoRR abs\/2305.13168.","DOI":"10.1007\/s11280-024-01297-w"}],"container-title":["Semantic Web: \u2013 Interoperability, Usability, Applicability"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/22104968251382172","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/22104968251382172","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/22104968251382172","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T23:43:26Z","timestamp":1777592606000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/22104968251382172"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11]]},"references-count":32,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["10.1177\/22104968251382172"],"URL":"https:\/\/doi.org\/10.1177\/22104968251382172","relation":{},"ISSN":["1570-0844","2210-4968"],"issn-type":[{"value":"1570-0844","type":"print"},{"value":"2210-4968","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11]]},"article-number":"22104968251382172"}}