{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T08:35:31Z","timestamp":1775032531461,"version":"3.50.1"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2024,2,21]],"date-time":"2024-02-21T00:00:00Z","timestamp":1708473600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["RM1 HG010860"],"award-info":[{"award-number":["RM1 HG010860"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Institutes of Health Office of the Director","award":["R24 OD011883"],"award-info":[{"award-number":["R24 OD011883"]}]},{"DOI":"10.13039\/100000015","name":"US Department of Energy","doi-asserted-by":"crossref","award":["DE-AC0205CH11231"],"award-info":[{"award-number":["DE-AC0205CH11231"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI\/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM\u2019s native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>SPIRES is available as part of the open source OntoGPT package: https:\/\/github.com\/monarch-initiative\/ontogpt.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae104","type":"journal-article","created":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T01:49:12Z","timestamp":1708566552000},"source":"Crossref","is-referenced-by-count":77,"title":["Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5705-7831","authenticated-orcid":false,"given":"J Harry","family":"Caufield","sequence":"first","affiliation":[{"name":"Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2411-565X","authenticated-orcid":false,"given":"Harshad","family":"Hegde","sequence":"additional","affiliation":[{"name":"Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1501-1082","authenticated-orcid":false,"given":"Vincent","family":"Emonet","sequence":"additional","affiliation":[{"name":"Institute of Data Science, Faculty of Science and Engineering, Maastricht University , 6200 MD Maastricht, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6315-3707","authenticated-orcid":false,"given":"Nomi L","family":"Harris","sequence":"additional","affiliation":[{"name":"Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8175-045X","authenticated-orcid":false,"given":"Marcin P","family":"Joachimiak","sequence":"additional","affiliation":[{"name":"Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7356-1779","authenticated-orcid":false,"given":"Nicolas","family":"Matentzoglu","sequence":"additional","affiliation":[{"name":"Semanticly , Athens, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3002-9838","authenticated-orcid":false,"given":"HyeongSik","family":"Kim","sequence":"additional","affiliation":[{"name":"Robert Bosch LLC , Sunnyvale, CA 94085, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8719-7760","authenticated-orcid":false,"given":"Sierra","family":"Moxon","sequence":"additional","affiliation":[{"name":"Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2170-2250","authenticated-orcid":false,"given":"Justin T","family":"Reese","sequence":"additional","affiliation":[{"name":"Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9114-8737","authenticated-orcid":false,"given":"Melissa A","family":"Haendel","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, University of Colorado, Anschutz Medical Campus , Aurora, CO 80217, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0736-9199","authenticated-orcid":false,"given":"Peter N","family":"Robinson","sequence":"additional","affiliation":[{"name":"Berlin Institute of Health at Charit\u00e9 , 10178 Berlin, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6601-2165","authenticated-orcid":false,"given":"Christopher J","family":"Mungall","sequence":"additional","affiliation":[{"name":"Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, United States"}]}],"member":"286","published-online":{"date-parts":[[2024,2,21]]},"reference":[{"key":"2024030911150937400_btae104-B1","author":"Ateia","year":"2023"},{"key":"2024030911150937400_btae104-B2","doi-asserted-by":"crossref","first-page":"408","DOI":"10.1007\/978-3-031-47240-4_22","volume-title":"The Semantic Web \u2013 ISWC 2023","author":"Babaei Giglou","year":"2023"},{"key":"2024030911150937400_btae104-B3","first-page":"610","author":"Bender","year":"2021"},{"key":"2024030911150937400_btae104-B4","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1016\/j.websem.2009.07.002","article-title":"DBpedia \u2013 a crystallization point for the web of data","volume":"7","author":"Bizer","year":"2009","journal-title":"J Web Semant"},{"key":"2024030911150937400_btae104-B5","doi-asserted-by":"crossref","first-page":"109","DOI":"10.2165\/00002018-199920020-00002","article-title":"The medical dictionary for regulatory activities (MedDRA)","volume":"20","author":"Brown","year":"1999","journal-title":"Drug Saf"},{"key":"2024030911150937400_btae104-B6","author":"Brown","year":"2020"},{"key":"2024030911150937400_btae104-B7","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1038\/s41538-018-0032-6","article-title":"FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration","volume":"2","author":"Dooley","year":"2018","journal-title":"NPJ Sci Food"},{"key":"2024030911150937400_btae104-B8","author":"Dagdelen"},{"key":"2024030911150937400_btae104-B9","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1162\/tacl_a_00298","article-title":"What BERT is not: lessons from a new suite of psycholinguistic diagnostics for language models","volume":"8","author":"Ettinger","year":"2020","journal-title":"Trans Assoc Comput Linguist"},{"key":"2024030911150937400_btae104-B10","doi-asserted-by":"crossref","first-page":"D649","DOI":"10.1093\/nar\/gkx1132","article-title":"The reactome pathway knowledgebase","volume":"46","author":"Fabregat","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2024030911150937400_btae104-B11","doi-asserted-by":"crossref","first-page":"1838","DOI":"10.1111\/cts.13301","article-title":"Progress toward a universal biomedical data translator","volume":"15","author":"Fecho","year":"2022","journal-title":"Clin Transl Sci"},{"key":"2024030911150937400_btae104-B12","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/s13321-018-0326-3","article-title":"OGER: hybrid multi-type entity recognition","volume":"11","author":"Furrer","year":"2019","journal-title":"J Cheminform"},{"key":"2024030911150937400_btae104-B13","first-page":"836","article-title":"Effects of cromakalim and pinacidil on large epicardial and small coronary arteries in conscious dogs","volume":"255","author":"Giudicelli","year":"1990","journal-title":"J Pharmacol Exp Ther"},{"key":"2024030911150937400_btae104-B14","author":"Graybeal","year":"2019"},{"key":"2024030911150937400_btae104-B15","doi-asserted-by":"crossref","first-page":"vbac034","DOI":"10.1093\/bioadv\/vbac034","article-title":"Gilda: biomedical entity text normalization with machine-learned disambiguation as a service","volume":"2","author":"Gyori","year":"2022","journal-title":"Bioinform Adv"},{"key":"2024030911150937400_btae104-B16","doi-asserted-by":"crossref","first-page":"D1214","DOI":"10.1093\/nar\/gkv1031","article-title":"ChEBI in 2016: improved services and an expanding collection of metabolites","volume":"44","author":"Hastings","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2024030911150937400_btae104-B17","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1038\/s41597-022-01807-3","article-title":"Unifying the identification of biomedical entities with the bioregistry","volume":"9","author":"Hoyt","year":"2022","journal-title":"Sci Data"},{"key":"2024030911150937400_btae104-B18","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1186\/s12859-019-3002-3","article-title":"ROBOT: a tool for automating ontology workflows","volume":"20","author":"Jackson","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2024030911150937400_btae104-B19","author":"Ji"},{"key":"2024030911150937400_btae104-B20","first-page":"56","article-title":"The open biomedical annotator","volume":"2009","author":"Jonquet","year":"2009","journal-title":"Summit Transl Bioinform"},{"key":"2024030911150937400_btae104-B21","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1016\/j.compag.2017.10.012","article-title":"AgroPortal: a vocabulary and ontology repository for agronomy","volume":"144","author":"Jonquet","year":"2018","journal-title":"Comput Electron Agric"},{"key":"2024030911150937400_btae104-B22","first-page":"118","author":"Jupp","year":"2015"},{"key":"2024030911150937400_btae104-B23","author":"Kazakov","year":"2015"},{"key":"2024030911150937400_btae104-B24","first-page":"345","article-title":"Quantification of BERT diagnosis generalizability across medical specialties using semantic dataset distance","volume":"2021","author":"Khambete","year":"2021","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"key":"2024030911150937400_btae104-B25","author":"Kindermann"},{"key":"2024030911150937400_btae104-B26","doi-asserted-by":"crossref","first-page":"baw068","DOI":"10.1093\/database\/baw068","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","author":"Li","year":"2016","journal-title":"Database (Oxford)"},{"key":"2024030911150937400_btae104-B27","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1097\/00001813-199903000-00004","article-title":"Risk of transient hyperammonemic encephalopathy in cancer patients who received continuous infusion of 5-fluorouracil with the complication of dehydration and infection","volume":"10","author":"Liaw","year":"1999","journal-title":"Anticancer Drugs"},{"key":"2024030911150937400_btae104-B28","first-page":"265","article-title":"Medical subject headings (MeSH)","volume":"88","author":"Lipscomb","year":"2000","journal-title":"Bull Med Libr Assoc"},{"key":"2024030911150937400_btae104-B29","author":"Liu","year":"2023"},{"key":"2024030911150937400_btae104-B30","doi-asserted-by":"crossref","first-page":"bbac409","DOI":"10.1093\/bib\/bbac409","article-title":"BioGPT: generative pre-trained transformer for biomedical text generation and mining","volume":"23","author":"Luo","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024030911150937400_btae104-B31","author":"Luo","year":"2023"},{"key":"2024030911150937400_btae104-B32","author":"Matentzoglu","year":"2023"},{"key":"2024030911150937400_btae104-B33","first-page":"148","author":"Moxon","year":"2021"},{"key":"2024030911150937400_btae104-B34","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/s13326-017-0126-0","article-title":"Dead simple OWL design patterns","volume":"8","author":"Osumi-Sutherland","year":"2017","journal-title":"J Biomed Semantics"},{"key":"2024030911150937400_btae104-B35","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/978-3-030-95481-9_6","volume-title":"Reasoning Web. Declarative Artificial Intelligence","author":"Pareti","year":"2022"},{"key":"2024030911150937400_btae104-B36","author":"Qiang","year":"2023"},{"key":"2024030911150937400_btae104-B37","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1111\/j.1744-6163.2009.00201.x","article-title":"Long-term lithium therapy leading to hyperparathyroidism: a case report","volume":"45","author":"Rizwan","year":"2009","journal-title":"Perspect Psychiatr Care"},{"key":"2024030911150937400_btae104-B38","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1136\/jamia.1999.0060151","article-title":"Units of measure in clinical information systems","volume":"6","author":"Schadow","year":"1999","journal-title":"J Am Med Inform Assoc"},{"key":"2024030911150937400_btae104-B39","doi-asserted-by":"crossref","first-page":"D330","DOI":"10.1093\/nar\/gky1055","article-title":"The gene ontology resource: 20 years and still GOing strong","volume":"47","author":"The Gene Ontology Consortium","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024030911150937400_btae104-B40","author":"Touvron","year":"2023"},{"key":"2024030911150937400_btae104-B41","doi-asserted-by":"crossref","first-page":"1848","DOI":"10.1111\/cts.13302","article-title":"Biolink model: a universal schema for knowledge graphs in clinical, biomedical, and translational science","volume":"15","author":"Unni","year":"2022","journal-title":"Clin Translational Sci"},{"key":"2024030911150937400_btae104-B42","author":"Vaswani","year":"2017"},{"key":"2024030911150937400_btae104-B43","author":"Vrande\u010di\u0107"},{"key":"2024030911150937400_btae104-B44","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1001\/jama.2023.25054","article-title":"Will generative artificial intelligence deliver on its promise in health care?","volume":"331","author":"Wachter","year":"2023","journal-title":"JAMA"},{"key":"2024030911150937400_btae104-B45","doi-asserted-by":"crossref","first-page":"e23375","DOI":"10.2196\/23375","article-title":"The 2019 n2c2\/OHNLP track on clinical semantic textual similarity: overview","volume":"8","author":"Wang","year":"2020","journal-title":"JMIR Med Inform"},{"key":"2024030911150937400_btae104-B46","doi-asserted-by":"crossref","first-page":"W541","DOI":"10.1093\/nar\/gkr469","article-title":"BioPortal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications","volume":"39","author":"Whetzel","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2024030911150937400_btae104-B47","doi-asserted-by":"crossref","first-page":"D1074","DOI":"10.1093\/nar\/gkx1037","article-title":"DrugBank 5.0: a major update to the DrugBank database for 2018","volume":"46","author":"Wishart","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2024030911150937400_btae104-B48","first-page":"254","author":"Xu","year":"2015"},{"key":"2024030911150937400_btae104-B49","author":"Zhang","year":"2023"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae104\/56732753\/btae104.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/3\/btae104\/56912793\/btae104.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/3\/btae104\/56912793\/btae104.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,9]],"date-time":"2024-03-09T11:15:39Z","timestamp":1709982939000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae104\/7612230"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,2,21]]},"references-count":49,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae104","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,3,1]]},"published":{"date-parts":[[2024,2,21]]},"article-number":"btae104"}}