{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T14:20:17Z","timestamp":1765808417773,"version":"3.41.2"},"reference-count":13,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T00:00:00Z","timestamp":1731628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete, and authoritative.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user\u2019s own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g. design, protocols) and quantitative results, can be used to answer questions about the user\u2019s specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https:\/\/github.com\/dbcraig\/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https:\/\/hub.docker.com) as dbcraig\/lmrac.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae679","type":"journal-article","created":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T17:04:36Z","timestamp":1731690276000},"source":"Crossref","is-referenced-by-count":3,"title":["LmRaC: a functionally extensible tool for LLM interrogation of user experimental results"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7917-2793","authenticated-orcid":false,"given":"Douglas B","family":"Craig","sequence":"first","affiliation":[{"name":"Department of Emergency Medicine Research, Michigan Medicine, University of Michigan , Ann Arbor, MI 48109,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0786-8377","authenticated-orcid":false,"given":"Sorin","family":"Dr\u0103ghici","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Wayne State University , Detroit, MI 48202,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2024,11,15]]},"reference":[{"author":"Beltagy","key":"2024121400584956000_btae679-B1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1371"},{"key":"2024121400584956000_btae679-B2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3458754","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Gu","year":"2021","journal-title":"ACM Trans Comput Healthcare"},{"author":"Huang","key":"2024121400584956000_btae679-B3","doi-asserted-by":"publisher","DOI":"10.1145\/3703155"},{"year":"2023","author":"L\u00e1la","key":"2024121400584956000_btae679-B4"},{"key":"2024121400584956000_btae679-B5","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"author":"Lewis","key":"2024121400584956000_btae679-B6"},{"key":"2024121400584956000_btae679-B7","doi-asserted-by":"publisher","first-page":"bbac409","DOI":"10.1093\/bib\/bbac409","article-title":"BioGPT: generative pre-trained transformer for biomedical text generation and mining","volume":"23","author":"Luo","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024121400584956000_btae679-B9","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1038\/d41586-023-00816-5","article-title":"GPT-4 is here: what scientists think","volume":"615","author":"Sanderson","year":"2023","journal-title":"Nature"},{"key":"2024121400584956000_btae679-B10","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pdig.0000568","volume-title":"PLOS Digital Health","author":"Soong","year":"2024"},{"key":"2024121400584956000_btae679-B8","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1038\/nature11252","article-title":"Comprehensive molecular characterization of human colon and rectal cancer","volume":"487","year":"2012","journal-title":"Nature"},{"key":"2024121400584956000_btae679-B11","doi-asserted-by":"publisher","first-page":"2983","DOI":"10.1038\/s41591-023-02594-z","article-title":"Large language models should be used as scientific reasoning engines, not knowledge databases","volume":"29","author":"Truhn","year":"2023","journal-title":"Nat Med"},{"key":"2024121400584956000_btae679-B12","doi-asserted-by":"publisher","DOI":"10.1056\/aioa2300068","article-title":"Almanac\u2014retrieval-augmented language models for clinical medicine","volume":"1","author":"Zakka","year":"2024","journal-title":"NEJM AI"},{"year":"2023","author":"Zhang","key":"2024121400584956000_btae679-B13"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae679\/60686186\/btae679.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/12\/btae679\/60934704\/btae679.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/12\/btae679\/60934704\/btae679.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,14]],"date-time":"2024-12-14T00:58:53Z","timestamp":1734137933000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae679\/7901216"}},"subtitle":[],"editor":[{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,11,15]]},"references-count":13,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,11,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae679","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2024,12]]},"published":{"date-parts":[[2024,11,15]]},"article-number":"btae679"}}