{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T23:14:42Z","timestamp":1772147682223,"version":"3.50.1"},"reference-count":20,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2025,6,30]],"date-time":"2025-06-30T00:00:00Z","timestamp":1751241600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Hong Kong Research Grants Council grants General Research Fund","award":["17113721"],"award-info":[{"award-number":["17113721"]}]},{"name":"Shenzhen Municipal Government General Program","award":["JCYJ20210324134405015"],"award-info":[{"award-number":["JCYJ20210324134405015"]}]},{"name":"University Research Committee fund from HKU"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has become cost-effective, interpreting these variants\u2014particularly collecting literature-based evidence like ACMG\/AMP PM3\u2014remains complex and time-consuming.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present AutoPM3, a method that automates PM3 evidence extraction from literatures using open-source large language models (LLMs). AutoPM3 combines a Text2SQL-based variant extractor and a retrieval-augmented generation (RAG) module, enhanced by a variant-specific retriever and fine-tuned LLM, to separately process tables and text. We curated PM3-Bench, a dataset of 1027 variant-publication evidence pairs from ClinGen. On openly accessible pairs, AutoPM3 achieved 86.1% accuracy for variant hits and 72.5% recall for in trans variants\u2014outperforming other methods, including those using larger models. We uncovered the effectiveness of AutoPM3\u2019s key modules, especially for variant-specific retriever and Text2SQL, through the sequential ablation study. AutoPM3 located evidence in 76\u2009s, demonstrating that open-source LLMs can offer an efficient, cost-effective solution for rare disease diagnosis.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>AutoPM3 is implemented and freely available under the MIT license at https:\/\/github.com\/HKU-BAL\/AutoPM3.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf382","type":"journal-article","created":{"date-parts":[[2025,6,28]],"date-time":"2025-06-28T07:53:16Z","timestamp":1751097196000},"source":"Crossref","is-referenced-by-count":3,"title":["AutoPM3: enhancing variant interpretation via LLM-driven PM3 evidence extraction from scientific literature"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3691-1098","authenticated-orcid":false,"given":"Shumin","family":"Li","sequence":"first","affiliation":[{"name":"Department of Computer Science, School of Computing and Data Science, University of Hong Kong , Hong Kong, 999077,","place":["China"]},{"name":"School of Biomedical Sciences, University of Hong Kong , Hong Kong, 999077,","place":["China"]},{"name":"Hong Kong Genome Institute , Hong Kong, 999077,","place":["China"]}]},{"given":"Yiding","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, School of Computing and Data Science, University of Hong Kong , Hong Kong, 999077,","place":["China"]}]},{"given":"Chi-Man","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, School of Computing and Data Science, University of Hong Kong , Hong Kong, 999077,","place":["China"]}]},{"given":"Yuanhua","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Biomedical Sciences, University of Hong Kong , Hong Kong, 999077,","place":["China"]},{"name":"Department of Statistics and Actuarial Science, The University of Hong Kong , Hong Kong, 999077,","place":["China"]},{"name":"Center for Translational Stem Cell Biology, Hong Kong Science and Technology Park , Hong Kong, 999077,","place":["China"]}]},{"given":"Tak-Wah","family":"Lam","sequence":"additional","affiliation":[{"name":"Department of Computer Science, School of Computing and Data Science, University of Hong Kong , Hong Kong, 999077,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9711-6533","authenticated-orcid":false,"given":"Ruibang","family":"Luo","sequence":"additional","affiliation":[{"name":"Department of Computer Science, School of Computing and Data Science, University of Hong Kong , Hong Kong, 999077,","place":["China"]},{"name":"Hong Kong Genome Institute , Hong Kong, 999077,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,6,30]]},"reference":[{"key":"2025071516412238600_btaf382-B1","doi-asserted-by":"crossref","first-page":"W530","DOI":"10.1093\/nar\/gky355","article-title":"LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC","volume":"46","author":"Allot","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2025071516412238600_btaf382-B2","doi-asserted-by":"crossref","first-page":"901","DOI":"10.1038\/s41588-023-01414-x","article-title":"Tracking genetic variants in the biomedical literature using LitVar 2.0","volume":"55","author":"Allot","year":"2023","journal-title":"Nat Genet"},{"key":"2025071516412238600_btaf382-B3","doi-asserted-by":"crossref","first-page":"AIcs2400245","DOI":"10.1056\/AIcs2400245","article-title":"GPT-4 performance, nondeterminism, and drift in genetic literature review","volume":"1","author":"Aronson","year":"2024","journal-title":"NEJM AI"},{"key":"2025071516412238600_btaf382-B4","doi-asserted-by":"crossref","first-page":"bat064","DOI":"10.1093\/database\/bat064","article-title":"BioC: a minimalist approach to interoperability for biomedical text processing","volume":"2013","author":"Comeau","year":"2013","journal-title":"Database"},{"key":"2025071516412238600_btaf382-B5","doi-asserted-by":"crossref","first-page":"1418","DOI":"10.1038\/s41467-024-45563-x","article-title":"Structured information extraction from scientific text with large language models","volume":"15","author":"Dagdelen","year":"2024","journal-title":"Nat Commun"},{"key":"2025071516412238600_btaf382-B6","author":"Hu"},{"key":"2025071516412238600_btaf382-B7","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1038\/s41746-024-01079-8","article-title":"A critical assessment of using ChatGPT for extracting structured data from clinical notes","volume":"7","author":"Huang","year":"2024","journal-title":"NPJ Digit Med"},{"key":"2025071516412238600_btaf382-B8","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1016\/j.ajhg.2016.08.016","article-title":"REVEL: an ensemble method for predicting the pathogenicity of rare missense variants","volume":"99","author":"Ioannidis","year":"2016","journal-title":"Am J Hum Genet"},{"key":"2025071516412238600_btaf382-B9","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1038\/s41586-020-2308-7","article-title":"The mutational constraint spectrum quantified from variation in 141,456 humans","volume":"581","author":"Karczewski","year":"2020","journal-title":"Nature"},{"key":"2025071516412238600_btaf382-B10","doi-asserted-by":"crossref","first-page":"1978","DOI":"10.1093\/bioinformatics\/bty897","article-title":"VarSome: the human genomic variant search engine","volume":"35","author":"Kopanos","year":"2019","journal-title":"Bioinformatics"},{"key":"2025071516412238600_btaf382-B11","doi-asserted-by":"crossref","first-page":"2811","DOI":"10.1093\/bioinformatics\/btab051","article-title":"Mutalyzer 2: next generation HGVS nomenclature checker","volume":"37","author":"Lefter","year":"2021","journal-title":"Bioinformatics"},{"key":"2025071516412238600_btaf382-B12","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1038\/s41467-024-45914-8","article-title":"Extracting accurate materials data from research papers with conversational language models and prompt engineering","volume":"15","author":"Polak","year":"2024","journal-title":"Nat Commun"},{"key":"2025071516412238600_btaf382-B13","first-page":"3505","author":"Rasley","year":"2020"},{"key":"2025071516412238600_btaf382-B14","doi-asserted-by":"crossref","first-page":"2235","DOI":"10.1056\/NEJMsr1406261","article-title":"ClinGen\u2014the clinical genome resource","volume":"372","author":"Rehm","year":"2015","journal-title":"N Engl J Med"},{"key":"2025071516412238600_btaf382-B15","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1038\/gim.2015.30","article-title":"Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology","volume":"17","author":"Richards","year":"2015","journal-title":"Genet Med"},{"key":"2025071516412238600_btaf382-B16","doi-asserted-by":"crossref","first-page":"2004","DOI":"10.1038\/nprot.2015.124","article-title":"Next-generation diagnostics and disease-gene discovery with the Exomiser","volume":"10","author":"Smedley","year":"2015","journal-title":"Nat Protoc"},{"key":"2025071516412238600_btaf382-B17","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1016\/j.ajhg.2016.07.005","article-title":"A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease","volume":"99","author":"Smedley","year":"2016","journal-title":"Am J Hum Genet"},{"key":"2025071516412238600_btaf382-B18","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1038\/s41572-024-00505-1","article-title":"Rare diseases: challenges and opportunities for research and public health","volume":"10","author":"Taruscio","year":"2024","journal-title":"Nat Rev Dis Primers"},{"key":"2025071516412238600_btaf382-B19","doi-asserted-by":"crossref","first-page":"W540","DOI":"10.1093\/nar\/gkae235","article-title":"PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge","volume":"52","author":"Wei","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025071516412238600_btaf382-B20","first-page":"6233","author":"Xiong"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf382\/63628233\/btaf382.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf382\/63628233\/btaf382.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf382\/63628233\/btaf382.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T20:41:28Z","timestamp":1752612088000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf382\/8178584"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":20,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf382","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.10.29.621006","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,6,30]]},"article-number":"btaf382"}}