{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T05:35:49Z","timestamp":1769837749635,"version":"3.49.0"},"reference-count":20,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T00:00:00Z","timestamp":1768089600000},"content-version":"vor","delay-in-days":10,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Institute of Health through the National Institute for General Medical Sciences","award":["R35GM142502"],"award-info":[{"award-number":["R35GM142502"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,1,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Scientific software packages impose persistent maintenance costs due to dependency churn, version incompatibilities, and bug triage, even when the underlying algorithms are stable and well described. At the same time, peer-reviewed publications already function as the canonical record of many computational methods, yet translating narrative method descriptions into usable code remains labor-intensive and error-prone. Recent advances in large language models (LLMs) raise the question of whether published articles alone can serve as sufficient specifications for on-demand code generation, potentially reducing reliance on continuously maintained libraries.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We systematically evaluated state-of-the-art LLMs by tasking them with implementing core algorithms using only the original scientific publications as input. Across a diverse benchmark including random forests, batch correction methods, gene regulatory network inference, and gene set enrichment analysis, we show that modern LLMs can frequently reproduce package-level functionality with performance indistinguishable from established libraries. Failures and discrepancies primarily arose when manuscripts underspecified implementation details or data structures, rather than from limitations in model reasoning. These results demonstrate that literature-driven code generation is already feasible for many well-specified algorithms, while also exposing where current publication standards hinder reproducibility.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>All prompts, generated code, evaluation scripts, and benchmark datasets are publicly available at https:\/\/github.com\/xomicsdatascience\/articles-to-code.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btag015","type":"journal-article","created":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T12:38:53Z","timestamp":1768048733000},"source":"Crossref","is-referenced-by-count":0,"title":["From articles to code: on-demand generation of core algorithms from scientific publications"],"prefix":"10.1093","volume":"42","author":[{"given":"Cameron S","family":"Movassaghi","sequence":"first","affiliation":[{"name":"Department of Computational Biomedicine, Cedars Sinai Medical Center , Los Angeles, CA 90048,","place":["United States"]}]},{"given":"Amanda","family":"Momenzadeh","sequence":"additional","affiliation":[{"name":"Department of Computational Biomedicine, Cedars Sinai Medical Center , Los Angeles, CA 90048,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2753-3926","authenticated-orcid":false,"given":"Jesse G","family":"Meyer","sequence":"additional","affiliation":[{"name":"Department of Computational Biomedicine, Cedars Sinai Medical Center , Los Angeles, CA 90048,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2026,1,11]]},"reference":[{"key":"2026013011072290000_btag015-B1","doi-asserted-by":"publisher","first-page":"459","DOI":"10.1186\/s12859-023-05578-5","article-title":"pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods","volume":"24","author":"Behdenna","year":"2023","journal-title":"BMC Bioinformatics"},{"key":"2026013011072290000_btag015-B2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2026013011072290000_btag015-B3","doi-asserted-by":"publisher","author":"Chen","year":"2021","DOI":"10.48550\/arXiv.2107.03374"},{"key":"2026013011072290000_btag015-B4","author":"Context Rot: How Increasing Input Tokens Impacts LLM Performance"},{"key":"2026013011072290000_btag015-B5","doi-asserted-by":"crossref","first-page":"3590","DOI":"10.1021\/acs.analchem.8b05592","article-title":"Systematic error removal using random Forest for normalizing Large-Scale untargeted lipidomics data","volume":"91","author":"Fan","year":"2019","journal-title":"Anal Chem"},{"key":"2026013011072290000_btag015-B6","doi-asserted-by":"crossref","first-page":"btac757","DOI":"10.1093\/bioinformatics\/btac757","article-title":"GSEApy: a comprehensive package for performing gene set enrichment analysis in Python","volume":"39","author":"Fang","year":"2023","journal-title":"Bioinformatics"},{"key":"2026013011072290000_btag015-B7","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2026013011072290000_btag015-B8","doi-asserted-by":"publisher","author":"Lewis","year":"2021","DOI":"10.48550\/arXiv.2005.11401"},{"key":"2026013011072290000_btag015-B9","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1126\/science.abq1158","article-title":"Competition-level code generation with AlphaCode","volume":"378","author":"Li","year":"2022","journal-title":"Science"},{"key":"2026013011072290000_btag015-B10","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1016\/j.cels.2021.05.015","article-title":"Advances in systems biology modeling: 10 years of crowdsourcing DREAM challenges","volume":"12","author":"Meyer","year":"2021","journal-title":"Cell Syst"},{"key":"2026013011072290000_btag015-B11","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1007\/s10827-018-0702-z","article-title":"Replicability or reproducibility? On the replication crisis in computational neuroscience and sharing only relevant detail","volume":"45","author":"Mi\u0142kowski","year":"2018","journal-title":"J Comput Neurosci"},{"key":"2026013011072290000_btag015-B12","doi-asserted-by":"crossref","first-page":"783","DOI":"10.1016\/j.csbj.2024.01.013","article-title":"Augusta: from RNA-Seq to gene regulatory networks and Boolean models","volume":"23","author":"Musilova","year":"2024","journal-title":"Comput Struct Biotechnol J"},{"key":"2026013011072290000_btag015-B13","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2026013011072290000_btag015-B14","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1126\/science.1213847","article-title":"Reproducible research in computational science","volume":"334","author":"Peng","year":"2011","journal-title":"Science"},{"key":"2026013011072290000_btag015-B15","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc Natl Acad Sci USA"},{"key":"2026013011072290000_btag015-B16","author":"The New Code\u2013Sean Grove, OpenAI","year":"2025"},{"key":"2026013011072290000_btag015-B17","first-page":"313","author":"Tsakpinis","year":"2023"},{"key":"2026013011072290000_btag015-B18","doi-asserted-by":"crossref","first-page":"1279","DOI":"10.1038\/s41467-024-45659-4","article-title":"Pick-up single-cell proteomic analysis for quantifying up to 3000 proteins in a mammalian cell","volume":"15","author":"Wang","year":"2024","journal-title":"Nat Commun"},{"key":"2026013011072290000_btag015-B19","doi-asserted-by":"publisher","author":"Yang","year":"2025","DOI":"10.48550\/arXiv.2505.13360"},{"key":"2026013011072290000_btag015-B20","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbad375","article-title":"The five pillars of computational reproducibility: bioinformatics and beyond","volume":"24","author":"Ziemann","year":"2023","journal-title":"Brief Bioinform"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btag015\/66342292\/btag015.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/1\/btag015\/66342292\/btag015.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/1\/btag015\/66342292\/btag015.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T16:07:31Z","timestamp":1769789251000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btag015\/8419967"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2026,1]]},"references-count":20,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btag015","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,1]]},"published":{"date-parts":[[2026,1]]},"article-number":"btag015"}}