{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T05:35:49Z","timestamp":1769837749365,"version":"3.49.0"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T00:00:00Z","timestamp":1767916800000},"content-version":"vor","delay-in-days":8,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Leverhulme Trust Research Fellowship","award":["108290"],"award-info":[{"award-number":["108290"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,1,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The presence or absence of some genes in a genome can influence whether other genes are likely to be present or absent. Understanding these gene co-occurrence and avoidance patterns reveals fundamental principles of genome organization, with applications ranging from evolutionary reconstruction to rational design of synthetic genomes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>PanForest, presented here, uses random forest classifiers to predict the presence and absence of genes in genomes from the set of other genes present. Performance statistics output by PanForest reveal how predictable each gene\u2019s presence or absence is, based on the presence or absence of other genes in the genome. Further, PanForest produces statistics indicating the importance of each gene in predicting the presence or absence of each other gene. The PanForest software can run serially or in parallel, thereby facilitating the analysis of pangenomes at Network of Life scale.<\/jats:p>\n                    <jats:p>A pangenome of 12\u00a0741 accessory genes in 1000 Escherichia coli genomes was analysed in around 5\u2009h using eight processors. To demonstrate PanForest\u2019s utility, we present a case study and show that certain genes associated with resistance to antimicrobial drugs reliably predict the presence or absence of other genes associated with resistance to the same drug. Further, we highlight several associations between those genes and others not known to be associated with antimicrobial resistance (AMR), or associated with resistance to other drugs. We envisage PanForest\u2019s use in studies from multiple disciplines concerning the dynamics of gene distributions in pangenomes ranging from biomedical science and synthetic biology to molecular ecology.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The software if freely available with a full manual and can be found with at www.github.com\/alanbeavan\/PanForest DOI: https:\/\/doi.org\/10.5281\/zenodo.17865482.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btag005","type":"journal-article","created":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T12:39:35Z","timestamp":1767875975000},"source":"Crossref","is-referenced-by-count":0,"title":["PanForest: predicting genes in genomes using random forests"],"prefix":"10.1093","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8219-6742","authenticated-orcid":false,"given":"Alan J S","family":"Beavan","sequence":"first","affiliation":[{"name":"University of Manchester School of Biological Science, Faculty of Biology, Medicine & Health, , Manchester, M13 9PL,","place":["United Kingdom"]},{"name":"School of Life Sciences, The University of Nottingham , Nottingham NG7 2UH,","place":["United Kingdom"]}]},{"given":"Maria Rosa","family":"Domingo-Sananes","sequence":"additional","affiliation":[{"name":"School of Science and Technology, Nottingham Trent University , Nottingham NG1 4FQ,","place":["United Kingdom"]}]},{"given":"James O","family":"McInerney","sequence":"additional","affiliation":[{"name":"Department of Evolution, Ecology and Behaviour, University of Liverpool , Liverpool L69 3BX,","place":["United Kingdom"]}]}],"member":"286","published-online":{"date-parts":[[2026,1,9]]},"reference":[{"key":"2026013011072067100_btag005-B1","doi-asserted-by":"crossref","first-page":"D690","DOI":"10.1093\/nar\/gkac920","article-title":"CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database","volume":"51","author":"Alcock","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2026013011072067100_btag005-B2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2026013011072067100_btag005-B3","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2026013011072067100_btag005-B4","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1609\/icwsm.v3i1.13937","article-title":"Gephi: an open source software for exploring and manipulating networks","volume":"3","author":"Bastian","year":"2009","journal-title":"ICWSM"},{"key":"2026013011072067100_btag005-B5","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2304934120","article-title":"Contingency, repeatability, and predictability in the evolution of a prokaryotic pangenome","volume":"121","author":"Beavan","year":"2024","journal-title":"Proc Natl Acad Sci USA"},{"key":"2026013011072067100_btag005-B6","doi-asserted-by":"crossref","first-page":"1510","DOI":"10.1038\/s41564-022-01231-8","article-title":"Gene essentiality evolves across a pangenome","volume":"7","author":"Beavan","year":"2022","journal-title":"Nat Microbiol"},{"key":"2026013011072067100_btag005-B7","volume-title":"Classification and Regression Trees","author":"Breiman","year":"1984"},{"key":"2026013011072067100_btag005-B8","doi-asserted-by":"crossref","first-page":"e2204206119","DOI":"10.1073\/pnas.2204206119","article-title":"Loss-of-function mutation survey revealed that genes with background-dependent fitness are rare and functionally related in yeast","volume":"119","author":"Caudal","year":"2022","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2026013011072067100_btag005-B9","doi-asserted-by":"crossref","first-page":"1042","DOI":"10.1111\/j.1523-1739.2010.01455.x","article-title":"Selectivity in mammalian extinction risk and threat types: a new measure of phylogenetic signal strength in binary traits","volume":"24","author":"Fritz","year":"2010","journal-title":"Conserv Biol"},{"key":"2026013011072067100_btag005-B10","author":"Gavriilidou","year":"2024"},{"key":"2026013011072067100_btag005-B11","author":"Hipp","year":"2020"},{"key":"2026013011072067100_btag005-B12","doi-asserted-by":"crossref","first-page":"100947","DOI":"10.1016\/j.lanmic.2024.07.010","article-title":"Antimicrobial resistance: a concise update","volume":"6","author":"Ho","year":"2025","journal-title":"Lancet Microbe"},{"key":"2026013011072067100_btag005-B13","first-page":"278","author":"Ho","year":"1995"},{"key":"2026013011072067100_btag005-B14","doi-asserted-by":"crossref","first-page":"17040","DOI":"10.1038\/nmicrobiol.2017.40","article-title":"Why prokaryotes have pangenomes","volume":"2","author":"McInerney","year":"2017","journal-title":"Nat Microbiol"},{"key":"2026013011072067100_btag005-B15","doi-asserted-by":"crossref","first-page":"3691","DOI":"10.1093\/bioinformatics\/btv421","article-title":"Roary: rapid large-scale prokaryote pan genome analysis","volume":"31","author":"Page","year":"2015","journal-title":"Bioinformatics"},{"key":"2026013011072067100_btag005-B16","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2026013011072067100_btag005-B17","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1186\/s12915-014-0066-4","article-title":"Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes","volume":"12","author":"Puigb\u00f2","year":"2014","journal-title":"BMC Biol"},{"key":"2026013011072067100_btag005-B18","doi-asserted-by":"crossref","first-page":"1580","DOI":"10.1038\/s41564-022-01208-7","article-title":"A bacterial pan-genome makes gene essentiality strain-dependent and evolvable","volume":"7","author":"Rosconi","year":"2022","journal-title":"Nat Microbiol"},{"key":"2026013011072067100_btag005-B19","doi-asserted-by":"crossref","first-page":"2498","DOI":"10.1101\/gr.1239303","article-title":"Cytoscape: a software environment for integrated models of biomolecular interaction networks","volume":"13","author":"Shannon","year":"2003","journal-title":"Genome Res"},{"key":"2026013011072067100_btag005-B20","doi-asserted-by":"crossref","first-page":"3135","DOI":"10.1093\/bioinformatics\/btq596","article-title":"GLay: community structure analysis of biological networks","volume":"26","author":"Su","year":"2010","journal-title":"Bioinformatics"},{"key":"2026013011072067100_btag005-B21","doi-asserted-by":"crossref","first-page":"13950","DOI":"10.1073\/pnas.0506758102","article-title":"Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial \u201cpan-genome\u201d","volume":"102","author":"Tettelin","year":"2005","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2026013011072067100_btag005-B22","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1186\/s13059-020-02090-4","article-title":"Producing polished prokaryotic pangenomes with the Panaroo pipeline","volume":"21","author":"Tonkin-Hill","year":"2020","journal-title":"Genome Biol"},{"key":"2026013011072067100_btag005-B23","doi-asserted-by":"crossref","first-page":"e1001284","DOI":"10.1371\/journal.pgen.1001284","article-title":"Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes","volume":"7","author":"Treangen","year":"2011","journal-title":"PLoS Genet"},{"key":"2026013011072067100_btag005-B24","volume-title":"Information Retrieval","author":"Van Rijsbergen","year":"1979"},{"key":"2026013011072067100_btag005-B25","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1016\/j.tim.2015.07.006","article-title":"Rates of lateral gene transfer in prokaryotes: high but why?","volume":"23","author":"Vos","year":"2015","journal-title":"Trends Microbiol"},{"key":"2026013011072067100_btag005-B26","article-title":"Coinfinder: detecting significant associations and dissociations in pangenomes","volume":"6","author":"Whelan","year":"2020","journal-title":"Microb Genom"},{"key":"2026013011072067100_btag005-B27","doi-asserted-by":"crossref","first-page":"1553","DOI":"10.1038\/s41467-022-29283-8","article-title":"Assessment of global health risk of antibiotic resistance genes","volume":"13","author":"Zhang","year":"2022","journal-title":"Nat Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btag005\/66327117\/btag005.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/1\/btag005\/66327117\/btag005.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/1\/btag005\/66327117\/btag005.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T16:07:29Z","timestamp":1769789249000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btag005\/8418381"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2026,1]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btag005","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,1]]},"published":{"date-parts":[[2026,1]]},"article-number":"btag005"}}