{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T16:41:13Z","timestamp":1775839273002,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2023,4,22]],"date-time":"2023-04-22T00:00:00Z","timestamp":1682121600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["9678241"],"award-info":[{"award-number":["9678241"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["7005866"],"award-info":[{"award-number":["7005866"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["7005453"],"award-info":[{"award-number":["7005453"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,5,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>As prevalent extrachromosomal replicons in many bacteria, plasmids play an essential role in their hosts\u2019 evolution and adaptation. The host range of a plasmid refers to the taxonomic range of bacteria in which it can replicate and thrive. Understanding host ranges of plasmids sheds light on studying the roles of plasmids in bacterial evolution and adaptation. Metagenomic sequencing has become a major means to obtain new plasmids and derive their hosts. However, host prediction for assembled plasmid contigs still needs to tackle several challenges: different sequence compositions and copy numbers between plasmids and the hosts, high diversity in plasmids, and limited plasmid annotations. Existing tools have not yet achieved an ideal tradeoff between sensitivity and precision on metagenomic assembled contigs.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this work, we construct a hierarchical classification tool named HOTSPOT, whose backbone is a phylogenetic tree of the bacterial hosts from phylum to species. By incorporating the state-of-the-art language model, Transformer, in each node\u2019s taxon classifier, the top-down tree search achieves an accurate host taxonomy prediction for the input plasmid contigs. We rigorously tested HOTSPOT on multiple datasets, including RefSeq complete plasmids, artificial contigs, simulated metagenomic data, mock metagenomic data, the Hi-C dataset, and the CAMI2 marine dataset. All experiments show that HOTSPOT outperforms other popular methods.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code of HOTSPOT is available via: https:\/\/github.com\/Orin-beep\/HOTSPOT<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad283","type":"journal-article","created":{"date-parts":[[2023,4,22]],"date-time":"2023-04-22T17:37:42Z","timestamp":1682185062000},"source":"Crossref","is-referenced-by-count":23,"title":["HOTSPOT: hierarchical host prediction for assembled plasmid contigs with transformer"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3664-9608","authenticated-orcid":false,"given":"Yongxin","family":"Ji","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR), China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5974-4985","authenticated-orcid":false,"given":"Jiayu","family":"Shang","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR), China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1304-6983","authenticated-orcid":false,"given":"Xubo","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR), China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1373-8023","authenticated-orcid":false,"given":"Yanni","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR), China"}]}],"member":"286","published-online":{"date-parts":[[2023,4,22]]},"reference":[{"key":"2023050421471450400_btad283-B1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-16282-w","article-title":"Large-scale network analysis captures biological features of bacterial plasmids","volume":"11","author":"Acman","year":"2020","journal-title":"Nat Commun"},{"key":"2023050421471450400_btad283-B2","doi-asserted-by":"crossref","first-page":"e01180\u201321","DOI":"10.1128\/msystems.01180-21","article-title":"Plasmidhostfinder: prediction of plasmid hosts using random Forest","volume":"7","author":"Aytan-Aktug","year":"2022","journal-title":"Msystems"},{"key":"2023050421471450400_btad283-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1875-0","article-title":"Dashing: fast and accurate genomic distances with hyperloglog","volume":"20","author":"Baker","year":"2019","journal-title":"Genome Biol"},{"key":"2023050421471450400_btad283-B4","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nbt.4037","article-title":"Metagenomic binning and association of plasmids with bacterial host genomes using dna methylation","volume":"36","author":"Beaulaurier","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2023050421471450400_btad283-B5","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nmeth.3176","article-title":"Fast and sensitive protein alignment using diamond","volume":"12","author":"Buchfink","year":"2015","journal-title":"Nat Methods"},{"key":"2023050421471450400_btad283-B6","doi-asserted-by":"crossref","first-page":"3895","DOI":"10.1128\/AAC.02412-14","article-title":"In silico detection and typing of plasmids using plasmidfinder and plasmid multilocus sequence typing","volume":"58","author":"Carattoli","year":"2014","journal-title":"Antimicrob Agents Chemother"},{"key":"2023050421471450400_btad283-B7","doi-asserted-by":"crossref","first-page":"483","DOI":"10.3389\/fmicb.2020.00483","article-title":"Analysis of COMPASS, a new comprehensive plasmid database revealed prevalence of multireplicon and extensive diversity of IncF plasmids","volume":"11","author":"Douarre","year":"2020","journal-title":"Front Microbiol"},{"key":"2023050421471450400_btad283-B8","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/nar\/30.7.1575","article-title":"An efficient algorithm for large-scale detection of protein families","volume":"30","author":"Enright","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023050421471450400_btad283-B9","first-page":"1050","author":"Gal","year":"2016"},{"key":"2023050421471450400_btad283-B10","doi-asserted-by":"crossref","first-page":"D195","DOI":"10.1093\/nar\/gky1050","article-title":"PLSDB: a resource of complete bacterial plasmids","volume":"47","author":"Galata","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023050421471450400_btad283-B11","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1007\/978-1-4939-9877-7_21","article-title":"MOBscan: automated annotation of MOB relaxases","volume":"2075","author":"Garcill\u00e1n-Barcia","year":"2020","journal-title":"Methods Mol Biol"},{"key":"2023050421471450400_btad283-B12","doi-asserted-by":"crossref","first-page":"1635","DOI":"10.1093\/molbev\/msw046","article-title":"Ete 3: reconstruction, analysis, and visualization of phylogenomic data","volume":"33","author":"Huerta-Cepas","year":"2016","journal-title":"Mol Biol Evol"},{"key":"2023050421471450400_btad283-B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-11-119","article-title":"Prodigal: prokaryotic gene recognition and translation initiation site identification","volume":"11","author":"Hyatt","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023050421471450400_btad283-B14","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1038\/255504a0","article-title":"Plasmid rp4 as a vector replicon in genetic engineering","volume":"255","author":"Jacob","year":"1975","journal-title":"Nature"},{"key":"2023050421471450400_btad283-B15","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1111\/1574-6968.12241","article-title":"Broad host range plasmids","volume":"348","author":"Jain","year":"2013","journal-title":"FEMS Microbiol Lett"},{"key":"2023050421471450400_btad283-B16","volume-title":"Entrez programming utilities help [internet]","author":"Kans","year":"2022"},{"key":"2023050421471450400_btad283-B17","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1038\/ismej.2014.191","article-title":"Broad host range plasmids can invade an unexpectedly diverse fraction of a soil bacterial community","volume":"9","author":"Kl\u00fcmper","year":"2015","journal-title":"ISME J"},{"key":"2023050421471450400_btad283-B18","doi-asserted-by":"crossref","first-page":"e35","DOI":"10.1093\/nar\/gkx1321","article-title":"Plasflow: predicting plasmid sequences in metagenomic data using genome signatures","volume":"46","author":"Krawczyk","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023050421471450400_btad283-B19","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1152\/physrev.1952.32.4.403","article-title":"Cell genetics and hereditary symbiosis","volume":"32","author":"Lederberg","year":"1952","journal-title":"Physiol Rev"},{"key":"2023050421471450400_btad283-B20","doi-asserted-by":"crossref","first-page":"1674","DOI":"10.1093\/bioinformatics\/btv033","article-title":"Megahit: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph","volume":"31","author":"Li","year":"2015","journal-title":"Bioinformatics"},{"key":"2023050421471450400_btad283-B21","doi-asserted-by":"crossref","first-page":"mgen000436","DOI":"10.1099\/mgen.0.000436","article-title":"Metagenome-assembled genome binning methods with short reads disproportionately fail for plasmids and genomic islands","volume":"6","author":"Maguire","year":"2020","journal-title":"Microbial Genomics"},{"key":"2023050421471450400_btad283-B22","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1038\/s41592-022-01431-4","article-title":"Critical assessment of metagenome interpretation: the second round of challenges","volume":"19","author":"Meyer","year":"2022","journal-title":"Nat Methods"},{"key":"2023050421471450400_btad283-B23","doi-asserted-by":"crossref","first-page":"e121\u2013e121","DOI":"10.1093\/nar\/gkt263","article-title":"Challenges in homology search: hmmer3 and convergent evolution of coiled-coil regions","volume":"41","author":"Mistry","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023050421471450400_btad283-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using minhash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2023050421471450400_btad283-B25","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1111\/j.1574-6941.1996.tb00304.x","article-title":"Monitoring the spread of broad host and narrow host range plasmids in soil microcosms","volume":"20","author":"Pukall","year":"1996","journal-title":"FEMS Microbiol Ecol"},{"key":"2023050421471450400_btad283-B26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/hdy.2010.24","article-title":"What traits are carried on mobile genetic elements, and why?","volume":"106","author":"Rankin","year":"2011","journal-title":"Heredity (Edinb)"},{"key":"2023050421471450400_btad283-B27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-17278-2","article-title":"Pathways for horizontal gene transfer in bacteria revealed by a global map of their plasmids","volume":"11","author":"Redondo-Salvo","year":"2020","journal-title":"Nat Commun"},{"key":"2023050421471450400_btad283-B28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-021-04299-x","article-title":"Copla, a taxonomic classifier of plasmids","volume":"22","author":"Redondo-Salvo","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2023050421471450400_btad283-B29","doi-asserted-by":"crossref","first-page":"e000206","DOI":"10.1099\/mgen.0.000206","article-title":"Mob-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies","volume":"4","author":"Robertson","year":"2018","journal-title":"Microb Genomics"},{"key":"2023050421471450400_btad283-B30","doi-asserted-by":"crossref","first-page":"mgen000435","DOI":"10.1099\/mgen.0.000435","article-title":"Universal whole-sequence-based plasmid typing and its utility to prediction of host range and epidemiological surveillance","volume":"6","author":"Robertson","year":"2020","journal-title":"Microb Genomics"},{"key":"2023050421471450400_btad283-B31","doi-asserted-by":"crossref","first-page":"mgen000398","DOI":"10.1099\/mgen.0.000398","article-title":"Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores","volume":"6","author":"Schwengers","year":"2020","journal-title":"Microb Genomics"},{"key":"2023050421471450400_btad283-B32","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1007\/978-981-13-3411-5_6","volume-title":"DNA Traffic in the Environment","author":"Shintani","year":"2019"},{"key":"2023050421471450400_btad283-B33","doi-asserted-by":"crossref","first-page":"242","DOI":"10.3389\/fmicb.2015.00242","article-title":"Genomics of microbial plasmids: classification and identification based on replication and transfer systems and host taxonomy","volume":"6","author":"Shintani","year":"2015","journal-title":"Front Microbiol"},{"key":"2023050421471450400_btad283-B34","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1111\/j.1574-6968.2011.02432.x","article-title":"Mobilizable narrow host range plasmids as natural suicide vectors enabling horizontal gene transfer among distantly related bacterial species","volume":"326","author":"Smorawinska","year":"2012","journal-title":"FEMS Microbiol Lett"},{"key":"2023050421471450400_btad283-B35","doi-asserted-by":"crossref","first-page":"2437","DOI":"10.1038\/s41396-019-0446-4","article-title":"Linking the resistome and plasmidome to the microbiome","volume":"13","author":"Stalder","year":"2019","journal-title":"ISME J"},{"key":"2023050421471450400_btad283-B36","doi-asserted-by":"crossref","first-page":"e67","DOI":"10.1093\/nar\/gku138","article-title":"Strain\/species identification in metagenomes using genome-specific markers","volume":"42","author":"Tu","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023050421471450400_btad283-B37","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023050421471450400_btad283-B38","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13059-019-1891-0","article-title":"Improved metagenomic analysis with kraken 2","volume":"20","author":"Wood","year":"2019","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad283\/50066637\/btad283.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/5\/btad283\/50204890\/btad283.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/5\/btad283\/50204890\/btad283.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,4]],"date-time":"2023-05-04T21:48:10Z","timestamp":1683236890000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad283\/7136643"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,4,22]]},"references-count":38,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,5,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad283","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,5,1]]},"published":{"date-parts":[[2023,4,22]]},"article-number":"btad283"}}