{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T07:07:54Z","timestamp":1771484874458,"version":"3.50.1"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T00:00:00Z","timestamp":1721001600000},"content-version":"vor","delay-in-days":53,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Center of Synthetic Biology and Integrated Bioengineering","award":["WU2022A008"],"award-info":[{"award-number":["WU2022A008"]}]},{"name":"Research Center for Industries of the Future","award":["WU2022C030"],"award-info":[{"award-number":["WU2022C030"]}]},{"name":"Research Center for Industries of the Future","award":["WU2023C019"],"award-info":[{"award-number":["WU2023C019"]}]},{"name":"\u2018Pioneer\u2019 and \u2018Leading Goose\u2019 Key R&D Program of Zhejiang","award":["2024SSYS0032"],"award-info":[{"award-number":["2024SSYS0032"]}]},{"name":"Zhejiang Provincial Natural Science Foundation of China","award":["LR22D010001"],"award-info":[{"award-number":["LR22D010001"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,5,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>High-throughput DNA sequencing technologies decode tremendous amounts of microbial protein-coding gene sequences. However, accurately assigning protein functions to novel gene sequences remain a challenge. To this end, we developed FunGeneTyper, an extensible framework with two new deep learning models (i.e., FunTrans and FunRep), structured databases, and supporting resources for achieving highly accurate (Accuracy\u2009&amp;gt;\u20090.99, F1-score\u2009&amp;gt;\u20090.97) and fine-grained classification of antibiotic resistance genes (ARGs) and virulence factor genes. Using an experimentally confirmed dataset of ARGs comprising remote homologous sequences as the test set, our framework achieves by-far-the-best performance in the discovery of new ARGs from human gut (F1-score: 0.6948), wastewater (0.6072), and soil (0.5445) microbiomes, beating the state-of-the-art bioinformatics tools and sequence alignment-based (F1-score: 0.0556\u20130.5065) and domain-based (F1-score: 0.2630\u20130.5224) annotation approaches. Furthermore, our framework is implemented as a lightweight, privacy-preserving, and plug-and-play neural network module, facilitating its versatility and accessibility to developers and users worldwide. We anticipate widespread utilization of FunGeneTyper (https:\/\/github.com\/emblab-westlake\/FunGeneTyper) for precise classification of protein-coding gene functions and the discovery of numerous valuable enzymes. This advancement will have a significant impact on various fields, including microbiome research, biotechnology, metagenomics, and bioinformatics.<\/jats:p>","DOI":"10.1093\/bib\/bbae319","type":"journal-article","created":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T11:14:54Z","timestamp":1721042094000},"source":"Crossref","is-referenced-by-count":4,"title":["Highly accurate classification and discovery of microbial protein-coding gene functions using FunGeneTyper: an extensible deep learning framework"],"prefix":"10.1093","volume":"25","author":[{"given":"Guoqing","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Environmental and Resource Sciences, Zhejiang University , Hangzhou, Zhejiang 310058 , China"},{"name":"Key Laboratory of Coastal Environment and Resources of Zhejiang Province , School of Engineering, , Hangzhou, Zhejiang 310030 , China"},{"name":"Westlake University , School of Engineering, , Hangzhou, Zhejiang 310030 , China"},{"name":"Center of Synthetic Biology and Integrated Bioengineering, Westlake University , Hangzhou, Zhejiang 310030 , China"}]},{"given":"Hui","family":"Wang","sequence":"additional","affiliation":[{"name":"Representation Learning Laboratory , School of Engineering, , Hangzhou, Zhejiang 310030 , China"},{"name":"Westlake University , School of Engineering, , Hangzhou, Zhejiang 310030 , China"}]},{"given":"Zhiguo","family":"Zhang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Coastal Environment and Resources of Zhejiang Province , School of Engineering, , Hangzhou, Zhejiang 310030 , China"},{"name":"Westlake University , School of Engineering, , Hangzhou, Zhejiang 310030 , China"}]},{"given":"Lu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Coastal Environment and Resources of Zhejiang Province , School of Engineering, , Hangzhou, Zhejiang 310030 , China"},{"name":"Westlake University , School of Engineering, , Hangzhou, Zhejiang 310030 , China"}]},{"given":"Guibing","family":"Guo","sequence":"additional","affiliation":[{"name":"Software College, Northeastern University , Shenyang, Liaoning 110169 , China"}]},{"given":"Jian","family":"Yang","sequence":"additional","affiliation":[{"name":"Westlake Laboratory of Life Sciences and Biomedicine , School of Life Sciences, , Hangzhou, Zhejiang 310024 , China"},{"name":"Westlake University , School of Life Sciences, , Hangzhou, Zhejiang 310024 , China"}]},{"given":"Fajie","family":"Yuan","sequence":"additional","affiliation":[{"name":"Representation Learning Laboratory , School of Engineering, , Hangzhou, Zhejiang 310030 , China"},{"name":"Westlake University , School of Engineering, , Hangzhou, Zhejiang 310030 , China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4137-5928","authenticated-orcid":false,"given":"Feng","family":"Ju","sequence":"additional","affiliation":[{"name":"Key Laboratory of Coastal Environment and Resources of Zhejiang Province , School of Engineering, , Hangzhou, Zhejiang 310030 , China"},{"name":"Westlake University , School of Engineering, , Hangzhou, Zhejiang 310030 , China"},{"name":"Center of Synthetic Biology and Integrated Bioengineering, Westlake University , Hangzhou, Zhejiang 310030 , China"},{"name":"Westlake Laboratory of Life Sciences and Biomedicine , School of Life Sciences, , Hangzhou, Zhejiang 310024 , China"},{"name":"Westlake University , School of Life Sciences, , Hangzhou, Zhejiang 310024 , China"}]}],"member":"286","published-online":{"date-parts":[[2024,7,15]]},"reference":[{"key":"2024071511143423300_ref1","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1038\/s41396-018-0277-8","article-title":"Wastewater treatment plant resistomes are shaped by bacterial composition, genetic exchange, and upregulated expression in the effluent microbiomes","volume":"13","author":"Ju","year":"2019","journal-title":"ISME J"},{"key":"2024071511143423300_ref2","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1016\/j.cell.2019.01.001","article-title":"Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle","volume":"176","author":"Pasolli","year":"2019","journal-title":"Cell"},{"key":"2024071511143423300_ref3","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1038\/s41587-020-0718-6","article-title":"A genomic catalog of Earth\u2019s microbiomes","volume":"39","author":"Nayfach","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2024071511143423300_ref4","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2024071511143423300_ref5","doi-asserted-by":"crossref","first-page":"2460","DOI":"10.1093\/bioinformatics\/btq461","article-title":"Search and clustering orders of magnitude faster than BLAST","volume":"26","author":"Edgar","year":"2010","journal-title":"Bioinformatics"},{"key":"2024071511143423300_ref6","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nmeth.3176","article-title":"Fast and sensitive protein alignment using DIAMOND","volume":"12","author":"Buchfink","year":"2015","journal-title":"Nat Methods"},{"key":"2024071511143423300_ref7","first-page":"D517","article-title":"CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database","volume":"48","author":"Alcock","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2024071511143423300_ref8","doi-asserted-by":"crossref","first-page":"2346","DOI":"10.1093\/bioinformatics\/btw136","article-title":"ARGs-OAP: online analysis pipeline for antibiotic resistance genes detection from metagenomic data using an integrated structured ARG-database","volume":"32","author":"Yang","year":"2016","journal-title":"Bioinformatics"},{"key":"2024071511143423300_ref9","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1186\/s40168-018-0401-z","article-title":"DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data","volume":"6","author":"Arango-Argoty","year":"2018","journal-title":"Microbiome"},{"key":"2024071511143423300_ref10","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1186\/s40168-020-00993-9","article-title":"PathoFact: a pipeline for the prediction of virulence factors and antimicrobial resistance genes in metagenomic data","volume":"9","author":"Nies","year":"2021","journal-title":"Microbiome"},{"key":"2024071511143423300_ref11","doi-asserted-by":"crossref","first-page":"8452","DOI":"10.1038\/ncomms9452","article-title":"Limited dissemination of the wastewater treatment plant core resistome","volume":"6","author":"Munck","year":"2015","journal-title":"Nat Commun"},{"key":"2024071511143423300_ref12","doi-asserted-by":"crossref","first-page":"612","DOI":"10.1038\/nature13377","article-title":"Bacterial phylogeny structures soil resistomes across habitats","volume":"509","author":"Forsberg","year":"2014","journal-title":"Nature"},{"key":"2024071511143423300_ref13","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1038\/s43705-022-00176-7","article-title":"Novel bacterial taxa in a minimal lignocellulolytic consortium and their potential for lignin and plastics transformation","volume":"2","author":"D\u00edaz Rodr\u00edguez","year":"2022","journal-title":"ISME Communications"},{"key":"2024071511143423300_ref14","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1038\/s41564-021-00979-9","article-title":"Compendium of 530 metagenome-assembled bacterial and archaeal genomes from the polar Arctic Ocean","volume":"6","author":"Royo-Llonch","year":"2021","journal-title":"Nat Microbiol"},{"key":"2024071511143423300_ref15","doi-asserted-by":"crossref","DOI":"10.1128\/AAC.00483-19","article-title":"Validating the AMRFinder tool and resistance gene database by using antimicrobial resistance genotype-phenotype correlations in a collection of isolates","volume":"63","author":"Feldgarden","year":"2019","journal-title":"Antimicrob Agents Chemother"},{"key":"2024071511143423300_ref16","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1186\/s13104-021-05531-w","article-title":"Hidden Markov model: a shortest unique representative approach to detect the protein toxins, virulence factors and antibiotic resistance genes","volume":"14","author":"Xie","year":"2021","journal-title":"BMC Res Notes"},{"key":"2024071511143423300_ref17","first-page":"356","article-title":"Sequencing-based methods and resources to study antimicrobial resistance","volume":"20","author":"Boolchandani","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2024071511143423300_ref18","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/ismej.2014.106","article-title":"Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology","volume":"9","author":"Gibson","year":"2015","journal-title":"ISME J"},{"key":"2024071511143423300_ref19","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/s41592-021-01100-y","article-title":"Low-N protein engineering with data-efficient deep learning","volume":"18","author":"Biswas","year":"2021","journal-title":"Nat Methods"},{"key":"2024071511143423300_ref20","doi-asserted-by":"crossref","first-page":"921","DOI":"10.1038\/s41587-022-01226-0","article-title":"Identification of antimicrobial peptides from the human gut microbiome using deep learning","volume":"40","author":"Ma","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2024071511143423300_ref21","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat Methods"},{"key":"2024071511143423300_ref22","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024071511143423300_ref23","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1016\/j.csbj.2021.03.022","article-title":"The language of proteins: NLP, machine learning & protein sequences","volume":"19","author":"Ofer","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2024071511143423300_ref24","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1038\/s42256-022-00457-9","article-title":"Learning functional properties of proteins with language models","volume":"4","author":"Unsal","year":"2022","journal-title":"Nature Machine Intelligence"},{"key":"2024071511143423300_ref25","doi-asserted-by":"crossref","first-page":"932","DOI":"10.1038\/s41587-021-01179-w","article-title":"Using deep learning to annotate the protein universe","volume":"40","author":"Bileschi","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2024071511143423300_ref26","volume-title":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21)","author":"Dohan"},{"key":"2024071511143423300_ref27","doi-asserted-by":"crossref","article-title":"Transformer protein language models are unsupervised structure learners","author":"Rao","DOI":"10.1101\/2020.12.15.422761"},{"key":"2024071511143423300_ref28","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin"},{"key":"2024071511143423300_ref29","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Reimers"},{"key":"2024071511143423300_ref30","volume-title":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20)","author":"Yuan"},{"key":"2024071511143423300_ref31","volume-title":"PMLR","author":"Houlsby"},{"key":"2024071511143423300_ref32","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1016\/S0140-6736(21)02724-0","article-title":"Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis","volume":"399","author":"Murray","year":"2022","journal-title":"The Lancet"},{"key":"2024071511143423300_ref33","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1038\/s41579-018-0048-6","article-title":"Multidrug efflux pumps: structure, function and regulation","volume":"16","author":"Du","year":"2018","journal-title":"Nat Rev Microbiol"},{"key":"2024071511143423300_ref34","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1128\/CMR.19.2.382-402.2006","article-title":"Clinically relevant chromosomally encoded multidrug resistance efflux pumps in bacteria","volume":"19","author":"Piddock","year":"2006","journal-title":"Clin Microbiol Rev"},{"key":"2024071511143423300_ref35","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1186\/s40168-021-01002-3","article-title":"HMD-ARG: hierarchical multi-task deep learning for annotating antibiotic resistance genes","volume":"9","author":"Li","year":"2021","journal-title":"Microbiome"},{"key":"2024071511143423300_ref36","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1093\/protein\/12.2.85","article-title":"Twilight zone of protein sequence alignments","volume":"12","author":"Rost","year":"1999","journal-title":"Protein Eng"},{"key":"2024071511143423300_ref37","doi-asserted-by":"crossref","first-page":"654","DOI":"10.1016\/j.cels.2021.05.017","article-title":"Learning the protein language: evolution, structure, and function","volume":"12","author":"Bepler","year":"2021","journal-title":"Cell Syst"},{"key":"2024071511143423300_ref38","doi-asserted-by":"crossref","first-page":"1128","DOI":"10.1126\/science.1176950","article-title":"Functional characterization of the antibiotic resistance reservoir in the human microflora","volume":"325","author":"Sommer","year":"2009","journal-title":"Science"},{"key":"2024071511143423300_ref39","article-title":"Novel soil-derived Beta-lactam, chloramphenicol, Fosfomycin and trimethoprim resistance genes revealed by functional metagenomics","volume":"10","author":"Willms","year":"2021","journal-title":"Antibiotics (Basel)"},{"key":"2024071511143423300_ref40","doi-asserted-by":"crossref","first-page":"1406","DOI":"10.3389\/fmicb.2017.01406","article-title":"Tetracycline resistance genes identified from distinct soil environments in China by functional metagenomics","volume":"8","author":"Wang","year":"2017","journal-title":"Front Microbiol"},{"key":"2024071511143423300_ref41","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1038\/ismej.2008.86","article-title":"Functional metagenomics reveals diverse beta-lactamases in a remote Alaskan soil","volume":"3","author":"Allen","year":"2009","journal-title":"ISME J"},{"key":"2024071511143423300_ref42","doi-asserted-by":"crossref","first-page":"4396","DOI":"10.1128\/AEM.01763-09","article-title":"Metagenomic analysis of apple orchard soil reveals antibiotic resistance genes encoding predicted bifunctional proteins","volume":"76","author":"Donato","year":"2010","journal-title":"Appl Environ Microbiol"},{"key":"2024071511143423300_ref43","doi-asserted-by":"crossref","first-page":"3693","DOI":"10.1093\/bioinformatics\/btaa230","article-title":"Learning transferable deep convolutional neural networks for the classification of bacterial virulence factors","volume":"36","author":"Zheng","year":"2020","journal-title":"Bioinformatics"},{"key":"2024071511143423300_ref44","doi-asserted-by":"crossref","first-page":"1634","DOI":"10.1038\/s41598-018-37647-8","article-title":"Bioinformatic discovery of a toxin family in Chryseobacterium piperi with sequence similarity to botulinum neurotoxins","volume":"9","author":"Mansfield","year":"2019","journal-title":"Sci Rep"},{"key":"2024071511143423300_ref45"},{"key":"2024071511143423300_ref46","doi-asserted-by":"crossref","first-page":"lqab066","DOI":"10.1093\/nargab\/lqab066","article-title":"ARG-SHINE: improve antibiotic resistance class prediction by integrating sequence homology, functional information and deep convolutional neural network","volume":"3","author":"Wang","year":"2021","journal-title":"NAR Genom Bioinform"},{"key":"2024071511143423300_ref47","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Chen"},{"key":"2024071511143423300_ref48","doi-asserted-by":"crossref","first-page":"1358","DOI":"10.1126\/science.adf2465","article-title":"Enzyme function prediction using contrastive learning","volume":"379","author":"Yu","year":"2023","journal-title":"Science"},{"key":"2024071511143423300_ref49","doi-asserted-by":"crossref","article-title":"Ultra-accurate classification and discovery of functional protein-coding genes from microbiomes using FunGeneTyper: an expandable deep learning-based framework","author":"Zhang","DOI":"10.1101\/2022.12.28.522150"},{"key":"2024071511143423300_ref50","doi-asserted-by":"crossref","first-page":"291","DOI":"10.3389\/fmicb.2013.00291","article-title":"FunGene: the functional gene pipeline and repository","volume":"4","author":"Fish","year":"2013","journal-title":"Front Microbiol"},{"key":"2024071511143423300_ref51","doi-asserted-by":"crossref","first-page":"3181","DOI":"10.1021\/acssynbio.0c00558","article-title":"Engineering microbiomes-looking ahead","volume":"9","author":"Lee","year":"2020","journal-title":"ACS Synth Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/4\/bbae319\/58544246\/bbae319.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/4\/bbae319\/58544246\/bbae319.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T11:15:30Z","timestamp":1721042130000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae319\/7713721"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,23]]},"references-count":51,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,5,23]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae319","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,5,23]]},"article-number":"bbae319"}}