{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T23:37:43Z","timestamp":1774568263029,"version":"3.50.1"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2023,11,23]],"date-time":"2023-11-23T00:00:00Z","timestamp":1700697600000},"content-version":"vor","delay-in-days":22,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Shanghai Municipal Science and Technology","award":["2017SHZDZX01"],"award-info":[{"award-number":["2017SHZDZX01"]}]},{"name":"Shanghai Municipal Science and Technology","award":["20692191500"],"award-info":[{"award-number":["20692191500"]}]},{"name":"Open Research Fund of Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE"},{"name":"Key Laboratory of MEA"},{"DOI":"10.13039\/100009122","name":"Ministry of Education","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100009122","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004106","name":"East China Normal University","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004106","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Antibiotic resistance presents a formidable global challenge to public health and the environment. While considerable endeavors have been dedicated to identify antibiotic resistance genes (ARGs) for assessing the threat of antibiotic resistance, recent extensive investigations using metagenomic and metatranscriptomic approaches have unveiled a noteworthy concern. A significant fraction of proteins defies annotation through conventional sequence similarity-based methods, an issue that extends to ARGs, potentially leading to their under-recognition due to dissimilarities at the sequence level.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Herein, we proposed an Artificial Intelligence-powered ARG identification framework using a pretrained large protein language model, enabling ARG identification and resistance category classification simultaneously. The proposed PLM-ARG was developed based on the most comprehensive ARG and related resistance category information (&amp;gt;28K ARGs and associated 29 resistance categories), yielding Matthew\u2019s correlation coefficients (MCCs) of 0.983\u2009\u00b1\u20090.001 by using a 5-fold cross-validation strategy. Furthermore, the PLM-ARG model was verified using an independent validation set and achieved an MCC of 0.838, outperforming other publicly available ARG prediction tools with an improvement range of 51.8%\u2013107.9%. Moreover, the utility of the proposed PLM-ARG model was demonstrated by annotating resistance in the UniProt database and evaluating the impact of ARGs on the Earth's environmental microbiota.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>PLM-ARG is available for academic purposes at https:\/\/github.com\/Junwu302\/PLM-ARG, and a user-friendly webserver (http:\/\/www.unimd.org\/PLM-ARG) is also provided.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad690","type":"journal-article","created":{"date-parts":[[2023,11,23]],"date-time":"2023-11-23T01:03:00Z","timestamp":1700701380000},"source":"Crossref","is-referenced-by-count":32,"title":["PLM-ARG: antibiotic resistance gene identification using a pretrained protein language model"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3381-2561","authenticated-orcid":false,"given":"Jun","family":"Wu","sequence":"first","affiliation":[{"name":"Center for Bioinformatics and Computational Biology, and The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University , Shanghai 200241, China"}]},{"given":"Jian","family":"Ouyang","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics and Computational Biology, and The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University , Shanghai 200241, China"}]},{"given":"Haipeng","family":"Qin","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics and Computational Biology, and The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University , Shanghai 200241, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2696-2809","authenticated-orcid":false,"given":"Jiajia","family":"Zhou","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics and Computational Biology, and The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University , Shanghai 200241, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7763-7558","authenticated-orcid":false,"given":"Ruth","family":"Roberts","sequence":"additional","affiliation":[{"name":"ApconiX Ltd, Alderley Park , Alderley Edge SK10 4TG, United Kingdom"},{"name":"University of Birmingham , Birmingham B15 2TT, United Kingdom"}]},{"given":"Rania","family":"Siam","sequence":"additional","affiliation":[{"name":"Biology Department, School of Sciences and Engineering, The American University in Cairo , New Cairo 11835, Egypt"}]},{"given":"Lan","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Architecture and Urban Planning, Tongji University , Shanghai 200092, China"}]},{"given":"Weida","family":"Tong","sequence":"additional","affiliation":[{"name":"National Center for Toxicological Research, Food and Drug Administration , Jefferson, AR 72079, United States"}]},{"given":"Zhichao","family":"Liu","sequence":"additional","affiliation":[{"name":"Nonclinical Drug Safety, Boehringer Ingelheim Pharmaceuticals, Inc , Ridgefield, CT 06877, United States"}]},{"given":"Tieliu","family":"Shi","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics and Computational Biology, and The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University , Shanghai 200241, China"},{"name":"School of Statistics, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, East China Normal University , Shanghai 200062, China"}]}],"member":"286","published-online":{"date-parts":[[2023,11,23]]},"reference":[{"key":"2023112609174748800_btad690-B1","first-page":"D517","article-title":"CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database","volume":"48","author":"Alcock","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023112609174748800_btad690-B2","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1038\/s41587-020-0603-3","article-title":"A unified catalog of 204,938 reference genomes from the human gut microbiome","volume":"39","author":"Almeida","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2023112609174748800_btad690-B3","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2023112609174748800_btad690-B4","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1186\/s40168-018-0401-z","article-title":"DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data","volume":"6","author":"Arango-Argoty","year":"2018","journal-title":"Microbiome"},{"key":"2023112609174748800_btad690-B5","doi-asserted-by":"crossref","first-page":"654","DOI":"10.1016\/j.cels.2021.05.017","article-title":"Learning the protein language: evolution, structure, and function","volume":"12","author":"Bepler","year":"2021","journal-title":"Cell Syst"},{"key":"2023112609174748800_btad690-B6","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nmeth.3176","article-title":"Fast and sensitive protein alignment using DIAMOND","volume":"12","author":"Buchfink","year":"2015","journal-title":"Nat Methods"},{"key":"2023112609174748800_btad690-B7","doi-asserted-by":"crossref","first-page":"1222","DOI":"10.1007\/s11427-021-1996-x","article-title":"Genomic and transcriptomic dissection of Theionarchaea in marine ecosystem","volume":"65","author":"Cai","year":"2022","journal-title":"Sci China Life Sci"},{"key":"2023112609174748800_btad690-B8","doi-asserted-by":"crossref","first-page":"2210","DOI":"10.1007\/s11427-020-1926-0","article-title":"Identification of antibiotic resistance genes and associated mobile genetic elements in permafrost","volume":"64","author":"Cao","year":"2021","journal-title":"Sci China Life Sci"},{"key":"2023112609174748800_btad690-B9","doi-asserted-by":"crossref","first-page":"14487","DOI":"10.1038\/s41598-019-50686-z","article-title":"Antimicrobial resistance prediction for gram-negative bacteria via game theory-based feature evaluation","volume":"9","author":"Chowdhury","year":"2019","journal-title":"Sci Rep"},{"key":"2023112609174748800_btad690-B10","doi-asserted-by":"crossref","first-page":"11033","DOI":"10.1038\/s41598-020-67949-9","article-title":"PARGT: a software tool for predicting antimicrobial resistance in bacteria","volume":"10","author":"Chowdhury","year":"2020","journal-title":"Sci Rep"},{"key":"2023112609174748800_btad690-B11","doi-asserted-by":"crossref","first-page":"3903","DOI":"10.2147\/IDR.S234610","article-title":"Antimicrobial resistance: implications and costs","volume":"12","author":"Dadgostar","year":"2019","journal-title":"Infect Drug Resist"},{"key":"2023112609174748800_btad690-B12","doi-asserted-by":"crossref","first-page":"3376","DOI":"10.1016\/j.cell.2021.05.002","article-title":"A global metagenomic map of urban microbiomes and antimicrobial resistance","volume":"184","author":"Danko","year":"2021","journal-title":"Cell"},{"key":"2023112609174748800_btad690-B18","author":"Drugs for Neglected Diseases Initiative"},{"key":"2023112609174748800_btad690-B13","doi-asserted-by":"crossref","first-page":"5634","DOI":"10.1038\/s41596-021-00628-9","article-title":"The trRosetta server for fast and accurate protein structure prediction","volume":"16","author":"Du","year":"2021","journal-title":"Nat Protoc"},{"key":"2023112609174748800_btad690-B14","doi-asserted-by":"crossref","first-page":"2435","DOI":"10.1038\/s41467-021-22757-1","article-title":"Forecasting the dissemination of antibiotic resistance genes across bacterial genomes","volume":"12","author":"Ellabaan","year":"2021","journal-title":"Nat Commun"},{"key":"2023112609174748800_btad690-B15","doi-asserted-by":"crossref","first-page":"e00483-19","DOI":"10.1128\/AAC.00483-19","article-title":"Validating the AMRFinder tool and resistance gene database by using antimicrobial resistance genotype\u2013phenotype correlations in a collection of isolates","volume":"63","author":"Feldgarden","year":"2019","journal-title":"Antimicrob Agents Chem"},{"key":"2023112609174748800_btad690-B16","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1038\/s42256-020-0207-0","article-title":"Clinical interpretation of an interpretable prognostic model for patients with COVID-19","volume":"3","author":"Giacobbe","year":"2020","journal-title":"Nat Mach Intell"},{"key":"2023112609174748800_btad690-B17","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/ismej.2014.106","article-title":"Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology","volume":"9","author":"Gibson","year":"2015","journal-title":"ISME J"},{"key":"2023112609174748800_btad690-B19","doi-asserted-by":"crossref","first-page":"D566","DOI":"10.1093\/nar\/gkw1004","article-title":"CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database","volume":"45","author":"Jia","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023112609174748800_btad690-B20","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023112609174748800_btad690-B21","doi-asserted-by":"crossref","first-page":"D574","DOI":"10.1093\/nar\/gkw1009","article-title":"MEGARes: an antimicrobial resistance database for high throughput sequencing","volume":"45","author":"Lakin","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023112609174748800_btad690-B22","doi-asserted-by":"crossref","first-page":"e2100916119","DOI":"10.1073\/pnas.2100916119","article-title":"The dynamic trophic architecture of open-ocean protist communities revealed through machine-guided metatranscriptomics","volume":"119","author":"Lambert","year":"2022","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023112609174748800_btad690-B23","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023112609174748800_btad690-B24","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1186\/s40168-021-01002-3","article-title":"HMD-ARG: hierarchical multi-task deep learning for annotating antibiotic resistance genes","volume":"9","author":"Li","year":"2021","journal-title":"Microbiome"},{"key":"2023112609174748800_btad690-B25","doi-asserted-by":"crossref","first-page":"2593","DOI":"10.1016\/j.drudis.2021.06.009","article-title":"AI-based language models powering drug discovery and development","volume":"26","author":"Liu","year":"2021","journal-title":"Drug Discov Today"},{"key":"2023112609174748800_btad690-B26","doi-asserted-by":"crossref","first-page":"3348","DOI":"10.1128\/AAC.00419-13","article-title":"The comprehensive antibiotic resistance database","volume":"57","author":"McArthur","year":"2013","journal-title":"Antimicrob Agents Chemother"},{"key":"2023112609174748800_btad690-B27","doi-asserted-by":"crossref","first-page":"325","DOI":"10.7196\/SAMJ.9644","article-title":"The World Health Organization global action plan for antimicrobial resistance","volume":"105","author":"Mendelson","year":"2015","journal-title":"S Afr Med J"},{"key":"2023112609174748800_btad690-B28","doi-asserted-by":"crossref","first-page":"e1006258","DOI":"10.1371\/journal.pcbi.1006258","article-title":"Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data","volume":"14","author":"Moradigaravand","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2023112609174748800_btad690-B29","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1016\/S0140-6736(21)02724-0","article-title":"Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis","volume":"399","author":"Murray","year":"2022","journal-title":"Lancet"},{"key":"2023112609174748800_btad690-B30","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1016\/j.csbj.2021.03.022","article-title":"The language of proteins: NLP, machine learning & protein sequences","volume":"19","author":"Ofer","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2023112609174748800_btad690-B31","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2023112609174748800_btad690-B32","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023112609174748800_btad690-B33","doi-asserted-by":"crossref","first-page":"1976","DOI":"10.1016\/S0140-6736(18)31117-6","article-title":"Global governance of antimicrobial resistance","volume":"391","author":"Rochford","year":"2018","journal-title":"Lancet"},{"key":"2023112609174748800_btad690-B34","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1038\/s41564-018-0292-6","article-title":"Prediction of the intestinal resistome by a three-dimensional structure-based method","volume":"4","author":"Rupp\u00e9","year":"2019","journal-title":"Nat Microbiol"},{"key":"2023112609174748800_btad690-B35","first-page":"84","volume-title":"Information Fusion","author":"Shwartz-Ziv"},{"key":"2023112609174748800_btad690-B36","doi-asserted-by":"crossref","first-page":"662","DOI":"10.1377\/hlthaff.2017.1153","article-title":"Antibiotic-resistant infection treatment costs have doubled since 2002, now exceeding $2 billion annually","volume":"37","author":"Thorpe","year":"2018","journal-title":"Health Aff (Millwood)"},{"key":"2023112609174748800_btad690-B37","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1038\/s42256-022-00457-9","article-title":"Learning functional properties of proteins with language models","volume":"4","author":"Unsal","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2023112609174748800_btad690-B38","doi-asserted-by":"crossref","first-page":"128048","DOI":"10.1016\/j.ufug.2023.128048","article-title":"The effect of greenness on ESKAPE pathogen reduction and its heterogeneity across global climate zones and urbanization gradient","volume":"87","author":"Wang","year":"2023","journal-title":"Urban Urban Gree"},{"key":"2023112609174748800_btad690-B39","doi-asserted-by":"crossref","first-page":"3574","DOI":"10.1093\/bioinformatics\/btac351","article-title":"Prior knowledge facilitates low homologous protein secondary structure prediction with DSM distillation","volume":"38","author":"Wang","year":"2022","journal-title":"Bioinformatics"},{"key":"2023112609174748800_btad690-B40","volume-title":"Global Antimicrobial Resistance Surveillance System (GLASS): The Detection and Reporting of Colistin Resistance","author":"World Health Organization","year":"2018"},{"key":"2023112609174748800_btad690-B41","doi-asserted-by":"crossref","first-page":"112183","DOI":"10.1016\/j.envres.2021.112183","article-title":"Annotating unknown species of urban microorganisms on a global scale unveils novel functional diversity and local environment association","volume":"207","author":"Wu","year":"2022","journal-title":"Environ Res"},{"key":"2023112609174748800_btad690-B42","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13073-021-00945-4","article-title":"X-CNV: genome-wide prediction of the pathogenicity of copy number variations","volume":"13","author":"Zhang","year":"2021","journal-title":"Genome Med"},{"key":"2023112609174748800_btad690-B43","doi-asserted-by":"crossref","first-page":"1547","DOI":"10.1007\/s11427-021-2037-x","article-title":"Genomic insights into versatile lifestyle of three new bacterial candidate phyla","volume":"65","author":"Zhang","year":"2022","journal-title":"Sci China Life Sci"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad690\/53703108\/btad690.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad690\/53811978\/btad690.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad690\/53811978\/btad690.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,26]],"date-time":"2023-11-26T09:18:33Z","timestamp":1700990313000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad690\/7443986"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,11,1]]},"references-count":43,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad690","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,11,1]]},"article-number":"btad690"}}