{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T16:13:25Z","timestamp":1775664805405,"version":"3.50.1"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2021,10,15]],"date-time":"2021-10-15T00:00:00Z","timestamp":1634256000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100003399","name":"Science and Technology Commission of Shanghai Municipality","doi-asserted-by":"publisher","award":["19JC1413000"],"award-info":[{"award-number":["19JC1413000"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003399","name":"Science and Technology Commission of Shanghai Municipality","doi-asserted-by":"publisher","award":["19430750600"],"award-info":[{"award-number":["19430750600"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32070572"],"award-info":[{"award-number":["32070572"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Medicine and Engineering Interdisciplinary Research Fund of Shanghai Jiao Tong University","award":["19X190020171"],"award-info":[{"award-number":["19X190020171"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,17]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Bacterial type IV secretion systems (T4SSs) are versatile and membrane-spanning apparatuses, which mediate both genetic exchange and delivery of effector proteins to target eukaryotic cells. The secreted effectors (T4SEs) can affect gene expression and signal transduction of the host cells. As such, they often function as virulence factors and play an important role in bacterial pathogenesis. Nowadays, T4SE prediction tools have utilized various machine learning algorithms, but the accuracy and speed of these tools remain to be improved. In this study, we apply a sequence embedding strategy from a pre-trained language model of protein sequences (TAPE) to the classification task of T4SEs. The training dataset is mainly derived from our updated type IV secretion system database SecReT4 with newly experimentally verified T4SEs. An online web server termed T4SEfinder is developed using TAPE and a multi-layer perceptron (MLP) for T4SE prediction after a comprehensive performance comparison with several candidate models, which achieves a slightly higher level of accuracy than the existing prediction tools. It only takes about 3\u00a0minutes to make a classification for 5000 protein sequences by T4SEfinder so that the computational speed is qualified for whole genome-scale T4SEs detection in pathogenic bacteria. T4SEfinder might contribute to meet the increasing demands of re-annotating secretion systems and effector proteins in sequenced bacterial genomes. T4SEfinder is freely accessible at https:\/\/tool2-mml.sjtu.edu.cn\/T4SEfinder_TAPE\/.<\/jats:p>","DOI":"10.1093\/bib\/bbab420","type":"journal-article","created":{"date-parts":[[2021,9,14]],"date-time":"2021-09-14T11:11:44Z","timestamp":1631617904000},"source":"Crossref","is-referenced-by-count":26,"title":["T4SEfinder: a bioinformatics tool for genome-scale prediction of bacterial type IV secreted effectors using pre-trained protein language model"],"prefix":"10.1093","volume":"23","author":[{"given":"Yumeng","family":"Zhang","sequence":"first","affiliation":[{"name":"State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, China"}]},{"given":"Yangming","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2910-6725","authenticated-orcid":false,"given":"Yi","family":"Xiong","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, China"}]},{"given":"Hui","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Pathogens and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China"}]},{"given":"Zixin","family":"Deng","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8031-9086","authenticated-orcid":false,"given":"Jiangning","family":"Song","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9439-1660","authenticated-orcid":false,"given":"Hong-Yu","family":"Ou","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, China"},{"name":"Shanghai Key Laboratory of Veterinary Biotechnology, Shanghai Jiao Tong University, Shanghai 16 200240, China"}]}],"member":"286","published-online":{"date-parts":[[2021,10,15]]},"reference":[{"key":"2022011921220741800_ref1","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1111\/mmi.13896","article-title":"Type IV secretion in gram-negative and gram-positive bacteria","volume":"107","author":"Grohmann","year":"2018","journal-title":"Mol Microbiol"},{"key":"2022011921220741800_ref2","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1038\/nrmicro753","article-title":"The versatile bacterial type IV secretion systems","volume":"1","author":"Cascales","year":"2003","journal-title":"Nat Rev Microbiol"},{"key":"2022011921220741800_ref3","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1038\/nrmicro2382","article-title":"Integrative and conjugative elements: mosaic mobile genetic elements enabling dynamic lateral gene flow","volume":"8","author":"Wozniak","year":"2010","journal-title":"Nat Rev Microbiol"},{"key":"2022011921220741800_ref4","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1128\/MMBR.00023-09","article-title":"Biological diversity of prokaryotic type IV secretion systems","volume":"73","author":"Alvarez-Martinez","year":"2009","journal-title":"Microbiol Mol Biol Rev"},{"key":"2022011921220741800_ref5","doi-asserted-by":"crossref","first-page":"1203","DOI":"10.1111\/j.1462-5822.2010.01499.x","article-title":"Type IV secretion systems: versatility and diversity in function","volume":"12","author":"Wallden","year":"2010","journal-title":"Cell Microbiol"},{"key":"2022011921220741800_ref6","doi-asserted-by":"crossref","first-page":"450","DOI":"10.1016\/j.tim.2016.02.003","article-title":"Subversion of retrograde trafficking by translocated pathogen effectors","volume":"24","author":"Personnic","year":"2016","journal-title":"Trends Microbiol"},{"key":"2022011921220741800_ref7","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1146\/annurev-micro-102215-095557","article-title":"Autophagy evasion and endoplasmic reticulum subversion: the yin and Yang of legionella intracellular infection","volume":"70","author":"Sherwood","year":"2016","journal-title":"Annu Rev Microbiol"},{"key":"2022011921220741800_ref8","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1016\/j.mib.2020.04.002","article-title":"Mapping bacterial effector arsenals: in vivo and in silico approaches to defining the protein features dictating effector secretion by bacteria","volume":"57","author":"Lee","year":"2020","journal-title":"Curr Opin Microbiol"},{"key":"2022011921220741800_ref9","doi-asserted-by":"crossref","first-page":"3135","DOI":"10.1093\/bioinformatics\/btt554","article-title":"Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles","volume":"29","author":"Zou","year":"2013","journal-title":"Bioinformatics"},{"key":"2022011921220741800_ref10","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1186\/1471-2164-15-50","article-title":"Prediction of bacterial type IV secreted effectors by C-terminal features","volume":"15","author":"Wang","year":"2014","journal-title":"BMC Genomics"},{"key":"2022011921220741800_ref11","first-page":"148","article-title":"Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI","volume":"19","author":"An","year":"2018","journal-title":"Brief Bioinform"},{"key":"2022011921220741800_ref12","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.chemolab.2018.11.002","article-title":"A deep learning framework for sequence-based bacteria type IV secreted effectors prediction","volume":"183","author":"Xue","year":"2018","journal-title":"Chemometrics Intellig Lab Syst"},{"key":"2022011921220741800_ref13","doi-asserted-by":"crossref","first-page":"2571","DOI":"10.3389\/fmicb.2018.02571","article-title":"PredT4SE-stack: prediction of bacterial type IV secreted effectors from protein sequences using a stacked ensemble method","volume":"9","author":"Xiong","year":"2018","journal-title":"Front Microbiol"},{"key":"2022011921220741800_ref14","doi-asserted-by":"crossref","first-page":"1391","DOI":"10.3389\/fmicb.2019.01391","article-title":"Prediction of T4SS effector proteins for Anaplasma phagocytophilum using OPT4e","volume":"10","author":"Esna Ashari","year":"2019","journal-title":"A New Software Tool Front Microbiol"},{"key":"2022011921220741800_ref15","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1093\/bib\/bbx164","article-title":"Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches","volume":"20","author":"Wang","year":"2019","journal-title":"Brief Bioinform"},{"key":"2022011921220741800_ref16","doi-asserted-by":"crossref","first-page":"1825","DOI":"10.1093\/bib\/bbz120","article-title":"Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery","volume":"21","author":"Hong","year":"2020","journal-title":"Brief Bioinform"},{"key":"2022011921220741800_ref17","doi-asserted-by":"crossref","first-page":"580382","DOI":"10.3389\/fmicb.2020.580382","article-title":"T4SE-XGB: interpretable sequence-based prediction of type IV secreted effectors using eXtreme gradient boosting algorithm","volume":"11","author":"Chen","year":"2020","journal-title":"Front Microbiol"},{"key":"2022011921220741800_ref18","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2022011921220741800_ref19","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach Learn"},{"key":"2022011921220741800_ref20","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2022011921220741800_ref21","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","article-title":"Deep learning in neural networks: an overview","volume":"61","author":"Schmidhuber","year":"2015","journal-title":"Neural Netw"},{"key":"2022011921220741800_ref22","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/bioinformatics\/btm098","article-title":"UniRef: comprehensive and non-redundant UniProt reference clusters","volume":"23","author":"Suzek","year":"2007","journal-title":"Bioinformatics"},{"key":"2022011921220741800_ref23","doi-asserted-by":"crossref","first-page":"e1900119","DOI":"10.1002\/pmic.201900119","article-title":"Protein function prediction: from traditional classifier to deep learning","volume":"19","author":"Lv","year":"2019","journal-title":"Proteomics"},{"key":"2022011921220741800_ref24","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.specom.2003.08.002","article-title":"Statistical language model adaptation: review and perspectives","volume":"42","author":"Bellegarda","year":"2004","journal-title":"Speech Commun"},{"key":"2022011921220741800_ref25","first-page":"03762","article-title":"Attention is all you need","volume":"1706","author":"Vaswani","year":"2017","journal-title":"arXiv preprint arXiv"},{"key":"2022011921220741800_ref26","first-page":"04805","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","volume":"1810","author":"Devlin","year":"2018","journal-title":"arXiv preprint arXiv"},{"key":"2022011921220741800_ref27","first-page":"06823","article-title":"Incorporating BERT into neural machine translation","volume":"2002","author":"Zhu","year":"2020","journal-title":"arXiv preprint arXiv"},{"key":"2022011921220741800_ref28","first-page":"11942","article-title":"ALBERT: a Lite BERT for self-supervised learning of language representations","volume":"1909","author":"Lan","year":"2019","journal-title":"arXiv preprint arXiv"},{"key":"2022011921220741800_ref29","first-page":"9689","article-title":"Evaluating protein transfer learning with TAPE","volume":"32","author":"Rao","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"key":"2022011921220741800_ref30","first-page":"05625","article-title":"Pre-training of deep bidirectional protein sequence representations with structural information","volume":"1912","author":"Min","year":"2019","journal-title":"arXiv preprint arXiv"},{"key":"2022011921220741800_ref31","first-page":"06225","article-title":"ProtTrans: towards cracking the language of Life\u2019s code through self-supervised deep learning and high performance computing","volume":"2007","author":"Elnaggar","year":"2020","journal-title":"arXiv preprint arXiv"},{"key":"2022011921220741800_ref32","doi-asserted-by":"publisher","DOI":"10.1101\/2020.12.15.422761","volume-title":"bioRxiv","author":"Rao","year":"2020"},{"key":"2022011921220741800_ref33","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2022011921220741800_ref34","doi-asserted-by":"publisher","DOI":"10.1101\/2021.02.12.430858","article-title":"MSA transformer","author":"Rao","year":"2021","journal-title":"bioRxiv"},{"key":"2022011921220741800_ref35","doi-asserted-by":"crossref","first-page":"D660","DOI":"10.1093\/nar\/gks1248","article-title":"SecReT4: a web-based bacterial type IV secretion system resource","volume":"41","author":"Bi","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2022011921220741800_ref36","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2022011921220741800_ref37","doi-asserted-by":"crossref","first-page":"1029","DOI":"10.1007\/s10822-017-0080-z","article-title":"Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini","volume":"31","author":"Wang","year":"2017","journal-title":"J Comput Aided Mol Des"},{"key":"2022011921220741800_ref38","doi-asserted-by":"crossref","first-page":"D506","DOI":"10.1093\/nar\/gky1049","article-title":"UniProt: a worldwide hub of protein knowledge","volume":"47","author":"UniProt Consortium","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022011921220741800_ref39","doi-asserted-by":"crossref","first-page":"9218","DOI":"10.1093\/nar\/gkt718","article-title":"Searching algorithm for type IV secretion system effectors 1.0: a tool for predicting type IV effectors and exploring their genomic context","volume":"41","author":"Meyer","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2022011921220741800_ref40","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1016\/S0140-6736(03)12659-1","article-title":"Genome sequence of Vibrio parahaemolyticus: a pathogenic mechanism distinct from that of V cholerae","volume":"361","author":"Makino","year":"2003","journal-title":"Lancet"},{"key":"2022011921220741800_ref41","doi-asserted-by":"crossref","first-page":"D427","DOI":"10.1093\/nar\/gky995","article-title":"The Pfam protein families database in 2019","volume":"47","author":"El-Gebali","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022011921220741800_ref42","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1109\/72.80266","article-title":"The multilayer perceptron as an approximation to a Bayes optimal discriminant function","volume":"1","author":"Ruck","year":"1990","journal-title":"IEEE Trans Neural Netw"},{"key":"2022011921220741800_ref43","doi-asserted-by":"crossref","first-page":"602","DOI":"10.1016\/j.neunet.2005.06.042","article-title":"Framewise phoneme classification with bidirectional LSTM and other neural network architectures","volume":"18","author":"Graves","year":"2005","journal-title":"Neural Netw"},{"key":"2022011921220741800_ref44","first-page":"448","volume-title":"Proceedings of the 32nd International Conference on Machine Learning","author":"Ioffe","year":"2015"},{"key":"2022011921220741800_ref45","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2022011921220741800_ref46","doi-asserted-by":"crossref","first-page":"e1000278","DOI":"10.1371\/journal.ppat.1000278","article-title":"A legionella pneumophila effector protein encoded in a region of genomic plasticity binds to dot\/Icm-modified vacuoles","volume":"5","author":"Ninio","year":"2009","journal-title":"PLoS Pathog"},{"key":"2022011921220741800_ref47","doi-asserted-by":"crossref","first-page":"e00175","DOI":"10.1128\/mBio.00175-11","article-title":"Dot\/Icm type IVB secretion system requirements for Coxiella burnetii growth in human macrophages","volume":"2","author":"Beare","year":"2011","journal-title":"MBio"},{"key":"2022011921220741800_ref48","doi-asserted-by":"crossref","first-page":"e1003556","DOI":"10.1371\/journal.ppat.1003556","article-title":"Brucella modulates secretory trafficking via multiple type IV secretion effector proteins","volume":"9","author":"Myeni","year":"2013","journal-title":"PLoS Pathog"},{"key":"2022011921220741800_ref49","doi-asserted-by":"crossref","first-page":"W181","DOI":"10.1093\/nar\/gkn179","article-title":"The CGView server: a comparative genomics tool for circular genomes","volume":"36","author":"Grant","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2022011921220741800_ref50","first-page":"15222","article-title":"BERTology meets biology: interpreting attention in protein language models","volume":"2006","author":"Vig","year":"2020","journal-title":"arXiv preprint arXiv"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab420\/42230896\/bbab420.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab420\/42230896\/bbab420.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,1,19]],"date-time":"2022-01-19T21:22:37Z","timestamp":1642627357000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab420\/6397152"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,15]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1,17]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab420","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,1]]},"published":{"date-parts":[[2021,10,15]]},"article-number":"bbab420"}}