{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T02:29:08Z","timestamp":1768876148460,"version":"3.49.0"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2024,4,11]],"date-time":"2024-04-11T00:00:00Z","timestamp":1712793600000},"content-version":"vor","delay-in-days":15,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFD2101204"],"award-info":[{"award-number":["2022YFD2101204"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["22138004"],"award-info":[{"award-number":["22138004"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,3,27]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Microbial community analysis is an important field to study the composition and function of microbial communities. Microbial species annotation is crucial to revealing microorganisms\u2019 complex ecological functions in environmental, ecological and host interactions. Currently, widely used methods can suffer from issues such as inaccurate species-level annotations and time and memory constraints, and as sequencing technology advances and sequencing costs decline, microbial species annotation methods with higher quality classification effectiveness become critical. Therefore, we processed 16S rRNA gene sequences into k-mers sets and then used a trained DNABERT model to generate word vectors. We also design a parallel network structure consisting of deep and shallow modules to extract the semantic and detailed features of 16S rRNA gene sequences. Our method can accurately and rapidly classify bacterial sequences at the SILVA database\u2019s genus and species level. The database is characterized by long sequence length (1500 base pairs), multiple sequences (428,748 reads) and high similarity. The results show that our method has better performance. The technique is nearly 20% more accurate at the species level than the currently popular naive Bayes-dominated QIIME 2 annotation method, and the top-5 results at the species level differ from BLAST methods by &amp;lt;2%. In summary, our approach combines a multi-module deep learning approach that overcomes the limitations of existing methods, providing an efficient and accurate solution for microbial species labeling and more reliable data support for microbiology research and application.<\/jats:p>","DOI":"10.1093\/bib\/bbae157","type":"journal-article","created":{"date-parts":[[2024,4,11]],"date-time":"2024-04-11T03:56:25Z","timestamp":1712807785000},"source":"Crossref","is-referenced-by-count":5,"title":["DSNetax: a deep learning species annotation method based on a deep-shallow parallel framework"],"prefix":"10.1093","volume":"25","author":[{"given":"Hongyuan","family":"Zhao","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence and Computer Science, Jiangnan university , Wuxi, Jiangsu 214122 , China"},{"name":"National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"},{"name":"Jiangnan University , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Suyi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Luzhou Laojiao Group Co. Ltd , Luzhou 646000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hui","family":"Qin","sequence":"additional","affiliation":[{"name":"Luzhou Laojiao Group Co. Ltd , Luzhou 646000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaogang","family":"Liu","sequence":"additional","affiliation":[{"name":"Luzhou Laojiao Group Co. Ltd , Luzhou 646000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dongna","family":"Ma","sequence":"additional","affiliation":[{"name":"National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"},{"name":"Jiangnan University , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiao","family":"Han","sequence":"additional","affiliation":[{"name":"National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"},{"name":"Jiangnan University , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"},{"name":"Shaoxing Key Laboratory of Traditional Fermentation Food and Human Health, Jiangnan University (Shaoxing) Industrial Technology Research Institute , Shaoxing, Zhejiang 312000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3221-2492","authenticated-orcid":false,"given":"Jian","family":"Mao","sequence":"additional","affiliation":[{"name":"National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"},{"name":"Jiangnan University , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"},{"name":"Shaoxing Key Laboratory of Traditional Fermentation Food and Human Health, Jiangnan University (Shaoxing) Industrial Technology Research Institute , Shaoxing, Zhejiang 312000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuangping","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Computer Science, Jiangnan university , Wuxi, Jiangsu 214122 , China"},{"name":"National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"},{"name":"Jiangnan University , State Key Laboratory of Food Science and Technology, School of Food Science and Technology, , Wuxi, Jiangsu 214122 , China"},{"name":"Shaoxing Key Laboratory of Traditional Fermentation Food and Human Health, Jiangnan University (Shaoxing) Industrial Technology Research Institute , Shaoxing, Zhejiang 312000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,4,9]]},"reference":[{"issue":"5","key":"2024041103562155600_ref1","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1038\/nrmicro1151","article-title":"Identifying microorganisms responsible for ecologically significant biogeochemical processes","volume":"3","author":"Madsen","year":"2005","journal-title":"Nat Rev Microbiol"},{"issue":"5","key":"2024041103562155600_ref2","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1111\/j.1574-6976.2000.tb00564.x","article-title":"Ecology and evolution of bacterial microdiversity","volume":"24","author":"Schloter","year":"2000","journal-title":"FEMS Microbiol Rev"},{"issue":"3","key":"2024041103562155600_ref3","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1016\/0022-2836(75)90213-2","article-title":"A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase","volume":"94","author":"Sanger","year":"1975","journal-title":"J Mol Biol"},{"issue":"1","key":"2024041103562155600_ref4","doi-asserted-by":"crossref","first-page":"5029","DOI":"10.1038\/s41467-019-13036-1","article-title":"Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis","volume":"10","author":"Johnson","year":"2019","journal-title":"Nat Commun"},{"issue":"4","key":"2024041103562155600_ref5","doi-asserted-by":"crossref","first-page":"bbaa229","DOI":"10.1093\/bib\/bbaa229","article-title":"A survey on deep learning in DNA\/RNA motif mining","volume":"22","author":"He","year":"2021","journal-title":"Brief Bioinform"},{"issue":"1","key":"2024041103562155600_ref6","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1038\/s43705-022-00182-9","article-title":"Machine learning and deep learning applications in microbiome research","volume":"2","author":"Hern\u00e1ndez Medina","year":"2022","journal-title":"ISME Commun"},{"issue":"7","key":"2024041103562155600_ref7","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1093\/bioinformatics\/btv683","article-title":"Large-scale machine learning for metagenomics sequence classification","volume":"32","author":"Vervier","year":"2016","journal-title":"Bioinformatics"},{"key":"2024041103562155600_ref8","doi-asserted-by":"crossref","first-page":"1032","DOI":"10.3389\/fbioe.2020.01032","article-title":"Review on the application of machine learning algorithms in the sequence data mining of DNA","volume":"8","author":"Yang","year":"2020","journal-title":"Front Bioeng Biotechnol"},{"key":"2024041103562155600_ref9","first-page":"9","volume-title":"Paper presented at: 2017 International Conference on Computer and Drone Applications (IConDA)","author":"Choong","year":"2017"},{"issue":"10","key":"2024041103562155600_ref10","doi-asserted-by":"crossref","first-page":"R108","DOI":"10.1186\/gb-2009-10-10-r108","article-title":"Genomic DNA k-mer spectra: models and modalities","volume":"10","author":"Chor","year":"2009","journal-title":"Genome Biol"},{"issue":"5","key":"2024041103562155600_ref11","doi-asserted-by":"crossref","first-page":"bbab005","DOI":"10.1093\/bib\/bbab005","article-title":"A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information","volume":"22","author":"Le","year":"2021","journal-title":"Brief Bioinform"},{"issue":"1","key":"2024041103562155600_ref12","doi-asserted-by":"crossref","first-page":"lqaa009","DOI":"10.1093\/nargab\/lqaa009","article-title":"DeepMicrobes: taxonomic classification for metagenomics with deep learning","volume":"2","author":"Liang","year":"2020","journal-title":"NAR Genomics Bioinformatics"},{"issue":"35","key":"2024041103562155600_ref13","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2122636119","article-title":"Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks","volume":"119","author":"Mock","year":"2022","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"6","key":"2024041103562155600_ref14","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1007\/s11103-021-01204-1","article-title":"Using k-mer embeddings learned from a skip-gram based neural network for building a cross-species DNA N6-methyladenine site prediction model","volume":"107","author":"Nguyen","year":"2021","journal-title":"Plant Mol Biol"},{"key":"2024041103562155600_ref15","volume-title":"Paper presented at: 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 19\u201320 June 2022","author":"Zhang","year":"2022"},{"issue":"15","key":"2024041103562155600_ref16","doi-asserted-by":"crossref","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome","volume":"37","author":"Ji","year":"2021","journal-title":"Bioinformatics"},{"key":"2024041103562155600_ref17","doi-asserted-by":"crossref","first-page":"842","DOI":"10.1162\/tacl_a_00349","article-title":"A primer in BERTology: what we know about how BERT works","volume":"8","author":"Rogers","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024041103562155600_ref18","doi-asserted-by":"crossref","first-page":"117972","DOI":"10.1016\/j.eswa.2022.117972","article-title":"BERT contextual embeddings for taxonomic classification of bacterial DNA sequences","volume":"208","author":"Marwah","year":"2022","journal-title":"Expert Systems with Applications"},{"key":"2024041103562155600_ref19","doi-asserted-by":"crossref","first-page":"D633","DOI":"10.1093\/nar\/gkt1244","article-title":"Ribosomal database project: data and tools for high throughput rRNA analysis","volume":"42","author":"Cole","year":"2014","journal-title":"Nucleic Acids Res"},{"issue":"7","key":"2024041103562155600_ref20","doi-asserted-by":"crossref","first-page":"5069","DOI":"10.1128\/AEM.03006-05","article-title":"Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB","volume":"72","author":"DeSantis","year":"2006","journal-title":"Appl Environ Microbiol"},{"key":"2024041103562155600_ref21","doi-asserted-by":"crossref","first-page":"D590","DOI":"10.1093\/nar\/gks1219","article-title":"The SILVA ribosomal RNA gene database project: improved data processing and web-based tools","volume":"41","author":"Quast","year":"2013","journal-title":"Nucleic Acids Res"},{"issue":"5","key":"2024041103562155600_ref22","doi-asserted-by":"crossref","first-page":"1613","DOI":"10.1099\/ijsem.0.001755","article-title":"Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies","volume":"67","author":"Yoon","year":"2017","journal-title":"Int J Syst Evol Microbiol"},{"key":"2024041103562155600_ref23","volume-title":"Paper presented at: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 27\u201330 June 2016","author":"He","year":"2016"},{"key":"2024041103562155600_ref24","volume-title":"Proceedings of the 37th International Conference on Machine Learning(ICML'20), 13\u201318 July 2020","author":"Ishida","year":"2020"},{"issue":"6","key":"2024041103562155600_ref25","doi-asserted-by":"crossref","first-page":"64","DOI":"10.9734\/bji\/2023\/v27i6707","article-title":"Protecting genetic genealogical databases from identical-by-state probing attacks: a machine learning-based approach","volume":"27","author":"Enow","year":"2023","journal-title":"Biotechnol J Int"},{"issue":"8","key":"2024041103562155600_ref26","doi-asserted-by":"crossref","first-page":"852","DOI":"10.1038\/s41587-019-0209-9","article-title":"Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2","volume":"37","author":"Bolyen","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2024041103562155600_ref27","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Fabian","year":"2011","journal-title":"J Mach Learn Res"},{"issue":"3","key":"2024041103562155600_ref28","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2024041103562155600_ref29","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/3\/bbae157\/57189167\/bbae157.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/3\/bbae157\/57189167\/bbae157.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,11]],"date-time":"2024-04-11T03:56:37Z","timestamp":1712807797000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae157\/7642701"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,27]]},"references-count":29,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae157","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,5]]},"published":{"date-parts":[[2024,3,27]]},"article-number":"bbae157"}}