{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T09:58:19Z","timestamp":1769248699685,"version":"3.49.0"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2023,3,1]],"date-time":"2023-03-01T00:00:00Z","timestamp":1677628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Himalayan Centre for High-throughput Computational Biology"},{"name":"DBT, Govt. of India","award":["BT\/PR40122\/BTIS\/137\/30\/2021"],"award-info":[{"award-number":["BT\/PR40122\/BTIS\/137\/30\/2021"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,3,19]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Discovering pre-microRNAs (miRNAs) is the core of miRNA discovery. Using traditional sequence\/structural features, many tools have been published to discover miRNAs. However, in practical applications like genomic annotations, their actual performance has been very low. This becomes more grave in plants where unlike animals pre-miRNAs are much more complex and difficult to identify. A huge gap exists between animals and plants for the available software for miRNA discovery and species-specific miRNA information. Here, we present miWords, a composite deep learning system of transformers and convolutional neural networks which sees genome as a pool of sentences made of words with specific occurrence preferences and contexts, to accurately identify pre-miRNA regions across plant genomes. A comprehensive benchmarking was done involving &amp;gt;10 software representing different genre and many experimentally validated datasets. miWords emerged as the best one while breaching accuracy of 98% and performance lead of ~10%. miWords was also evaluated across Arabidopsis genome where also it outperformed the compared tools. As a demonstration, miWords was run across the tea genome, reporting 803 pre-miRNA regions, all validated by small RNA-seq reads from multiple samples, and most of them were functionally supported by the degradome sequencing data. miWords is freely available as stand-alone source codes at https:\/\/scbb.ihbt.res.in\/miWords\/index.php.<\/jats:p>","DOI":"10.1093\/bib\/bbad088","type":"journal-article","created":{"date-parts":[[2023,3,13]],"date-time":"2023-03-13T02:36:07Z","timestamp":1678674967000},"source":"Crossref","is-referenced-by-count":13,"title":["miWords: transformer-based composite deep learning for highly accurate discovery of pre-miRNA regions across plant genomes"],"prefix":"10.1093","volume":"24","author":[{"given":"Sagar","family":"Gupta","sequence":"first","affiliation":[{"name":"Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT) , Palampur, Himachal Pradesh 176061, India"},{"name":"Academy of Scientific and Innovative Research (AcSIR) , Ghaziabad, Uttar Pradesh 201002, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4004-8047","authenticated-orcid":false,"given":"Ravi","family":"Shankar","sequence":"additional","affiliation":[{"name":"Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT) , Palampur, Himachal Pradesh 176061, India"},{"name":"Academy of Scientific and Innovative Research (AcSIR) , Ghaziabad, Uttar Pradesh 201002, India"}]}],"member":"286","published-online":{"date-parts":[[2023,3,15]]},"reference":[{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"1368","DOI":"10.1093\/bioinformatics\/btr153","article-title":"PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs","volume":"27","author":"Xuan","year":"2011","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","DOI":"10.1002\/bies.201600113","article-title":"MicroRNA annotation of plant genomes\u2014do it right or not at all","volume":"39","author":"Taylor","year":"2017","journal-title":"Bioessays"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"11511","DOI":"10.1073\/pnas.0404025101","article-title":"Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes","volume":"101","author":"Bonnet","year":"2004","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"787","DOI":"10.1016\/j.molcel.2004.05.027","article-title":"Computational identification of plant microRNAs and their targets, including a stress-induced miRNA","volume":"14","author":"Jones-Rhoades","year":"2004","journal-title":"Mol Cell"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1101\/gr.2908205","article-title":"Computational prediction of miRNAs in Arabidopsis thaliana","volume":"15","author":"Adai","year":"2005","journal-title":"Genome Res"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1186\/1471-2164-6-119","article-title":"Computational evidence for hundreds of non-conserved plant microRNAs","volume":"6","author":"Lindow","year":"2005","journal-title":"BMC Genomics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"766","DOI":"10.1038\/ng1590","article-title":"Identification of hundreds of conserved and nonconserved human microRNAs","volume":"37","author":"Bentwich","year":"2005","journal-title":"Nat Genet"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1186\/1471-2105-6-310","article-title":"Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine","volume":"6","author":"Xue","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1093\/bioinformatics\/bti802","article-title":"Identification of plant microRNA homologs","volume":"22","author":"Dezulian","year":"2006","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"e197","DOI":"10.1093\/bioinformatics\/btl257","article-title":"Hairpins in a haystack: recognizing microRNA precursors in comparative genomics data","volume":"22","author":"Hertel","year":"2006","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"1321","DOI":"10.1093\/bioinformatics\/btm026","article-title":"De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures","volume":"23","author":"Ng","year":"2007","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"989","DOI":"10.1093\/bioinformatics\/btp107","article-title":"microPred: effective classification of pre-miRNAs for human miRNA gene prediction","volume":"25","author":"Batuwita","year":"2009","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"W68","DOI":"10.1093\/nar\/gkp347","article-title":"miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments","volume":"37","author":"Hackenberg","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"2226","DOI":"10.1093\/bioinformatics\/btq329","article-title":"MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data","volume":"26","author":"Mathelier","year":"2010","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"2614","DOI":"10.1093\/bioinformatics\/btr430","article-title":"miRDeep-P: a computational tool for analyzing the microRNA transcriptome in plants","volume":"27","author":"Yang","year":"2011","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"e45782","DOI":"10.1371\/journal.pone.0045782","article-title":"miR-BAG: bagging based identification of MicroRNA precursors","volume":"7","author":"Jha","year":"2012","journal-title":"PloS One"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1093\/nar\/gkr688","article-title":"miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades","volume":"40","author":"Friedl\u00e4nder","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"727","DOI":"10.1093\/nar\/gks1187","article-title":"miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data","volume":"41","author":"An","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1186\/1471-2105-14-83","article-title":"HuntMi: an efficient and taxon-specific approach in pre-miRNA identification","volume":"14","author":"Gudy\u015b","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1261\/rna.035279.112","article-title":"ShortStack: comprehensive annotation and quantification of small RNA genes","volume":"19","author":"Axtell","year":"2013","journal-title":"RNA"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1186\/s12859-014-0423-x","article-title":"Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine","volume":"15","author":"Meng","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"2837","DOI":"10.1093\/bioinformatics\/btu380","article-title":"miR-PREFeR: an accurate, fast and easy-to-use plant miRNA prediction tool using small RNA-Seq data","volume":"30","author":"Lei","year":"2014","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"3124","DOI":"10.1039\/C6MB00295A","article-title":"plantMirP: an efficient computational program for the prediction of plant pre-miRNA by incorporating knowledge-based energy features","volume":"12","author":"Yao","year":"2016","journal-title":"Mol Biosyst"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1109\/BIGCOMP.2017.7881722","volume-title":"2017 IEEE International Conference on Big Data and Smart Computing (BigComp)","author":"Thomas","year":"2017"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"1316","DOI":"10.1109\/TCBB.2016.2576459","article-title":"High class-imbalance in pre-miRNA prediction: a novel approach based on deepSOM","volume":"14","author":"Stegmayer","year":"2017","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2023032004562497100_","article-title":"Deep recurrent neural network-based identification of precursor microRNAs","volume":"30","author":"Park","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1093\/bioinformatics\/btx612","article-title":"Genome-wide pre-miRNA discovery from few labeled examples","volume":"34","author":"Yones","year":"2018","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1186\/s12859-019-3279-2","article-title":"Fast and accurate microRNA search using CNN","volume":"20","author":"Tang","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"104448","DOI":"10.1016\/j.compbiomed.2021.104448","article-title":"High precision in microRNA prediction: a novel genome-wide approach with convolutional deep residual networks","volume":"134","author":"Yones","year":"2021","journal-title":"Comput Biol Med"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun ACM"},{"key":"2023032004562497100_","first-page":"1","volume-title":"An analysis of convolutional neural networks for sentence classification","author":"Vieira"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","DOI":"10.1002\/047084535X","volume-title":"Recurrent Neural Networks for Prediction: learning algorithms, architectures and stability","author":"Mandic","year":"2001"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1105\/tpc.17.00851","article-title":"Revisiting criteria for plant MicroRNA annotation in the era of big data","volume":"30","author":"Axtell","year":"2018","journal-title":"Plant Cell"},{"key":"2023032004562497100_","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"1191","DOI":"10.1093\/bioinformatics\/btab823","article-title":"miRe2e: a full end-to-end deep model based on transformers for prediction of pre-miRNAs","volume":"38","author":"Raad","year":"2022","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"D155","DOI":"10.1093\/nar\/gky1141","article-title":"miRBase: from microRNA sequences to function","volume":"47","author":"Kozomara","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1038\/s41438-021-00480-8","article-title":"sRNAanno\u2014a database repository of uniformly annotated small RNAs in plants","volume":"8","author":"Chen","year":"2021","journal-title":"Hortic Res"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"D1475","DOI":"10.1093\/nar\/gkab811","article-title":"PmiREN2.0: from data annotation to functional exploration of plant microRNAs","volume":"50","author":"Guo","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"D982","DOI":"10.1093\/nar\/gku1162","article-title":"PNRD: a plant non-coding RNA database","volume":"43","author":"Yi","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"bbaa184","DOI":"10.1093\/bib\/bbaa184","article-title":"Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning","volume":"22","author":"Bugnon","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1093\/bioinformatics\/btn604","article-title":"CleaveLand: a pipeline for using degradome data to find cleaved small RNA targets","volume":"25","author":"Addo-Quaye","year":"2009","journal-title":"Bioinformatics"},{"key":"2023032004562497100_","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1186\/1471-2164-13-126","article-title":"De novo sequencing and characterization of Picrorhiza kurrooa transcriptome at two temperatures showed major transcriptome adjustments","volume":"13","author":"Gahlan","year":"2012","journal-title":"BMC Genomics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/2\/bbad088\/49560741\/bbad088.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/2\/bbad088\/49560741\/bbad088.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,25]],"date-time":"2023-03-25T13:31:56Z","timestamp":1679751116000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad088\/7076120"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3]]},"references-count":43,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,3,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad088","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,3]]},"published":{"date-parts":[[2023,3]]},"article-number":"bbad088"}}