{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T12:49:15Z","timestamp":1760014155290,"version":"3.37.3"},"reference-count":58,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2018,8,31]],"date-time":"2018-08-31T00:00:00Z","timestamp":1535673600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1149544","CCF-1447235"],"award-info":[{"award-number":["CCF-1149544","CCF-1447235"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"United States Department of Agriculture National Institute of Food and Agriculture Award","award":["2013-68004-20359"],"award-info":[{"award-number":["2013-68004-20359"]}]},{"name":"Bioinformatics and Genomic Systems Engineering"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Non-coding RNAs (ncRNAs) are known to play crucial roles in various biological processes, and there is a pressing need for accurate computational detection methods that could be used to efficiently scan genomes to detect novel ncRNAs. However, unlike coding genes, ncRNAs often lack distinctive sequence features that could be used for recognizing them. Although many ncRNAs are known to have a well conserved secondary structure, which provides useful cues for computational prediction, it has been also shown that a structure-based approach alone may not be sufficient for detecting ncRNAs in a single sequence. Currently, the most effective ncRNA detection methods combine structure-based techniques with a comparative genome analysis approach to improve the prediction performance.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this paper, we propose RNAdetect, a computational method incorporating novel features for accurate detection of ncRNAs in combination with comparative genome analysis. Given a sequence alignment, RNAdetect can accurately detect the presence of functional ncRNAs by incorporating novel predictive features based on the concept of generalized ensemble defect (GED), which assesses the degree of structure conservation across multiple related sequences and the conformation of the individual folding structures to a common consensus structure. Furthermore, n-gram models (NGMs) are used to extract features that can effectively capture sequence homology to known ncRNA families. Utilization of NGMs can enhance the detection of ncRNAs that have sparse folding structures with many unpaired bases. Extensive performance evaluation based on the Rfam database and bacterial genomes demonstrate that RNAdetect can accurately and reliably detect novel ncRNAs, outperforming the current state-of-the-art methods.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code for RNAdetect and the benchmark data used in this paper can be downloaded at https:\/\/github.com\/bjyoontamu\/RNAdetect.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty765","type":"journal-article","created":{"date-parts":[[2018,8,30]],"date-time":"2018-08-30T12:28:00Z","timestamp":1535632080000},"page":"1133-1141","source":"Crossref","is-referenced-by-count":7,"title":["RNAdetect: efficient computational detection of novel non-coding RNAs"],"prefix":"10.1093","volume":"35","author":[{"given":"Chun-Chi","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA"},{"name":"TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA"}]},{"given":"Xiaoning","family":"Qian","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA"},{"name":"TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA"}]},{"given":"Byung-Jun","family":"Yoon","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA"},{"name":"TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,8,31]]},"reference":[{"key":"2023020108345890800_bty765-B1","doi-asserted-by":"crossref","first-page":"1787","DOI":"10.1126\/science.1155472","article-title":"The eukaryotic genome as an RNA machine","volume":"319","author":"Amaral","year":"2008","journal-title":"Science"},{"key":"2023020108345890800_bty765-B2","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1016\/S0960-9822(01)00270-6","article-title":"Novel small RNA-encoding genes in the intergenic regions of Escherichia coli","volume":"11","author":"Argaman","year":"2001","journal-title":"Curr. Biol"},{"key":"2023020108345890800_bty765-B3","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1038\/417141a","article-title":"Complete genome sequence of the model actinomycete Streptomyces coelicolor a3 (2)","volume":"417","author":"Bentley","year":"2002","journal-title":"Nature"},{"key":"2023020108345890800_bty765-B4","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1093\/bfgp\/elp043","article-title":"From consensus structure prediction to RNA gene finding","volume":"8","author":"Bernhart","year":"2009","journal-title":"Brief. Funct. Genomic. Proteomic"},{"key":"2023020108345890800_bty765-B5","doi-asserted-by":"crossref","first-page":"474.","DOI":"10.1186\/1471-2105-9-474","article-title":"RNAalifold: improved consensus structure prediction for RNA alignments","volume":"9","author":"Bernhart","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020108345890800_bty765-B6","first-page":"1579","article-title":"Fast kernel classifiers with online and active learning","volume":"6","author":"Bordes","year":"2005","journal-title":"J. Mach. Learn. Res"},{"key":"2023020108345890800_bty765-B7","doi-asserted-by":"crossref","first-page":"15423","DOI":"10.3390\/ijms140815423","article-title":"Detecting and comparing non-coding RNAs in the high-throughput era","volume":"14","author":"Bussotti","year":"2013","journal-title":"Int. J. Mol. Sci"},{"key":"2023020108345890800_bty765-B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM: a library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans. Intell. Syst. Technol"},{"key":"2023020108345890800_bty765-B9","doi-asserted-by":"crossref","first-page":"955","DOI":"10.1002\/prot.20373","article-title":"Protein classification based on text document classification techniques","volume":"58","author":"Cheng","year":"2005","journal-title":"Proteins"},{"key":"2023020108345890800_bty765-B10","doi-asserted-by":"crossref","first-page":"e11147.","DOI":"10.1371\/journal.pone.0011147","article-title":"progressivemauve: multiple genome alignment with gene gain, loss and rearrangement","volume":"5","author":"Darling","year":"2010","journal-title":"PLoS One"},{"key":"2023020108345890800_bty765-B11","doi-asserted-by":"crossref","DOI":"10.1201\/b14297","volume-title":"Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions","author":"Deng","year":"2012"},{"key":"2023020108345890800_bty765-B12","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1109\/KAM.2009.137","volume-title":"2009 Second International Symposium on Knowledge Acquisition and Modeling, KAM\u201909","author":"Ding","year":"2009"},{"key":"2023020108345890800_bty765-B13","doi-asserted-by":"crossref","first-page":"615.","DOI":"10.1186\/1471-2164-11-615","article-title":"A comparative genome-wide study of ncRNAs in trypanosomatids","volume":"11","author":"Doniger","year":"2010","journal-title":"BMC Genom"},{"volume-title":"Statistical Identification of Language.","year":"1994","author":"Dunning","key":"2023020108345890800_bty765-B14"},{"key":"2023020108345890800_bty765-B15","doi-asserted-by":"crossref","first-page":"919","DOI":"10.1038\/35103511","article-title":"Non\u2013coding RNA genes and the modern RNA world","volume":"2","author":"Eddy","year":"2001","journal-title":"Nat. Rev. Genet"},{"key":"2023020108345890800_bty765-B16","doi-asserted-by":"crossref","first-page":"286.","DOI":"10.1186\/1756-0500-7-286","article-title":"Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences","volume":"7","author":"ElGokhy","year":"2014","journal-title":"BMC Res. Notes"},{"key":"2023020108345890800_bty765-B17","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1101\/gr.5890907","article-title":"Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA","volume":"17","author":"Freyhult","year":"2007","journal-title":"Genome Res"},{"key":"2023020108345890800_bty765-B18","doi-asserted-by":"crossref","first-page":"e0130200.","DOI":"10.1371\/journal.pone.0130200","article-title":"Discovery of novel ncRNA sequences in multiple genome alignments on the basis of conserved and stable secondary structures","volume":"10","author":"Fu","year":"2015","journal-title":"PLoS One"},{"key":"2023020108345890800_bty765-B19","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1093\/nar\/gkg006","article-title":"Rfam: an RNA family database","volume":"31","author":"Griffiths-Jones","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023020108345890800_bty765-B20","first-page":"69","article-title":"RNAz 2.0: improved noncoding RNA detection","volume":"15","author":"Gruber","year":"2010","journal-title":"Pac. Symp. Biocomput"},{"key":"2023020108345890800_bty765-B21","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res"},{"key":"2023020108345890800_bty765-B22","doi-asserted-by":"crossref","first-page":"22.","DOI":"10.1186\/1471-2105-13-22","article-title":"Analysis of energy-based algorithms for RNA secondary structure prediction","volume":"13","author":"Hajiaghayi","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020108345890800_bty765-B23","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1007\/BF00818163","article-title":"Fast folding and comparison of RNA secondary structures","volume":"125","author":"Hofacker","year":"1994","journal-title":"Monatsh. Chem"},{"key":"2023020108345890800_bty765-B24","doi-asserted-by":"crossref","first-page":"D574","DOI":"10.1093\/nar\/gkv1209","article-title":"Ensembl genomes 2016: more genomes, more complexity","volume":"44","author":"Kersey","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023020108345890800_bty765-B25","doi-asserted-by":"crossref","first-page":"R68.","DOI":"10.1186\/gb-2007-8-5-r68","article-title":"ngLOC: an n-gram-based bayesian method for estimating the subcellular proteomes of eukaryotes","volume":"8","author":"King","year":"2007","journal-title":"Genome Biol"},{"key":"2023020108345890800_bty765-B26","doi-asserted-by":"crossref","first-page":"2947","DOI":"10.1093\/bioinformatics\/btm404","article-title":"Clustal W and Clustal X version 2.0","volume":"23","author":"Larkin","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020108345890800_bty765-B27","doi-asserted-by":"crossref","first-page":"26.","DOI":"10.1186\/1748-7188-6-26","article-title":"ViennaRNA package 2.0","volume":"6","author":"Lorenz","year":"2011","journal-title":"Algorithms Mol. Biol"},{"key":"2023020108345890800_bty765-B28","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.ymeth.2016.04.004","article-title":"Predicting RNA secondary structures from sequence and probing data","volume":"103","author":"Lorenz","year":"2016","journal-title":"Methods"},{"key":"2023020108345890800_bty765-B29","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1109\/ICCABS.2011.5729865","volume-title":"2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)","author":"Lou","year":"2011"},{"key":"2023020108345890800_bty765-B30","doi-asserted-by":"crossref","first-page":"1805","DOI":"10.1261\/rna.1643609","article-title":"Improved RNA secondary structure prediction by maximizing expected pair accuracy","volume":"15","author":"Lu","year":"2009","journal-title":"RNA"},{"volume-title":"Randomization, Bootstrap and Monte Carlo Methods in Biology","year":"2006","author":"Manly","key":"2023020108345890800_bty765-B31"},{"key":"2023020108345890800_bty765-B32","doi-asserted-by":"crossref","first-page":"1331","DOI":"10.3390\/e16031331","article-title":"Describing the structural diversity within an RNA\u2019s ensemble","volume":"16","author":"Martin","year":"2014","journal-title":"Entropy"},{"key":"2023020108345890800_bty765-B33","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1006\/jmbi.1999.2700","article-title":"Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure","volume":"288","author":"Mathews","year":"1999","journal-title":"J. Mol. Biol"},{"key":"2023020108345890800_bty765-B34","doi-asserted-by":"crossref","first-page":"R17","DOI":"10.1093\/hmg\/ddl046","article-title":"Non-coding RNA","volume":"15","author":"Mattick","year":"2006","journal-title":"Hum. Mol. Genet"},{"key":"2023020108345890800_bty765-B35","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1002\/bip.360290621","article-title":"The equilibrium partition function and base pair binding probabilities for RNA secondary structure","volume":"29","author":"McCaskill","year":"1990","journal-title":"Biopolymers"},{"key":"2023020108345890800_bty765-B36","doi-asserted-by":"crossref","first-page":"4119","DOI":"10.1093\/nar\/gkg438","article-title":"Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics","volume":"31","author":"McCutcheon","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023020108345890800_bty765-B37","doi-asserted-by":"crossref","first-page":"2933","DOI":"10.1093\/bioinformatics\/btt509","article-title":"Infernal 1.1: 100-fold faster rna homology searches","volume":"29","author":"Nawrocki","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020108345890800_bty765-B38","doi-asserted-by":"crossref","first-page":"1335","DOI":"10.1093\/bioinformatics\/btp157","article-title":"Infernal 1.0: inference of rna alignments","volume":"25","author":"Nawrocki","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020108345890800_bty765-B39","doi-asserted-by":"crossref","first-page":"D130","DOI":"10.1093\/nar\/gku1063","article-title":"Rfam 12.0: updates to the RNA families database","volume":"43","author":"Nawrocki","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020108345890800_bty765-B40","doi-asserted-by":"crossref","first-page":"e33.","DOI":"10.1371\/journal.pcbi.0020033","article-title":"Identification and classification of conserved RNA secondary structures in the human genome","volume":"2","author":"Pedersen","year":"2006","journal-title":"PLoS Comput. Biol"},{"key":"2023020108345890800_bty765-B41","doi-asserted-by":"crossref","first-page":"129.","DOI":"10.1186\/1471-2105-11-129","article-title":"RNAstructure: software for RNA secondary structure prediction and analysis","volume":"11","author":"Reuter","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020108345890800_bty765-B42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/nar\/gkj405","article-title":"Escherichia coli K-12: a cooperatively developed annotation snapshot\u20132005","volume":"34","author":"Riley","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023020108345890800_bty765-B43","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1093\/bioinformatics\/16.7.583","article-title":"Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs","volume":"16","author":"Rivas","year":"2000","journal-title":"Bioinformatics"},{"key":"2023020108345890800_bty765-B44","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/1471-2105-2-8","article-title":"Noncoding RNA gene detection using comparative sequence analysis","volume":"2","author":"Rivas","year":"2001","journal-title":"BMC Bioinformatics"},{"key":"2023020108345890800_bty765-B45","doi-asserted-by":"crossref","first-page":"1369","DOI":"10.1016\/S0960-9822(01)00401-8","article-title":"Computational identification of noncoding RNAs in E. coli by comparative genomics","volume":"11","author":"Rivas","year":"2001","journal-title":"Curr. Biol"},{"key":"2023020108345890800_bty765-B46","doi-asserted-by":"crossref","first-page":"3263","DOI":"10.1093\/nar\/gki644","article-title":"Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming","volume":"33","author":"S\u00e6trom","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023020108345890800_bty765-B47","doi-asserted-by":"crossref","first-page":"5486","DOI":"10.1021\/bi051972s","article-title":"Unstructured rna is a substrate for trnase z","volume":"45","author":"Shibata","year":"2006","journal-title":"Biochemistry"},{"key":"2023020108345890800_bty765-B48","doi-asserted-by":"crossref","first-page":"1260","DOI":"10.1126\/science.1072249","article-title":"An expanding universe of noncoding RNAs","volume":"296","author":"Storz","year":"2002","journal-title":"Science"},{"key":"2023020108345890800_bty765-B49","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1016\/j.cmpb.2005.11.007","article-title":"N-gram-based classification and unsupervised hierarchical clustering of genome sequences","volume":"81","author":"Tomovi\u0107","year":"2006","journal-title":"Comput. Methods Programs Biomed"},{"key":"2023020108345890800_bty765-B50","doi-asserted-by":"crossref","first-page":"468","DOI":"10.4161\/rna.8.3.14421","article-title":"Deep sequencing-based identification of small non-coding RNAs in Streptomyces coelicolor","volume":"8","author":"Vockenhuber","year":"2011","journal-title":"RNA Biol"},{"key":"2023020108345890800_bty765-B51","doi-asserted-by":"crossref","first-page":"S1.","DOI":"10.1186\/1471-2105-13-S5-S1","article-title":"Stable stem enabled shannon entropies distinguish non-coding RNAs from random backgrounds","volume":"13","author":"Wang","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020108345890800_bty765-B52","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.jmb.2004.07.018","article-title":"Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics","volume":"342","author":"Washietl","year":"2004","journal-title":"J. Mol. Biol"},{"key":"2023020108345890800_bty765-B53","doi-asserted-by":"crossref","first-page":"2454","DOI":"10.1073\/pnas.0409169102","article-title":"Fast and reliable prediction of noncoding RNAs","volume":"102","author":"Washietl","year":"2005","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020108345890800_bty765-B54","doi-asserted-by":"crossref","first-page":"1383","DOI":"10.1038\/nbt1144","article-title":"Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome","volume":"23","author":"Washietl","year":"2005","journal-title":"Nat. Biotechnol"},{"key":"2023020108345890800_bty765-B55","doi-asserted-by":"crossref","first-page":"1637","DOI":"10.1101\/gad.901001","article-title":"Identification of novel small RNAs using comparative genomics and microarrays","volume":"15","author":"Wassarman","year":"2001","journal-title":"Genes Dev"},{"key":"2023020108345890800_bty765-B56","doi-asserted-by":"crossref","first-page":"4816","DOI":"10.1093\/nar\/27.24.4816","article-title":"No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution","volume":"27","author":"Workman","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023020108345890800_bty765-B57","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1002\/jcc.21633","article-title":"Nucleic acid sequence design via efficient ensemble defect optimization","volume":"32","author":"Zadeh","year":"2011","journal-title":"J. Comput. Chem"},{"key":"2023020108345890800_bty765-B58","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1093\/nar\/9.1.133","article-title":"Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information","volume":"9","author":"Zuker","year":"1981","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/7\/1133\/48968198\/bioinformatics_35_7_1133.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/7\/1133\/48968198\/bioinformatics_35_7_1133.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T19:40:57Z","timestamp":1675280457000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/7\/1133\/5088324"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,8,31]]},"references-count":58,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2019,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty765","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2019,4,1]]},"published":{"date-parts":[[2018,8,31]]}}}