{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T16:00:37Z","timestamp":1776096037455,"version":"3.50.1"},"reference-count":65,"publisher":"Oxford University Press (OUP)","issue":"24","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Long non-coding RNAs (lncRNAs), which are non-coding RNAs of length above 200 nucleotides, play important biological functions such as gene expression regulation. To fully reveal the functions of lncRNAs, a fundamental step is to annotate them in various species. However, as lncRNAs tend to encode one or multiple open reading frames, it is not trivial to distinguish these long non-coding transcripts from protein-coding genes in transcriptomic data.<\/jats:p>\n               <jats:p>Results: In this work, we design a new tool that calculates the coding potential of a transcript using a machine learning model (random forest) based on multiple features including sequence characteristics of putative open reading frames, translation scores based on ribosomal coverage, and conservation against characterized protein families. The experimental results show that our tool competes favorably with existing coding potential computation tools in lncRNA identification.<\/jats:p>\n               <jats:p>Availability and implementation: The scripts and data can be downloaded at https:\/\/github.com\/zhangy72\/LncRNA-ID<\/jats:p>\n               <jats:p>Contact: \u00a0yannisun@msu.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv480","type":"journal-article","created":{"date-parts":[[2015,8,28]],"date-time":"2015-08-28T00:18:54Z","timestamp":1440721134000},"page":"3897-3905","source":"Crossref","is-referenced-by-count":92,"title":["LncRNA-ID: Long non-coding RNA IDentification using balanced random forests"],"prefix":"10.1093","volume":"31","author":[{"given":"Rujira","family":"Achawanantakun","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA"}]},{"given":"Jiao","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA"}]},{"given":"Yanni","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA"}]},{"given":"Yuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA"}]}],"member":"286","published-online":{"date-parts":[[2015,8,26]]},"reference":[{"key":"2023051307185077000_btv480-B1","doi-asserted-by":"crossref","first-page":"3889","DOI":"10.1073\/pnas.0635171100","article-title":"Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae","volume":"100","author":"Arava","year":"2003","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051307185077000_btv480-B2","doi-asserted-by":"crossref","first-page":"173","DOI":"10.4161\/epi.27030","article-title":"A long non-coding RNA promotes full activation of adult gene expression in the chicken globin domain","volume":"9","author":"Arriaga-Canon","year":"2014","journal-title":"Epigenetics"},{"key":"2023051307185077000_btv480-B3","doi-asserted-by":"crossref","first-page":"e43047","DOI":"10.1371\/journal.pone.0043047","article-title":"Computational identification and functional predictions of long noncoding RNA in Zea mays","volume":"7","author":"Boerner","year":"2012","journal-title":"PLoS ONE"},{"key":"2023051307185077000_btv480-B4","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1038\/351325a0","article-title":"Characterization of a murine gene expressed from the inactive X chromosome","volume":"351","author":"Borsani","year":"1991","journal-title":"Nature"},{"key":"2023051307185077000_btv480-B5","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"2023051307185077000_btv480-B6","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1016\/0092-8674(92)90519-I","article-title":"The product of the mouse Xist gene is a 15\u2009kb inactive X-specific transcript containing no conserved ORF and located in the nucleus","volume":"71","author":"Brockdorff","year":"1992","journal-title":"Cell"},{"key":"2023051307185077000_btv480-B7","doi-asserted-by":"crossref","first-page":"D210","DOI":"10.1093\/nar\/gkr1175","article-title":"NONCODE v3.0: integrative annotation of long noncoding RNAs","volume":"40","author":"Bu","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B8","doi-asserted-by":"crossref","first-page":"1915","DOI":"10.1101\/gad.17446611","article-title":"Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses","volume":"25","author":"Cabili","year":"2011","journal-title":"Genes Dev."},{"key":"2023051307185077000_btv480-B9","article-title":"Using random forest to learn imbalanced data","volume-title":"Technical report","author":"Chen","year":"2004"},{"key":"2023051307185077000_btv480-B10","doi-asserted-by":"crossref","first-page":"D983","DOI":"10.1093\/nar\/gks1099","article-title":"LncRNADisease: a database for long-non-coding RNA-associated diseases","volume":"41","author":"Chen","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B11","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1016\/j.ygeno.2012.04.003","article-title":"Random forests for genomic data analysis","volume":"99","author":"Chen","year":"2012","journal-title":"Genomics"},{"key":"2023051307185077000_btv480-B12","doi-asserted-by":"crossref","first-page":"R72","DOI":"10.1186\/gb-2010-11-7-r72","article-title":"Long noncoding RNA genes: conservation of sequence and brain expression among diverse amniotes","volume":"11","author":"Chodroff","year":"2010","journal-title":"Genome Biol."},{"key":"2023051307185077000_btv480-B13","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1046\/j.1365-2141.2003.04754.x","article-title":"Beta\u2009+\u200945\u2009G\u2013C: a novel silent beta-thalassaemia mutation, the first in the Kozak sequence","volume":"124","author":"De Angioletti","year":"2004","journal-title":"Br. J. Haematol."},{"key":"2023051307185077000_btv480-B14","doi-asserted-by":"crossref","first-page":"1775","DOI":"10.1101\/gr.132159.111","article-title":"The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression","volume":"22","author":"Derrien","year":"2012","journal-title":"Genome Res."},{"key":"2023051307185077000_btv480-B15","doi-asserted-by":"crossref","first-page":"e1000176","DOI":"10.1371\/journal.pcbi.1000176","article-title":"Differentiating protein-coding and noncoding RNA: challenges and ambiguities","volume":"4","author":"Dinger","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023051307185077000_btv480-B16","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1038\/nature11233","article-title":"Landscape of transcription in human cells","volume":"489","author":"Djebali","year":"2012","journal-title":"Nature"},{"key":"2023051307185077000_btv480-B17","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2023051307185077000_btv480-B18","first-page":"205","article-title":"A new generation of homology search tools based on probabilistic inference","volume":"23","author":"Eddy","year":"2009","journal-title":"Genome Inf."},{"key":"2023051307185077000_btv480-B19","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1093\/nar\/gkr367","article-title":"HMMER web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B20","volume-title":"Analyzing Receiver Operating Characteristic Curves With SAS","author":"Gonen","year":"2007"},{"key":"2023051307185077000_btv480-B21","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1038\/nbt.1633","article-title":"Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs","volume":"28","author":"Guttman","year":"2010","journal-title":"Nat. Biotechnol."},{"issue":"1","key":"2023051307185077000_btv480-B22","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1016\/j.cell.2013.06.009","article-title":"Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins","volume":"154","author":"Guttman","year":"2013","journal-title":"Cell"},{"key":"2023051307185077000_btv480-B23","doi-asserted-by":"crossref","DOI":"10.1145\/1656274.1656278","article-title":"The WEKA data mining software: An update","volume":"11","author":"Hall","year":"2009","journal-title":"SIGKDD Explorations"},{"key":"2023051307185077000_btv480-B24","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-540-79452-3","volume-title":"Machine Learning: Modeling Data Locally and Globally","author":"Huang","year":"2008"},{"key":"2023051307185077000_btv480-B25","doi-asserted-by":"crossref","first-page":"e78915","DOI":"10.1371\/journal.pone.0078915","article-title":"Sequence and expression characteristics of long noncoding RNAs in honey bee caste development\u2014potential novel regulators for transgressive ovary size","volume":"8","author":"Humann","year":"2013","journal-title":"PLoS ONE"},{"key":"2023051307185077000_btv480-B26","doi-asserted-by":"crossref","first-page":"582","DOI":"10.4161\/rna.7.5.13216","article-title":"Long noncoding RNA in genome regulation: prospects and mechanisms","volume":"7","author":"Hung","year":"2010","journal-title":"RNA Biol."},{"key":"2023051307185077000_btv480-B27","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1016\/j.cell.2011.10.002","article-title":"Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes","volume":"147","author":"Ingolia","year":"2011","journal-title":"Cell"},{"key":"2023051307185077000_btv480-B28","doi-asserted-by":"crossref","first-page":"1484","DOI":"10.1126\/science.1138341","article-title":"RNA maps reveal new RNA classes and a possible function for pervasive transcription","volume":"316","author":"Kapranov","year":"2007","journal-title":"Science"},{"key":"2023051307185077000_btv480-B29","doi-asserted-by":"crossref","first-page":"W345","DOI":"10.1093\/nar\/gkm391","article-title":"CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine","volume":"35","author":"Kong","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B30","first-page":"5073","article-title":"Context effects and inefficient initiation at non-aug codons in eucaryotic cell-free translation systems","volume":"9","author":"Kozak","year":"1989","journal-title":"Genome Res."},{"key":"2023051307185077000_btv480-B31","doi-asserted-by":"crossref","first-page":"2482","DOI":"10.1093\/emboj\/16.9.2482","article-title":"Recognition of aug and alternative initiator codons is augmented by g in position +4 but is not generally affected by the nucleotides in positions +5 and +6","volume":"16","author":"Kozak","year":"1997","journal-title":"EMBO J."},{"key":"2023051307185077000_btv480-B32","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/S0378-1119(99)00210-3","article-title":"Initiation of translation in prokaryotes and eukaryotes","volume":"234","author":"Kozak","year":"1999","journal-title":"Gene"},{"key":"2023051307185077000_btv480-B33","doi-asserted-by":"crossref","first-page":"e137","DOI":"10.1093\/nar\/gkt426","article-title":"CoRAL: predicting non-coding RNAs from small RNA-sequencing data","volume":"41","author":"Leung","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B34","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1186\/1471-2105-15-311","article-title":"PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme","volume":"15","author":"Li","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023051307185077000_btv480-B35","doi-asserted-by":"crossref","first-page":"i275","DOI":"10.1093\/bioinformatics\/btr209","article-title":"Phylocsf: a comparative genomics method to distinguish protein coding and non-coding regions","volume":"27","author":"Lin","year":"2011","journal-title":"Bioinformatics"},{"key":"2023051307185077000_btv480-B36","doi-asserted-by":"crossref","first-page":"4333","DOI":"10.1105\/tpc.112.102855","article-title":"Genome-wide analysis uncovers regulation of long intergenic noncoding RNAs in Arabidopsis","volume":"24","author":"Liu","year":"2012","journal-title":"Plant Cell"},{"key":"2023051307185077000_btv480-B37","doi-asserted-by":"crossref","first-page":"e76387","DOI":"10.1371\/journal.pone.0076387","article-title":"Inheritable and precise large genomic deletions of non-coding RNA genes in zebrafish using TALENs","volume":"8","author":"Liu","year":"2013","journal-title":"PLoS One"},{"key":"2023051307185077000_btv480-B38","doi-asserted-by":"crossref","first-page":"bar009","DOI":"10.1093\/database\/bar009","article-title":"UniProt knowledgebase: a hub of integrated protein data","volume":"2011","author":"Magrane","year":"2011","journal-title":"Database"},{"key":"2023051307185077000_btv480-B39","doi-asserted-by":"crossref","first-page":"W327","DOI":"10.1093\/nar\/gkh454","article-title":"CD-Search: protein domain annotations on the fly","volume":"32","author":"Marchler-Bauer","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B40","doi-asserted-by":"crossref","first-page":"D348","DOI":"10.1093\/nar\/gks1243","article-title":"CDD: conserved domains and protein three-dimensional structure","volume":"41","author":"Marchler-Bauer","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B41","volume-title":"version 7.10.0 (R2010a)","author":"MATLAB","year":"2010"},{"key":"2023051307185077000_btv480-B42","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.1093\/bioinformatics\/btl024","article-title":"Thermodynamics of RNA-RNA binding","volume":"22","author":"Muckstein","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051307185077000_btv480-B43","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nature01266","article-title":"Analysis of the mouse transcriptome based on functional annotation of 60\u2009770 full-length cDNAs","volume":"420","author":"Okazaki","year":"2002","journal-title":"Nature"},{"key":"2023051307185077000_btv480-B44","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1038\/nrg2904","article-title":"Non-coding RNAs as regulators of embryogenesis","volume":"12","author":"Pauli","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023051307185077000_btv480-B45","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1101\/gr.133009.111","article-title":"Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis","volume":"22","author":"Pauli","year":"2012","journal-title":"Genome Res."},{"key":"2023051307185077000_btv480-B46","doi-asserted-by":"crossref","first-page":"1159, 1161","DOI":"10.1126\/science.337.6099.1159","article-title":"Genomics. ENCODE project writes eulogy for junk DNA","volume":"337","author":"Pennisi","year":"2012","journal-title":"Science"},{"key":"2023051307185077000_btv480-B47","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1101\/gad.1484207","article-title":"Eukaryotic regulatory RNAs: an answer to the \u2019genome complexity\u2019 conundrum","volume":"21","author":"Prasanth","year":"2007","journal-title":"Genes Dev."},{"key":"2023051307185077000_btv480-B48","author":"Probost","year":"2000"},{"key":"2023051307185077000_btv480-B49","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1093\/nar\/gkl842","article-title":"NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"35","author":"Pruitt","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B50","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1093\/nar\/gkr1065","article-title":"The Pfam protein families database","volume":"40","author":"Punta","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B51","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1000844","article-title":"The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies","volume":"6","author":"Schloss","year":"2010","journal-title":"PLoS Comput. Biol."},{"key":"2023051307185077000_btv480-B52","first-page":"201","article-title":"Biological applications of support vector machines","volume":"1","author":"Shaw","year":"2008","journal-title":"Nat. Educ."},{"key":"2023051307185077000_btv480-B53","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/bioinformatics\/btm098","article-title":"UniRef: comprehensive and non-redundant UniProt reference clusters","volume":"23","author":"Suzek","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051307185077000_btv480-B54","first-page":"1341","article-title":"Feature selection with ensembles, artificial variables, and redundancy elimination","volume":"10","author":"Tuv","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"2023051307185077000_btv480-B55","doi-asserted-by":"crossref","first-page":"3623","DOI":"10.1093\/nar\/gkt1386","article-title":"Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages","volume":"42","author":"Vasquez","year":"2014","journal-title":"Nucl. Acids Res."},{"key":"2023051307185077000_btv480-B56","doi-asserted-by":"crossref","first-page":"e74","DOI":"10.1093\/nar\/gkt006","article-title":"CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model","volume":"41","author":"Wang","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B57","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1016\/j.tcb.2011.04.001","article-title":"Long noncoding RNAs and human disease","volume":"21","author":"Wapinski","year":"2011","journal-title":"Trends Cell Biol."},{"key":"2023051307185077000_btv480-B58","doi-asserted-by":"crossref","first-page":"1494","DOI":"10.1101\/gad.1800909","article-title":"Long noncoding RNAs: functional surprises from the RNA world","volume":"23","author":"Wilusz","year":"2009","journal-title":"Genes Dev."},{"key":"2023051307185077000_btv480-B59","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1093\/bioinformatics\/btn583","article-title":"Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature","volume":"25","author":"Wu","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051307185077000_btv480-B60","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1093\/nar\/gkn917","article-title":"Identification of protein-coding sequences using the hybridization of 18S rRNA and mRNA during translation","volume":"37","author":"Xing","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023051307185077000_btv480-B61","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1038\/cr.2010.25","article-title":"Length of the ORF, position of the first AUG and the Kozak motif are important factors in potential dual-coding transcripts","volume":"20","author":"Xu","year":"2010","journal-title":"Cell Res."},{"key":"2023051307185077000_btv480-B62","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1186\/1471-2105-12-198","article-title":"HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors","volume":"12","author":"Zhang","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023051307185077000_btv480-B63","article-title":"MetaDomain: a profile HMM-based protein domain classification tool for short sequences","volume-title":"Proceedings of Pacific Symposium on Biocomputing (PSB)","author":"Zhang","year":"2012"},{"key":"2023051307185077000_btv480-B64","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1093\/bioinformatics\/btt357","article-title":"A Sensitive and Accurate protein domain cLassification Tool (SALT) for short reads","volume":"29","author":"Zhang","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051307185077000_btv480-B65","doi-asserted-by":"crossref","first-page":"e1003737","DOI":"10.1371\/journal.pcbi.1003737","article-title":"A scalable and accurate targeted gene assembly tool (SAT-Assembler) for next-generation sequencing data","volume":"10","author":"Zhang","year":"2014","journal-title":"PLoS Comput. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/24\/3897\/50307026\/bioinformatics_31_24_3897.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/24\/3897\/50307026\/bioinformatics_31_24_3897.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,13]],"date-time":"2023-05-13T07:20:50Z","timestamp":1683962450000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/24\/3897\/196877"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,8,26]]},"references-count":65,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2015,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv480","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,12,15]]},"published":{"date-parts":[[2015,8,26]]}}}