{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T12:16:40Z","timestamp":1774527400607,"version":"3.50.1"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,12,26]],"date-time":"2022-12-26T00:00:00Z","timestamp":1672012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Recent experimental evidence has shown that some long non-coding RNAs (lncRNAs) contain small open reading frames (sORFs) that are translated into functional micropeptides, suggesting that these lncRNAs are misannotated as non-coding. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (Ribo-Seq) and mass-spectrometry experiments, which are cell-type dependent and expensive.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we propose a computational method to identify possible misannotated lncRNAs from sequence information alone. Our approach first builds deep learning models to discriminate coding and non-coding transcripts and leverages these models\u2019 training dynamics to identify misannotated lncRNAs\u2014i.e. lncRNAs with coding potential. The set of misannotated lncRNAs we identified significantly overlap with experimentally validated ones and closely resemble coding protein sequences as evidenced by significant BLAST hits. Our analysis on a subset of misannotated lncRNA candidates also shows that some ORFs they contain yield high confidence folded structures as predicted by AlphaFold2. This methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Source code is available at https:\/\/github.com\/nabiafshan\/DetectingMisannotatedLncRNAs.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac821","type":"journal-article","created":{"date-parts":[[2022,12,24]],"date-time":"2022-12-24T00:38:38Z","timestamp":1671842318000},"source":"Crossref","is-referenced-by-count":12,"title":["Discovering misannotated lncRNAs using deep learning training dynamics"],"prefix":"10.1093","volume":"39","author":[{"given":"Afshan","family":"Nabi","sequence":"first","affiliation":[{"name":"Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, Turkey"}]},{"given":"Berke","family":"Dilekoglu","sequence":"additional","affiliation":[{"name":"Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, Turkey"}]},{"given":"Ogun","family":"Adebali","sequence":"additional","affiliation":[{"name":"Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, Turkey"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7058-5372","authenticated-orcid":false,"given":"Oznur","family":"Tastan","sequence":"additional","affiliation":[{"name":"Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, Turkey"}]}],"member":"286","published-online":{"date-parts":[[2022,12,26]]},"reference":[{"key":"2023010806160337300_btac821-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023010806160337300_btac821-B2","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1016\/j.cell.2015.01.009","article-title":"A micropeptide encoded by a putative long noncoding RNA regulates muscle performance","volume":"160","author":"Anderson","year":"2015","journal-title":"Cell"},{"key":"2023010806160337300_btac821-B3","doi-asserted-by":"crossref","DOI":"10.1084\/jem.20192009","article-title":"When non-coding is not enough","volume":"217","author":"Anfossi","year":"2020","journal-title":"J. Exp. Med"},{"key":"2023010806160337300_btac821-B4","doi-asserted-by":"crossref","first-page":"e03528","DOI":"10.7554\/eLife.03528","article-title":"Extensive translation of small open reading frames revealed by Poly-Ribo-Seq","volume":"3","author":"Aspden","year":"2014","journal-title":"Elife"},{"key":"2023010806160337300_btac821-B5","doi-asserted-by":"crossref","first-page":"3889","DOI":"10.1093\/bioinformatics\/bty418","article-title":"LncRNAnet: long non-coding RNA identification using deep learning","volume":"34","author":"Baek","year":"2018","journal-title":"Bioinformatics"},{"key":"2023010806160337300_btac821-B6","doi-asserted-by":"crossref","first-page":"1298","DOI":"10.1016\/j.cell.2013.02.012","article-title":"Long noncoding RNAs: cellular address codes in development and disease","volume":"152","author":"Batista","year":"2013","journal-title":"Cell"},{"key":"2023010806160337300_btac821-B7","doi-asserted-by":"crossref","first-page":"981","DOI":"10.1002\/embj.201488411","article-title":"Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation","volume":"33","author":"Bazzini","year":"2014","journal-title":"EMBO J."},{"key":"2023010806160337300_btac821-B8","doi-asserted-by":"crossref","first-page":"lqz024","DOI":"10.1093\/nargab\/lqz024","article-title":"RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences","volume":"2","author":"Camargo","year":"2020","journal-title":"NAR Genom. Bioinform"},{"key":"2023010806160337300_btac821-B9","doi-asserted-by":"crossref","first-page":"a032680","DOI":"10.1101\/cshperspect.a032680","article-title":"Roles of long noncoding RNAs and circular RNAs in translation","volume":"11","author":"Chekulaeva","year":"2019","journal-title":"Cold Spring Harb. Perspect. Biol"},{"key":"2023010806160337300_btac821-B10","doi-asserted-by":"crossref","first-page":"1853","DOI":"10.1093\/bib\/bby055","article-title":"The small peptide world in long noncoding RNAs","volume":"20","author":"Choi","year":"2019","journal-title":"Brief. Bioinform"},{"key":"2023010806160337300_btac821-B11","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1038\/nrm.2017.58","article-title":"Classification and function of small open reading frames","volume":"18","author":"Couso","year":"2017","journal-title":"Nat. Rev. Mol. Cell Biol"},{"key":"2023010806160337300_btac821-B12","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comput. Biol"},{"key":"2023010806160337300_btac821-B13","doi-asserted-by":"crossref","first-page":"1723","DOI":"10.15252\/embr.201540717","article-title":"Myc coordinates transcription and translation to enhance transformation and suppress invasiveness","volume":"16","author":"Elkon","year":"2015","journal-title":"EMBO Rep"},{"key":"2023010806160337300_btac821-B14","doi-asserted-by":"crossref","first-page":"W516","DOI":"10.1093\/nar\/gkz400","article-title":"CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition","volume":"47","author":"Guo","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B15","doi-asserted-by":"crossref","DOI":"10.1128\/MCB.00528-19","article-title":"When long noncoding becomes protein coding","volume":"40","author":"Hartford","year":"2020","journal-title":"Mol. Cell. Biol"},{"key":"2023010806160337300_btac821-B16","doi-asserted-by":"crossref","first-page":"8105","DOI":"10.1093\/nar\/gky567","article-title":"A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential","volume":"46","author":"Hill","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B17","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023010806160337300_btac821-B18","doi-asserted-by":"crossref","first-page":"D65","DOI":"10.1093\/nar\/gkaa791","article-title":"cncRNAdb: a manually curated resource of experimentally supported RNAs with both protein-coding and noncoding function","volume":"49","author":"Huang","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B19","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1038\/nrg3645","article-title":"Ribosome profiling: new views of translation, from single codons to genome scale","volume":"15","author":"Ingolia","year":"2014","journal-title":"Nat. Rev. Genet"},{"key":"2023010806160337300_btac821-B20","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1126\/science.1168978","article-title":"Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling","volume":"324","author":"Ingolia","year":"2009","journal-title":"Science"},{"key":"2023010806160337300_btac821-B21","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1016\/j.cell.2011.10.002","article-title":"Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes","volume":"147","author":"Ingolia","year":"2011","journal-title":"Cell"},{"key":"2023010806160337300_btac821-B22","doi-asserted-by":"crossref","first-page":"e08890","DOI":"10.7554\/eLife.08890","article-title":"Many lncRNAs, 5\u2019UTRs, and pseudogenes are translated and some are likely to express functional proteins","volume":"4","author":"Ji","year":"2015","journal-title":"Elife"},{"key":"2023010806160337300_btac821-B23","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023010806160337300_btac821-B24","doi-asserted-by":"crossref","first-page":"W12","DOI":"10.1093\/nar\/gkx428","article-title":"CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features","volume":"45","author":"Kang","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B25","author":"Kingma","year":"2014"},{"key":"2023010806160337300_btac821-B26","doi-asserted-by":"crossref","first-page":"W345","DOI":"10.1093\/nar\/gkm391","article-title":"CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine","volume":"35","author":"Kong","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B27","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput"},{"key":"2023010806160337300_btac821-B28","first-page":"6765","article-title":"Hyperband: a novel bandit-based approach to hyperparameter optimization","volume":"18","author":"Li","year":"2017","journal-title":"J. Machine Learn. Res"},{"key":"2023010806160337300_btac821-B29","doi-asserted-by":"crossref","first-page":"8111","DOI":"10.1093\/nar\/gkz646","article-title":"A hidden human proteome encoded by \u2018non-coding\u2019 genes","volume":"47","author":"Lu","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B30","doi-asserted-by":"crossref","first-page":"3701","DOI":"10.1016\/j.celrep.2018.05.058","article-title":"MOXI is a mitochondrial micropeptide that enhances fatty acid \u03b2-oxidation","volume":"23","author":"Makarewich","year":"2018","journal-title":"Cell Rep"},{"key":"2023010806160337300_btac821-B31","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1038\/nature21034","article-title":"mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide","volume":"541","author":"Matsumoto","year":"2017","journal-title":"Nature"},{"key":"2023010806160337300_btac821-B32","doi-asserted-by":"crossref","first-page":"1797","DOI":"10.1101\/gr.6761107","article-title":"28-way vertebrate alignment and conservation track in the UCSC genome browser","volume":"17","author":"Miller","year":"2007","journal-title":"Genome Res"},{"key":"2023010806160337300_btac821-B33","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B34","doi-asserted-by":"crossref","first-page":"vbab043","DOI":"10.1093\/bioadv\/vbab043","article-title":"Folding the unfoldable: using AlphaFold to explore spurious proteins","volume":"2","author":"Monzon","year":"2022","journal-title":"Bioinform. Adv"},{"key":"2023010806160337300_btac821-B35","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1126\/science.aad4076","article-title":"A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle","volume":"351","author":"Nelson","year":"2016","journal-title":"Science"},{"key":"2023010806160337300_btac821-B36","author":"Ng","year":"2017"},{"key":"2023010806160337300_btac821-B37","doi-asserted-by":"crossref","first-page":"D497","DOI":"10.1093\/nar\/gkx1130","article-title":"An update on sORFs. org: a repository of small orfs identified by ribosome profiling","volume":"46","author":"Olexiouk","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B38","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1146\/annurev-biochem-051410-092902","article-title":"Genome regulation by long noncoding RNAs","volume":"81","author":"Rinn","year":"2012","journal-title":"Annu. Rev. Biochem"},{"key":"2023010806160337300_btac821-B39","doi-asserted-by":"crossref","first-page":"e03523","DOI":"10.7554\/eLife.03523","article-title":"Long non-coding RNAs as a source of new peptides","volume":"3","author":"Ruiz-Orera","year":"2014","journal-title":"Elife"},{"key":"2023010806160337300_btac821-B40","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nchembio.1120","article-title":"Peptidomic discovery of short open reading frame\u2013encoded peptides in human cells","volume":"9","author":"Slavoff","year":"2013","journal-title":"Nat. Chem. Biol"},{"key":"2023010806160337300_btac821-B41","doi-asserted-by":"crossref","first-page":"7002","DOI":"10.1111\/febs.15845","article-title":"The largely unexplored biology of small proteins in pro-and eukaryotes","volume":"288","author":"Steinberg","year":"2021","journal-title":"FEBS J"},{"key":"2023010806160337300_btac821-B42","doi-asserted-by":"crossref","first-page":"41458","DOI":"10.1038\/srep41458","article-title":"Transcriptomic investigation of wound healing and regeneration in the cnidarian calliactis polypus","volume":"7","author":"Stewart","year":"2017","journal-title":"Sci. Rep"},{"key":"2023010806160337300_btac821-B43","author":"Swayamdipta","year":"2020"},{"key":"2023010806160337300_btac821-B44","doi-asserted-by":"crossref","first-page":"e43","DOI":"10.1093\/nar\/gkz087","article-title":"CPPred: coding potential prediction based on the global description of RNA sequence","volume":"47","author":"Tong","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B45","author":"Tong","year":"2020"},{"key":"2023010806160337300_btac821-B46","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.cell.2013.06.020","article-title":"lincRNAs: genomics, evolution, and mechanisms","volume":"154","author":"Ulitsky","year":"2013","journal-title":"Cell"},{"key":"2023010806160337300_btac821-B47","author":"Vaswani","year":"2017"},{"key":"2023010806160337300_btac821-B48","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"J. Machine Learn. Res"},{"key":"2023010806160337300_btac821-B49","doi-asserted-by":"crossref","first-page":"e74","DOI":"10.1093\/nar\/gkt006","article-title":"CPAT: coding-potential assessment tool using an alignment-free logistic regression model","volume":"41","author":"Wang","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023010806160337300_btac821-B50","doi-asserted-by":"crossref","first-page":"e20190950","DOI":"10.1084\/jem.20190950","article-title":"LNCRNA-encoded polypeptide ASRPS inhibits triple-negative breast cancer angiogenesis","volume":"217","author":"Wang","year":"2020","journal-title":"J. Exp. Med"},{"key":"2023010806160337300_btac821-B51","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1186\/s12859-019-3033-9","article-title":"MiPepid: micropeptide identification tool using machine learning","volume":"20","author":"Zhu","year":"2019","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac821\/48416491\/btac821.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac821\/48521782\/btac821.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac821\/48521782\/btac821.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,8]],"date-time":"2023-01-08T06:16:56Z","timestamp":1673158616000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btac821\/6960922"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,12,26]]},"references-count":51,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac821","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,1,1]]},"published":{"date-parts":[[2022,12,26]]},"article-number":"btac821"}}