{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T14:30:46Z","timestamp":1774708246992,"version":"3.50.1"},"reference-count":67,"publisher":"Oxford University Press (OUP)","issue":"16","license":[{"start":{"date-parts":[[2022,6,24]],"date-time":"2022-06-24T00:00:00Z","timestamp":1656028800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Australian Government Research Training Program (RTP) Scholarship"},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01GM076485"],"award-info":[{"award-number":["R01GM076485"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions but seldom address the much more difficult (and practical) inter-family problem.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We demonstrate that it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modelled after structure mapping data that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalization despite the widespread assumption in the literature and provide strong evidence that many existing learning-based models have not generalized inter-family.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code and data are available at https:\/\/github.com\/marcellszi\/dl-rna.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac415","type":"journal-article","created":{"date-parts":[[2022,6,24]],"date-time":"2022-06-24T09:31:44Z","timestamp":1656063104000},"page":"3892-3899","source":"Crossref","is-referenced-by-count":77,"title":["Deep learning models for RNA secondary structure prediction (probably) do not generalize across families"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0672-8222","authenticated-orcid":false,"given":"Marcell","family":"Szikszai","sequence":"first","affiliation":[{"name":"Department of Computer Science & Software Engineering, The University of Western Australia , Perth, WA 6009, Australia"}]},{"given":"Michael","family":"Wise","sequence":"additional","affiliation":[{"name":"Department of Computer Science & Software Engineering, The University of Western Australia , Perth, WA 6009, Australia"},{"name":"The Marshall Centre for Infectious Diseases Research and Training, The University of Western Australia , Perth, WA 6009, Australia"}]},{"given":"Amitava","family":"Datta","sequence":"additional","affiliation":[{"name":"Department of Computer Science & Software Engineering, The University of Western Australia , Perth, WA 6009, Australia"}]},{"given":"Max","family":"Ward","sequence":"additional","affiliation":[{"name":"Department of Computer Science & Software Engineering, The University of Western Australia , Perth, WA 6009, Australia"},{"name":"Department of Molecular and Cellular Biology, Harvard University , Cambridge, MA 02138, USA"}]},{"given":"David H","family":"Mathews","sequence":"additional","affiliation":[{"name":"Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester , Rochester, NY 14642, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,6,24]]},"reference":[{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","first-page":"i19","article-title":"Efficient parameter estimation for RNA secondary structure prediction","volume":"23","author":"Andronescu","year":"2007","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"340","DOI":"10.1186\/1471-2105-9-340","article-title":"RNA STRAND: the RNA secondary structure and statistical analysis database","volume":"9","author":"Andronescu","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"2304","DOI":"10.1261\/rna.1950510","article-title":"Computational approaches for RNA energy parameter estimation","volume":"16","author":"Andronescu","year":"2010","journal-title":"RNA"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1007\/978-1-62703-709-9_14","article-title":"RNA structural alignments, part II: non-Sankoff approaches for structural alignments","volume":"1097","author":"Asai","year":"2014","journal-title":"Methods Mol. Biol. (Clifton, NJ)"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"1218","DOI":"10.1093\/bioinformatics\/btaa944","article-title":"RNANet: an automatically built dual-source dataset integrating homologous sequences and RNA structures","volume":"37","author":"Becquey","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1093\/nar\/26.1.351","article-title":"The ribonuclease P database","volume":"26","author":"Brown","year":"1998","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/1471-2105-3-2","article-title":"The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs","volume":"3","author":"Cannone","year":"2002","journal-title":"BMC Bioinformatics"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/82816","article-title":"RNA: versatility in form and function","volume":"7","author":"Caprara","year":"2000","journal-title":"Nat. Struct. Biol"},{"key":"2023041408464871100_","volume-title":"International Conference on Learning Representations.","author":"Chen","year":"2019"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"5381","DOI":"10.1093\/nar\/gky285","article-title":"bpRNA: large-scale automated annotation and analysis of RNA secondary structure","volume":"46","author":"Danaee","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1073\/pnas.0806929106","article-title":"Accurate SHAPE-directed RNA structure determination","volume":"106","author":"Deigan","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"e35","DOI":"10.1093\/nar\/gkw1094","article-title":"A high-throughput approach to profile RNA structure","volume":"45","author":"Delli Ponti","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"e90","DOI":"10.1093\/bioinformatics\/btl246","article-title":"CONTRAfold: RNA secondary structure prediction without physics-based models","volume":"22","author":"Do","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1038\/418222a","article-title":"The chemical repertoire of natural ribozymes","volume":"418","author":"Doudna","year":"2002","journal-title":"Nature"},{"key":"2023041408464871100_","author":"Flamm","year":"2021"},{"key":"2023041408464871100_","article-title":"UFold: fast and accurate RNA secondary structure prediction with deep learning","author":"Fu","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"D121","DOI":"10.1093\/nar\/gki081","article-title":"Rfam: annotating non-coding RNAs in complete genomes","volume":"33","author":"Griffiths-Jones","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"5498","DOI":"10.1073\/pnas.1219988110","article-title":"Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots","volume":"110","author":"Hajdin","year":"2013","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1007\/978-1-62703-709-9_13","article-title":"RNA structural alignments, part I: Sankoff-based approaches for structural alignments","volume":"1097","author":"Havgaard","year":"2014","journal-title":"Methods Mol. Biol. (Clifton, NJ"},{"key":"2023041408464871100_","first-page":"770","author":"He","year":"2016"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1007\/978-1-62703-709-9_4","article-title":"Energy-directed RNA structure prediction","volume":"1097","author":"Hofacker","year":"2014","journal-title":"Methods in Molecular Biology (Clifton, NJ)"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"D159","DOI":"10.1093\/nar\/gkn772","article-title":"tRNAdb 2009: compilation of tRNA sequences and tRNA genes","volume":"37","author":"J\u00fchling","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"D192","DOI":"10.1093\/nar\/gkaa1047","article-title":"Rfam 14: expanded coverage of metagenomic, viral and microRNA families","volume":"49","author":"Kalvari","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","author":"Kingma","year":"2017"},{"key":"2023041408464871100_","first-page":"319","volume-title":"Shape, Contour and Grouping in Computer Vision, Lecture Notes in Computer Science","author":"LeCun","year":"1999"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1073\/pnas.1313039111","article-title":"RNA design rules from a massive open laboratory","volume":"111","author":"Lee","year":"2014","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041408464871100_","first-page":"281","volume-title":"RNA 3D Structure Analysis and Prediction, Nucleic Acids and Molecular Biology","author":"Leontis","year":"2012"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1186\/1748-7188-6-26","article-title":"ViennaRNA package 2.0","volume":"6","author":"Lorenz","year":"2011","journal-title":"Algorithms Mol. Biol"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"919","DOI":"10.1007\/978-3-540-27836-8_77","volume-title":"Automata, Languages and Programming, Lecture Notes in Computer Science","author":"Lyngs\u00f8","year":"2004"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.ymeth.2019.04.003","article-title":"How to benchmark RNA secondary structure prediction accuracy","volume":"162\u2013163","author":"Mathews","year":"2019","journal-title":"Methods (San Diego, CA)"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1006\/jmbi.1999.2700","article-title":"Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure","volume":"288","author":"Mathews","year":"1999","journal-title":"J. Mol. Biol"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"7287","DOI":"10.1073\/pnas.0401799101","article-title":"Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure","volume":"101","author":"Mathews","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"11.2.1","DOI":"10.1002\/cpnc.19","article-title":"RNA secondary structure prediction","volume":"67","author":"Mathews","year":"2016","journal-title":"Curr. Protoc. Nucleic Acid Chem"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"4223","DOI":"10.1021\/ja043822v","article-title":"RNA structure analysis at single nucleotide resolution by selective 2\u2032-hydroxyl acylation and primer extension (SHAPE)","volume":"127","author":"Merino","year":"2005","journal-title":"J. Am. Chem. Soc"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"982","DOI":"10.1261\/rna.075341.120","article-title":"RNA-Puzzles round IV: 3D structure predictions of four ribozymes and two aptamers","volume":"26","author":"Miao","year":"2020","journal-title":"RNA"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/0378-1119(89)90026-7","article-title":"Comparative and functional anatomy of group II catalytic introns \u2013 a review","volume":"82","author":"Michel","year":"1989","journal-title":"Gene"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"2933","DOI":"10.1093\/bioinformatics\/btt509","article-title":"Infernal 1.1: 100-fold faster RNA homology searches","volume":"29","author":"Nawrocki","year":"2013","journal-title":"Bioinformatics"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1186\/1471-2105-11-129","article-title":"RNAstructure: software for RNA secondary structure prediction and analysis","volume":"11","author":"Reuter","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.4161\/rna.24971","article-title":"The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective","volume":"10","author":"Rivas","year":"2013","journal-title":"RNA Biol"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1261\/rna.030049.111","article-title":"A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more","volume":"18","author":"Rivas","year":"2012","journal-title":"RNA (New York, NY)"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1093\/nar\/gkg107","article-title":"SRPDB: signal recognition particle database","volume":"31","author":"Rosenblad","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1038\/s41467-021-21194-4","article-title":"RNA secondary structure prediction using deep learning with thermodynamic integration","volume":"12","author":"Sato","year":"2021","journal-title":"Nat. Commun"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"2673","DOI":"10.1109\/78.650093","article-title":"Bidirectional recurrent neural networks","volume":"45","author":"Schuster","year":"1997","journal-title":"IEEE Trans. Signal Process"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1007\/978-1-61779-949-5_8","article-title":"RNA structure prediction: an overview of methods","volume":"905","author":"Seetin","year":"2012","journal-title":"Methods Mol. Biol. (Clifton, NJ)"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"776","DOI":"10.1038\/nrg2172","article-title":"Ribozymes, riboswitches and beyond: Regulation of gene expression without proteins","volume":"8","author":"Serganov","year":"2007","journal-title":"Nat. Rev. Genet"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/j.sbi.2007.03.001","article-title":"Bridging the gap in RNA structure prediction","volume":"17","author":"Shapiro","year":"2007","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"5407","DOI":"10.1038\/s41467-019-13395-9","article-title":"RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning","volume":"10","author":"Singh","year":"2019","journal-title":"Nat. Commun"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"1808","DOI":"10.1261\/rna.053694.115","article-title":"Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures","volume":"22","author":"Sloma","year":"2016","journal-title":"RNA (New York, NY)"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1038\/s41576-019-0150-2","article-title":"RNA sequencing: the teenage years","volume":"20","author":"Stark","year":"2019","journal-title":"Nat. Rev. Genet"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"2807","DOI":"10.1093\/nar\/gks1283","article-title":"Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions","volume":"41","author":"S\u00fck\u00f6sd","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1093\/nar\/28.1.166","article-title":"5S ribosomal RNA database Y2K","volume":"28","author":"Szymanski","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"11570","DOI":"10.1093\/nar\/gkx815","article-title":"TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs","volume":"45","author":"Tan","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1006\/jmbi.1999.3001","article-title":"How RNA folds","volume":"293","author":"Tinoco","year":"1999","journal-title":"J. Mol. Biol"},{"key":"2023041408464871100_","author":"Tompson","year":"2015"},{"key":"2023041408464871100_","author":"Vaswani","year":"2017"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"143","DOI":"10.3389\/fgene.2019.00143","article-title":"DMfold: a novel method to predict RNA secondary structure with pseudoknots based on deep learning and improved base pair maximization principle","volume":"10","author":"Wang","year":"2019","journal-title":"Front. Genet"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1186\/s12859-021-04102-x","article-title":"A novel end-to-end method to predict RNA secondary structure profile based on bidirectional LSTM and residual neural network","volume":"22","author":"Wang","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"8541","DOI":"10.1093\/nar\/gkx512","article-title":"Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best","volume":"45","author":"Ward","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023041408464871100_","first-page":"4298","article-title":"Determining parameters for non-linear models of multi-loop free energy change","volume":"35","author":"Ward","year":"2019","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023041408464871100_","author":"Wayment-Steele","year":"2021"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"1610","DOI":"10.1038\/nprot.2006.249","article-title":"Selective 2\u2032-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution","volume":"1","author":"Wilkinson","year":"2006","journal-title":"Nat. Protoc"},{"key":"2023041408464871100_","first-page":"1306","article-title":"Phylogenetic analysis of tmRNA secondary structure","volume":"2","author":"Williams","year":"1996","journal-title":"RNA"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1515\/cmb-2020-0002","article-title":"Improving RNA secondary structure prediction via state inference with deep recurrent neural networks","volume":"8","author":"Willmott","year":"2020","journal-title":"Comput. Math. Biophys"},{"key":"2023041408464871100_","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1093\/nar\/gkg019","article-title":"tmRDB (tmRNA database)","volume":"31","author":"Zwieb","year":"2003","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac415\/44395253\/btac415.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/16\/3892\/49889999\/btac415.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/16\/3892\/49889999\/btac415.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,23]],"date-time":"2023-11-23T13:14:43Z","timestamp":1700745283000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/16\/3892\/6617348"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,6,24]]},"references-count":67,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2022,8,10]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac415","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.03.21.485135","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,8,15]]},"published":{"date-parts":[[2022,6,24]]}}}