{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T22:03:39Z","timestamp":1765231419181,"version":"3.41.2"},"reference-count":37,"publisher":"Public Library of Science (PLoS)","issue":"7","license":[{"start":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T00:00:00Z","timestamp":1752537600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R35- GM122562"],"award-info":[{"award-number":["R35- GM122562"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DMS-215177 and DMS-2330628)"],"award-info":[{"award-number":["DMS-215177 and DMS-2330628)"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Philip-Morris USA Inc"},{"DOI":"10.13039\/100000893","name":"Simons Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000893","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Identifying novel and functional RNA structures remains a significant challenge in RNA motif design and is crucial for developing RNA-based therapeutics. Here we introduce a computational topology-based approach with unsupervised machine-learning algorithms to estimate the database size and content of RNA-like graph topologies. Specifically, we apply graph theory enumeration to generate all 110,667 possible 2D dual graphs for vertex numbers ranging from 2 to 9. Among them, only 0.11% (121 dual graphs) correspond to approximately 200,000 known RNA atomic fragments\/substructures (collected in 2021) using the RNA-as-Graphs (RAG) framework. The remaining 99.89% of the dual graphs may be RNA-like or non-RNA-like. To determine which dual graphs in the 99.89% hypothetical set are more likely to be associated with RNA structures, we apply computational topology descriptors using the Persistent Spectral Graphs (PSG) method to characterize each graph using 19 PSG-based features and use clustering algorithms that partition all possible dual graphs into two clusters. The cluster with the higher percentage of known dual graphs for RNA is defined as the \u201cRNA-like\" cluster, while the other is considered as \u201cnon-RNA-like\". The distance between each dual graph and the center of the RNA-like cluster represents the likelihood of it belonging to RNA structures. From validation, our PSG-based RNA-like cluster includes 97.3% of the 121 known RNA dual graphs, suggesting good performance. Furthermore, 46.017% of the hypothetical RNAs are predicted to be RNA-like. Among the top 15 graphs identified as high-likelihood candidates for novel RNA motifs, 4 were confirmed from the RNA dataset collected in 2022. Significantly, we observe that all the top 15 RNA-like dual graphs can be separated into multiple subgraphs, whereas the top 15 non-RNA-like dual graphs tend not to have any subgraphs (subgraphs preserve pseudoknots and junctions). Moreover, a significant topological difference between top RNA-like and non-RNA-like graphs is evident when comparing their topological features (e.g., Betti-0 and Betti-1 numbers). These findings provide valuable insights into the size of the RNA motif universe and RNA design strategies, offering a novel framework for predicting RNA graph topologies and guiding the discovery of novel RNA motifs, perhaps anti-viral therapeutics by subgraph assembly.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1013230","type":"journal-article","created":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T17:39:16Z","timestamp":1752601156000},"page":"e1013230","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":2,"title":["How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors"],"prefix":"10.1371","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7402-6372","authenticated-orcid":true,"given":"Rui","family":"Wang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2392-2062","authenticated-orcid":true,"given":"Tamar","family":"Schlick","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2025,7,15]]},"reference":[{"volume-title":"Biochemistry, RNA structure","year":"2023","author":"D Wang","key":"pcbi.1013230.ref001"},{"issue":"10","key":"pcbi.1013230.ref002","doi-asserted-by":"crossref","first-page":"1193","DOI":"10.1038\/s41592-022-01623-y","article-title":"Advances and opportunities in RNA structure experimental determination and computational modeling","volume":"19","author":"J Zhang","year":"2022","journal-title":"Nat Methods"},{"key":"pcbi.1013230.ref003","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1007\/11732990_12","article-title":"RNA secondary structure prediction via energy density minimization.","volume-title":"Research in Computational Molecular Biology: 10th Annual International Conference, RECOMB 2006, Venice, Italy, April 2\u20135, 2006. Proceedings 10","author":"C Alkan","year":"2006"},{"key":"pcbi.1013230.ref004","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1186\/1471-2105-11-129","article-title":"RNAstructure: software for RNA secondary structure prediction and analysis","volume":"11","author":"JS Reuter","year":"2010","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"pcbi.1013230.ref005","article-title":"UFold: fast and accurate RNA secondary structure prediction with deep learning","volume":"50","author":"L Fu","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1013230.ref006","doi-asserted-by":"crossref","first-page":"869601","DOI":"10.3389\/fmolb.2022.869601","article-title":"Deep learning in RNA structure studies","volume":"9","author":"H Yu","year":"2022","journal-title":"Front Mol Biosci"},{"issue":"5","key":"pcbi.1013230.ref007","doi-asserted-by":"crossref","first-page":"1129","DOI":"10.1016\/j.jmb.2004.06.054","article-title":"Candidates for novel RNA topologies","volume":"341","author":"N Kim","year":"2004","journal-title":"J Mol Biol"},{"issue":"6","key":"pcbi.1013230.ref008","doi-asserted-by":"crossref","first-page":"129534","DOI":"10.1016\/j.bbagen.2020.129534","article-title":"Identification of novel RNA design candidates by clustering the extended RNA-as-graphs library","volume":"1864","author":"S Jain","year":"2020","journal-title":"Biochim Biophys Acta Gen Subj"},{"issue":"19","key":"pcbi.1013230.ref009","doi-asserted-by":"crossref","first-page":"9474","DOI":"10.1093\/nar\/gkv823","article-title":"RAG-3D: a search tool for RNA 3D substructures","volume":"43","author":"M Zahran","year":"2015","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"pcbi.1013230.ref010","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1021\/acs.jpcb.0c10685","article-title":"A fiedler vector scoring approach for novel RNA motif selection","volume":"125","author":"Q Zhu","year":"2021","journal-title":"J Phys Chem B"},{"issue":"9","key":"pcbi.1013230.ref011","doi-asserted-by":"crossref","DOI":"10.1002\/cnm.3376","article-title":"Persistent spectral graph","volume":"36","author":"R Wang","year":"2020","journal-title":"Int J Numer Method Biomed Eng"},{"issue":"1","key":"pcbi.1013230.ref012","doi-asserted-by":"crossref","first-page":"67","DOI":"10.3934\/fods.2021006","article-title":"Hermes: persistent spectral graph software","volume":"3","author":"R Wang","year":"2021","journal-title":"Found Data Sci"},{"key":"pcbi.1013230.ref013","article-title":"Persistent Laplacian projected Omicron BA. 4 and BA. 5 to become new dominating variants","author":"J Chen","year":"2022","journal-title":"arXiv preprint"},{"issue":"7","key":"pcbi.1013230.ref014","doi-asserted-by":"crossref","first-page":"2405","DOI":"10.1021\/acs.jcim.3c01023","article-title":"PLPCA: Persistent Laplacian-Enhanced PCA for microarray data analysis","volume":"64","author":"S Cottrell","year":"2024","journal-title":"J Chem Inf Model"},{"key":"pcbi.1013230.ref015","doi-asserted-by":"crossref","first-page":"115842","DOI":"10.1016\/j.cam.2024.115842","article-title":"Analyzing single cell RNA sequencing with topological nonnegative matrix factorization","volume":"445","author":"Y Hozumi","year":"2024","journal-title":"J Comput Appl Math"},{"issue":"16","key":"pcbi.1013230.ref016","doi-asserted-by":"crossref","first-page":"9249","DOI":"10.3390\/ijms23169249","article-title":"RNA-as-graphs motif atlas-dual graph library of RNA modules and viral frameshifting-element applications","volume":"23","author":"Q Zhu","year":"2022","journal-title":"Int J Mol Sci"},{"issue":"11","key":"pcbi.1013230.ref017","doi-asserted-by":"crossref","first-page":"2926","DOI":"10.1093\/nar\/gkg365","article-title":"Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design","volume":"31","author":"HH Gan","year":"2003","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"pcbi.1013230.ref018","doi-asserted-by":"crossref","first-page":"1040","DOI":"10.1016\/j.bpj.2020.10.012","article-title":"Structure-altering mutations of the SARS-CoV-2 frameshifting RNA element","volume":"120","author":"T Schlick","year":"2021","journal-title":"Biophys J"},{"issue":"30","key":"pcbi.1013230.ref019","doi-asserted-by":"crossref","first-page":"11404","DOI":"10.1021\/jacs.1c03003","article-title":"To knot or not to knot: multiple conformations of the SARS-CoV-2 frameshifting RNA element","volume":"143","author":"T Schlick","year":"2021","journal-title":"J Am Chem Soc"},{"issue":"1","key":"pcbi.1013230.ref020","doi-asserted-by":"crossref","first-page":"4284","DOI":"10.1038\/s41467-022-31353-w","article-title":"Length-dependent motions of SARS-CoV-2 frameshifting RNA pseudoknot and alternative conformations suggest avenues for frameshifting suppression","volume":"13","author":"S Yan","year":"2022","journal-title":"Nat Commun"},{"issue":"20","key":"pcbi.1013230.ref021","article-title":"Evolution of coronavirus frameshifting elements: competing stem networks explain conservation and variability","volume":"120","author":"S Yan","year":"2023","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1013230.ref022","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1016\/j.ymeth.2019.03.022","article-title":"An extended dual graph library and partitioning algorithm applicable to pseudoknotted RNA structures","author":"S Jain","year":"2019","journal-title":"Methods"},{"issue":"3","key":"pcbi.1013230.ref023","doi-asserted-by":"crossref","first-page":"107438","DOI":"10.1016\/j.jsb.2019.107438","article-title":"Inverse folding with RNA-As-Graphs produces a large pool of candidate sequences with target topologies","volume":"209","author":"S Jain","year":"2020","journal-title":"J Struct Biol"},{"issue":"10","key":"pcbi.1013230.ref024","doi-asserted-by":"crossref","first-page":"1295","DOI":"10.1002\/jcc.20057","article-title":"An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots","volume":"25","author":"RM Dirks","year":"2004","journal-title":"J Comput Chem"},{"issue":"13","key":"pcbi.1013230.ref025","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btr215","article-title":"IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming","volume":"27","author":"K Sato","year":"2011","journal-title":"Bioinformatics"},{"issue":"5","key":"pcbi.1013230.ref026","doi-asserted-by":"crossref","first-page":"2053","DOI":"10.1006\/jmbi.1998.2436","article-title":"A dynamic programming algorithm for RNA structure prediction including pseudoknots","volume":"285","author":"E Rivas","year":"1999","journal-title":"J Mol Biol"},{"issue":"1","key":"pcbi.1013230.ref027","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1007\/s00357-015-9167-1","article-title":"Feature relevance in ward\u2019s hierarchical clustering using the L p norm","volume":"32","author":"RC de Amorim","year":"2015","journal-title":"J Classif"},{"key":"pcbi.1013230.ref028","article-title":"UMAP: Uniform manifold approximation and projection for dimension reduction","author":"L McInnes","year":"2018","journal-title":"arXiv preprint"},{"key":"pcbi.1013230.ref029","first-page":"70","article-title":"The dip test of unimodality","author":"JA Hartigan","year":"1985","journal-title":"The Annals of Statistics"},{"key":"pcbi.1013230.ref030","first-page":"515","article-title":"On the calibration of Silverman\u2019s test for multimodality","author":"P Hall","year":"2001","journal-title":"Statistica Sinica"},{"issue":"9","key":"pcbi.1013230.ref031","article-title":"RNA graph partitioning for the discovery of RNA modularity: a novel application of graph partition algorithm to biology","volume":"9","author":"N Kim","year":"2014","journal-title":"PLoS One"},{"issue":"8","key":"pcbi.1013230.ref032","doi-asserted-by":"crossref","first-page":"1166","DOI":"10.1093\/bioinformatics\/bts091","article-title":"Unipro UGENE: a unified bioinformatics toolkit","volume":"28","author":"K Okonechnikov","year":"2012","journal-title":"Bioinformatics"},{"issue":"2","key":"pcbi.1013230.ref033","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1093\/nar\/gkr629","article-title":"Predicting coaxial helical stacking in RNA junctions","volume":"40","author":"C Laing","year":"2012","journal-title":"Nucleic Acids Res"},{"issue":"11","key":"pcbi.1013230.ref034","doi-asserted-by":"crossref","first-page":"4079","DOI":"10.1073\/pnas.1318893111","article-title":"Graph-based sampling for approximating global helical topologies of RNA","volume":"111","author":"N Kim","year":"2014","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1013230.ref035","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1016\/j.jmb.2015.10.009","article-title":"Predicting large RNA-like topologies by a knowledge-based clustering approach","volume":"428","author":"N Baba","year":"2016","journal-title":"J Mol Biol"},{"issue":"4","key":"pcbi.1013230.ref036","article-title":"Heterogeneous and multiple conformational transition pathways between pseudoknots of the SARS-CoV-2 frameshift element","volume":"122","author":"S Yan","year":"2025","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"11","key":"pcbi.1013230.ref037","doi-asserted-by":"crossref","first-page":"1437","DOI":"10.1261\/rna.080035.124","article-title":"Abolished frameshifting for predicted structure-stabilizing SARS-CoV-2 mutants: implications to alternative conformations and their statistical structural analyses","volume":"30","author":"A Dey","year":"2024","journal-title":"RNA"}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013230","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T17:39:27Z","timestamp":1752601167000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013230"}},"subtitle":[],"editor":[{"given":"Mingfu","family":"Shao","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,7,15]]},"references-count":37,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7,15]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1013230","relation":{},"ISSN":["1553-7358"],"issn-type":[{"type":"electronic","value":"1553-7358"}],"subject":[],"published":{"date-parts":[[2025,7,15]]}}}