{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T12:06:50Z","timestamp":1768133210950,"version":"3.49.0"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T00:00:00Z","timestamp":1741910400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01GM145986"],"award-info":[{"award-number":["R01GM145986"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>RNA secondary structure is often essential to function. Recent work has led to the development of high-throughput experimental probing methods for structure determination. Although structure is more conserved than primary sequence, much of the bioinformatics pipelines to connect RNA structure to function rely on nucleotide sequence alignments rather than structural similarity. There is a need to develop methods for secondary structure comparisons that are also fast and efficient to navigate the vast amounts of structural data. K-mer based similarity approaches are valued for their computational efficiency and have been applied for protein, DNA, and RNA primary sequences. However, these approaches have yet to be implemented for RNA secondary structure.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Our method, bpRNA-CosMoS, fills this gap by using k-mers and length-weighted cosine similarity to compute similarity scores between RNA structures. bpRNA-CosMoS is built upon the bpRNA structure array, which represents the structural category of each nucleotide as a single-character structural code (e.g. hairpin=H, etc.). A structural comparison score is calculated through cosine similarity of the k-mer count vectors, generated from structure arrays. A major challenge with k-mer based methods is that they often ignore the length of the sequences being compared. We have overcome this with a length-weighted penalty that addresses cases of two RNAs of vastly different lengths. In addition, the use of \u201cfuzzy counting\u201d has added some optional flexibility to decrease the negative impact that small structural variations have on the similarity score. This results in a robust and efficient way to identify structural comparisons across large datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The code and application guidelines of bpRNA-CosMoS are made available at github (https:\/\/github.com\/BLasher113\/bpRNA-CosMoS) and Zenodo (10.5281\/zenodo.14715285).<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf108","type":"journal-article","created":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T14:17:02Z","timestamp":1741961822000},"source":"Crossref","is-referenced-by-count":2,"title":["bpRNA-CosMoS: a robust and efficient RNA structural comparison method using k-mer based cosine similarity"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2650-4162","authenticated-orcid":false,"given":"Brittany","family":"Lasher","sequence":"first","affiliation":[{"name":"Department of Biochemistry and Biophysics, Oregon State University, 2011 Agricultural and Life Sciences, 2750 SW Campus Way, Corvallis, Oregon 97331,","place":["USA"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7285-1977","authenticated-orcid":false,"given":"David A","family":"Hendrix","sequence":"additional","affiliation":[{"name":"Department of Biochemistry and Biophysics, Oregon State University, 2011 Agricultural and Life Sciences, 2750 SW Campus Way, Corvallis, Oregon 97331,","place":["USA"]},{"name":"School of Electrical Engineering and Computer Science, Oregon State University, Kelley Egineering Center, 1148, 2461 SW Campus Way, Corvallis, Oregon 97331,","place":["USA"]}]}],"member":"286","published-online":{"date-parts":[[2025,3,14]]},"reference":[{"key":"2025042316111930700_btaf108-B1","first-page":"361","author":"Bastian","year":"2009"},{"key":"2025042316111930700_btaf108-B2","doi-asserted-by":"crossref","first-page":"10717","DOI":"10.1093\/nar\/gkac844","article-title":"rRNA expansion segment 7 in eukaryotes: from signature fold to tentacles","volume":"50","author":"Biesiada","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025042316111930700_btaf108-B3","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat Biotechnol"},{"key":"2025042316111930700_btaf108-B4","doi-asserted-by":"crossref","first-page":"e0258693","DOI":"10.1371\/journal.pone.0258693","article-title":"Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy","volume":"16","author":"Bussi","year":"2021","journal-title":"PLoS One"},{"key":"2025042316111930700_btaf108-B5","doi-asserted-by":"crossref","first-page":"432","DOI":"10.1038\/s41586-020-2249-1","article-title":"RIC-seq for global in situ profiling of RNA\u2013RNA spatial interactions","volume":"582","author":"Cai","year":"2020","journal-title":"Nature"},{"key":"2025042316111930700_btaf108-B6","first-page":"vbad005","article-title":"Snekmer: a scalable pipeline for protein sequence fingerprinting based on amino acid recoding","volume":"3","author":"Chang","year":"2023","journal-title":"Bioinf Adv"},{"key":"2025042316111930700_btaf108-B7","doi-asserted-by":"crossref","first-page":"546","DOI":"10.3390\/biom12040546","article-title":"Developments in algorithms for sequence alignment: a review","volume":"12","author":"Chao","year":"2022","journal-title":"Biomolecules"},{"key":"2025042316111930700_btaf108-B8","doi-asserted-by":"crossref","first-page":"5381","DOI":"10.1093\/nar\/gky285","article-title":"BPRNA: large-scale automated annotation and analysis of RNA secondary structure","volume":"46","author":"Danaee","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2025042316111930700_btaf108-B9","doi-asserted-by":"crossref","first-page":"696","DOI":"10.1038\/nature12756","article-title":"In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features","volume":"505","author":"Ding","year":"2014","journal-title":"Nature"},{"key":"2025042316111930700_btaf108-B10","doi-asserted-by":"crossref","first-page":"854","DOI":"10.1016\/j.molcel.2018.05.001","article-title":"Sequence, structure, and context preferences of human RNA binding proteins","volume":"70","author":"Dominguez","year":"2018","journal-title":"Mol Cell"},{"key":"2025042316111930700_btaf108-B12","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1016\/j.molcel.2016.04.028","article-title":"In vivo mapping of eukaryotic RNA interactomes reveals principles of higher-order organization and regulation molecular cell article In vivo mapping of eukaryotic RNA interactomes reveals principles of higher-order organization and regulation","volume":"62","author":"Ghut","year":"2016","journal-title":"Mol Cell"},{"key":"2025042316111930700_btaf108-B13","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.1093\/bioinformatics\/btw773","article-title":"The super-n-motifs model: a novel alignment-free approach for representing and comparing RNA secondary structures","volume":"33","author":"Glouzon","year":"2017","journal-title":"Bioinformatics"},{"key":"2025042316111930700_btaf108-B14","doi-asserted-by":"publisher","author":"He","DOI":"10.1101\/2024.02.24.581671"},{"key":"2025042316111930700_btaf108-B15","first-page":"159","author":"H\u00f6chsmann","year":"2003"},{"key":"2025042316111930700_btaf108-B16","doi-asserted-by":"crossref","first-page":"10010","DOI":"10.1073\/pnas.1017386108","article-title":"Cross-linking, ligation, and sequencing of hybrids reveals RNA-RNA interactions in yeast","volume":"108","author":"Kudla","year":"2011","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025042316111930700_btaf108-B17","doi-asserted-by":"publisher","author":"Lajarte","year":"2024","DOI":"10.1101\/2024.01.24.577093"},{"key":"2025042316111930700_btaf108-B18","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1261\/rna.079211.122","article-title":"bpRNA-align: improved RNA secondary structure global alignment for comparing and clustering RNA structures","volume":"29","author":"Lasher","year":"2023","journal-title":"RNA"},{"key":"2025042316111930700_btaf108-B19","doi-asserted-by":"crossref","first-page":"D183","DOI":"10.1093\/nar\/gkaa880","article-title":"RASP: an atlas of transcriptome-wide RNA secondary structure probing data","volume":"49","author":"Li","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025042316111930700_btaf108-B20","doi-asserted-by":"crossref","first-page":"1267","DOI":"10.1016\/j.cell.2016.04.028","article-title":"RNA duplex map in living cells reveals higher-order transcriptome structure In brief accession numbers GSE74353 Lu et al resource RNA duplex map in living cells reveals higher-order transcriptome structure","volume":"165","author":"Lu","year":"2016","journal-title":"Cell"},{"key":"2025042316111930700_btaf108-B21","doi-asserted-by":"crossref","first-page":"11063","DOI":"10.1073\/pnas.1106501108","article-title":"Multiplexed RNA structure characterization with selective 2\u2032-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq)","volume":"108","author":"Lucks","year":"2011","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025042316111930700_btaf108-B22","doi-asserted-by":"crossref","first-page":"W493","DOI":"10.1093\/nar\/gkv489","article-title":"Web-Beagle: a web server for the alignment of RNA secondary structures","volume":"43","author":"Mattei","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2025042316111930700_btaf108-B23","doi-asserted-by":"crossref","first-page":"715","DOI":"10.1016\/j.molcel.2018.09.012","article-title":"Higher-order organization principles of pre-translational mRNPs","volume":"72","author":"Metkar","year":"2018","journal-title":"Mol Cell"},{"key":"2025042316111930700_btaf108-B24","doi-asserted-by":"crossref","first-page":"2856","DOI":"10.1093\/bioinformatics\/bty1057","article-title":"MMseqs2 desktop and local web server app for fast, interactive sequence searches","volume":"35","author":"Mirdita","year":"2019","journal-title":"Bioinformatics"},{"key":"2025042316111930700_btaf108-B25","doi-asserted-by":"crossref","first-page":"793","DOI":"10.1038\/s41587-019-0166-3","article-title":"RNA proximity sequencing reveals the spatial organization of the transcriptome in the nucleus","volume":"37","author":"Morf","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2025042316111930700_btaf108-B26","doi-asserted-by":"crossref","first-page":"12023","DOI":"10.1038\/ncomms12023","article-title":"Mapping RNA\u2013RNA interactome and RNA structure in vivo by MARIO","volume":"7","author":"Nguyen","year":"2016","journal-title":"Nat Commun"},{"key":"2025042316111930700_btaf108-B27","doi-asserted-by":"crossref","first-page":"e135","DOI":"10.1093\/nar\/gkx533","article-title":"Structure-seq2: sensitive and accurate genome-wide profiling of RNA structure in vivo","volume":"45","author":"Ritchey","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2025042316111930700_btaf108-B28","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1261\/rna.030049.111","article-title":"A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more","volume":"18","author":"Rivas","year":"2012","journal-title":"RNA"},{"key":"2025042316111930700_btaf108-B30","first-page":"618","author":"Sharma","year":"2016"},{"key":"2025042316111930700_btaf108-B31","doi-asserted-by":"crossref","first-page":"959","DOI":"10.1038\/nmeth.3029","article-title":"RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP)","volume":"11","author":"Siegfried","year":"2014","journal-title":"Nat Methods"},{"key":"2025042316111930700_btaf108-B32","doi-asserted-by":"crossref","first-page":"R3","DOI":"10.1186\/gb-2014-15-1-r3","article-title":"RNase-mediated protein footprint sequencing reveals protein-binding sites throughout the human transcriptome","volume":"15","author":"Silverman","year":"2014","journal-title":"Genome Biol"},{"key":"2025042316111930700_btaf108-B33","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1038\/nature14263","article-title":"Structural imprints in vivo decode RNA regulatory mechanisms","volume":"519","author":"Spitale","year":"2015","journal-title":"Nature"},{"key":"2025042316111930700_btaf108-B34","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/nature14280","article-title":"hiCLIP reveals the in vivo atlas of mRNA secondary structures recognized by Staufen 1","volume":"519","author":"Sugimoto","year":"2015","journal-title":"Nature"},{"key":"2025042316111930700_btaf108-B35","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1261\/rna.042218.113","article-title":"Mod-seq: high-throughput sequencing for chemical probing of RNA structure","volume":"20","author":"Talkish","year":"2014","journal-title":"RNA"},{"key":"2025042316111930700_btaf108-B36","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.ymeth.2023.02.009","article-title":"Measuring functional similarity of lncRNAs based on variable K-mer profiles of nucleotide sequences","volume":"212","author":"Teng","year":"2023","journal-title":"Methods"},{"key":"2025042316111930700_btaf108-B37","doi-asserted-by":"crossref","first-page":"e71","DOI":"10.1093\/nar\/gkaa404","article-title":"Lead-seq: transcriptome-wide structure probing in vivo using lead(II) ions","volume":"48","author":"Twittenhoff","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025042316111930700_btaf108-B38","doi-asserted-by":"crossref","first-page":"1265","DOI":"10.1007\/s40747-022-00846-y","article-title":"A fast and efficient algorithm for DNA sequence similarity identification","volume":"9","author":"Uddin","year":"2023","journal-title":"Complex Intell Syst"},{"key":"2025042316111930700_btaf108-B39","first-page":"70","volume-title":"Biological Sequence Analysis by Vector-Valued Functions: Revisiting Alignment-Free Methodologies for DNA and Protein Classification","author":"Vinga","year":"2007"},{"key":"2025042316111930700_btaf108-B40","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison-a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2025042316111930700_btaf108-B41","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1016\/j.csbj.2023.01.007","article-title":"RNAsmc: a integrated tool for comparing RNA secondary structure and evaluating allosteric effects","volume":"21","author":"Wang","year":"2023","journal-title":"Comput Struct Biotechnol J"},{"key":"2025042316111930700_btaf108-B42","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1038\/s41589-019-0459-3","article-title":"Keth-seq for transcriptome-wide RNA structure mapping","volume":"16","author":"Weng","year":"2020","journal-title":"Nat Chem Biol"},{"key":"2025042316111930700_btaf108-B44","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1093\/nar\/gky1172","article-title":"Assaying RNA structure with LASER-Seq","volume":"47","author":"Zinshteyn","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025042316111930700_btaf108-B45","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1038\/s41592-018-0121-0","article-title":"COMRADES determines in vivo RNA structures and interactions","volume":"15","author":"Ziv","year":"2018","journal-title":"Nat Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf108\/62415041\/btaf108.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf108\/62415041\/btaf108.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf108\/62415041\/btaf108.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T20:11:36Z","timestamp":1745439096000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf108\/8078599"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,3,14]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,3,29]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf108","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,4]]},"published":{"date-parts":[[2025,3,14]]},"article-number":"btaf108"}}