{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T08:46:50Z","timestamp":1778057210179,"version":"3.51.4"},"reference-count":20,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes <jats:italic>all possible<\/jats:italic> sequence alignments and <jats:italic>all possible<\/jats:italic> secondary structures, whereas the existing methods only use <jats:italic>one optimal<\/jats:italic> sequence alignment and <jats:italic>one optimal<\/jats:italic> secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Our method enables fast and accurate clustering of ncRNAs. The software is available for download at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/bpla-kernel.dna.bio.keio.ac.jp\/clustering\/\" ext-link-type=\"uri\">http:\/\/bpla-kernel.dna.bio.keio.ac.jp\/clustering\/<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-12-s1-s48","type":"journal-article","created":{"date-parts":[[2011,2,18]],"date-time":"2011-02-18T20:08:08Z","timestamp":1298059688000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures"],"prefix":"10.1186","volume":"12","author":[{"given":"Yutaka","family":"Saito","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kengo","family":"Sato","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yasubumi","family":"Sakakibara","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2011,2,15]]},"reference":[{"issue":"12","key":"4406_CR1","doi-asserted-by":"publisher","first-page":"919","DOI":"10.1038\/35103511","volume":"2","author":"SR Eddy","year":"2001","unstructured":"Eddy SR: Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2001, 2(12):919\u2013929. 10.1038\/35103511","journal-title":"Nat Rev Genet"},{"issue":"Database issue","key":"4406_CR2","doi-asserted-by":"publisher","first-page":"D136","DOI":"10.1093\/nar\/gkn766","volume":"37","author":"PP Gardner","year":"2009","unstructured":"Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A: Rfam: updates to the RNA families database. Nucleic Acids Res 2009, 37(Database issue):D136\u201340. 10.1093\/nar\/gkn766","journal-title":"Nucleic Acids Res"},{"issue":"5","key":"4406_CR3","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1038\/nbt.1633","volume":"28","author":"M Guttman","year":"2010","unstructured":"Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol MJ, Gnirke A, Nusbaum C, Rinn JL, Lander ES, Regev A: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 2010, 28(5):503\u2013510. 10.1038\/nbt.1633","journal-title":"Nat Biotechnol"},{"key":"4406_CR4","doi-asserted-by":"crossref","unstructured":"Rederstorff M, Bernhart SH, Tanzer A, Zywicki M, Perfler K, Lukasser M, Hofacker IL, H\u00fcttenhofer A: RNPomics: defining the ncRNA transcriptome by cDNA library generation from ribonucleo-protein particles. Nucleic Acids Res 2010., 38(10):","DOI":"10.1093\/nar\/gkq057"},{"issue":"7244","key":"4406_CR5","doi-asserted-by":"publisher","first-page":"266","DOI":"10.1038\/nature08055","volume":"459","author":"Y Shi","year":"2009","unstructured":"Shi Y, Tyson GW, DeLong EF: Metatranscriptomics reveals unique microbial small RNAs in the ocean\u2019s water column. Nature 2009, 459(7244):266\u2013269. 10.1038\/nature08055","journal-title":"Nature"},{"issue":"7273","key":"4406_CR6","doi-asserted-by":"publisher","first-page":"656","DOI":"10.1038\/nature08586","volume":"462","author":"Z Weinberg","year":"2009","unstructured":"Weinberg Z, Perreault J, Meyer MM, Breaker RR: Exceptional structured noncoding RNAs revealed by bacterial metagenome analysis. Nature 2009, 462(7273):656\u2013659. 10.1038\/nature08586","journal-title":"Nature"},{"issue":"5","key":"4406_CR7","doi-asserted-by":"publisher","first-page":"810","DOI":"10.1137\/0145048","volume":"45","author":"D Sankoff","year":"1985","unstructured":"Sankoff D: Simultaneous solution of the RNA folding, alignment, and proto-sequence problems. SIAM J Appl Math 1985, 45(5):810\u201325. 10.1137\/0145048","journal-title":"SIAM J Appl Math"},{"issue":"4","key":"4406_CR8","doi-asserted-by":"publisher","first-page":"e65","DOI":"10.1371\/journal.pcbi.0030065","volume":"3","author":"S Will","year":"2007","unstructured":"Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R: Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Comput Biol 2007, 3(4):e65. 10.1371\/journal.pcbi.0030065","journal-title":"PLoS Comput Biol"},{"issue":"8","key":"4406_CR9","doi-asserted-by":"publisher","first-page":"926","DOI":"10.1093\/bioinformatics\/btm049","volume":"23","author":"E Torarinsson","year":"2007","unstructured":"Torarinsson E, Havgaard JH, Gorodkin J: Multiple structural alignment and clustering of RNA sequences. Bioinformatics 2007, 23(8):926\u2013932. 10.1093\/bioinformatics\/btm049","journal-title":"Bioinformatics"},{"key":"4406_CR10","doi-asserted-by":"publisher","first-page":"318","DOI":"10.1186\/1471-2105-9-318","volume":"9","author":"K Sato","year":"2008","unstructured":"Sato K, Mituyama T, Asai K, Sakakibara Y: Directed acyclic graph kernels for structural RNA analysis. BMC Bioinformatics 2008, 9: 318. 10.1186\/1471-2105-9-318","journal-title":"BMC Bioinformatics"},{"issue":"10","key":"4406_CR11","doi-asserted-by":"publisher","first-page":"1896","DOI":"10.1371\/journal.pcbi.0030193","volume":"3","author":"JH Havgaard","year":"2007","unstructured":"Havgaard JH, Torarinsson E, Gorodkin J: Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. PLoS Comput Biol 2007, 3(10):1896\u20131908. 10.1371\/journal.pcbi.0030193","journal-title":"PLoS Comput Biol"},{"issue":"3","key":"4406_CR12","doi-asserted-by":"publisher","first-page":"R31","DOI":"10.1186\/gb-2010-11-3-r31","volume":"11","author":"Z Weinberg","year":"2010","unstructured":"Weinberg Z, Wang JX, Bogue J, Yang J, Corbino K, Moy RH, Breaker RR: Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes. Genome Biol 2010, 11(3):R31. 10.1186\/gb-2010-11-3-r31","journal-title":"Genome Biol"},{"key":"4406_CR13","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1186\/1748-7188-1-19","volume":"1","author":"A Wilm","year":"2006","unstructured":"Wilm A, Mainz I, Steger G: An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol 2006, 1: 19. 10.1186\/1748-7188-1-19","journal-title":"Algorithms Mol Biol"},{"key":"4406_CR14","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","volume":"147","author":"T Smith","year":"1981","unstructured":"Smith T, Waterman M: Identification of common molecular subsequences. J Mol Biol 1981, 147: 195\u20137. 10.1016\/0022-2836(81)90087-5","journal-title":"J Mol Biol"},{"issue":"6-7","key":"4406_CR15","doi-asserted-by":"publisher","first-page":"1105","DOI":"10.1002\/bip.360290621","volume":"29","author":"JS McCaskill","year":"1990","unstructured":"McCaskill JS: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 1990, 29(6\u20137):1105\u201319. 10.1002\/bip.360290621","journal-title":"Biopolymers"},{"issue":"11","key":"4406_CR16","doi-asserted-by":"publisher","first-page":"1682","DOI":"10.1093\/bioinformatics\/bth141","volume":"20","author":"H Saigo","year":"2004","unstructured":"Saigo H, Vert JP, Ueda N, Akutsu T: Protein homology detection using string alignment kernels. Bioinformatics 2004, 20(11):1682\u20139. 10.1093\/bioinformatics\/bth141","journal-title":"Bioinformatics"},{"key":"4406_CR17","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1186\/1471-2105-4-44","volume":"4","author":"RJ Klein","year":"2003","unstructured":"Klein RJ, Eddy SR: RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 2003, 4: 44. 10.1186\/1471-2105-4-44","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"4406_CR18","doi-asserted-by":"publisher","first-page":"999","DOI":"10.1093\/nar\/gkn1054","volume":"37","author":"K Morita","year":"2009","unstructured":"Morita K, Saito Y, Sato K, Oka K, Hotta K, Sakakibara Y: Genome-wide searching with base-pairing kernel functions for noncoding RNAs: computational and expression analysis of snoRNA families in Caenorhabditis elegans. Nucleic Acids Res 2009, 37(3):999\u20131009. 10.1093\/nar\/gkn1054","journal-title":"Nucleic Acids Res"},{"issue":"13","key":"4406_CR19","doi-asserted-by":"publisher","first-page":"1593","DOI":"10.1093\/bioinformatics\/btl142","volume":"22","author":"D Dalli","year":"2006","unstructured":"Dalli D, Wilm A, Mainz I, Steger G: STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time. Bioinformatics 2006, 22(13):1593\u20139. 10.1093\/bioinformatics\/btl142","journal-title":"Bioinformatics"},{"issue":"22","key":"4406_CR20","doi-asserted-by":"publisher","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","volume":"22","author":"JD Thompson","year":"1994","unstructured":"Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22(22):4673\u201380. 10.1093\/nar\/22.22.4673","journal-title":"Nucleic Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-S1-S48.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T13:44:38Z","timestamp":1630503878000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-S1-S48"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,2,15]]},"references-count":20,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["4406"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-s1-s48","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,2,15]]},"assertion":[{"value":"15 February 2011","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S48"}}