{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T22:03:44Z","timestamp":1766268224094},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>\n              <jats:italic>RNACompress<\/jats:italic> employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: (1) present a robust and effective way for RNA structural data compression; (2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that <jats:italic>RNACompress<\/jats:italic> achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as <jats:italic>Gencompress, winrar<\/jats:italic> and <jats:italic>gzip<\/jats:italic>. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed <jats:italic>RNACompress<\/jats:italic>, as a useful tool for academic users. Extensive tests have shown that <jats:italic>RNACompress<\/jats:italic> is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. <jats:italic>RNACompress<\/jats:italic> also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-176","type":"journal-article","created":{"date-parts":[[2008,4,22]],"date-time":"2008-04-22T19:33:51Z","timestamp":1208892831000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure"],"prefix":"10.1186","volume":"9","author":[{"given":"Qi","family":"Liu","sequence":"first","affiliation":[]},{"given":"Yu","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Chun","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Jiajun","family":"Bu","sequence":"additional","affiliation":[]},{"given":"Yin","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Xiuzi","family":"Ye","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,3,31]]},"reference":[{"issue":"1","key":"2161_CR1","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1038\/35047580","volume":"2","author":"P Avner","year":"2001","unstructured":"Avner P, Heard E: X-chromosome inactivation: counting, choice and initiation. Nat Rev Genet 2001, 2(1):59\u201367. 10.1038\/35047580","journal-title":"Nat Rev Genet"},{"issue":"1","key":"2161_CR2","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1146\/annurev.biochem.67.1.153","volume":"67","author":"DN Frank","year":"1998","unstructured":"Frank DN, Pace NR: RIBONUCLEASE P: Unity and Diversity in a tRNA Processing Ribozyme. Annual Review of Biochemistry 1998, 67(1):153\u2013180. 10.1146\/annurev.biochem.67.1.153","journal-title":"Annual Review of Biochemistry"},{"issue":"14","key":"2161_CR3","doi-asserted-by":"publisher","first-page":"3617","DOI":"10.1093\/emboj\/20.14.3617","volume":"20","author":"T Kiss","year":"2001","unstructured":"Kiss T: Small nucleolar RNA-guided post-transcriptional modification of cellular RNAs. EMBO J 2001, 20(14):3617\u20133622. 10.1093\/emboj\/20.14.3617","journal-title":"EMBO J"},{"issue":"3","key":"2161_CR4","doi-asserted-by":"publisher","first-page":"1764","DOI":"10.1128\/MCB.14.3.1764","volume":"14","author":"S Lankenau","year":"1994","unstructured":"Lankenau S, Corces VG, Lankenau DH: The Drosophila micropia retrotransposon encodes a testis-specific antisense RNA complementary to reverse transcriptase. Molecular and Cellular Biology 1994, 14(3):1764\u20131775.","journal-title":"Molecular and Cellular Biology"},{"issue":"5405","key":"2161_CR5","doi-asserted-by":"publisher","first-page":"1168","DOI":"10.1126\/science.283.5405.1168","volume":"283","author":"TM Lowe","year":"1999","unstructured":"Lowe TM, Eddy SR: A Computational Screen for Methylation Guide snoRNAs in Yeast. Science 1999, 283(5405):1168\u20131171. 10.1126\/science.283.5405.1168","journal-title":"Science"},{"key":"2161_CR6","doi-asserted-by":"publisher","first-page":"2326\u20132343","DOI":"10.1002\/(SICI)1521-3773(19990614)38:12<1798::AID-ANIE1798>3.0.CO;2-0","volume":"38","author":"RT Batey","year":"1999","unstructured":"Batey RT, Rambo RP, Doudna JA: Tertiary motifs in RNA structure and folding. Angew Chem Int Ed 1999, 38: 2326\u20132343.","journal-title":"Angew Chem Int Ed"},{"issue":"3","key":"2161_CR7","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1016\/S0092-8674(01)00547-5","volume":"107","author":"A Nykanen","year":"2001","unstructured":"Nykanen A, Haley B, Zamore PD: ATP Requirements and Small Interfering RNA Structure in the RNA Interference Pathway. Cell 2001, 107(3):309\u2013321. 10.1016\/S0092-8674(01)00547-5","journal-title":"Cell"},{"key":"2161_CR8","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1016\/0076-6879(89)80106-5","volume":"180","author":"M Zuker","year":"1989","unstructured":"Zuker M: Computer prediction of RNA structure. Methods Enzymol 1989, 180: 262\u2013288.","journal-title":"Methods Enzymol"},{"issue":"Database Issue","key":"2161_CR9","doi-asserted-by":"publisher","first-page":"D112","DOI":"10.1093\/nar\/gki041","volume":"33","author":"C Liu","year":"2005","unstructured":"Liu C, Bai B, Skogerb G, Cai L, Deng W, Zhang Y, Bu D, Zhao Y, Chen R: NONCODE: an integrated knowledge database of non-coding RNAs. Nucleic Acids Research 2005, 33(Database Issue):D112-D115. 10.1093\/nar\/gki041","journal-title":"Nucleic Acids Research"},{"issue":"1","key":"2161_CR10","doi-asserted-by":"publisher","first-page":"439","DOI":"10.1093\/nar\/gkg006","volume":"31","author":"S Griffiths-Jones","year":"2003","unstructured":"Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR: Rfam: an RNA family database. Nucleic Acids Research 2003, 31(1):439\u2013441. 10.1093\/nar\/gkg006","journal-title":"Nucleic Acids Research"},{"issue":"1","key":"2161_CR11","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1093\/nar\/26.1.351","volume":"26","author":"JW Brown","year":"2005","unstructured":"Brown JW, Journals O: The ribonuclease P database. Nucleic Acids Research 2005, 26(1):351\u2013352. 10.1093\/nar\/26.1.351","journal-title":"Nucleic Acids Research"},{"issue":"Database Issue","key":"2161_CR12","doi-asserted-by":"publisher","first-page":"D125","DOI":"10.1093\/nar\/gki089","volume":"33","author":"KC Pang","year":"2005","unstructured":"Pang KC, Stephen S, Engstrom PG, Tajul-Arifin K, Chen W, Wahlestedt C, Lenhard B, Hayashizaki Y, Mattick JS: RNAdb--a comprehensive mammalian noncoding RNA database. Nucleic Acids Research 2005, 33(Database Issue):D125. 10.1093\/nar\/gki089","journal-title":"Nucleic Acids Research"},{"key":"2161_CR13","volume-title":"Proceedings of RECOMB","author":"X Chen","year":"2000","unstructured":"Chen X, Kwong S, Li M: A compression algorithm for DNA sequences and its applications in genome comparison. Proceedings of RECOMB 2000., 107:"},{"issue":"12","key":"2161_CR14","doi-asserted-by":"publisher","first-page":"1696","DOI":"10.1093\/bioinformatics\/18.12.1696","volume":"18","author":"X Chen","year":"2002","unstructured":"Chen X, Li M, Ma B, Tromp J: DNACompress: fast and effective DNA sequence compression. Bioinformatics 2002, 18(12):1696\u20131698. 10.1093\/bioinformatics\/18.12.1696","journal-title":"Bioinformatics"},{"key":"2161_CR15","first-page":"340","volume-title":"Data Compression Conference, 1993 DCC'93","author":"S Grumbach","year":"1993","unstructured":"Grumbach S, Tahi F, Inria LC: Compression of DNA sequences. Data Compression Conference, 1993 DCC'93 1993, 340\u2013350."},{"key":"2161_CR16","volume-title":"Data Compression Conference, 1996 DCC'96 Proceedings","author":"E Rivals","year":"1996","unstructured":"Rivals E, Delahaye JP, Dauchet M, Delgrange O: A guaranteed compression scheme for repetitive DNA sequences. Data Compression Conference, 1996 DCC'96 Proceedings 1996."},{"issue":"03","key":"2161_CR17","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1017\/S0033583500003620","volume":"33","author":"PG Higgs","year":"2001","unstructured":"Higgs PG: RNA secondary structure: physical and computational aspects. Quarterly Reviews of Biophysics 2001, 33(03):199\u2013253. 10.1017\/S0033583500003620","journal-title":"Quarterly Reviews of Biophysics"},{"issue":"2","key":"2161_CR18","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1093\/bioinformatics\/17.2.149","volume":"17","author":"M Li","year":"2001","unstructured":"Li M, Badger JH, Chen X, Kwong S, Kearney P, Zhang H: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 2001, 17(2):149\u2013154. 10.1093\/bioinformatics\/17.2.149","journal-title":"Bioinformatics"},{"issue":"4","key":"2161_CR19","doi-asserted-by":"publisher","first-page":"240","DOI":"10.1145\/362991.363001","volume":"11","author":"SH Unger","year":"1968","unstructured":"Unger SH: A global parser for context-free phrase structure grammars. Communications of the ACM 1968, 11(4):240\u2013247. 10.1145\/362991.363001","journal-title":"Communications of the ACM"},{"issue":"2","key":"2161_CR20","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/0196-6774(85)90036-7","volume":"6","author":"DE Knuth","year":"1985","unstructured":"Knuth DE: Dynamic Huffman coding. Journal of Algorithms 1985, 6(2):163\u2013180. 10.1016\/0196-6774(85)90036-7","journal-title":"Journal of Algorithms"},{"issue":"4","key":"2161_CR21","doi-asserted-by":"publisher","first-page":"500","DOI":"10.1093\/bioinformatics\/btk010","volume":"22","author":"P Steffen","year":"2006","unstructured":"Steffen P, Voss B, Rehmsmeier M, Reeder J, Giegerich R: RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 2006, 22(4):500. 10.1093\/bioinformatics\/btk010","journal-title":"Bioinformatics"},{"key":"2161_CR22","doi-asserted-by":"crossref","unstructured":"Voss B, Giegerich R, Rehmsmeier M: Complete probabilistic analysis of RNA shapes. BMC Biol 2006., 4(5):","DOI":"10.1186\/1741-7007-4-5"},{"issue":"2","key":"2161_CR23","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1016\/0022-0000(82)90051-4","volume":"24","author":"K Hashiguchi","year":"1982","unstructured":"Hashiguchi K: Limitedness Theorem on Finite Automata With Distance Functions. J COMP AND SYS SCI 1982, 24(2):233\u2013244. 10.1016\/0022-0000(82)90051-4","journal-title":"J COMP AND SYS SCI"},{"issue":"1","key":"2161_CR24","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1002\/spe.4380180105","volume":"18","author":"D Grune","year":"1988","unstructured":"Grune D, Jacobs CJH: A programmer-friendly LL (1) parser generator. Software\u2014Practice & Experience 1988, 18(1):29\u201338. 10.1002\/spe.4380180105","journal-title":"Software\u2014Practice & Experience"},{"key":"2161_CR25","doi-asserted-by":"publisher","first-page":"446","DOI":"10.1093\/bioinformatics\/15.6.446","volume":"15","author":"B Knudsen","year":"1999","unstructured":"Knudsen B, Hein J: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 1999, 15: 446\u2013454. 10.1093\/bioinformatics\/15.6.446","journal-title":"Bioinformatics"},{"issue":"1","key":"2161_CR26","doi-asserted-by":"publisher","first-page":"502","DOI":"10.1093\/nar\/gkg012","volume":"31","author":"VL Murthy","year":"2003","unstructured":"Murthy VL, Rose GD: RNABase: an annotated database of RNA structures. Nucleic Acids Research 2003, 31(1):502\u2013504. 10.1093\/nar\/gkg012","journal-title":"Nucleic Acids Research"},{"key":"2161_CR27","volume-title":"Grammatical Man: Information, Entropy, Language, and Life","author":"J Campbell","year":"1982","unstructured":"Campbell J: Grammatical Man: Information, Entropy, Language, and Life. Simon and Schuster; 1982."},{"key":"2161_CR28","volume-title":"Elements of Information Theory","author":"TJA Cover TM","year":"1990","unstructured":"Cover TM TJA: Elements of Information Theory. Wiley; 1990."},{"issue":"11","key":"2161_CR29","doi-asserted-by":"publisher","first-page":"1917","DOI":"10.1109\/26.61469","volume":"38","author":"A Moffat","year":"1990","unstructured":"Moffat A: Implementing the PPM data compression scheme. Communications, IEEE Transactions on 1990, 38(11):1917\u20131921. 10.1109\/26.61469","journal-title":"Communications, IEEE Transactions on"},{"issue":"57","key":"2161_CR30","first-page":"94","volume":"63","author":"JM Carothers","year":"2001","unstructured":"Carothers JM, Oestreich SC, Davis JH, Szostak JW: Informational Complexity and Functional Activity of RNA Structures. networks 2001, 63(57):94.","journal-title":"networks"},{"issue":"14","key":"2161_CR31","doi-asserted-by":"publisher","first-page":"3946","DOI":"10.1093\/nar\/gkg448","volume":"31","author":"EI Zagryadskaya","year":"2003","unstructured":"Zagryadskaya EI, Doyon FR, Steinberg SV, Journals O: Importance of the reverse Hoogsteen base pair 54\u201358 for tRNA function. Nucleic Acids Research 2003, 31(14):3946\u20133953. 10.1093\/nar\/gkg448","journal-title":"Nucleic Acids Research"},{"key":"2161_CR32","first-page":"660","volume-title":"Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB'04)-Volume 00","author":"O Bergig","year":"2004","unstructured":"Bergig O, Barash D, Kedem K: RNA Motif Search Using the Structure to String (STR2) Method. Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB'04)-Volume 00 2004, 660\u2013661."},{"issue":"4","key":"2161_CR33","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1093\/bioinformatics\/btk008","volume":"22","author":"Z Yao","year":"2006","unstructured":"Yao Z, Weinberg Z, Ruzzo WL: CMfinder--a covariance model based RNA motif finding algorithm. Bioinformatics 2006, 22(4):445. 10.1093\/bioinformatics\/btk008","journal-title":"Bioinformatics"},{"issue":"1","key":"2161_CR34","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1093\/nar\/30.1.176","volume":"30","author":"M Szymanski","year":"2002","unstructured":"Szymanski M, Barciszewska MZ, Erdmann VA, Barciszewski J, Journals O: 5S Ribosomal RNA Database. Nucleic Acids Research 2002, 30(1):176\u2013178. 10.1093\/nar\/30.1.176","journal-title":"Nucleic Acids Research"},{"issue":"Database Issue","key":"2161_CR35","doi-asserted-by":"publisher","first-page":"D140","DOI":"10.1093\/nar\/gkj112","volume":"34","author":"S Griffiths-Jones","year":"2006","unstructured":"Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ, Journals O: miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Research 2006, 34(Database Issue):D140-D144. 10.1093\/nar\/gkj112","journal-title":"Nucleic Acids Research"},{"issue":"8","key":"2161_CR36","doi-asserted-by":"publisher","first-page":"926","DOI":"10.1093\/bioinformatics\/btm049","volume":"23","author":"E Torarinsson","year":"2007","unstructured":"Torarinsson E, Havgaard JH, Gorodkin J: Multiple structural alignment and clustering of RNA sequences. Bioinformatics 2007, 23(8):926. 10.1093\/bioinformatics\/btm049","journal-title":"Bioinformatics"},{"issue":"4","key":"2161_CR37","doi-asserted-by":"publisher","first-page":"e47","DOI":"10.1371\/journal.pgen.0020047","volume":"2","author":"PG Engstrom","year":"2006","unstructured":"Engstrom PG, Suzuki H, Ninomiya N, Akalin A, Sessa L, Lavorgna G, Brozzi A, Luzi L, Tan SL, Yang L: Complex loci in human and mouse genomes. PLoS Genet 2006, 2(4):e47. 10.1371\/journal.pgen.0020047","journal-title":"PLoS Genet"},{"issue":"Database issue","key":"2161_CR38","doi-asserted-by":"publisher","first-page":"D158","DOI":"10.1093\/nar\/gkj002","volume":"34","author":"L Lestrade","year":"2006","unstructured":"Lestrade L, Weber MJ, Journals O: snoRNA-LBME-db, a comprehensive database of human H\/ACA and C\/D box snoRNAs. Nucleic Acids Research 2006, 34(Database issue):D158-D162. 10.1093\/nar\/gkj002","journal-title":"Nucleic Acids Research"},{"issue":"14","key":"2161_CR39","doi-asserted-by":"publisher","first-page":"e90","DOI":"10.1093\/bioinformatics\/btl246","volume":"22","author":"CB Do","year":"2006","unstructured":"Do CB, Woods DA, Batzoglou S: CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 2006, 22(14):e90. 10.1093\/bioinformatics\/btl246","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-176.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T03:23:19Z","timestamp":1630466599000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-176"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,3,31]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2161"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-176","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,3,31]]},"assertion":[{"value":"18 November 2007","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 March 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 March 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"176"}}