{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,22]],"date-time":"2026-03-22T04:42:22Z","timestamp":1774154542202,"version":"3.50.1"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2024,3,1]],"date-time":"2024-03-01T00:00:00Z","timestamp":1709251200000},"content-version":"vor","delay-in-days":29,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,5,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Recent success of AlphaFold2 in protein structure prediction relied heavily on co-evolutionary information derived from homologous protein sequences found in the huge, integrated database of protein sequences (Big Fantastic Database). In contrast, the existing nucleotide databases were not consolidated to facilitate wider and deeper homology search. Here, we built a comprehensive database by incorporating the non-coding RNA (ncRNA) sequences from RNAcentral, the transcriptome assembly and metagenome assembly from metagenomics RAST (MG-RAST), the genomic sequences from Genome Warehouse (GWH), and the genomic sequences from MGnify, in addition to the nucleotide (nt) database and its subsets in National Center of Biotechnology Information (NCBI). The resulting Master database of All possible RNA sequences (MARS) is 20-fold larger than NCBI\u2019s nt database or 60-fold larger than RNAcentral. The new dataset along with a new split\u2013search strategy allows a substantial improvement in homology search over existing state-of-the-art techniques. It also yields more accurate and more sensitive multiple sequence alignments (MSAs) than manually curated MSAs from Rfam for the majority of structured RNAs mapped to Rfam. The results indicate that MARS coupled with the fully automatic homology search tool RNAcmap will be useful for improved structural and functional inference of ncRNAs and RNA language models based on MSAs. MARS is accessible at https:\/\/ngdc.cncb.ac.cn\/omix\/release\/OMIX003037, and RNAcmap3 is accessible at http:\/\/zhouyq-lab.szbl.ac.cn\/download\/.<\/jats:p>","DOI":"10.1093\/gpbjnl\/qzae018","type":"journal-article","created":{"date-parts":[[2024,3,2]],"date-time":"2024-03-02T07:29:25Z","timestamp":1709364565000},"source":"Crossref","is-referenced-by-count":18,"title":["MARS and RNAcmap3: The Master Database of All Possible RNA Sequences Integrated with RNAcmap for RNA Homology Search"],"prefix":"10.1093","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0433-5580","authenticated-orcid":false,"given":"Ke","family":"Chen","sequence":"first","affiliation":[{"name":"Institute of Systems and Physical Biology , Shenzhen Bay Laboratory, Shenzhen 518055,","place":["China"]},{"name":"Peking University Shenzhen Graduate School , Shenzhen 518055,","place":["China"]},{"name":"University of Science and Technology of China , Hefei 230026,","place":["China"]},{"name":"Suzhou Institute for Advanced Research, University of Science and Technology of China , Suzhou 215123,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4863-3865","authenticated-orcid":false,"given":"Thomas","family":"Litfin","sequence":"additional","affiliation":[{"name":"Institute for Glycomics, Griffith University , Southport, QLD 4222,","place":["Australia"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0478-5533","authenticated-orcid":false,"given":"Jaswinder","family":"Singh","sequence":"additional","affiliation":[{"name":"Institute of Systems and Physical Biology , Shenzhen Bay Laboratory, Shenzhen 518055,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0856-2385","authenticated-orcid":false,"given":"Jian","family":"Zhan","sequence":"additional","affiliation":[{"name":"Institute of Systems and Physical Biology , Shenzhen Bay Laboratory, Shenzhen 518055,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9958-5699","authenticated-orcid":false,"given":"Yaoqi","family":"Zhou","sequence":"additional","affiliation":[{"name":"Institute of Systems and Physical Biology , Shenzhen Bay Laboratory, Shenzhen 518055,","place":["China"]},{"name":"Peking University Shenzhen Graduate School , Shenzhen 518055,","place":["China"]},{"name":"Institute for Glycomics, Griffith University , Southport, QLD 4222,","place":["Australia"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,3,1]]},"reference":[{"key":"2024092609160755300_qzae018-B1","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/S0021-9258(19)77302-5","article-title":"A soluble ribonucleic acid intermediate in protein synthesis","volume":"231","author":"Hoagland","year":"1958","journal-title":"J Biol Chem"},{"key":"2024092609160755300_qzae018-B2","doi-asserted-by":"crossref","first-page":"1377","DOI":"10.1101\/gr.247239.118","article-title":"Decrypting noncoding RNA interactions, structures, and functional networks","volume":"29","author":"Fabbri","year":"2019","journal-title":"Genome Res"},{"key":"2024092609160755300_qzae018-B3","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1146\/annurev.cellbio.23.090506.123406","article-title":"microRNA functions","volume":"23","author":"Bushati","year":"2007","journal-title":"Annu Rev Cell Dev Biol"},{"key":"2024092609160755300_qzae018-B4","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1038\/75546","article-title":"The imprinted antisense RNA at the Igf2r locus overlaps but does not imprint Mas1","volume":"25","author":"Lyle","year":"2000","journal-title":"Nat Genet"},{"key":"2024092609160755300_qzae018-B5","doi-asserted-by":"crossref","first-page":"984","DOI":"10.1002\/cbic.200300664","article-title":"On secondary structure rearrangements and equilibria of small RNAs","volume":"4","author":"Micura","year":"2003","journal-title":"Chembiochem"},{"key":"2024092609160755300_qzae018-B6","doi-asserted-by":"crossref","first-page":"100555","DOI":"10.1016\/j.jbc.2021.100555","article-title":"An RNA-centric historical narrative around the Protein Data Bank","volume":"296","author":"Westhof","year":"2021","journal-title":"J Biol Chem"},{"key":"2024092609160755300_qzae018-B7","doi-asserted-by":"crossref","first-page":"1555","DOI":"10.1080\/15476286.2019.1644590","article-title":"Predicting functional long non-coding RNAs validated by low throughput experiments","volume":"16","author":"Zhou","year":"2019","journal-title":"RNA Biol"},{"key":"2024092609160755300_qzae018-B8","doi-asserted-by":"crossref","first-page":"2242","DOI":"10.1126\/science.1103388","article-title":"Global identification of human transcribed sequences with genome tiling arrays","volume":"306","author":"Bertone","year":"2004","journal-title":"Science"},{"key":"2024092609160755300_qzae018-B9","doi-asserted-by":"crossref","first-page":"D86","DOI":"10.1093\/nar\/gkaa1076","article-title":"EVLncRNAs 2.0: an updated database of manually curated functional long non-coding RNAs validated by low-throughput experiments","volume":"49","author":"Zhou","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024092609160755300_qzae018-B10","doi-asserted-by":"crossref","first-page":"D212","DOI":"10.1093\/nar\/gkaa921","article-title":"RNAcentral 2021: secondary structure integration, improved sequence search and new member databases","volume":"49","author":"RNAcentral Consortium","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024092609160755300_qzae018-B11","doi-asserted-by":"crossref","first-page":"D10","DOI":"10.1093\/nar\/gkaa892","article-title":"Database resources of the National Center for Biotechnology Information","volume":"49","author":"Sayers","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024092609160755300_qzae018-B12","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1016\/j.gpb.2021.04.001","article-title":"Genome Warehouse: a public repository housing genome-scale data","volume":"19","author":"Chen","year":"2021","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2024092609160755300_qzae018-B13","doi-asserted-by":"crossref","first-page":"D27","DOI":"10.1093\/nar\/gkab951","article-title":"Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2022","volume":"50","author":"CNCB-NGDC Members and Partners","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024092609160755300_qzae018-B14","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1186\/1471-2105-9-386","article-title":"The metagenomics RAST server\u2014a public resource for the automatic phylogenetic and functional analysis of metagenomes","volume":"9","author":"Meyer","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2024092609160755300_qzae018-B15","doi-asserted-by":"crossref","first-page":"e1004008","DOI":"10.1371\/journal.pcbi.1004008","article-title":"A RESTful API for accessing microbial community data for MG-RAST","volume":"11","author":"Wilke","year":"2015","journal-title":"PLoS Comput Biol"},{"key":"2024092609160755300_qzae018-B16","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2024092609160755300_qzae018-B17","doi-asserted-by":"crossref","first-page":"3494","DOI":"10.1093\/bioinformatics\/btab391","article-title":"RNAcmap: a fully automatic pipeline for predicting contact maps of RNAs by evolutionary coupling analysis","volume":"37","author":"Zhang","year":"2021","journal-title":"Bioinformatics"},{"key":"2024092609160755300_qzae018-B18","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2024092609160755300_qzae018-B19","doi-asserted-by":"crossref","first-page":"2933","DOI":"10.1093\/bioinformatics\/btt509","article-title":"Infernal 1.1: 100-fold faster RNA homology searches","volume":"29","author":"Nawrocki","year":"2013","journal-title":"Bioinformatics"},{"key":"2024092609160755300_qzae018-B20","doi-asserted-by":"crossref","first-page":"E1293","DOI":"10.1073\/pnas.1111471108","article-title":"Direct-coupling analysis of residue coevolution captures native contacts across many protein families","volume":"108","author":"Morcos","year":"2011","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024092609160755300_qzae018-B21","doi-asserted-by":"crossref","first-page":"2589","DOI":"10.1093\/bioinformatics\/btab165","article-title":"Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning","volume":"37","author":"Singh","year":"2021","journal-title":"Bioinformatics"},{"key":"2024092609160755300_qzae018-B22","doi-asserted-by":"crossref","first-page":"3900","DOI":"10.1093\/bioinformatics\/btac421","article-title":"Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling","volume":"38","author":"Singh","year":"2022","journal-title":"Bioinformatics"},{"key":"2024092609160755300_qzae018-B23","article-title":"Improved RNA homology detection and alignment by automatic iterative search in an expanded database","author":"Singh","year":"2022","journal-title":"bioRxiv"},{"key":"2024092609160755300_qzae018-B24","doi-asserted-by":"crossref","first-page":"167904","DOI":"10.1016\/j.jmb.2022.167904","article-title":"rMSA: a sequence search and alignment algorithm to improve RNA structure modeling","volume":"435","author":"Zhang","year":"2023","journal-title":"J Mol Biol"},{"key":"2024092609160755300_qzae018-B25","article-title":"De novo RNA tertiary structure prediction at atomic resolution using geometric potentials from deep learning","author":"Pearce","year":"2022","journal-title":"bioRxiv"},{"key":"2024092609160755300_qzae018-B26","doi-asserted-by":"crossref","first-page":"D192","DOI":"10.1093\/nar\/gkaa1047","article-title":"Rfam 14: expanded coverage of metagenomic, viral and microRNA families","volume":"49","author":"Kalvari","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024092609160755300_qzae018-B27","doi-asserted-by":"crossref","first-page":"2487","DOI":"10.1093\/bioinformatics\/btt403","article-title":"nhmmer: DNA homology search with profile HMMs","volume":"29","author":"Wheeler","year":"2013","journal-title":"Bioinformatics"},{"key":"2024092609160755300_qzae018-B28","first-page":"D570","article-title":"MGnify: the microbiome analysis resource in 2020","volume":"48","author":"Mitchell","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2024092609160755300_qzae018-B29","doi-asserted-by":"crossref","article-title":"Multiple sequence alignment-based RNA language model and its application to structural inference","author":"Zhang","DOI":"10.1093\/nar\/gkad1031"},{"key":"2024092609160755300_qzae018-B30","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2024092609160755300_qzae018-B31","doi-asserted-by":"crossref","first-page":"e0163962","DOI":"10.1371\/journal.pone.0163962","article-title":"SeqKit: a cross-platform and ultrafast toolkit for FASTA\/Q file manipulation","volume":"11","author":"Shen","year":"2016","journal-title":"PLoS One"},{"key":"2024092609160755300_qzae018-B32","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2024092609160755300_qzae018-B33","doi-asserted-by":"crossref","first-page":"D437","DOI":"10.1093\/nar\/gkaa1038","article-title":"RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences","volume":"49","author":"Burley","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024092609160755300_qzae018-B34","doi-asserted-by":"crossref","first-page":"15674","DOI":"10.1073\/pnas.1314045110","article-title":"Assessing the utility of coevolution-based residue\u2013residue contact predictions in a sequence- and structure-rich era","volume":"110","author":"Kamisetty","year":"2013","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024092609160755300_qzae018-B35","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1002\/prot.22934","article-title":"Learning generative models for protein fold families","volume":"79","author":"Balakrishnan","year":"2011","journal-title":"Proteins"},{"key":"2024092609160755300_qzae018-B36","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1038\/nbt.3769","article-title":"Mutation effects predicted from sequence co-variation","volume":"35","author":"Hopf","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2024092609160755300_qzae018-B37","doi-asserted-by":"crossref","first-page":"012707","DOI":"10.1103\/PhysRevE.87.012707","article-title":"Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models","volume":"87","author":"Ekeberg","year":"2013","journal-title":"Phys Rev E Stat Nonlin Soft Matter Phys"},{"key":"2024092609160755300_qzae018-B38","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1186\/1471-2105-11-129","article-title":"RNAstructure: software for RNA secondary structure prediction and analysis","volume":"11","author":"Reuter","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2024092609160755300_qzae018-B39","doi-asserted-by":"crossref","first-page":"5407","DOI":"10.1038\/s41467-019-13395-9","article-title":"RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning","volume":"10","author":"Singh","year":"2019","journal-title":"Nat Commun"},{"key":"2024092609160755300_qzae018-B40","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1038\/s41467-021-21194-4","article-title":"RNA secondary structure prediction using deep learning with thermodynamic integration","volume":"12","author":"Sato","year":"2021","journal-title":"Nat Commun"},{"key":"2024092609160755300_qzae018-B41","doi-asserted-by":"crossref","first-page":"e14","DOI":"10.1093\/nar\/gkab1074","article-title":"UFold: fast and accurate RNA secondary structure prediction with deep learning","volume":"50","author":"Fu","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024092609160755300_qzae018-B42","doi-asserted-by":"crossref","first-page":"3892","DOI":"10.1093\/bioinformatics\/btac415","article-title":"Deep learning models for RNA secondary structure prediction (probably) do not generalize across families","volume":"38","author":"Szikszai","year":"2022","journal-title":"Bioinformatics"},{"key":"2024092609160755300_qzae018-B43","doi-asserted-by":"crossref","first-page":"5169","DOI":"10.1093\/bioinformatics\/btaa652","article-title":"Single-sequence and profile-based prediction of RNA solvent accessibility using dilated convolutional neural network","volume":"36","author":"Hanumanthappa","year":"2021","journal-title":"Bioinformatics"}],"container-title":["Genomics, Proteomics &amp; Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/gpb\/advance-article-pdf\/doi\/10.1093\/gpbjnl\/qzae018\/56819152\/qzae018.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/gpb\/article-pdf\/22\/1\/qzae018\/59350512\/qzae018.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/gpb\/article-pdf\/22\/1\/qzae018\/59350512\/qzae018.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,26]],"date-time":"2024-09-26T09:17:03Z","timestamp":1727342223000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/gpb\/article\/doi\/10.1093\/gpbjnl\/qzae018\/7617691"}},"subtitle":[],"editor":[{"given":"Jianhua","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2024,2]]},"references-count":43,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,5,9]]}},"URL":"https:\/\/doi.org\/10.1093\/gpbjnl\/qzae018","relation":{},"ISSN":["1672-0229","2210-3244"],"issn-type":[{"value":"1672-0229","type":"print"},{"value":"2210-3244","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,2]]},"published":{"date-parts":[[2024,2]]},"article-number":"qzae018"}}