{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:59Z","timestamp":1772138039198,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2020,5,28]],"date-time":"2020-05-28T00:00:00Z","timestamp":1590624000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2018YFC0910500"],"award-info":[{"award-number":["2018YFC0910500"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1611261"],"award-info":[{"award-number":["U1611261"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61772566"],"award-info":[{"award-number":["61772566"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["81801132"],"award-info":[{"award-number":["81801132"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangdong Frontier & Key Tech Innovation Pro-gram","award":["2018B010109006"],"award-info":[{"award-number":["2018B010109006"]}]},{"name":"Guangdong Frontier & Key Tech Innovation Pro-gram","award":["2019B020228001"],"award-info":[{"award-number":["2019B020228001"]}]},{"name":"Introducing Innovative and Entrepreneurial Teams","award":["2016ZT06D211"],"award-info":[{"award-number":["2016ZT06D211"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>RNA secondary structure plays a vital role in fundamental cellular processes, and identification of RNA secondary structure is a key step to understand RNA functions. Recently, a few experimental methods were developed to profile genome-wide RNA secondary structure, i.e. the pairing probability of each nucleotide, through high-throughput sequencing techniques. However, these high-throughput methods have low precision and cannot cover all nucleotides due to limited sequencing coverage.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we have developed a new method for the prediction of genome-wide RNA secondary structure profile from RNA sequence based on the extreme gradient boosting technique. The method achieves predictions with areas under the receiver operating characteristic curve (AUC) &amp;gt;0.9 on three different datasets, and AUC of 0.888 by another independent test on the recently released Zika virus data. These AUCs are consistently &amp;gt;5% greater than those by the CROSS method recently developed based on a shallow neural network. Further analysis on the 1000 Genome Project data showed that our predicted unpaired probabilities are highly correlated (&amp;gt;0.8) with the minor allele frequencies at synonymous, non-synonymous mutations, and mutations in untranslated regions, which were higher than those generated by RNAplfold. Moreover, the prediction over all human mRNA indicated a consistent result with previous observation that there is a periodic distribution of unpaired probability on codons. The accurate predictions by our method indicate that such model trained on genome-wide experimental data might be an alternative for analytical methods.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The GRASP is available for academic use at https:\/\/github.com\/sysu-yanglab\/GRASP.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa534","type":"journal-article","created":{"date-parts":[[2020,5,23]],"date-time":"2020-05-23T07:14:30Z","timestamp":1590218070000},"page":"4576-4582","source":"Crossref","is-referenced-by-count":10,"title":["Accurate prediction of genome-wide RNA secondary structure profile based on extreme gradient boosting"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9575-5745","authenticated-orcid":false,"given":"Yaobin","family":"Ke","sequence":"first","affiliation":[{"name":"School of Data and Computer Science , Guangzhou 510000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6840-8198","authenticated-orcid":false,"given":"Jiahua","family":"Rao","sequence":"additional","affiliation":[{"name":"School of Data and Computer Science , Guangzhou 510000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huiying","family":"Zhao","sequence":"additional","affiliation":[{"name":"Sun Yat-sen Memorial Hospital , Guangzhou 510000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yutong","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Data and Computer Science , Guangzhou 510000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nong","family":"Xiao","sequence":"additional","affiliation":[{"name":"School of Data and Computer Science , Guangzhou 510000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6782-2813","authenticated-orcid":false,"given":"Yuedong","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Data and Computer Science , Guangzhou 510000, China"},{"name":"Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University) of Ministry of Education , Guangzhou 510000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,5,28]]},"reference":[{"key":"2023062213570754600_btaa534-B51484401","doi-asserted-by":"crossref","first-page":"340","DOI":"10.1186\/1471-2105-9-340","article-title":"RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database","volume":"9","author":"Andronescu","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023062213570754600_btaa534-B1","doi-asserted-by":"crossref","first-page":"1850014","DOI":"10.1142\/S0219720018500142","article-title":"Training host-pathogen protein-protein interaction predictors","volume":"16","author":"Basit","year":"2018","journal-title":"J. Bioinform. Comput. Biol"},{"key":"2023062213570754600_btaa534-B2","doi-asserted-by":"crossref","first-page":"614","DOI":"10.1093\/bioinformatics\/btk014","article-title":"Local RNA base pairing probabilities in large sequences","volume":"22","author":"Bernhart","year":"2006","journal-title":"Bioinformatics"},{"key":"2023062213570754600_btaa534-B3","doi-asserted-by":"crossref","first-page":"1235","DOI":"10.1002\/humu.23785","article-title":"Predicting the change of exon splicing caused by genetic variant using support vector regression","volume":"40","author":"Chen","year":"2019","journal-title":"Hum. Mutat"},{"key":"2023062213570754600_btaa534-B4","doi-asserted-by":"crossref","DOI":"10.1186\/s13321-019-0373-4","article-title":"DLIGAND2: an improved knowledge-based energy function for protein-ligand interactions using the distance-scaled, finite, ideal-gas reference state","author":"Chen","year":"2019"},{"key":"2023062213570754600_btaa534-B5","volume-title":":","author":"Chen","year":"2016"},{"key":"2023062213570754600_btaa534-B6","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1038\/s41419-017-0003-x","article-title":"EGBMMDA: extreme gradient boosting machine for MiRNA-disease association prediction","volume":"9","author":"Chen","year":"2018","journal-title":"Cell Death Dis"},{"key":"2023062213570754600_btaa534-B7","doi-asserted-by":"crossref","first-page":"149","DOI":"10.3390\/info9070149","article-title":"Effective intrusion detection system using XGBoost","volume":"9","author":"Dhaliwal","year":"2018","journal-title":"Information"},{"key":"2023062213570754600_btaa534-B8","doi-asserted-by":"crossref","first-page":"696","DOI":"10.1038\/nature12756","article-title":"In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features","volume":"505","author":"Ding","year":"2014","journal-title":"Nature"},{"key":"2023062213570754600_btaa534-B9","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1016\/0300-9084(94)90120-1","article-title":"Potential secondary structure at the translational start domain of eukaryotic and prokaryotic mRNAs","volume":"76","author":"Ganoza","year":"1994","journal-title":"Biochimie"},{"key":"2023062213570754600_btaa534-B10","doi-asserted-by":"crossref","first-page":"1977","DOI":"10.1016\/j.febslet.2008.03.004","article-title":"RNA-binding proteins and post-transcriptional gene regulation","volume":"582","author":"Glisovic","year":"2008","journal-title":"FEBS Lett"},{"key":"2023062213570754600_btaa534-B70990541","doi-asserted-by":"crossref","first-page":"e1001074","DOI":"10.1371\/journal.pgen.1001074","article-title":"Disease-Associated Mutations That Alter the RNA Structural Ensemble","volume":"6","author":"Halvorsen","year":"2010","journal-title":"PLoS Genetics"},{"key":"2023062213570754600_btaa534-B11","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","article-title":"The meaning and use of the area under a receiver operating characteristic (ROC) curve","volume":"143","author":"Hanley","year":"1982","journal-title":"Radiology"},{"key":"2023062213570754600_btaa534-B12","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1007\/978-1-62703-709-9_4","article-title":"Energy-directed RNA structure prediction","volume":"1097","author":"Hofacker","year":"2014","journal-title":"Methods Mol. Biol"},{"key":"2023062213570754600_btaa534-B13","doi-asserted-by":"crossref","first-page":"R9","DOI":"10.1186\/gb-2012-13-2-r9","article-title":"Predicting the effects of frameshifting indels","volume":"13","author":"Hu","year":"2012","journal-title":"Genome Biol"},{"key":"2023062213570754600_btaa534-B14","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1038\/ejhg.2012.3","article-title":"1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data","volume":"20","author":"Huang","year":"2012","journal-title":"Eur. J. Hum. Genet"},{"key":"2023062213570754600_btaa534-B15","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1109\/ICCSNT.2012.6526067","article-title":"Application of BP neural network based on GA in function fitting","author":"Jin-Yue","year":"2012","journal-title":"Proceedings of 2012 2nd International Conference on Computer Science and Network Technology"},{"key":"2023062213570754600_btaa534-B16","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1038\/nature09322","article-title":"Genome-wide measurement of RNA secondary structure in yeast","volume":"467","author":"Kertesz","year":"2010","journal-title":"Nature"},{"key":"2023062213570754600_btaa534-B17","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1016\/j.chom.2018.10.011","article-title":"Integrative analysis of Zika virus genome RNA structure reveals critical determinants of viral infectivity","volume":"24","author":"Li","year":"2018","journal-title":"Cell Host Microbe"},{"key":"2023062213570754600_btaa534-B18","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023062213570754600_btaa534-B19","article-title":"ViennaRNA Package 2.0","volume":"6, 26","author":"Lorenz","year":"2011","journal-title":"Algorithm Mol. Biol"},{"key":"2023062213570754600_btaa534-B20","author":"Lowry"},{"key":"2023062213570754600_btaa534-B21","doi-asserted-by":"crossref","first-page":"11063","DOI":"10.1073\/pnas.1106501108","article-title":"Multiplexed RNA structure characterization with selective 2\u2019-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq)","volume":"108","author":"Lucks","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062213570754600_btaa534-B22","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1089\/106652700750050862","article-title":"RNA pseudoknot prediction in energy-based models","volume":"7","author":"Lyngso","year":"2000","journal-title":"J. Comput. Biol"},{"key":"2023062213570754600_btaa534-B23","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1006\/jmbi.1999.2700","article-title":"Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure","volume":"288","author":"Mathews","year":"1999","journal-title":"J. Mol. Biol"},{"key":"2023062213570754600_btaa534-B24","doi-asserted-by":"crossref","first-page":"D495","DOI":"10.1093\/nar\/gky1044","article-title":"Translocatome: a novel resource for the analysis of protein translocation between cellular organelles","volume":"47","author":"Mendik","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023062213570754600_btaa534-B25","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1038\/nrg3681","article-title":"Insights into RNA structure and function from genome-wide studies","volume":"15","author":"Mortimer","year":"2014","journal-title":"Nat. Rev. Genet"},{"key":"2023062213570754600_btaa534-B26","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1101\/gr.138545.112","article-title":"SeqFold: genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data","volume":"23","author":"Ouyang","year":"2013","journal-title":"Genome Res"},{"key":"2023062213570754600_btaa534-B27","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn Res"},{"key":"2023062213570754600_btaa534-B28","article-title":"A high-throughput approach to profile RNA structure","volume":"45","author":"Ponti","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023062213570754600_btaa534-B29","first-page":"1212","author":"Roberts","year":"2003"},{"key":"2023062213570754600_btaa534-B30","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1038\/nature12894","article-title":"Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo","volume":"505","author":"Rouskin","year":"2014","journal-title":"Nature"},{"key":"2023062213570754600_btaa534-B31","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1007\/978-1-61779-949-5_8","article-title":"RNA structure prediction: an overview of methods","volume":"905","author":"Seetin","year":"2012","journal-title":"Methods Mol. Biol"},{"key":"2023062213570754600_btaa534-B32","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1186\/1471-2105-7-65","article-title":"Computational models with thermodynamic and composition features improve siRNA design","volume":"7","author":"Shabalina","year":"2006","journal-title":"BMC Bioinform"},{"key":"2023062213570754600_btaa534-B33","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1038\/nmeth.1529","article-title":"FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing","volume":"7","author":"Underwood","year":"2010","journal-title":"Nat. Methods"},{"key":"2023062213570754600_btaa534-B34","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1038\/nature12946","article-title":"Landscape and variation of RNA secondary structure across the human transcriptome","volume":"505","author":"Wan","year":"2014","journal-title":"Nature"},{"key":"2023062213570754600_btaa534-B0083451","doi-asserted-by":"crossref","first-page":"e164","DOI":"10.1093\/nar\/gkq603","article-title":"ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data","volume":"38","author":"Wang","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023062213570754600_btaa534-B35","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nrg2484","article-title":"RNA-Seq: a revolutionary tool for transcriptomics","volume":"10","author":"Wang","year":"2009","journal-title":"Nat. Rev. Genet"},{"key":"2023062213570754600_btaa534-B36","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1261\/rna.057364.116","article-title":"Genome-scale characterization of RNA tertiary structures and their functional impact by RNA solvent accessibility prediction","volume":"23","author":"Yang","year":"2017","journal-title":"RNA"},{"key":"2023062213570754600_btaa534-B37","doi-asserted-by":"crossref","first-page":"1157","DOI":"10.1261\/rna.2500605","article-title":"RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble","volume":"11","author":"Ye","year":"2005","journal-title":"RNA"},{"key":"2023062213570754600_btaa534-B38","doi-asserted-by":"crossref","first-page":"R23","DOI":"10.1186\/gb-2013-14-3-r23","article-title":"DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels","volume":"14","author":"Zhao","year":"2013","journal-title":"Genome Biol"},{"key":"2023062213570754600_btaa534-B39","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1186\/s12864-018-4766-y","article-title":"BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues","volume":"19","author":"Zou","year":"2018","journal-title":"BMC Genomics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa534\/33620331\/btaa534.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/17\/4576\/50677763\/btaa534.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/17\/4576\/50677763\/btaa534.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,23]],"date-time":"2023-06-23T16:33:02Z","timestamp":1687537982000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/17\/4576\/5848407"}},"subtitle":[],"editor":[{"given":"Jan","family":"Gorodkin","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,5,28]]},"references-count":42,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2020,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa534","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/610782","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,9,1]]},"published":{"date-parts":[[2020,5,28]]}}}