{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T12:23:24Z","timestamp":1772713404430,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T00:00:00Z","timestamp":1648512000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100004826","name":"Beijing Natural Science Foundation","doi-asserted-by":"publisher","award":["5214023"],"award-info":[{"award-number":["5214023"]}],"id":[{"id":"10.13039\/501100004826","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31400669"],"award-info":[{"award-number":["31400669"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31830054"],"award-info":[{"award-number":["31830054"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005150","name":"CAMS","doi-asserted-by":"publisher","award":["2017PT310004"],"award-info":[{"award-number":["2017PT310004"]}],"id":[{"id":"10.13039\/501100005150","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,5,13]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The parallel measurement of transcriptome and proteome revealed unmatched profiles. Since proteomic analysis is more expensive and challenging than transcriptomic analysis, the question of how to use messenger RNA (mRNA) expression data to predict protein level is extremely important. Here, we comprehensively evaluated 13 machine learning models on inferring protein expression levels using RNA expression profile. A total of 20 proteogenomic datasets from three mainstream proteomic platforms with &amp;gt;2500 samples of 13 human tissues were collected for model evaluation. Our results highlighted that the appropriate feature selection methods combined with classical machine learning models could achieve excellent predictive performance. The voting ensemble model outperformed other candidate models across datasets. Adding the mRNA proxy model to the regression model further improved the prediction performance. The dataset and gene characteristics could affect the prediction performance. Finally, we applied the model to the brain transcriptome of cerebral cortex regions to infer the protein profile for better understanding the functional characteristics of the brain regions. This benchmarking work not only provides useful hints on the inherent correlation between transcriptome and proteome, but also has practical value of the transcriptome-based prediction of protein expression levels.<\/jats:p>","DOI":"10.1093\/bib\/bbac091","type":"journal-article","created":{"date-parts":[[2022,2,25]],"date-time":"2022-02-25T20:10:27Z","timestamp":1645819827000},"source":"Crossref","is-referenced-by-count":4,"title":["Evaluation of machine learning models on protein level inference from prioritized RNA features"],"prefix":"10.1093","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4304-3479","authenticated-orcid":false,"given":"Wenjian","family":"Xu","sequence":"first","affiliation":[{"name":"Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children\u2019s Hospital, Capital Medical University, National Center for Children\u2019s Health, Beijing 100045, China"}]},{"given":"Haochen","family":"He","sequence":"additional","affiliation":[{"name":"Department of Radiation Protection and Health Physics, Beijing Institute of Radiation Medicine, Beijing 100850, China"}]},{"given":"Zhengguang","family":"Guo","sequence":"additional","affiliation":[{"name":"Core Facility of Instruments, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, 5 Dong Dan San Tiao, Beijing 100005, China"}]},{"given":"Wei","family":"Li","sequence":"additional","affiliation":[{"name":"Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children\u2019s Hospital, Capital Medical University, National Center for Children\u2019s Health, Beijing 100045, China"}]}],"member":"286","published-online":{"date-parts":[[2022,3,29]]},"reference":[{"key":"2022051813444290900_ref1","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1016\/j.cell.2016.03.014","article-title":"On the dependency of cellular protein levels on mRNA abundance","volume":"165","author":"Liu","year":"2016","journal-title":"Cell"},{"key":"2022051813444290900_ref2","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1016\/j.cels.2017.03.003","article-title":"Absolute quantification of protein and mRNA abundances demonstrate variability in gene-specific translation efficiency in yeast","volume":"4","author":"Lahtvee","year":"2017","journal-title":"Cell Syst"},{"key":"2022051813444290900_ref3","doi-asserted-by":"crossref","first-page":"E19","DOI":"10.1038\/nature22293","article-title":"Can we predict protein from mRNA levels?","volume":"547","author":"Fortelny","year":"2017","journal-title":"Nature"},{"key":"2022051813444290900_ref4","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1038\/nrg3185","article-title":"Insights into the regulation of protein abundance from proteomic and transcriptomic analyses","volume":"13","author":"Vogel","year":"2012","journal-title":"Nat Rev Genet"},{"key":"2022051813444290900_ref5","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1038\/nature13438","article-title":"Proteogenomic characterization of human colon and rectal cancer","volume":"513","author":"Zhang","year":"2014","journal-title":"Nature"},{"key":"2022051813444290900_ref6","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1016\/j.ccell.2019.02.005","article-title":"The Proteogenomic landscape of curable prostate cancer","volume":"35","author":"Sinha","year":"2019","journal-title":"Cancer Cell"},{"key":"2022051813444290900_ref7","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.cell.2020.05.043","article-title":"Integrative proteomic characterization of human lung adenocarcinoma","volume":"182","author":"Xu","year":"2020","journal-title":"Cell"},{"key":"2022051813444290900_ref8","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1038\/s41586-019-0987-8","article-title":"Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma","volume":"567","author":"Jiang","year":"2019","journal-title":"Nature"},{"key":"2022051813444290900_ref9","doi-asserted-by":"crossref","first-page":"1787","DOI":"10.1038\/s41593-017-0011-2","article-title":"A multiregional proteomic survey of the postnatal human brain","volume":"20","author":"Carlyle","year":"2017","journal-title":"Nat Neurosci"},{"key":"2022051813444290900_ref10","doi-asserted-by":"crossref","first-page":"1240","DOI":"10.1016\/j.cell.2019.10.038","article-title":"Integrated proteogenomic characterization of HBV-related hepatocellular carcinoma","volume":"179","author":"Gao","year":"2019","journal-title":"Cell"},{"key":"2022051813444290900_ref11","doi-asserted-by":"crossref","first-page":"729","DOI":"10.1016\/j.cell.2020.01.026","article-title":"Proteogenomic characterization of endometrial carcinoma","volume":"180","author":"Dou","year":"2020","journal-title":"Cell"},{"key":"2022051813444290900_ref12","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1016\/j.cell.2020.06.013","article-title":"Proteogenomic characterization reveals therapeutic vulnerabilities in lung adenocarcinoma","volume":"182","author":"Gillette","year":"2020","journal-title":"Cell"},{"key":"2022051813444290900_ref13","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1016\/j.cell.2020.06.012","article-title":"Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression","volume":"182","author":"Chen","year":"2020","journal-title":"Cell"},{"key":"2022051813444290900_ref14","doi-asserted-by":"crossref","first-page":"1035","DOI":"10.1016\/j.cell.2019.03.030","article-title":"Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities","volume":"177","author":"Vasaikar","year":"2019","journal-title":"Cell"},{"key":"2022051813444290900_ref15","doi-asserted-by":"crossref","first-page":"964","DOI":"10.1016\/j.cell.2019.10.007","article-title":"Integrated proteogenomic characterization of clear cell renal cell carcinoma","volume":"179","author":"Clark","year":"2019","journal-title":"Cell"},{"key":"2022051813444290900_ref16","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1016\/j.ccell.2021.01.006","article-title":"Proteogenomic and metabolomic characterization of human glioblastoma","volume":"39","author":"Wang","year":"2021","journal-title":"Cancer Cell"},{"key":"2022051813444290900_ref17","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1016\/j.ccell.2020.12.007","article-title":"Proteogenomic insights into the biology and treatment of HPV-negative head and neck squamous cell carcinoma","volume":"39","author":"Huang","year":"2021","journal-title":"Cancer Cell"},{"key":"2022051813444290900_ref18","doi-asserted-by":"crossref","first-page":"1962","DOI":"10.1016\/j.cell.2020.10.044","article-title":"Integrated proteogenomic characterization across major histological types of Pediatric brain cancer","volume":"183","author":"Petralia","year":"2020","journal-title":"Cell"},{"key":"2022051813444290900_ref19","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1038\/nature18003","article-title":"Proteogenomics connects somatic mutations to signalling in breast cancer","volume":"534","author":"Mertins","year":"2016","journal-title":"Nature"},{"key":"2022051813444290900_ref20","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1016\/j.cell.2016.05.069","article-title":"Integrated proteogenomic characterization of human high-grade serous ovarian cancer","volume":"166","author":"Zhang","year":"2016","journal-title":"Cell"},{"key":"2022051813444290900_ref21","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1016\/j.ccell.2018.12.003","article-title":"Proteogenomic characterization of human early-onset gastric cancer","volume":"35","author":"Mun","year":"2019","journal-title":"Cancer Cell"},{"key":"2022051813444290900_ref22","doi-asserted-by":"crossref","first-page":"4348","DOI":"10.1016\/j.cell.2021.07.016","article-title":"A proteogenomic portrait of lung squamous cell carcinoma","volume":"184","author":"Satpathy","year":"2021","journal-title":"Cell"},{"key":"2022051813444290900_ref23","doi-asserted-by":"crossref","first-page":"5031","DOI":"10.1016\/j.cell.2021.08.023","article-title":"Proteogenomic characterization of pancreatic ductal adenocarcinoma","volume":"184","author":"Cao","year":"2021","journal-title":"Cell"},{"key":"2022051813444290900_ref24","doi-asserted-by":"crossref","first-page":"1436","DOI":"10.1016\/j.cell.2020.10.036","article-title":"Proteogenomic landscape of breast cancer tumorigenesis and targeted therapy","volume":"183","author":"Krug","year":"2020","journal-title":"Cell"},{"key":"2022051813444290900_ref25","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1016\/j.cels.2020.06.013","article-title":"Community assessment of the predictability of cancer protein and phosphoprotein levels from genomics and transcriptomics","volume":"11","author":"Yang","year":"2020","journal-title":"Cell Syst"},{"key":"2022051813444290900_ref26","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1186\/s12915-019-0730-9","article-title":"Joint learning improves protein abundance prediction in cancers","volume":"17","author":"Li","year":"2019","journal-title":"BMC Biol"},{"key":"2022051813444290900_ref27","doi-asserted-by":"crossref","first-page":"3788","DOI":"10.1093\/bioinformatics\/btaa239","article-title":"Blood-based multi-tissue gene expression inference with Bayesian ridge regression","volume":"36","author":"Xu","year":"2020","journal-title":"Bioinformatics"},{"key":"2022051813444290900_ref28","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1186\/s13059-019-1795-z","article-title":"A comparison of automatic cell identification methods for single-cell RNA sequencing data","volume":"20","author":"Abdelaal","year":"2019","journal-title":"Genome Biol"},{"key":"2022051813444290900_ref29","doi-asserted-by":"crossref","first-page":"D1038","DOI":"10.1093\/nar\/gky1151","article-title":"OMIM.Org: leveraging knowledge across phenotype-gene relationships","volume":"47","author":"Amberger","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022051813444290900_ref30","doi-asserted-by":"crossref","first-page":"1260419","DOI":"10.1126\/science.1260419","article-title":"Proteomics. Tissue-based map of the human proteome","volume":"347","author":"Uhl\u00e9n","year":"2015","journal-title":"Science"},{"key":"2022051813444290900_ref31","doi-asserted-by":"crossref","first-page":"D542","DOI":"10.1093\/nar\/gkx1104","article-title":"iPTMnet: an integrated resource for protein post-translational modification network discovery","volume":"46","author":"Huang","year":"2018","journal-title":"Nucleic Acid Res"},{"key":"2022051813444290900_ref32","doi-asserted-by":"crossref","first-page":"974","DOI":"10.1074\/mcp.RA118.000583","article-title":"Peptide level turnover measurements enable the study of Proteoform dynamics","volume":"17","author":"Zecha","year":"2018","journal-title":"Mol Cell Proteomics"},{"key":"2022051813444290900_ref33","doi-asserted-by":"crossref","first-page":"D559","DOI":"10.1093\/nar\/gky973","article-title":"CORUM: the comprehensive resource of mammalian protein complexes-2019","volume":"47","author":"Giurgiu","year":"2019","journal-title":"Nucleic Acid Res"},{"key":"2022051813444290900_ref34","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1038\/nrg.2017.75","article-title":"Human gene essentiality","volume":"19","author":"Bartha","year":"2018","journal-title":"Nat Rev Genet"},{"key":"2022051813444290900_ref35","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"SciPy 1.0: fundamental algorithms for scientific computing in python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat Method"},{"key":"2022051813444290900_ref36","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2022051813444290900_ref37","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/MCSE.2007.55","article-title":"Matplotlib: a 2D graphics environment","volume":"9","author":"Hunter","year":"2007","journal-title":"Comput Sci Eng"},{"key":"2022051813444290900_ref38","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-24277-4","volume-title":"ggplot2: Elegant Graphics for Data Analysis","author":"Wickham","year":"2016"},{"key":"2022051813444290900_ref39","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1038\/nature11405","article-title":"An anatomically comprehensive atlas of the adult human brain transcriptome","volume":"489","author":"Hawrylycz","year":"2012","journal-title":"Nature"},{"key":"2022051813444290900_ref40","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1016\/j.tins.2012.09.005","article-title":"The Allen human brain atlas: comprehensive gene expression mapping of the human brain","volume":"35","author":"Shen","year":"2012","journal-title":"Trends Neurosci"},{"key":"2022051813444290900_ref41","doi-asserted-by":"crossref","first-page":"1599","DOI":"10.1152\/physrev.00025.2019","article-title":"SUMO: from bench to bedside","volume":"100","author":"Chang","year":"2020","journal-title":"Physiol Rev"},{"key":"2022051813444290900_ref42","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1016\/j.cels.2017.08.013","article-title":"Widespread post-transcriptional attenuation of genomic copy-number variation in cancer","volume":"5","author":"Gon\u00e7alves","year":"2017","journal-title":"Cell Syst"},{"key":"2022051813444290900_ref43","doi-asserted-by":"crossref","first-page":"1397","DOI":"10.1093\/bib\/bbz072","article-title":"New insights on human essential genes based on integrated analysis and the construction of the HEGIAP web-based platform","volume":"21","author":"Chen","year":"2020","journal-title":"Brief Bioinform"},{"issue":"21","key":"2022051813444290900_ref44","first-page":"00225","article-title":"A global multiregional proteomic map of the human cerebral cortex","volume":"S1672\u20130229","author":"Guo","year":"2021","journal-title":"Genom Proteom Bioinformat"},{"key":"2022051813444290900_ref45","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1038\/s41467-020-14391-0","article-title":"Surface protein imputation from single cell transcriptomes by deep neural networks","volume":"11","author":"Zhou","year":"2020","journal-title":"Nat Commun"},{"key":"2022051813444290900_ref46","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.ymeth.2020.10.001","article-title":"Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data","volume":"189","author":"Xu","year":"2021","journal-title":"Methods"},{"key":"2022051813444290900_ref47","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1186\/s12859-021-04022-w","article-title":"PIKE-R2P: protein-protein interaction network-based knowledge embedding with graph neural network for single-cell RNA to protein prediction","volume":"22","author":"Dai","year":"2021","journal-title":"BMC Bioinformat"},{"key":"2022051813444290900_ref48","doi-asserted-by":"crossref","first-page":"D1266","DOI":"10.1093\/nar\/gkx965","article-title":"The BioStudies database-one stop shop for all data supporting a life sciences study","volume":"46","author":"Sarkans","year":"2018","journal-title":"Nucleic Acid Res"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/3\/bbac091\/43745119\/bbac091.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/3\/bbac091\/43745119\/bbac091.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,18]],"date-time":"2022-05-18T13:46:38Z","timestamp":1652881598000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac091\/6555405"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,29]]},"references-count":48,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,5,13]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac091","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,5]]},"published":{"date-parts":[[2022,3,29]]},"article-number":"bbac091"}}