{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T22:14:22Z","timestamp":1772748862264,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Gene expression profiling technologies can generally produce mRNA abundance data for all genes in a genome. A dearth of proteomic data persists because identification range and sensitivity of proteomic measurements lag behind those of transcriptomic measurements. Using partial proteomic data, it is likely that integrative transcriptomic and proteomic analysis may introduce significant bias. Developing methodologies to accurately estimate missing proteomic data will allow better integration of transcriptomic and proteomic datasets and provide deeper insight into metabolic mechanisms underlying complex biological systems.<\/jats:p>\n               <jats:p>Results: In this study, we present a non-linear data-driven model to predict abundance for undetected proteins using two independent datasets of cognate transcriptomic and proteomic data collected from Desulfovibrio vulgaris. We use stochastic gradient boosted trees (GBT) to uncover possible non-linear relationships between transcriptomic and proteomic data, and to predict protein abundance for the proteins not experimentally detected based on relevant predictors such as mRNA abundance, cellular role, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. Initially, we constructed a GBT model using all possible variables to assess their relative importance and characterize the behavior of the predictive model. A strong plateau effect in the regions of high mRNA values and sparse data occurred in this model. Hence, we removed genes in those areas based on thresholds estimated from the partial dependency plots where this behavior was captured. At this stage, only the strongest predictors of protein abundance were retained to reduce the complexity of the GBT model. After removing genes in the plateau region, mRNA abundance, main cellular functional categories and few triple codon counts emerged as the top-ranked predictors of protein abundance. We then created a new tuned GBT model using the five most significant predictors. The construction of our non-linear model consists of a set of serial regression trees models with implicit strength in variable selection. The model provides variable relative importance measures using as a criterion mean square error. The results showed that coefficients of determination for our nonlinear models ranged from 0.393 to 0.582 in both datasets, providing better results than linear regression used in the past. We evaluated the validity of this non-linear model using biological information of operons, regulons and pathways, and the results demonstrated that the coefficients of variation of estimated protein abundance values within operons, regulons or pathways are indeed smaller than those for random groups of proteins.<\/jats:p>\n               <jats:p>Contact: \u00a0weiwen.zhang@asu.edu; george.runger@asu.edu<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp325","type":"journal-article","created":{"date-parts":[[2009,5,16]],"date-time":"2009-05-16T00:14:09Z","timestamp":1242432849000},"page":"1905-1914","source":"Crossref","is-referenced-by-count":29,"title":["Integrative analysis of transcriptomic and proteomic data of <i>Desulfovibrio vulgaris<\/i>: a non-linear model to predict abundance of undetected proteins"],"prefix":"10.1093","volume":"25","author":[{"given":"Wandaliz","family":"Torres-Garc\u00eda","sequence":"first","affiliation":[{"name":"1 Department of Industrial, Systems and Operations Engineering, Tempe AZ, 85287-5906 and 2 Center for Ecogenomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-6501"},{"name":"1 Department of Industrial, Systems and Operations Engineering, Tempe AZ, 85287-5906 and 2 Center for Ecogenomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-6501"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weiwen","family":"Zhang","sequence":"additional","affiliation":[{"name":"1 Department of Industrial, Systems and Operations Engineering, Tempe AZ, 85287-5906 and 2 Center for Ecogenomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-6501"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"George C.","family":"Runger","sequence":"additional","affiliation":[{"name":"1 Department of Industrial, Systems and Operations Engineering, Tempe AZ, 85287-5906 and 2 Center for Ecogenomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-6501"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Roger H.","family":"Johnson","sequence":"additional","affiliation":[{"name":"1 Department of Industrial, Systems and Operations Engineering, Tempe AZ, 85287-5906 and 2 Center for Ecogenomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-6501"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Deirdre R.","family":"Meldrum","sequence":"additional","affiliation":[{"name":"1 Department of Industrial, Systems and Operations Engineering, Tempe AZ, 85287-5906 and 2 Center for Ecogenomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-6501"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2009,5,15]]},"reference":[{"key":"2023013112050523300_B1","doi-asserted-by":"crossref","first-page":"1015","DOI":"10.1101\/gr.3844805","article-title":"The MicrobesOnline web site for comparative genomics","volume":"15","author":"Alm","year":"2005","journal-title":"Genome Res."},{"key":"2023013112050523300_B2","doi-asserted-by":"crossref","first-page":"16577","DOI":"10.1073\/pnas.0406767101","article-title":"Integrative analysis of genomescale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription","volume":"101","author":"Alter","year":"2004","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112050523300_B3","doi-asserted-by":"crossref","first-page":"41921","DOI":"10.1074\/jbc.M304470200","article-title":"Osteopontin regulation by inorganic phosphate is ERK1\/2-, protein kinase C-, and proteasomedependent","volume":"278","author":"Beck","year":"2003","journal-title":"J. Biol. Chem."},{"key":"2023013112050523300_B4","doi-asserted-by":"crossref","first-page":"1083","DOI":"10.1074\/mcp.M400099-MCP200","article-title":"Posttranscriptional expression regulation in the yeast Saccharomyces cerevisiae on a genomic scale","volume":"3","author":"Beyer","year":"2004","journal-title":"Mol. Cell Proteomics"},{"key":"2023013112050523300_B5","doi-asserted-by":"crossref","first-page":"1284","DOI":"10.1074\/mcp.M500082-MCP200","article-title":"A combined proteome and microarray investigation of inorganic phosphate-induced pre-osteoblast cells","volume":"4","author":"Conrads","year":"2005","journal-title":"Mol. Cell Proteomics"},{"key":"2023013112050523300_B6","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1890\/0012-9658(2007)88[243:BTFEMA]2.0.CO;2","article-title":"Boosted trees for ecological modeling and prediction","volume":"88","author":"De'ath","year":"2007","journal-title":"Ecology"},{"key":"2023013112050523300_B7","doi-asserted-by":"crossref","first-page":"802","DOI":"10.1111\/j.1365-2656.2008.01390.x","article-title":"A working guide to boosted regression trees","volume":"77","author":"Elith","year":"2008","journal-title":"J. Anim. Ecol."},{"key":"2023013112050523300_B8","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"2023013112050523300_B9","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1016\/S0167-9473(01)00065-2","article-title":"Stochastic gradient boosting","volume":"38","author":"Friedman","year":"2002","journal-title":"Comput. Stat. Data Anal."},{"key":"2023013112050523300_B10","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1093\/bioinformatics\/18.4.585","article-title":"Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts","volume":"18","author":"Greenbaum","year":"2002","journal-title":"Bioinformatics"},{"key":"2023013112050523300_B11","doi-asserted-by":"crossref","first-page":"1720","DOI":"10.1128\/MCB.19.3.1720","article-title":"Correlation between protein and mRNA abundance in yeast","volume":"19","author":"Gygi","year":"1999","journal-title":"Mol. Cell Biol."},{"key":"2023013112050523300_B12","volume-title":"The Elements of Statistical Learning-Data Mining, Inference, Prediction.","author":"Hastie","year":"2001"},{"key":"2023013112050523300_B13","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1016\/j.copbio.2003.10.006","article-title":"Interplay of transcriptomics and proteomics","volume":"14","author":"Hegde","year":"2003","journal-title":"Curr. Opin. Biotechnol."},{"key":"2023013112050523300_B14","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1038\/nbt959","article-title":"The genome sequence of the anaerobic, sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough","volume":"22","author":"Heidelberg","year":"2004","journal-title":"Nat. Biotechnol."},{"key":"2023013112050523300_B15","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1089\/153623104773547480","article-title":"Genomic insights into gene regulation of Desulfovibrio vulgaris Hildenborough","volume":"8","author":"Hemme","year":"2004","journal-title":"OMICS"},{"key":"2023013112050523300_B16","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1097\/00001622-200301000-00006","article-title":"Serial analysis of gene expression and cancer","volume":"15","author":"Hermeking","year":"2003","journal-title":"Curr. Opin. Oncol."},{"key":"2023013112050523300_B17","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1007\/s10142-002-0065-3","article-title":"Global analysis of gene expression in yeast","volume":"2","author":"Horak","year":"2002","journal-title":"Funct. Integr. Genomics"},{"key":"2023013112050523300_B18","doi-asserted-by":"crossref","first-page":"929","DOI":"10.1126\/science.292.5518.929","article-title":"Integrated genomic and proteomic analyses of a systematically perturbed metabolic network","volume":"292","author":"Ideker","year":"2001","journal-title":"Science"},{"key":"2023013112050523300_B19","volume-title":"Miller And Freund's Probability and Statistics for Engineers.","author":"Johnson","year":"2005"},{"key":"2023013112050523300_B20","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1016\/S0092-8674(03)00926-7","article-title":"Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria","volume":"115","author":"Mootha","year":"2003","journal-title":"Cell"},{"key":"2023013112050523300_B21","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1073\/pnas.242716699","article-title":"Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics","volume":"100","author":"Mootha","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112050523300_B22","doi-asserted-by":"crossref","first-page":"4068","DOI":"10.1128\/JB.01921-05","article-title":"Salt stress in Desulfovibrio vulgaris Hildenborough: an integrated genomics approach","volume":"188","author":"Mukhopadhyay","year":"2006","journal-title":"J. Bacteriol."},{"key":"2023013112050523300_B23","doi-asserted-by":"crossref","first-page":"1641","DOI":"10.1093\/bioinformatics\/btl134","article-title":"Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: zero-inflated Poisson regression models to predict abundance of undetected proteins","volume":"22","author":"Nie","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112050523300_B24","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1016\/j.bbrc.2005.11.055","article-title":"Correlation between mRNA and protein abundance in Desulfovibrio vulgaris: a multiple regression to identify sources of variations","volume":"339","author":"Nie","year":"2006","journal-title":"Biochem. Biophys Res. Commun."},{"key":"2023013112050523300_B25","doi-asserted-by":"crossref","first-page":"2229","DOI":"10.1534\/genetics.106.065862","article-title":"Correlation of mRNA expression and protein abundance affected by multiple sequence features related to translational efficiency in Desulfovibrio vulgaris: a quantitative analysis","volume":"174","author":"Nie","year":"2006","journal-title":"Genetics"},{"key":"2023013112050523300_B26","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1080\/07388550701334212","article-title":"Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications","volume":"27","author":"Nie","year":"2007","journal-title":"Crit. Rev. Biotechnol."},{"key":"2023013112050523300_B27","doi-asserted-by":"crossref","first-page":"1749","DOI":"10.1101\/gr.362402","article-title":"Gene expression analysis using oligonucleotide arrays produced by maskless photolithography","volume":"12","author":"Nuwaysir","year":"2002","journal-title":"Genome Res."},{"key":"2023013112050523300_B28","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1186\/1471-2105-7-19","article-title":"OpWise: operons aid the identification of differentially expressed genes in bacterial microarray experiments","volume":"7","author":"Price","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023013112050523300_B29","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1021\/pr0498638","article-title":"Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome","volume":"4","author":"Qian","year":"2005","journal-title":"J. Proteome Res."},{"key":"2023013112050523300_B30","author":"Ridgeway","year":"2007","journal-title":"Generalized boosted models: a guide to the gbm package."},{"key":"2023013112050523300_B31","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.mimet.2004.09.017","article-title":"Correlation of proteomic and transcriptomic profiles of Staphylococcus aureus during the post-exponential phase of growth","volume":"60","author":"Scherl","year":"2006","journal-title":"J. Microbiol. Methods"},{"key":"2023013112050523300_B32","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1186\/1471-2164-7-296","article-title":"Exploring glycopeptide-resistance in Staphylococcus aureus: a combined proteomics and transcriptomics approach for the identification of resistance-related markers","volume":"7","author":"Scherl","year":"2006","journal-title":"BMC Genomics"},{"key":"2023013112050523300_B33","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1089\/15362310252780843","article-title":"The use of accurate mass tags for high-throughput microbial proteomics","volume":"6","author":"Smith","year":"2002","journal-title":"OMICS"},{"key":"2023013112050523300_B34","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1093\/bioinformatics\/btk019","article-title":"Improving missing value estimation in microarray data with gene ontology","volume":"22","author":"Tuikkala","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112050523300_B35","doi-asserted-by":"crossref","first-page":"3107","DOI":"10.1073\/pnas.0634629100","article-title":"Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae","volume":"100","author":"Washburn","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112050523300_B36","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1002\/pmic.200500856","article-title":"Guidelines for the next 10 years of proteomics","volume":"6","author":"Wilkins","year":"2006","journal-title":"Proteomics"},{"key":"2023013112050523300_B37","doi-asserted-by":"crossref","first-page":"4286","DOI":"10.1002\/pmic.200500930","article-title":"A proteomic view of Desulfovibrio vulgaris metabolism as determined by liquid chromatography coupled with tandem mass spectrometry","volume":"6","author":"Zhang","year":"2006","journal-title":"Proteomics"},{"key":"2023013112050523300_B38","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1007\/s10482-005-9024-z","article-title":"Global transcriptomic analysis of Desulfovibrio vulgaris on different electron donors","volume":"89","author":"Zhang","year":"2006","journal-title":"Antonie Van Leeuwenhoek"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/15\/1905\/48993536\/bioinformatics_25_15_1905.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/15\/1905\/48993536\/bioinformatics_25_15_1905.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:20:11Z","timestamp":1675200011000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/15\/1905\/211925"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,5,15]]},"references-count":38,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2009,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp325","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,8,1]]},"published":{"date-parts":[[2009,5,15]]}}}