{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T19:28:43Z","timestamp":1780601323503,"version":"3.54.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2019,6,14]],"date-time":"2019-06-14T00:00:00Z","timestamp":1560470400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Environmental Determinants of Diabetes in the Young"},{"name":"TEDDY"},{"name":"National Institute of Diabetes"},{"name":"Digestive and Kidney Diseases"},{"DOI":"10.13039\/100000062","name":"NIDDK","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000062","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Anna-Valentine Cancer Fund Focused Interactive Group"},{"name":"FIG"},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000071","name":"National Institute of Child Health and Human Development","doi-asserted-by":"publisher","award":["U54 HD090258"],"award-info":[{"award-number":["U54 HD090258"]}],"id":[{"id":"10.13039\/100000071","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Biostatistics and Bioinformatics Shared Resource"},{"name":"Proteomics and Metabolomics Core"},{"DOI":"10.13039\/100000054","name":"National Cancer Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Moffitt\u2019s Cancer Center Support","award":["P30-CA076292"],"award-info":[{"award-number":["P30-CA076292"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Missingness in label-free mass spectrometry is inherent to the technology. A computational approach to recover missing values in metabolomics and proteomics datasets is important. Most existing methods are designed under a particular assumption, either missing at random or under the detection limit. If the missing pattern deviates from the assumption, it may lead to biased results. Hence, we investigate the missing patterns in free mass spectrometry data and develop an omnibus approach GMSimpute, to allow effective imputation accommodating different missing patterns.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Three proteomics datasets and one metabolomics dataset indicate missing values could be a mixture of abundance-dependent and abundance-independent missingness. We assess the performance of GMSimpute using simulated data (with a wide range of 80 missing patterns) and metabolomics data from the Cancer Genome Atlas breast cancer and clear cell renal cell carcinoma studies. Using Pearson correlation and normalized root mean square errors between the true and imputed abundance, we compare its performance to K-nearest neighbors\u2019 type approaches, Random Forest, GSimp, a model-based method implemented in DanteR and minimum values. The results indicate GMSimpute provides higher accuracy in imputation and exhibits stable performance across different missing patterns. In addition, GMSimpute is able to identify the features in downstream differential expression analysis with high accuracy when applied to the Cancer Genome Atlas datasets.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>GMSimpute is on CRAN: https:\/\/cran.r-project.org\/web\/packages\/GMSimpute\/index.html.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz488","type":"journal-article","created":{"date-parts":[[2019,6,10]],"date-time":"2019-06-10T19:14:53Z","timestamp":1560194093000},"page":"257-263","source":"Crossref","is-referenced-by-count":25,"title":["GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9874-3555","authenticated-orcid":false,"given":"Qian","family":"Li","sequence":"first","affiliation":[{"name":"Health Informatics Institute, University of South Florida , Tampa, FL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kate","family":"Fisher","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Bioinformatics, Moffitt Cancer Center , Tampa, FL, USA"},{"name":"Department of Biostatistics, IDDI Inc. , Raleigh, NC, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wenjun","family":"Meng","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Bioinformatics, Moffitt Cancer Center , Tampa, FL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bin","family":"Fang","sequence":"additional","affiliation":[{"name":"Proteomics and Metabolomics Core Facility, Moffitt Cancer Center , Tampa, FL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eric","family":"Welsh","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Bioinformatics, Moffitt Cancer Center , Tampa, FL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eric B","family":"Haura","sequence":"additional","affiliation":[{"name":"Department of Thoracic Oncology, Moffitt Cancer Center , Tampa, FL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"John M","family":"Koomen","sequence":"additional","affiliation":[{"name":"Department of Molecular Oncology, Moffitt Cancer Center , Tampa, FL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Steven A","family":"Eschrich","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Bioinformatics, Moffitt Cancer Center , Tampa, FL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Brooke L","family":"Fridley","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Bioinformatics, Moffitt Cancer Center , Tampa, FL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Y Ann","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Bioinformatics, Moffitt Cancer Center , Tampa, FL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2019,6,14]]},"reference":[{"key":"2023013109502952400_btz488-B1","doi-asserted-by":"crossref","first-page":"7217","DOI":"10.1158\/0008-5472.CAN-14-0505","article-title":"Adaptive responses to dasatinib-treated lung squamous cell cancer cells harboring DDR2 mutations","volume":"74","author":"Bai","year":"2014","journal-title":"Cancer Res"},{"key":"2023013109502952400_btz488-B2","first-page":"203","article-title":"Support vector regression","volume":"11","author":"Basak","year":"2007","journal-title":"Neural Information Processing-Letters and Reviews"},{"key":"2023013109502952400_btz488-B3","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. Series B Methodol"},{"key":"2023013109502952400_btz488-B4","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023013109502952400_btz488-B5","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1038\/nature11252","article-title":"Comprehensive molecular characterization of human colon and rectal cancer","volume":"487","year":"2012","journal-title":"Nature"},{"key":"2023013109502952400_btz488-B6","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.molonc.2015.07.004","article-title":"4-protein signature predicting tamoxifen treatment outcome in recurrent breast cancer","volume":"10","author":"De Marchi","year":"2016","journal-title":"Mol. Oncol"},{"key":"2023013109502952400_btz488-B7","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/j.ymeth.2015.03.006","article-title":"Evaluating kinase ATP uptake and tyrosine phosphorylation using multiplexed quantification of chemically labeled and post-translationally modified peptides","volume":"81","author":"Fang","year":"2015","journal-title":"Methods"},{"key":"2023013109502952400_btz488-B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J. Stat. Softw"},{"key":"2023013109502952400_btz488-B9","doi-asserted-by":"crossref","first-page":"3050","DOI":"10.1002\/elps.201500352","article-title":"Missing value imputation strategies for metabolomics data","volume":"36","author":"Grace","year":"2015","journal-title":"Electrophoresis"},{"key":"2023013109502952400_btz488-B10","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1016\/j.ccell.2015.12.004","article-title":"An integrated metabolic atlas of clear cell renal cell carcinoma","volume":"29","author":"Hakimi","year":"2016","journal-title":"Cancer Cell"},{"key":"2023013109502952400_btz488-B11","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/5254.708428","article-title":"Support vector machines","volume":"13","author":"Hearst","year":"1998","journal-title":"IEEE Intell. Syst"},{"key":"2023013109502952400_btz488-B12","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Hui","year":"2005","journal-title":"J. R. Stat. Soc. Series B Stat. Methodol"},{"key":"2023013109502952400_btz488-B13","doi-asserted-by":"crossref","first-page":"1555","DOI":"10.1093\/bioinformatics\/btx816","article-title":"Missing value imputation for LC-MS metabolomics data by incorporating metabolic network and adduct ion relations","volume":"34","author":"Jin","year":"2017","journal-title":"Bioinformatics"},{"key":"2023013109502952400_btz488-B14","doi-asserted-by":"crossref","first-page":"2028","DOI":"10.1093\/bioinformatics\/btp362","article-title":"A statistical framework for protein quantitation in bottom-up MS-based proteomics","volume":"25","author":"Karpievitch","year":"2009","journal-title":"Bioinformatics"},{"key":"2023013109502952400_btz488-B15","doi-asserted-by":"crossref","first-page":"140012","DOI":"10.1038\/sdata.2014.12","article-title":"Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control","volume":"1","author":"Kirwan","year":"2014","journal-title":"Sci. Data"},{"key":"2023013109502952400_btz488-B16","doi-asserted-by":"crossref","first-page":"966","DOI":"10.1093\/bioinformatics\/btq054","article-title":"Skyline: an open source document editor for creating and analyzing targeted proteomics experiments","volume":"26","author":"MacLean","year":"2010","journal-title":"Bioinformatics"},{"key":"2023013109502952400_btz488-B17","doi-asserted-by":"crossref","first-page":"8689","DOI":"10.1021\/acs.analchem.7b01069","article-title":"Detailed investigation and comparison of the XCMS and MZmine 2 chromatogram construction and chromatographic peak detection methods for preprocessing mass spectrometry metabolomics data","volume":"89","author":"Myers","year":"2017","journal-title":"Anal. Chem"},{"key":"2023013109502952400_btz488-B18","doi-asserted-by":"crossref","first-page":"3268","DOI":"10.2337\/db13-0159","article-title":"Cord serum lipidome in prediction of islet autoimmunity and type 1 diabetes","volume":"62","author":"Ore\u0161i\u010d","year":"2013","journal-title":"Diabetes"},{"key":"2023013109502952400_btz488-B19","doi-asserted-by":"crossref","first-page":"2740","DOI":"10.2337\/db10-1652","article-title":"Age- and islet autoimmunity\u2013associated differences in amino acid and lipid metabolites in children at risk for type 1 diabetes","volume":"60","author":"Pflueger","year":"2011","journal-title":"Diabetes"},{"key":"2023013109502952400_btz488-B20","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1186\/s12859-017-1547-6","article-title":"Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies","volume":"18","author":"Shah","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023013109502952400_btz488-B21","doi-asserted-by":"crossref","first-page":"6729","DOI":"10.1021\/ac051080y","article-title":"Fusion of mass spectrometry-based metabolomics data","volume":"77","author":"Smilde","year":"2005","journal-title":"Anal. Chem"},{"key":"2023013109502952400_btz488-B22","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1021\/ac051437y","article-title":"XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification","volume":"78","author":"Smith","year":"2006","journal-title":"Anal. Chem"},{"key":"2023013109502952400_btz488-B23","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1007\/0-387-29362-0_23","volume-title":"Bioinformatics and Computational Biology Solutions Using R and Bioconductor","author":"Smyth","year":"2005"},{"key":"2023013109502952400_btz488-B24","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1186\/s13058-014-0415-9","article-title":"A joint analysis of metabolomics and genetics of breast cancer","volume":"16","author":"Tang","year":"2014","journal-title":"Breast Cancer Res"},{"key":"2023013109502952400_btz488-B25","doi-asserted-by":"crossref","first-page":"2404","DOI":"10.1093\/bioinformatics\/bts449","article-title":"DanteR: an extensible R-based tool for quantitative analysis of -omics data","volume":"28","author":"Taverner","year":"2012","journal-title":"Bioinformatics"},{"key":"2023013109502952400_btz488-B26","doi-asserted-by":"crossref","first-page":"1998","DOI":"10.1093\/bioinformatics\/bts306","article-title":"Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data","volume":"28","author":"Tekwe","year":"2012","journal-title":"Bioinformatics"},{"key":"2023013109502952400_btz488-B27","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","article-title":"Missing value estimation methods for DNA microarrays","volume":"17","author":"Troyanskaya","year":"2001","journal-title":"Bioinformatics"},{"key":"2023013109502952400_btz488-B28","doi-asserted-by":"crossref","first-page":"2301.","DOI":"10.1038\/nprot.2016.136","article-title":"The MaxQuant computational platform for mass spectrometry-based shotgun proteomics","volume":"11","author":"Tyanova","year":"2016","journal-title":"Nat. Protoc"},{"key":"2023013109502952400_btz488-B29","doi-asserted-by":"crossref","first-page":"e1005973","DOI":"10.1371\/journal.pcbi.1005973","article-title":"GSimp: a Gibbs sampler based left-censored missing value imputation approach for metabolomics studies","volume":"14","author":"Wei","year":"2018","journal-title":"PLoS Comput. Biol"},{"key":"2023013109502952400_btz488-B30","doi-asserted-by":"crossref","first-page":"663","DOI":"10.1038\/s41598-017-19120-0","article-title":"Missing value imputation approach for mass spectrometry-based metabolomics data","volume":"8","author":"Wei","year":"2018","journal-title":"Sci. Rep"},{"key":"2023013109502952400_btz488-B31","doi-asserted-by":"crossref","first-page":"2250","DOI":"10.1021\/acs.jproteome.7b00111","article-title":"Metabolomics\u2013proteomics combined approach identifies differential metabolism-associated molecular events between senescence and apoptosis","volume":"16","author":"Wu","year":"2017","journal-title":"J. Proteome Res"},{"key":"2023013109502952400_btz488-B32","doi-asserted-by":"crossref","first-page":"2941","DOI":"10.1093\/bioinformatics\/btu430","article-title":"Improving peak detection in high-resolution LC\/MS metabolomics data using preexisting knowledge and machine learning approach","volume":"30","author":"Yu","year":"2014","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz488\/28914108\/btz488.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/1\/257\/48981528\/bioinformatics_36_1_257.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/1\/257\/48981528\/bioinformatics_36_1_257.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T13:34:51Z","timestamp":1721396091000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/1\/257\/5519114"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2019,6,14]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz488","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,1,1]]},"published":{"date-parts":[[2019,6,14]]}}}