{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T08:58:25Z","timestamp":1768035505472,"version":"3.49.0"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,7,9]],"date-time":"2021-07-09T00:00:00Z","timestamp":1625788800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["CA214845"],"award-info":[{"award-number":["CA214845"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["CA008748"],"award-info":[{"award-number":["CA008748"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>One pivotal feature of transcriptomics data is the unwanted variations caused by disparate experimental handling, known as handling effects. Various data normalization methods were developed to alleviate the adverse impact of handling effects in the setting of differential expression analysis. However, little research has been done to evaluate their performance in the setting of survival outcome prediction, an important analysis goal for transcriptomics data in biomedical research. Leveraging a unique pair of datasets for the same set of tumor samples\u2014one with handling effects and the other without, we developed a benchmarking tool for conducting such an evaluation in microRNA microarrays. We applied this tool to evaluate the performance of three popular normalization methods\u2014quantile normalization, median normalization and variance stabilizing normalization\u2014in survival prediction using various approaches for model building and designs for sample assignment. We showed that handling effects can have a strong impact on survival prediction and that quantile normalization, a most popular method in current practice, tends to underperform median normalization and variance stabilizing normalization. We demonstrated with a small example the reason for quantile normalization\u2019s poor performance in this setting. Our finding highlights the importance of putting normalization evaluation in the context of the downstream analysis setting and the potential of improving the development of survival predictors by applying median normalization. We make available our benchmarking tool for performing such evaluation on additional normalization methods in connection with prediction modeling approaches.<\/jats:p>","DOI":"10.1093\/bib\/bbab257","type":"journal-article","created":{"date-parts":[[2021,6,25]],"date-time":"2021-06-25T11:08:36Z","timestamp":1624619316000},"source":"Crossref","is-referenced-by-count":11,"title":["Performance evaluation of transcriptomics data normalization for survival risk prediction"],"prefix":"10.1093","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3002-3556","authenticated-orcid":false,"given":"Ai","family":"Ni","sequence":"first","affiliation":[{"name":"Ohio State University, New York, NY 10017 USA"}]},{"given":"Li-Xuan","family":"Qin","sequence":"additional","affiliation":[{"name":"Memorial Sloan Kettering Cancer Center, New York, NY 10017 USA"}]}],"member":"286","published-online":{"date-parts":[[2021,7,9]]},"reference":[{"issue":"9472","key":"2021110815082478400_ref1","doi-asserted-by":"crossref","first-page":"1685","DOI":"10.1016\/S0140-6736(05)66541-5","article-title":"Prediction of cancer outcome with microarrays","volume":"365","author":"Lee","year":"2005","journal-title":"Lancet"},{"issue":"1","key":"2021110815082478400_ref2","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1038\/ncomms1033","article-title":"Identification of high-quality cancer prognostic markers and metastasis network modules","volume":"1","author":"Li","year":"2010","journal-title":"Nat Commun"},{"issue":"13","key":"2021110815082478400_ref3","doi-asserted-by":"crossref","first-page":"E2970","DOI":"10.1073\/pnas.1717139115","article-title":"Predicting cancer outcomes from histology and genomics using convolutional networks","volume":"115","author":"Mobadersany","year":"2018","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"7187","key":"2021110815082478400_ref4","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1038\/nature06915","article-title":"Enabling personalized cancer medicine through analysis of gene-expression patterns","volume":"452","author":"van't Veer","year":"2008","journal-title":"Nature"},{"issue":"9","key":"2021110815082478400_ref5","doi-asserted-by":"crossref","first-page":"1151","DOI":"10.1038\/nbt1239","article-title":"The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements","volume":"24","author":"Consortium","year":"2006","journal-title":"Nat Biotechnol"},{"issue":"9","key":"2021110815082478400_ref6","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1038\/nbt.2957","article-title":"A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium","volume":"32","author":"Consortium SM-I","year":"2014","journal-title":"Nat Biotechnol"},{"issue":"8","key":"2021110815082478400_ref7","doi-asserted-by":"crossref","first-page":"809","DOI":"10.1038\/nmeth.3014","article-title":"Evaluation of quantitative miRNA expression platforms in the microRNA quality control (miRQC) study","volume":"11","author":"Mestdagh","year":"2014","journal-title":"Nat Methods"},{"issue":"10","key":"2021110815082478400_ref8","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nrg2825","article-title":"Tackling the widespread and critical impact of batch effects in high-throughput data","volume":"11","author":"Leek","year":"2010","journal-title":"Nat Rev Genet"},{"issue":"9460","key":"2021110815082478400_ref9","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1016\/S0140-6736(05)17947-1","article-title":"Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer","volume":"365","author":"Wang","year":"2005","journal-title":"Lancet"},{"issue":"2","key":"2021110815082478400_ref10","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/S1470-2045(09)70343-2","article-title":"Relation between microRNA expression and progression and prognosis of gastric cancer: a microRNA expression analysis","volume":"11","author":"Ueda","year":"2010","journal-title":"Lancet Oncol"},{"issue":"22","key":"2021110815082478400_ref11","doi-asserted-by":"crossref","first-page":"2059","DOI":"10.1056\/NEJMoa1301689","article-title":"Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia","volume":"368","author":"Cancer Genome Atlas Research N","year":"2013","journal-title":"N Engl J Med"},{"issue":"7","key":"2021110815082478400_ref12","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1016\/S1470-2045(17)30243-7","article-title":"Novel molecular subgroups for clinical classification and outcome prediction in childhood medulloblastoma: a cohort study","volume":"18","author":"Schwalbe","year":"2017","journal-title":"Lancet Oncol"},{"issue":"2","key":"2021110815082478400_ref13","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1093\/bioinformatics\/19.2.185","article-title":"A comparison of normalization methods for high density oligonucleotide array data based on variance and bias","volume":"19","author":"Bolstad","year":"2003","journal-title":"Bioinformatics"},{"issue":"1","key":"2021110815082478400_ref14","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1186\/1471-2105-11-94","article-title":"Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments","volume":"11","author":"Bullard","year":"2010","journal-title":"BMC Bioinformatics"},{"issue":"6","key":"2021110815082478400_ref15","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1093\/bib\/bbs046","article-title":"A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis","volume":"14","author":"Dillies","year":"2013","journal-title":"Brief Bioinform"},{"issue":"6","key":"2021110815082478400_ref16","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1261\/rna.037895.112","article-title":"miRNA-Seq normalization comparisons need improvement","volume":"19","author":"Zhou","year":"2013","journal-title":"RNA"},{"issue":"6","key":"2021110815082478400_ref17","doi-asserted-by":"crossref","first-page":"e98879","DOI":"10.1371\/journal.pone.0098879","article-title":"MicroRNA array normalization: an evaluation using a randomized dataset as the benchmark","volume":"9","author":"Qin","year":"2014","journal-title":"PLoS One"},{"key":"2021110815082478400_ref18","doi-asserted-by":"crossref","first-page":"e4584","DOI":"10.7717\/peerj.4584","article-title":"Empirical evaluation of data normalization methods for molecular classification","volume":"6","author":"Huang","year":"2018","journal-title":"PeerJ"},{"issue":"7006","key":"2021110815082478400_ref19","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/nature02871","article-title":"The functions of animal microRNAs","volume":"431","author":"Ambros","year":"2004","journal-title":"Nature"},{"issue":"2","key":"2021110815082478400_ref20","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/S0092-8674(04)00045-5","article-title":"MicroRNAs: genomics, biogenesis, mechanism, and function","volume":"116","author":"Bartel","year":"2004","journal-title":"Cell"},{"issue":"13","key":"2021110815082478400_ref21","doi-asserted-by":"crossref","first-page":"3371","DOI":"10.1158\/1078-0432.CCR-13-3155","article-title":"Blocking and randomization to improve molecular biomarker discovery","volume":"20","author":"Qin","year":"2014","journal-title":"Clin Cancer Res"},{"issue":"1","key":"2021110815082478400_ref22","doi-asserted-by":"crossref","first-page":"180084","DOI":"10.1038\/sdata.2018.84","article-title":"A pair of datasets for microRNA expression profiling to examine the use of careful study design for assigning arrays to samples","volume":"5","author":"Qin","year":"2018","journal-title":"Sci Data"},{"issue":"32","key":"2021110815082478400_ref23","doi-asserted-by":"crossref","first-page":"3931","DOI":"10.1200\/JCO.2016.68.1031","article-title":"Cautionary note on using cross-validation for molecular classification","volume":"34","author":"Qin","year":"2016","journal-title":"J Clin Oncol"},{"issue":"15","key":"2021110815082478400_ref24","doi-asserted-by":"crossref","first-page":"2543","DOI":"10.1002\/sim.2268","article-title":"Power calculations for preclinical studies using a K-sample rank test and the Lehmann alternative hypothesis","volume":"25","author":"Heller","year":"2006","journal-title":"Stat Med"},{"issue":"Suppl 1","key":"2021110815082478400_ref25","doi-asserted-by":"crossref","first-page":"S96","DOI":"10.1093\/bioinformatics\/18.suppl_1.S96","article-title":"Variance stabilization applied to microarray data calibration and to the quantification of differential expression","volume":"18","author":"Huber","year":"2002","journal-title":"Bioinformatics"},{"key":"2021110815082478400_ref26","doi-asserted-by":"crossref","first-page":"10","DOI":"10.2202\/1544-6115.1339","article-title":"Normalization method for transcriptional studies of heterogeneous samples--simultaneous array normalization and identification of equivalent expression","volume":"8","author":"Qin","year":"2009","journal-title":"Stat Appl Genet Mol Biol"},{"issue":"6","key":"2021110815082478400_ref27","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1089\/10665270050514954","article-title":"Analysis of variance for gene expression microarray data","volume":"7","author":"Kerr","year":"2000","journal-title":"J Comput Biol"},{"issue":"1","key":"2021110815082478400_ref28","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.ygyno.2011.12.423","article-title":"Nomogram for predicting 5-year disease-specific mortality after primary surgery for epithelial ovarian cancer","volume":"125","author":"Barlin","year":"2012","journal-title":"Gynecol Oncol"},{"issue":"1","key":"2021110815082478400_ref29","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1186\/s12920-016-0187-4","article-title":"Study design and data analysis considerations for the discovery of prognostic molecular biomarkers: a case study of progression free survival in advanced serous ovarian cancer","volume":"9","author":"Qin","year":"2016","journal-title":"BMC Med Genomics"},{"issue":"2","key":"2021110815082478400_ref30","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1093\/biostatistics\/kxp059","article-title":"Frozen robust multiarray analysis (fRMA)","volume":"11","author":"McCall","year":"2010","journal-title":"Biostatistics"},{"issue":"Suppl 4","key":"2021110815082478400_ref31","first-page":"105","article-title":"Preprocessing steps for Agilent MicroRNA arrays: does the order matter?","volume":"13","author":"Qin","year":"2014","journal-title":"Cancer Inform"},{"issue":"4","key":"2021110815082478400_ref32","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1002\/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3","article-title":"The lasso method for variable selection in the cox model","volume":"16","author":"Tibshirani","year":"1997","journal-title":"Stat Med"},{"issue":"476","key":"2021110815082478400_ref33","doi-asserted-by":"crossref","first-page":"1418","DOI":"10.1198\/016214506000000735","article-title":"The adaptive lasso and its Oracle properties","volume":"101","author":"Zou","year":"2006","journal-title":"J Am Stat Assoc"},{"issue":"5","key":"2021110815082478400_ref34","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1111\/j.1467-9868.2008.00674.x","article-title":"Sure independence screening for ultrahigh dimensional feature space","volume":"70","author":"Fan","year":"2008","journal-title":"J R Stat Soc Series B Stat Methodology"},{"key":"2021110815082478400_ref35","first-page":"2013","article-title":"Ultrahigh dimensional feature selection: beyond the linear model","volume":"10","author":"Fan","year":"2009","journal-title":"J Mach Learn Res"},{"issue":"1","key":"2021110815082478400_ref36","doi-asserted-by":"crossref","first-page":"e1005357","DOI":"10.1371\/journal.pcbi.1005357","article-title":"Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies","volume":"13","author":"Tamba","year":"2017","journal-title":"PLoS Comput Biol"},{"issue":"4","key":"2021110815082478400_ref37","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1002\/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4","article-title":"Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors","volume":"15","author":"Harrell","year":"1996","journal-title":"Stat Med"},{"key":"2021110815082478400_ref38","doi-asserted-by":"crossref","DOI":"10.1201\/9781420010138","volume-title":"Measurement Error in Nonlinear Models","author":"Carroll","year":"2006","edition":"2nd"},{"key":"2021110815082478400_ref39","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-19425-7","volume-title":"Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis","author":"Harrell","year":"2015"},{"issue":"1","key":"2021110815082478400_ref40","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab257\/41089431\/bbab257.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab257\/41089431\/bbab257.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,8]],"date-time":"2021-11-08T15:23:30Z","timestamp":1636385010000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab257\/6317608"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,9]]},"references-count":40,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab257","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,7,9]]},"article-number":"bbab257"}}