{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T19:54:03Z","timestamp":1777492443799,"version":"3.51.4"},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"14","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Motivation: Prior to applying genomic predictors to clinical samples, the genomic data must be properly normalized to ensure that the test set data are comparable to the data upon which the predictor was trained. The most effective normalization methods depend on data from multiple patients. From a biomedical perspective, this implies that predictions for a single patient may change depending on which other patient samples they are normalized with. This test set bias will occur when any cross-sample normalization is used before clinical prediction.<\/jats:p>\n                  <jats:p>Results: We demonstrate that results from existing gene signatures which rely on normalizing test data may be irreproducible when the patient population changes composition or size using a set of curated, publicly available breast cancer microarray experiments. As an alternative, we examine the use of gene signatures that rely on ranks from the data and show why signatures using rank-based features can avoid test set bias while maintaining highly accurate classification, even across platforms.<\/jats:p>\n                  <jats:p>Availability and implementation: The code, data and instructions necessary to reproduce our entire analysis is available at https:\/\/github.com\/prpatil\/testsetbias.<\/jats:p>\n                  <jats:p>Contact: jtleek@gmail.com or bhaibeka@uhnresearch.ca<\/jats:p>\n                  <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv157","type":"journal-article","created":{"date-parts":[[2015,3,18]],"date-time":"2015-03-18T21:13:52Z","timestamp":1426713232000},"page":"2318-2323","source":"Crossref","is-referenced-by-count":97,"title":["Test set bias affects reproducibility of gene signatures"],"prefix":"10.1093","volume":"31","author":[{"given":"Prasad","family":"Patil","sequence":"first","affiliation":[{"name":"1 Department of Biostatistics, Johns Hopkins School of Public Health, Baltimore, MD, USA,"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pierre-Olivier","family":"Bachant-Winner","sequence":"additional","affiliation":[{"name":"2 Institut de Recherches Cliniques de Montr\u00e9al, Montreal, Quebec H2W 1R7, Canada,"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Benjamin","family":"Haibe-Kains","sequence":"additional","affiliation":[{"name":"3 Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 1L7, Canada and"},{"name":"4 Department of Medical Biophysics, University of Toronto, Toronto, Ontario M5G 1L7, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeffrey T.","family":"Leek","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Johns Hopkins School of Public Health, Baltimore, MD, USA,"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,3,18]]},"reference":[{"key":"2023020202151513900_btv157-B1","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1038\/nrg1749","article-title":"Microarray data analysis: from disarray to consolidation and consensus","volume":"7","author":"Allison","year":"2006","journal-title":"Nat. Rev. Genet."},{"key":"2023020202151513900_btv157-B2","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1093\/jnci\/dji008","article-title":"Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer","volume":"97","author":"Baggerly","year":"2005","journal-title":"J. Natl. Cancer Inst."},{"issue":"Database issue","key":"2023020202151513900_btv157-B3","first-page":"D991","article-title":"NCBI GEO: archive for functional genomics data sets\u2014update","volume":"41","author":"Barrett","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023020202151513900_btv157-B4","volume-title":"aroma.affymetrix: a generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory. Technical report 745","author":"Bengtsson","year":"2008"},{"key":"2023020202151513900_btv157-B5","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1186\/1471-2164-7-278","article-title":"Converting a breast cancer microarray signature into a high-throughput diagnostic test","volume":"7","author":"Glas","year":"2006","journal-title":"BMC Genomics"},{"key":"2023020202151513900_btv157-B6","author":"Haibe-Kains","year":"2014","journal-title":"genefu: Relevant Functions for Gene Expression Analysis, Especially in Breast Cancer"},{"key":"2023020202151513900_btv157-B7","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1093\/jnci\/djr545","article-title":"A three-gene model to robustly identify breast cancer molecular subtypes","volume":"104","author":"Haibe-Kains","year":"2012","journal-title":"J. Natl. Cancer Inst."},{"key":"2023020202151513900_btv157-B8","author":"Hastie","year":"2014"},{"key":"2023020202151513900_btv157-B9","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1093\/bioinformatics\/bts034","article-title":"The sva package for removing batch effects and other unwanted variation in high-throughput experiments","volume":"28","author":"Leek","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020202151513900_btv157-B10","author":"Letter","year":"2011"},{"key":"2023020202151513900_btv157-B11","doi-asserted-by":"crossref","first-page":"1715","DOI":"10.1093\/jnci\/djm216","article-title":"Challenges in projecting clustering results across gene expression profiling datasets","volume":"99","author":"Lusa","year":"2007","journal-title":"J. Natl Cancer Inst."},{"key":"2023020202151513900_btv157-B12","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1038\/nm.2311","article-title":"Taming the dragon: genomic biomarkers to individualize the treatment of cancer","volume":"17","author":"Majewski","year":"2011","journal-title":"Nat. Med."},{"key":"2023020202151513900_btv157-B13","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1093\/biostatistics\/kxp059","article-title":"Frozen robust multiarray analysis (frma)","volume":"11","author":"McCall","year":"2010","journal-title":"Biostatistics"},{"key":"2023020202151513900_btv157-B14","doi-asserted-by":"crossref","first-page":"488","DOI":"10.1016\/S0140-6736(05)17866-0","article-title":"Prediction of cancer outcome with microarrays: a multiple random validation strategy","volume":"365","author":"Michiels","year":"2005","journal-title":"Lancet"},{"key":"2023020202151513900_btv157-B15","first-page":"e561","article-title":"Removing batch effects for prediction problems with frozen surrogate variable analysis","volume-title":"PeerJ","author":"Parker","year":"2014"},{"key":"2023020202151513900_btv157-B16","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1200\/JCO.2008.18.1370","article-title":"Supervised risk predictor of breast cancer based on intrinsic subtypes","volume":"27","author":"Parker","year":"2009","journal-title":"J. Clin. Oncol."},{"issue":"Database issue","key":"2023020202151513900_btv157-B17","doi-asserted-by":"crossref","first-page":"D747","DOI":"10.1093\/nar\/gkl995","article-title":"ArrayExpress\u2013a public database of microarray experiments and gene expression profiles","volume":"35","author":"Parkinson","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023020202151513900_btv157-B18","doi-asserted-by":"crossref","first-page":"572","DOI":"10.1016\/S0140-6736(02)07746-2","article-title":"Use of proteomic patterns in serum to identify ovarian cancer","volume":"359","author":"Petricoin","year":"2002","journal-title":"Lancet"},{"key":"2023020202151513900_btv157-B19","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1016\/j.ygeno.2012.08.003","article-title":"A single-sample microarray normalization method to facilitate personalized-medicine workflows","volume":"100","author":"Piccolo","year":"2012","journal-title":"Genomics"},{"key":"2023020202151513900_btv157-B20","article-title":"Genetic signatures of exceptional longevity in humans","volume":"2010","author":"Sebastiani","year":"2010","journal-title":"Science"},{"key":"2023020202151513900_btv157-B21","doi-asserted-by":"crossref","first-page":"6567","DOI":"10.1073\/pnas.082099299","article-title":"Diagnosis of multiple cancer types by shrunken centroids of gene expression","volume":"99","author":"Tibshirani","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020202151513900_btv157-B22","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1038\/415530a","article-title":"Gene expression profiling predicts clinical outcome of breast cancer","volume":"415","author":"van\u2019t Veer","year":"2002","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/14\/2318\/49034474\/bioinformatics_31_14_2318.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/14\/2318\/49034474\/bioinformatics_31_14_2318.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:41:45Z","timestamp":1675291305000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/14\/2318\/256029"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,3,18]]},"references-count":22,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2015,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv157","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/005983","asserted-by":"object"}]},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,7,15]]},"published":{"date-parts":[[2015,3,18]]}}}