{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T07:13:18Z","timestamp":1769843598323,"version":"3.49.0"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2018,1,17]],"date-time":"2018-01-17T00:00:00Z","timestamp":1516147200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["P30 AR061271, P50 AR060780"],"award-info":[{"award-number":["P30 AR061271, P50 AR060780"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100001845","name":"Scleroderma Research Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100001845","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Molecular subtypes of cancers and autoimmune disease, defined by transcriptomic profiling, have provided insight into disease pathogenesis, molecular heterogeneity and therapeutic responses. However, technical biases inherent to different gene expression profiling platforms present a unique problem when analyzing data generated from different studies. Currently, there is a lack of effective methods designed to eliminate platform-based bias. We present a method to normalize and classify RNA-seq data using machine learning classifiers trained on DNA microarray data and molecular subtypes in two datasets: breast invasive carcinoma (BRCA) and colorectal cancer (CRC).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Multiple analyses show that feature specific quantile normalization (FSQN) successfully removes platform-based bias from RNA-seq data, regardless of feature scaling or machine learning algorithm. We achieve up to 98% accuracy for BRCA data and 97% accuracy for CRC data in assigning molecular subtypes to RNA-seq data normalized using FSQN and a support vector machine trained exclusively on DNA microarray data. We find that maximum accuracy was achieved when normalizing RNA-seq datasets that contain at least 25 samples. FSQN allows comparison of RNA-seq data to existing DNA microarray datasets. Using these techniques, we can successfully leverage information from existing gene expression data in new analyses despite different platforms used for gene expression profiling.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>FSQN has been submitted as an R package to CRAN. All code used for this study is available on Github (https:\/\/github.com\/jenniferfranks\/FSQN).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty026","type":"journal-article","created":{"date-parts":[[2018,1,16]],"date-time":"2018-01-16T20:11:11Z","timestamp":1516133471000},"page":"1868-1874","source":"Crossref","is-referenced-by-count":73,"title":["Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data"],"prefix":"10.1093","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2400-5431","authenticated-orcid":false,"given":"Jennifer M","family":"Franks","sequence":"first","affiliation":[{"name":"Department of Molecular and Systems Biology, University of South Carolina, Columbia, SC, USA"}]},{"given":"Guoshuai","family":"Cai","sequence":"additional","affiliation":[{"name":"Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA"}]},{"given":"Michael L","family":"Whitfield","sequence":"additional","affiliation":[{"name":"Department of Molecular and Systems Biology, University of South Carolina, Columbia, SC, USA"},{"name":"Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,1,17]]},"reference":[{"key":"2023012713565328600_bty026-B1","doi-asserted-by":"crossref","first-page":"472","DOI":"10.1186\/1471-2164-13-472","article-title":"Transcriptome classification reveals molecular subtypes in psoriasis","volume":"13","author":"Ainali","year":"2012","journal-title":"BMC Genomics"},{"key":"2023012713565328600_bty026-B2","author":"Bolstad","year":"2016"},{"key":"2023012713565328600_bty026-B3","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1093\/bioinformatics\/19.2.185","article-title":"A comparison of normalization methods for high density oligonucleotide array data based on variance and bias","volume":"19","author":"Bolstad","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012713565328600_bty026-B4","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nature11412","article-title":"Comprehensive molecular portraits of human breast tumours","volume":"490","author":"Cancer Genome Atlas","year":"2012","journal-title":"Nature"},{"key":"2023012713565328600_bty026-B5","doi-asserted-by":"crossref","first-page":"2492","DOI":"10.1001\/jama.295.21.2492","article-title":"Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study","volume":"295","author":"Carey","year":"2006","journal-title":"JAMA"},{"key":"2023012713565328600_bty026-B6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J. Stat. Softw"},{"key":"2023012713565328600_bty026-B7","doi-asserted-by":"crossref","first-page":"1350","DOI":"10.1038\/nm.3967","article-title":"The consensus molecular subtypes of colorectal cancer","volume":"21","author":"Guinney","year":"2015","journal-title":"Nat. Med"},{"key":"2023012713565328600_bty026-B8","doi-asserted-by":"crossref","first-page":"e71462","DOI":"10.1371\/journal.pone.0071462","article-title":"Large scale comparison of gene expression levels by microarrays and RNAseq using TCGA data","volume":"8","author":"Guo","year":"2013","journal-title":"PLoS One"},{"key":"2023012713565328600_bty026-B9","volume-title":"Algorithms for Clustering Data","author":"Jain","year":"1988"},{"key":"2023012713565328600_bty026-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v028.i05","article-title":"Building predictive models in R using the caret package","volume":"28","author":"Kuhn","year":"2008","journal-title":"J. Stat. Softw"},{"key":"2023012713565328600_bty026-B11","first-page":"8","article-title":"Plotrix: a package in teh red light district of R","volume":"6","author":"Lemon","year":"2006","journal-title":"R-News"},{"key":"2023012713565328600_bty026-B12","doi-asserted-by":"crossref","first-page":"323.","DOI":"10.1186\/1471-2105-12-323","article-title":"RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012713565328600_bty026-B13","first-page":"18","article-title":"Classification and regerssion by random forest","volume":"2","author":"Liaw","year":"2002","journal-title":"R. News"},{"key":"2023012713565328600_bty026-B14","first-page":"2295","article-title":"The non-paranormal: semiparametric estimation of high dimensional undirected graphs","volume":"10","author":"Liu","year":"2009","journal-title":"J. Mach. Learn Res"},{"key":"2023012713565328600_bty026-B15","doi-asserted-by":"crossref","first-page":"e2696","DOI":"10.1371\/journal.pone.0002696","article-title":"Molecular subsets in the gene expression signatures of scleroderma skin","volume":"3","author":"Milano","year":"2008","journal-title":"PLoS One"},{"key":"2023012713565328600_bty026-B16","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1007\/BF01897163","article-title":"A study of standardization of variables in cluster-analysis","volume":"5","author":"Milligan","year":"1988","journal-title":"J. Classif"},{"key":"2023012713565328600_bty026-B17","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1200\/JCO.2008.18.1370","article-title":"Supervised risk predictor of breast cancer based on intrinsic subtypes","volume":"27","author":"Parker","year":"2009","journal-title":"J. Clin. Oncol"},{"key":"2023012713565328600_bty026-B18","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1038\/35021093","article-title":"Molecular portraits of human breast tumours","volume":"406","author":"Perou","year":"2000","journal-title":"Nature"},{"key":"2023012713565328600_bty026-B19","doi-asserted-by":"crossref","first-page":"2877","DOI":"10.1093\/bioinformatics\/btt480","article-title":"A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis","volume":"29","author":"Reese","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012713565328600_bty026-B20","doi-asserted-by":"crossref","first-page":"R25.","DOI":"10.1186\/gb-2010-11-3-r25","article-title":"A scaling normalization method for differential expression analysis of RNA-seq data","volume":"11","author":"Robinson","year":"2010","journal-title":"Genome Biol"},{"key":"2023012713565328600_bty026-B21","doi-asserted-by":"crossref","first-page":"10869","DOI":"10.1073\/pnas.191367098","article-title":"Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications","volume":"98","author":"Sorlie","year":"2001","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012713565328600_bty026-B22","doi-asserted-by":"crossref","first-page":"e1621","DOI":"10.7717\/peerj.1621","article-title":"Cross-platform normalization of microarray and RNA-seq data for machine learning applications","volume":"4","author":"Thompson","year":"2016","journal-title":"Peer J"},{"key":"2023012713565328600_bty026-B23","doi-asserted-by":"crossref","first-page":"6567","DOI":"10.1073\/pnas.082099299","article-title":"Diagnosis of multiple cancer types by shrunken centroids of gene expression","volume":"99","author":"Tibshirani","year":"2002","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012713565328600_bty026-B24","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1214\/ss\/1056397488","article-title":"Class prediction by nearest shrunken centroids, with applications to DNA microarrays","volume":"18","author":"Tibshirani","year":"2003","journal-title":"Stat. Sci"},{"key":"2023012713565328600_bty026-B25","doi-asserted-by":"crossref","first-page":"e0126545","DOI":"10.1371\/journal.pone.0126545","article-title":"Probe region expression estimation for RNA-Seq data for improved microarray comparability","volume":"10","author":"Uziela","year":"2015","journal-title":"PLoS One"},{"key":"2023012713565328600_bty026-B26","article-title":"KernSmooth: Functions for Kernel Smoothing Supporting Wand & Jones (1995)","volume-title":"R package version 2.23-15","author":"Wand","year":"2015"},{"key":"2023012713565328600_bty026-B27","doi-asserted-by":"crossref","first-page":"e78644","DOI":"10.1371\/journal.pone.0078644","article-title":"Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells","volume":"9","author":"Zhao","year":"2014","journal-title":"PLoS One"},{"key":"2023012713565328600_bty026-B28","first-page":"1059","article-title":"The huge package for high-dimensional undirected graph estimation in R","volume":"13","author":"Zhao","year":"2012","journal-title":"J. Mach. Learn Res"},{"key":"2023012713565328600_bty026-B29","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1186\/s13058-015-0520-4","article-title":"Molecular subtyping for clinically defined breast cancer subgroups","volume":"17","author":"Zhao","year":"2015","journal-title":"Breast Cancer Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/11\/1868\/48937839\/bioinformatics_34_11_1868.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/11\/1868\/48937839\/bioinformatics_34_11_1868.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T14:29:26Z","timestamp":1674829766000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/11\/1868\/4816109"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,1,17]]},"references-count":29,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2018,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty026","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,6,1]]},"published":{"date-parts":[[2018,1,17]]}}}