{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T11:52:26Z","timestamp":1775476346408,"version":"3.50.1"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2021,11,12]],"date-time":"2021-11-12T00:00:00Z","timestamp":1636675200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002794","name":"Swedish Cancer Society","doi-asserted-by":"publisher","award":["CAN 2020\/709"],"award-info":[{"award-number":["CAN 2020\/709"]}],"id":[{"id":"10.13039\/501100002794","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002794","name":"Swedish Cancer Society","doi-asserted-by":"publisher","award":["2019\/51"],"award-info":[{"award-number":["2019\/51"]}],"id":[{"id":"10.13039\/501100002794","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Lund Medical Faculty"},{"name":"Sk\u00e5ne University Hospital Research Funds"},{"name":"Sk\u00e5ne County Council\u2019s Research and Development Foundation","award":["REGSKANE-821461"],"award-info":[{"award-number":["REGSKANE-821461"]}]},{"name":"Cancer Research Fund at Malm\u00f6 General Hospital and Mrs. Berta Kamprad's Cancer Foundation","award":["FBKS-2019-35"],"award-info":[{"award-number":["FBKS-2019-35"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,27]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Gene expression-based multiclass prediction, such as tumor subtyping, is a non-trivial bioinformatic problem. Most classifier methods operate by comparing expression levels relative to other samples. Methods that base predictions on the expression pattern within a sample have been proposed as an alternative. As these methods are invariant to the cohort composition and can be applied to a sample in isolation, they can collectively be termed single sample predictors (SSP). Such predictors could potentially be used for preprocessing-free classification of new samples and be built to function across different expression platforms where proper batch and dataset normalization is challenging. Here, we evaluate the behavior of several multiclass SSPs based on binary gene-pair rules (k-Top Scoring Pairs, Absolute Intrinsic Molecular Subtyping and a new Random Forest approach) and compare them to centroids built with centered or raw expression values, with the criteria that an optimal predictor should have high accuracy, overcome differences in tumor purity, be robust across expression platforms and provide an informative prediction output score.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We found that gene-pair-based SSPs showed excellent performance on many expression-based classification tasks. The three methods differed in prediction score output, handling of tied scores and behavior in low purity samples. The k-Top Scoring Pairs and Random Forest approach both achieved high classification accuracy while providing an informative prediction score. Although gene-pair-based SSPs have been touted as being cross-platform compatible (through training on mixed platform data), out-of-the-box compatibility with a new dataset remains a potential issue that warrants cohort-to-cohort verification.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Our R package \u2018multiclassPairs\u2019 (https:\/\/cran.r-project.org\/package=multiclassPairs) (https:\/\/doi.org\/10.1093\/bioinformatics\/btab088) is freely available and enables easy training, prediction, and visualization using the gene-pair rule-based Random Forest SSP method and provides additional multiclass functionalities to the switchBox k-Top-Scoring Pairs package.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab763","type":"journal-article","created":{"date-parts":[[2021,11,9]],"date-time":"2021-11-09T20:18:03Z","timestamp":1636489083000},"page":"1022-1029","source":"Crossref","is-referenced-by-count":25,"title":["A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3317-865X","authenticated-orcid":false,"given":"Pontus","family":"Eriksson","sequence":"first","affiliation":[{"name":"Division of Oncology, Department of Clinical Sciences, Lund University , Lund, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9029-3702","authenticated-orcid":false,"given":"Nour-al-dain","family":"Marzouka","sequence":"additional","affiliation":[{"name":"Division of Oncology, Department of Clinical Sciences, Lund University , Lund, Sweden"}]},{"given":"Gottfrid","family":"Sj\u00f6dahl","sequence":"additional","affiliation":[{"name":"Urology - urothelial cancer, Department of Translational Medicine, Lund University, Sk\u00e5ne University Hospital , Malm\u00f6, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1667-6510","authenticated-orcid":false,"given":"Carina","family":"Bernardo","sequence":"additional","affiliation":[{"name":"Division of Oncology, Department of Clinical Sciences, Lund University , Lund, Sweden"}]},{"given":"Fredrik","family":"Liedberg","sequence":"additional","affiliation":[{"name":"Urology - urothelial cancer, Department of Translational Medicine, Lund University, Sk\u00e5ne University Hospital , Malm\u00f6, Sweden"}]},{"given":"Mattias","family":"H\u00f6glund","sequence":"additional","affiliation":[{"name":"Division of Oncology, Department of Clinical Sciences, Lund University , Lund, Sweden"}]}],"member":"286","published-online":{"date-parts":[[2021,11,12]]},"reference":[{"key":"2023020108525756300_btab763-B1","doi-asserted-by":"crossref","first-page":"1469","DOI":"10.1214\/14-AOAS738","article-title":"Rank discriminants for predicting phenotypes from RNA expression","volume":"8","author":"Afsari","year":"2014","journal-title":"Ann. Appl. Stat"},{"key":"2023020108525756300_btab763-B2","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1093\/bioinformatics\/btu622","article-title":"switchBox: an R package for k-Top Scoring Pairs classifier development","volume":"31","author":"Afsari","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020108525756300_btab763-B3","doi-asserted-by":"crossref","first-page":"8971","DOI":"10.1038\/ncomms9971","article-title":"Systematic pan-cancer analysis of tumour purity","volume":"6","author":"Aran","year":"2015","journal-title":"Nat. Commun"},{"key":"2023020108525756300_btab763-B4","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1093\/toxsci\/kft249","article-title":"Comparison of microarrays and RNA-seq for gene expression analyses of dose-response experiments","volume":"137","author":"Black","year":"2014","journal-title":"Toxicol. Sci"},{"key":"2023020108525756300_btab763-B5","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023020108525756300_btab763-B3939091","article-title":"Clinical Value of RNA Sequencing-Based Classifiers for Prediction of the Five Conventional Breast Cancer Biomarkers: A Report From the Population-Based Multicenter Sweden Cancerome Analysis Network-Breast Initiative","volume":"2","author":"Brueffer","year":"2018","journal-title":"JCO Precis. Oncol."},{"key":"2023020108525756300_btab763-B6","doi-asserted-by":"crossref","first-page":"14071","DOI":"10.1038\/s41598-020-70832-2","article-title":"Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer","volume":"10","author":"Cascianelli","year":"2020","journal-title":"Sci. Rep"},{"key":"2023020108525756300_btab763-B7","doi-asserted-by":"crossref","first-page":"729","DOI":"10.1093\/bib\/bbz008","article-title":"Performance of gene expression-based single sample predictors for assessment of clinicopathological subgroups and molecular subtypes in cancers: a case comparison study in non-small cell lung cancer","volume":"21","author":"Cirenajwis","year":"2020","journal-title":"Brief. Bioinform"},{"key":"2023020108525756300_btab763-B8","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1038\/nbt.3838","article-title":"Reproducible RNA-seq analysis using recount2","volume":"35","author":"Collado-Torres","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"2023020108525756300_btab763-B9","doi-asserted-by":"crossref","first-page":"4148","DOI":"10.1093\/bioinformatics\/bti681","article-title":"Classification of microarrays to nearest centroids","volume":"21","author":"Dabney","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020108525756300_btab763-B10","doi-asserted-by":"crossref","first-page":"Article19","DOI":"10.2202\/1544-6115.1071","article-title":"Classifying gene expression profiles from pairwise mRNA comparisons","volume":"3","author":"Geman","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023020108525756300_btab763-B11","author":"Gibbs","year":"2020"},{"key":"2023020108525756300_btab763-B12","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1186\/1471-2164-7-96","article-title":"The molecular portraits of breast tumors are conserved across microarray platforms","volume":"7","author":"Hu","year":"2006","journal-title":"BMC Genomics"},{"key":"2023020108525756300_btab763-B13","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1016\/j.eururo.2019.09.006","article-title":"A consensus molecular classification of muscle-invasive bladder cancer","volume":"77","author":"Kamoun","year":"2020","journal-title":"Eur. Urol"},{"key":"2023020108525756300_btab763-B14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v036.i11","article-title":"Feature selection with the Boruta package","volume":"36","author":"Kursa","year":"2010","journal-title":"J. Stat. Softw"},{"key":"2023020108525756300_btab763-B15","doi-asserted-by":"crossref","first-page":"2301","DOI":"10.1038\/s41467-021-22465-w","article-title":"An integrated multi-omics analysis identifies prognostic molecular subtypes of non-muscle-invasive bladder cancer","volume":"12","author":"Lindskrog","year":"2021","journal-title":"Nat. Commun"},{"key":"2023020108525756300_btab763-B16","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1016\/j.cell.2018.02.052","article-title":"An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics","volume":"173","author":"Liu","year":"2018","journal-title":"Cell"},{"key":"2023020108525756300_btab763-B7924630","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btab088","article-title":"multiclassPairs: An R package to train multiclass pairbased classifier","author":"Marzouka","year":"2021","journal-title":"Bioinformatics"},{"key":"2023020108525756300_btab763-B17","doi-asserted-by":"crossref","first-page":"3737","DOI":"10.1038\/s41598-018-22126-x","article-title":"A validation and extended description of the Lund taxonomy for urothelial carcinoma using the TCGA cohort","volume":"8","author":"Marzouka","year":"2018","journal-title":"Sci. Rep"},{"key":"2023020108525756300_btab763-B18","doi-asserted-by":"crossref","first-page":"1482","DOI":"10.1038\/s41587-019-0336-3","article-title":"Visualizing structure and transitions in high-dimensional biological data","volume":"37","author":"Moon","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023020108525756300_btab763-B19","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1186\/s13058-017-0824-7","article-title":"Detecting gene signature activation in breast cancer in an absolute, single-patient manner","volume":"19","author":"Paquet","year":"2017","journal-title":"Breast Cancer Res"},{"key":"2023020108525756300_btab763-B20","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1093\/jnci\/dju357","article-title":"Absolute assignment of breast cancer intrinsic molecular subtype","volume":"107","author":"Paquet","year":"2015","journal-title":"J. Natl. Cancer Inst"},{"key":"2023020108525756300_btab763-B21","doi-asserted-by":"crossref","first-page":"1729","DOI":"10.1093\/bioinformatics\/btr233","article-title":"Rgtsp: a generalized top scoring pairs package for class prediction","volume":"27","author":"Popovici","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020108525756300_btab763-B22","doi-asserted-by":"crossref","first-page":"953","DOI":"10.1007\/s12094-019-02203-x","article-title":"Standardized versus research-based PAM50 intrinsic subtyping of breast cancer","volume":"22","author":"Prat","year":"2020","journal-title":"Clin. Transl. Oncol"},{"key":"2023020108525756300_btab763-B23","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1158\/2326-6066.CIR-17-0201","article-title":"Impact of tumor purity on immune gene expression and clustering analyses across multiple cancer types","volume":"6","author":"Rhee","year":"2018","journal-title":"Cancer Immunol. Res"},{"key":"2023020108525756300_btab763-B24","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1016\/j.cell.2017.09.007","article-title":"Comprehensive molecular characterization of muscle-invasive bladder cancer","volume":"171","author":"Robertson","year":"2017","journal-title":"Cell"},{"key":"2023020108525756300_btab763-B25","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1002\/path.4886","article-title":"Molecular classification of urothelial carcinoma: global mRNA classification versus tumour-cell phenotype classification","volume":"242","author":"Sj\u00f6dahl","year":"2017","journal-title":"J. Pathol"},{"key":"2023020108525756300_btab763-B26","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1186\/s12859-018-2246-7","article-title":"Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons","volume":"19","author":"Smid","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023020108525756300_btab763-B27","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1186\/s12920-016-0185-6","article-title":"Breast cancer subtype predictors revisited: from consensus to concordance?","volume":"9","author":"Sontrop","year":"2016","journal-title":"BMC Med. Genomics"},{"key":"2023020108525756300_btab763-B28","doi-asserted-by":"crossref","first-page":"8418","DOI":"10.1073\/pnas.0932692100","article-title":"Repeated observation of breast tumor subtypes in independent gene expression data sets","volume":"100","author":"Sorlie","year":"2003","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020108525756300_btab763-B29","doi-asserted-by":"crossref","first-page":"3896","DOI":"10.1093\/bioinformatics\/bti631","article-title":"Simple decision rules for classifying human cancers from gene expression profiles","volume":"21","author":"Tan","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020108525756300_btab763-B30","doi-asserted-by":"crossref","first-page":"6567","DOI":"10.1073\/pnas.082099299","article-title":"Diagnosis of multiple cancer types by shrunken centroids of gene expression","volume":"99","author":"Tibshirani","year":"2002","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020108525756300_btab763-B31","doi-asserted-by":"crossref","first-page":"17","DOI":"10.18637\/jss.v077.i01","article-title":"ranger: a fast implementation of random forests for high dimensional data in C++ and R","volume":"77","author":"Wright","year":"2017","journal-title":"J. Stat. Softw"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab763\/41355379\/btab763.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/4\/1022\/49008610\/btab763.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/4\/1022\/49008610\/btab763.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T20:14:10Z","timestamp":1675282450000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/4\/1022\/6426072"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,11,12]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,1,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab763","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,2,15]]},"published":{"date-parts":[[2021,11,12]]}}}