{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T17:51:14Z","timestamp":1763747474175},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Supervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies.<\/jats:p><jats:p>Results: We proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients.<\/jats:p><jats:p>Availability and Implementation: An R package MetaKTSP is available online. (http:\/\/tsenglab.biostat.pitt.edu\/software.htm).<\/jats:p><jats:p>Contact: ctseng@pitt.edu<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw115","type":"journal-article","created":{"date-parts":[[2016,3,4]],"date-time":"2016-03-04T02:13:45Z","timestamp":1457057625000},"page":"1966-1973","source":"Crossref","is-referenced-by-count":36,"title":["MetaKTSP: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis"],"prefix":"10.1093","volume":"32","author":[{"given":"SungHwan","family":"Kim","sequence":"first","affiliation":[{"name":"1 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA"},{"name":"2 Department of Statistics, Korea University, Seoul, South Korea"}]},{"given":"Chien-Wei","family":"Lin","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA"}]},{"given":"George. C.","family":"Tseng","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA"},{"name":"3 Department of Computational and Systems Biology"},{"name":"4 Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA"}]}],"member":"286","published-online":{"date-parts":[[2016,3,2]]},"reference":[{"key":"2023020112332728200_btw115-B1","doi-asserted-by":"crossref","first-page":"1469","DOI":"10.1214\/14-AOAS738","article-title":"Rank discriminants for predicting phenotypes from RNA expression","volume":"8","author":"Afsari","year":"2014","journal-title":"Ann. Appl. Stat"},{"key":"2023020112332728200_btw115-B2","doi-asserted-by":"crossref","first-page":"i105","DOI":"10.1093\/bioinformatics\/btg385","article-title":"Adjustment of systematic microarray data biases","volume":"20","author":"Benito","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020112332728200_btw115-B3","doi-asserted-by":"crossref","first-page":"i105","DOI":"10.1093\/bioinformatics\/btu279","article-title":"Cross-study validation for the assessment of prediction algorithms","volume":"30","author":"Bernau","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020112332728200_btw115-B4","doi-asserted-by":"crossref","first-page":"1655","DOI":"10.1093\/bioinformatics\/btp292","article-title":"Ratio adjustment and calibration scheme for gene-wise normalization to enhance microarray inter-study prediction","volume":"25","author":"Cheng","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020112332728200_btw115-B5","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1177\/1066896908328577","article-title":"Microarray-based gene expression profiling as a clinical tool for breast cancer management: are we there yet?","volume":"17","author":"Correa","year":"2009","journal-title":"Int. J. Surg. Pathol"},{"key":"2023020112332728200_btw115-B8","doi-asserted-by":"crossref","first-page":"e26023.","DOI":"10.1371\/journal.pone.0026023","article-title":"Hormone receptor and ERBB2 status in gene expression profiles of human breast tumor samples","volume":"6","author":"Dvorkin-Gheva","year":"2011","journal-title":"Plos One"},{"key":"2023020112332728200_btw115-B100","article-title":"Statistical Methods for Research Workers","author":"Fisher","year":"1925"},{"key":"2023020112332728200_btw115-B9","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1080\/00031305.1948.10483405","article-title":"Questions and answers #14","volume":"2","author":"Fisher","year":"1948","journal-title":"Am. Stat"},{"key":"2023020112332728200_btw115-B10","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1093\/biostatistics\/kxm033","article-title":"Cross-study validation and combined analysis of gene expression microarray data","volume":"9","author":"Garrett-Mayer","year":"2008","journal-title":"Biostatistics"},{"key":"2023020112332728200_btw115-B11","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1071","article-title":"Expression Profiles from Pairwise mRNA Comparisons","volume":"3","author":"Geman","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023020112332728200_btw115-B14","doi-asserted-by":"crossref","first-page":"6097","DOI":"10.1158\/0008-5472.CAN-12-3232","article-title":"Why your new cancer biomarker may never work: recurrent patterns and remarkable diversity in biomarker failures","volume":"72","author":"Kern","year":"2012","journal-title":"Cancer Res"},{"key":"2023020112332728200_btw115-B15","doi-asserted-by":"crossref","first-page":"e15.","DOI":"10.1093\/nar\/gkr1071","article-title":"MetaQC: objective quality control and inclusion\/exclusion criteria for genomic meta-analysis","volume":"40","author":"Kang","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020112332728200_btw115-B16","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1038\/nbt1217","article-title":"A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies","volume":"24","author":"Kuo","year":"2006","journal-title":"Nat. Biotechnol"},{"key":"2023020112332728200_btw115-B18","doi-asserted-by":"crossref","first-page":"e110840","DOI":"10.1371\/journal.pone.0110840","article-title":"Measuring the effect of inter-study variability on estimating prediction error","volume":"9","author":"Ma","year":"2014","journal-title":"PLoS One"},{"key":"2023020112332728200_btw115-B19","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1016\/j.ccr.2004.05.015","article-title":"A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen","volume":"5","author":"Ma","year":"2004","journal-title":"Cancer Cell"},{"key":"2023020112332728200_btw115-B22","doi-asserted-by":"crossref","first-page":"336.","DOI":"10.1186\/1471-2164-14-336","article-title":"A simple and reproducible breast cancer prognostic test","volume":"17","author":"Marchionni","year":"2013","journal-title":"BMC Genomics"},{"key":"2023020112332728200_btw115-B201","doi-asserted-by":"crossref","first-page":"1151","DOI":"10.1038\/nbt1239","article-title":"The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements","volume":"24","author":"MAQC","year":"2006","journal-title":"Nat. Biotechnol."},{"key":"2023020112332728200_btw115-B25","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1177\/1740774513499458","article-title":"Development of omics-based clinical tests for prognosis and therapy selection: the challenge of achieving statistical robustness and clinical utility","volume":"10","author":"McShane","journal-title":"Clin. Trials"},{"key":"2023020112332728200_btw115-B26","doi-asserted-by":"crossref","first-page":"2586","DOI":"10.1093\/bioinformatics\/btq472","article-title":"Module-based prediction approach for robust inter-study predictions in microarray data","volume":"26","author":"Mi","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020112332728200_btw115-B27","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1186\/1471-2164-5-71","article-title":"Inter-platform comparability of microarrays in acute lymphoblastic leukemia","volume":"5","author":"Mitchell","year":"2004","journal-title":"BMC Genomics"},{"key":"2023020112332728200_btw115-B28","doi-asserted-by":"crossref","first-page":"1390","DOI":"10.1038\/onc.2010.525","article-title":"Genome-wide methylation analysis identifies epigenetically inactivated candidate tumour suppressor genes in renal cell carcinoma","volume":"30","author":"Morris","year":"2011","journal-title":"Oncogene"},{"key":"2023020112332728200_btw115-B29","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1613\/jair.614","article-title":"Popular ensemble methods: an empirical study","volume":"11","author":"Opitz","year":"1999","journal-title":"J. Artif. Intell. Res"},{"key":"2023020112332728200_btw115-B30","doi-asserted-by":"crossref","first-page":"3867","DOI":"10.1214\/09-AOS697","article-title":"Karl Pearson\u2019s meta-analysis revisited","volume":"37","author":"Owen","year":"2009","journal-title":"Ann. Stat"},{"key":"2023020112332728200_btw115-B31","doi-asserted-by":"crossref","first-page":"2817","DOI":"10.1056\/NEJMoa041588","article-title":"A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer","volume":"351","author":"Paik","year":"2004","journal-title":"N. Engl. J. Med"},{"key":"2023020112332728200_btw115-B32","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1200\/JCO.2008.18.1370","article-title":"Supervised risk predictor of breast cancer based on intrinsic subtypes","volume":"27","author":"Parker","year":"2009","journal-title":"J. Clin. Oncol"},{"key":"2023020112332728200_btw115-B202","doi-asserted-by":"crossref","first-page":"3860","DOI":"10.1158\/1078-0432.CCR-10-0889","article-title":"Clinical implications of gene dosage and gene expression patterns in diploid breast carcinoma","volume":"16","author":"Parris","year":"2010","journal-title":"Clin. Cancer Res."},{"key":"2023020112332728200_btw115-B33","doi-asserted-by":"crossref","first-page":"e84428","DOI":"10.1371\/journal.pone.0084428","article-title":"DACH1: its role as a classifier of long term good prognosis in luminal breast cancer","volume":"9","author":"Powe","year":"2014","journal-title":"PLoS One"},{"key":"2023020112332728200_btw115-B34","doi-asserted-by":"crossref","first-page":"3414","DOI":"10.1073\/pnas.0611373104","article-title":"Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas","volume":"104","author":"Price","year":"2007","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020112332728200_btw115-B35","doi-asserted-by":"crossref","first-page":"15149","DOI":"10.1073\/pnas.211566398","article-title":"Multiclass cancer diagnosis using tumor gene expression signatures","author":"Ramaswamy","year":"2001","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020112332728200_btw115-B36","doi-asserted-by":"crossref","first-page":"2589","DOI":"10.1182\/blood-2007-09-112730","article-title":"A 2-gene classifier for predicting response to the farnesyltransferase inhibitor tipifarnib in acute myeloid leukemia","volume":"5","author":"Raponi","year":"2008","journal-title":"Blood"},{"key":"2023020112332728200_btw115-B203","doi-asserted-by":"crossref","first-page":"3870","DOI":"10.1158\/0008-5472.CAN-09-4120","article-title":"FOXC1 is a potential prognostic biomarker with functional significance in basal-like breast cancer","volume":"70","author":"Ray","year":"2010","journal-title":"Cancer Res."},{"key":"2023020112332728200_btw115-B37","doi-asserted-by":"crossref","first-page":"927","DOI":"10.1093\/jnci\/dji153","article-title":"Limits of predictive models using microarray data for breast cancer clinical treatment outcome","volume":"97","author":"Reid","year":"2005","journal-title":"J. Natl. Cancer Inst"},{"key":"2023020112332728200_btw115-B38","doi-asserted-by":"crossref","first-page":"e5540","DOI":"10.1371\/journal.pone.0005540","article-title":"Intra-platform repeatability and inter-platform comparability of microRNA microarray technology","volume":"4","author":"Sato","year":"2009","journal-title":"PLoS One"},{"key":"2023020112332728200_btw115-B39","doi-asserted-by":"crossref","first-page":"1154","DOI":"10.1093\/bioinformatics\/btn083","article-title":"Merging two gene-expression studies via cross-platform normalization","volume":"24","author":"Shabalin","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020112332728200_btw115-B42","volume-title":"The American Soldier: Adjustment during Army Life, Vol. 1","author":"Stouffer","year":"1949"},{"key":"2023020112332728200_btw115-B43","doi-asserted-by":"crossref","first-page":"439.","DOI":"10.1186\/1471-2105-9-439","article-title":"CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data","volume":"9","author":"Slawski","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020112332728200_btw115-B44","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1186\/1471-2105-9-63","article-title":"Meta-analysis of breast cancer microarray studies in conjunction with conserved cis-elements suggest patterns for coordinate regulation","volume":"28","author":"Smith","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020112332728200_btw115-B45","doi-asserted-by":"crossref","first-page":"4111","DOI":"10.1200\/JCO.2010.28.4273","article-title":"Genomic index of sensitivity to endocrine therapy for breast cancer","volume":"28","author":"Symmans","year":"2010","journal-title":"J. Clin. Oncol"},{"key":"2023020112332728200_btw115-B46","doi-asserted-by":"crossref","first-page":"3896","DOI":"10.1093\/bioinformatics\/bti631","article-title":"Simple decision rules for classifying human cancers from gene expression profiles","volume":"21","author":"Tan","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020112332728200_btw115-B47","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1198\/jasa.2009.0037","article-title":"A statistical framework to infer functional gene associations from multiple biologically interrelated microarray experiments","volume":"104","author":"Teng","year":"2007","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020112332728200_btw115-B48","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4137\/BIC.S3793","article-title":"Identification of gene expression signature in estrogen receptor positive breast carcinoma","volume":"2","author":"Thakkar","year":"2010","journal-title":"Biomark. Cancer"},{"key":"2023020112332728200_btw115-B50","doi-asserted-by":"crossref","first-page":"R37","DOI":"10.1186\/bcr2088","article-title":"Evaluation of biological pathways involved in chemotherapy response in breast cancer","volume":"10","author":"Tordai","year":"2008","journal-title":"Breast Cancer Res"},{"key":"2023020112332728200_btw115-B51","doi-asserted-by":"crossref","first-page":"3785","DOI":"10.1093\/nar\/gkr1265","article-title":"Comprehensive literature review and statistical considerations for microarray meta-analysis","volume":"40","author":"Tseng","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020112332728200_btw115-B52","doi-asserted-by":"crossref","first-page":"7669","DOI":"10.1038\/sj.onc.1207966","article-title":"Mutation of GATA3 in human breast tumors","volume":"23","author":"Usary","year":"2004","journal-title":"Oncogene"},{"key":"2023020112332728200_btw115-B53","doi-asserted-by":"crossref","first-page":"1999","DOI":"10.1056\/NEJMoa021967","article-title":"A gene-expression signature as a predictor of survival in breast cancer","volume":"347","author":"van de Vijver","year":"2002","journal-title":"N. Engl. J. Med"},{"key":"2023020112332728200_btw115-B54","doi-asserted-by":"crossref","first-page":"1648","DOI":"10.1172\/JCI74440","article-title":"Tumor cell migration screen identifies SRPK1 as breast cancer metastasis determinant","volume":"125","author":"van Roosmalen","year":"2015","journal-title":"J. Clin. Invest"},{"key":"2023020112332728200_btw115-B55","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1038\/415530a","article-title":"Gene expression profiling predicts clinical outcome of breast cancer","volume":"415","author":"van\u2019t Veer","year":"2002","journal-title":"Nature"},{"key":"2023020112332728200_btw115-B56","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/1471-2105-13-S3-S13","article-title":"Detecting disease-associated genes with confounding variable adjustment and the impact on genomic meta-analysis: With application to major depressive disorder","volume":"13","author":"Wang","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020112332728200_btw115-B58","doi-asserted-by":"crossref","first-page":"3905","DOI":"10.1093\/bioinformatics\/bti647","article-title":"Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data","volume":"20","author":"Xu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020112332728200_btw115-B59","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1186\/1471-2105-9-125","article-title":"Merging microarray data from separate breast cancer studies provides a robust prognostic test","volume":"9","author":"Xu","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020112332728200_btw115-B60","doi-asserted-by":"crossref","first-page":"4657","DOI":"10.1038\/onc.2008.101","article-title":"PCDH8, the human homolog of PAPC, is a candidate tumor suppressor of breast cancer","volume":"27","author":"Yu","year":"2008","journal-title":"Oncogene"},{"key":"2023020112332728200_btw115-B61","doi-asserted-by":"crossref","first-page":"1785","DOI":"10.3892\/or.2012.1997","article-title":"Frequent silencing of protocadherin 8 by promoter methylation, a candidate tumor suppressor for human gastric cancer","volume":"28","author":"Zhang","year":"2012","journal-title":"Oncol. Rep"},{"key":"2023020112332728200_btw115-B62","doi-asserted-by":"crossref","first-page":"4196","DOI":"10.1158\/1078-0432.CCR-13-0804","article-title":"Breast cancer index identifies early-stage estrogen receptor-positive breast cancer patients at risk for early- and late-distant recurrence","volume":"19","author":"Zhang","year":"2013","journal-title":"Clin. Cancer Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/13\/1966\/49019785\/bioinformatics_32_13_1966.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/13\/1966\/49019785\/bioinformatics_32_13_1966.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,14]],"date-time":"2024-06-14T19:29:30Z","timestamp":1718393370000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/13\/1966\/1743805"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,3,2]]},"references-count":53,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2016,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw115","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,7,1]]},"published":{"date-parts":[[2016,3,2]]}}}