{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T21:43:28Z","timestamp":1766180608174},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1074,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Within medical research there is an increasing trend toward deriving multiple types of data from the same individual. The most effective prognostic prediction methods should use all available data, as this maximizes the amount of information used. In this article, we consider a variety of learning strategies to boost prediction performance based on the use of all available data.<\/jats:p>\n               <jats:p>Implementation: We consider data integration via the use of multiple kernel learning supervised learning methods. We propose a scheme in which feature selection by statistical score is performed separately per data type and by pathway membership. We further consider the introduction of a confidence measure for the class assignment, both to remove some ambiguously labeled datapoints from the training data and to implement a cautious classifier that only makes predictions when the associated confidence is high.<\/jats:p>\n               <jats:p>Results: We use the METABRIC dataset for breast cancer, with prediction of survival at 2000 days from diagnosis. Predictive accuracy is improved by using kernels that exclusively use those genes, as features, which are known members of particular pathways. We show that yet further improvements can be made by using a range of additional kernels based on clinical covariates such as Estrogen Receptor (ER) status. Using this range of measures to improve prediction performance, we show that the test accuracy on new instances is nearly 80%, though predictions are only made on 69.2% of the patient cohort.<\/jats:p>\n               <jats:p>Availability: \u00a0https:\/\/github.com\/jseoane\/FSMKL<\/jats:p>\n               <jats:p>Contact: \u00a0J.Seoane@bristol.ac.uk<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt610","type":"journal-article","created":{"date-parts":[[2013,10,26]],"date-time":"2013-10-26T00:23:57Z","timestamp":1382747037000},"page":"838-845","source":"Crossref","is-referenced-by-count":68,"title":["A pathway-based data integration framework for prediction of disease progression"],"prefix":"10.1093","volume":"30","author":[{"given":"Jos\u00e9 A.","family":"Seoane","sequence":"first","affiliation":[{"name":"1 MRC Centre for Causal Analyses in Translational Epidemiology, 2MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Clifton BS8 2BN, UK and 3Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK"}]},{"given":"Ian N. M.","family":"Day","sequence":"additional","affiliation":[{"name":"1 MRC Centre for Causal Analyses in Translational Epidemiology, 2MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Clifton BS8 2BN, UK and 3Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK"}]},{"given":"Tom R.","family":"Gaunt","sequence":"additional","affiliation":[{"name":"1 MRC Centre for Causal Analyses in Translational Epidemiology, 2MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Clifton BS8 2BN, UK and 3Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK"},{"name":"1 MRC Centre for Causal Analyses in Translational Epidemiology, 2MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Clifton BS8 2BN, UK and 3Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK"}]},{"given":"Colin","family":"Campbell","sequence":"additional","affiliation":[{"name":"1 MRC Centre for Causal Analyses in Translational Epidemiology, 2MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Clifton BS8 2BN, UK and 3Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK"}]}],"member":"286","published-online":{"date-parts":[[2013,10,24]]},"reference":[{"key":"2023012710444133600_btt610-B1","doi-asserted-by":"crossref","first-page":"Article 27","DOI":"10.2202\/1544-6115.1441","article-title":"Bayesian unsupervised learning with multiple data types","volume":"8","author":"Agius","year":"2009","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012710444133600_btt610-B2","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1023\/B:JOMG.0000023589.00994.5e","article-title":"p27 deregulation in breast cancer: prognostic significance and implications for therapy","volume":"9","author":"Alkarain","year":"2004","journal-title":"J. Mammary Gland Biol. Neoplasia"},{"key":"2023012710444133600_btt610-B3","volume-title":"SEER Cancer Statistics Review, 1975-2007","author":"Altekruse","year":"2010"},{"key":"2023012710444133600_btt610-B4","article-title":"Multiple gaussian process models","author":"Archambeau","year":"2011"},{"key":"2023012710444133600_btt610-B5","doi-asserted-by":"crossref","DOI":"10.1145\/1015330.1015424","article-title":"Multiple kernel learning, conic duality and the SMO algorithm","volume-title":"Proceedings of the Twenty-first International Conference on Machine Learning (ICML 2004)","author":"Bach","year":"2004"},{"key":"2023012710444133600_btt610-B6","doi-asserted-by":"crossref","first-page":"e1003047","DOI":"10.1371\/journal.pcbi.1003047","article-title":"Improving breast cancer survival analysis through competition-based multidimensional modeling","volume":"9","author":"Bilal","year":"2013","journal-title":"PLoS Comput. Biol."},{"key":"2023012710444133600_btt610-B7","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"2023012710444133600_btt610-B8","doi-asserted-by":"crossref","first-page":"1183","DOI":"10.1093\/jnci\/djj329","article-title":"Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer","volume":"98","author":"Buyse","year":"2006","journal-title":"J. Natl Cancer Inst."},{"key":"2023012710444133600_btt610-B9","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-01552-6","volume-title":"Learning with Support Vector Machines","author":"Campbell","year":"2011"},{"key":"2023012710444133600_btt610-B10","first-page":"18","article-title":"Ensemble selection from libraries of models","volume-title":"Proceedings of the 21st International Conference on Machine Learning","author":"Caruana","year":"2004"},{"key":"2023012710444133600_btt610-B11","doi-asserted-by":"crossref","DOI":"10.1038\/msb.2009.69","article-title":"Harnessing gene expression to identify the genetic basis of drug resistance","volume":"5","author":"Chen","year":"2009","journal-title":"Mol. Syst. Biol."},{"key":"2023012710444133600_btt610-B12","doi-asserted-by":"crossref","first-page":"e1002956","DOI":"10.1371\/journal.pcbi.1002956","article-title":"Integrative analysis using module-guided random forests reveals correlated genetic factors related to mouse weight","volume":"9","author":"Chen","year":"2013","journal-title":"PLoS Comput. Biol."},{"key":"2023012710444133600_btt610-B13","doi-asserted-by":"crossref","first-page":"181ra50","DOI":"10.1126\/scitranslmed.3005974","article-title":"Development of a prognostic model for breast cancer survival in an open challenge environment","volume":"5","author":"Cheng","year":"2013","journal-title":"Sci. Transl. Med."},{"key":"2023012710444133600_btt610-B14","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1038\/nature10983","article-title":"The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups","volume":"486","author":"Curtis","year":"2012","journal-title":"Nature"},{"key":"2023012710444133600_btt610-B16","doi-asserted-by":"crossref","DOI":"10.1109\/ICMLA.2008.124","article-title":"Inferring sparse kernel combinations and relevance vectors: an application to subcellular localization of proteins","volume-title":"Proceedings of the Seventh International Conference on Machine Learning and Applications (ICML\u201908)","author":"Damoulas","year":"2008"},{"key":"2023012710444133600_btt610-B17","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/S0304-4165(99)00167-1","article-title":"Glycoprotein glycosylation and cancer progression","volume":"1473","author":"Dennis","year":"1999","journal-title":"Biochim. Biophys. Acta"},{"key":"2023012710444133600_btt610-B18","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"2023012710444133600_btt610-B19","article-title":"Bayesian efficient multiple kernel learning","volume-title":"Proceedings of the 29th International Conference on Machine Learning (ICML 2012)","author":"G\u00f6nen","year":"2012"},{"key":"2023012710444133600_btt610-B20","first-page":"2211","article-title":"Multiple kernel learning algorithms","volume":"12","author":"G\u00f6nen","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"2023012710444133600_btt610-B21","doi-asserted-by":"crossref","first-page":"i391","DOI":"10.1093\/bioinformatics\/btq174","article-title":"Multivariate multi-way analysis of multi-source data","volume":"26","author":"Huopaniemi","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012710444133600_btt610-B22","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1093\/jnci\/djq524","article-title":"Gene pathways associated with prognosis and chemotherapy sensitivity in molecular subtypes of breast cancer","volume":"103","author":"Iwamoto","year":"2011","journal-title":"J. Natl Cancer Inst."},{"key":"2023012710444133600_btt610-B23","doi-asserted-by":"crossref","first-page":"2044","DOI":"10.1002\/ijc.27884","article-title":"Germline variation in TP53 regulatory network genes associates with breast cancer survival and treatment outcome","volume":"132","author":"Jamshidi","year":"2013","journal-title":"Int. J. Cancer"},{"key":"2023012710444133600_btt610-B24","first-page":"953","article-title":"Lp-norm multiple kernel learning","volume":"12","author":"Kloft","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"2023012710444133600_btt610-B25","first-page":"27","article-title":"A statistical framework for genomic data fusion","volume":"5","author":"Lanckriet","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"2023012710444133600_btt610-B26","doi-asserted-by":"crossref","first-page":"1192","DOI":"10.1093\/bioinformatics\/btq107","article-title":"Integrative mixture of experts to combine clinical factors and gene markers","volume":"26","author":"L\u00ea Cao","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012710444133600_btt610-B27","doi-asserted-by":"crossref","first-page":"e30880","DOI":"10.1371\/journal.pone.0030880","article-title":"Gene-expression signature predicts postoperative recurrence in stage I non-small cell lung cancer patients","volume":"7","author":"Lu","year":"2012","journal-title":"PLoS One"},{"key":"2023012710444133600_btt610-B28","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1097\/01.pai.0000213130.63417.b3","article-title":"COX-2 expression in invasive breast cancer: correlation with prognostic parameters and outcome","volume":"15","author":"Nassar","year":"2007","journal-title":"Appl. Immunohistochem. Mol. Morphol."},{"key":"2023012710444133600_btt610-B29","first-page":"61","article-title":"Probabilistic outputs for support vector machines and comparison to regularised likelihood methods","volume-title":"Advances in Large Margin Classifiers","author":"Platt","year":"1999"},{"key":"2023012710444133600_btt610-B30","first-page":"2491","article-title":"SimpleMKL","volume":"9","author":"Rakotomamonjy","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"2023012710444133600_btt610-B31","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1007\/s10994-009-5155-1","article-title":"Infinite factorization of multiple non-parametric views","volume":"79","author":"Rogers","year":"2010","journal-title":"Mach. Learn."},{"key":"2023012710444133600_btt610-B33","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1186\/1471-2407-10-178","article-title":"Higher expression levels of SOCS 1,3,4,7 are associated with earlier tumour stage and better clinical outcome in human breast cancer","volume":"10","author":"Sasi","year":"2010","journal-title":"BMC Cancer"},{"key":"2023012710444133600_btt610-B34","doi-asserted-by":"crossref","first-page":"i158","DOI":"10.1093\/bioinformatics\/btq210","article-title":"Discovering transcriptional modules by Bayesian data integration","volume":"26","author":"Savage","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012710444133600_btt610-B35","volume-title":"Learning with Kernels","author":"Scholkopf","year":"2002"},{"key":"2023012710444133600_btt610-B36","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809682","volume-title":"Kernel Methods for Pattern Analysis","author":"Shawe-Taylor","year":"2004"},{"key":"2023012710444133600_btt610-B39","article-title":"Controlling the sensitivity of support vector machines","author":"Veropoulos","year":"1999"},{"key":"2023012710444133600_btt610-B41","doi-asserted-by":"crossref","first-page":"R1","DOI":"10.1186\/bcr2464","article-title":"PREDICT: a new UK prognostic model that predicts survival following surgery for invasive breast cancer","volume":"12","author":"Wishart","year":"2010","journal-title":"Breast Cancer Res."},{"key":"2023012710444133600_btt610-B42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2202\/1544-6115.1470","article-title":"Extensions of sparse canonical correlation analysis with applications to genomic data","volume":"8","author":"Witten","year":"2009","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012710444133600_btt610-B43","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1186\/1471-2105-10-267","article-title":"Enhanced protein fold recognition through a novel data integration approach","volume":"10","author":"Ying","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012710444133600_btt610-B44","first-page":"427","article-title":"Class prediction from disparate biological data sources using an iterative multi-kernel algorithm","volume":"5780","author":"Ying","year":"2009","journal-title":"Lect. Notes Bioinform."},{"key":"2023012710444133600_btt610-B46","doi-asserted-by":"crossref","first-page":"e1002227","DOI":"10.1371\/journal.pcbi.1002227","article-title":"Patient-specific data fusion defines prognostic cancer subtypes","volume":"7","author":"Yuan","year":"2011","journal-title":"PLoS Comput. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/6\/838\/48919619\/bioinformatics_30_6_838.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/6\/838\/48919619\/bioinformatics_30_6_838.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T11:10:08Z","timestamp":1674817808000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/6\/838\/285849"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,10,24]]},"references-count":40,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2014,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt610","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,3,15]]},"published":{"date-parts":[[2013,10,24]]}}}