{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T22:22:24Z","timestamp":1761862944742,"version":"3.37.3"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Variable selection is a typical approach used for molecular-signature and biomarker discovery; however, its application to survival data is often complicated by censored samples. We propose a new algorithm for variable selection suitable for the analysis of high-dimensional, right-censored data called Survival Max\u2013Min Parents and Children (SMMPC). The algorithm is conceptually simple, scalable, based on the theory of Bayesian networks (BNs) and the Markov blanket and extends the corresponding algorithm (MMPC) for classification tasks. The selected variables have a structural interpretation: if T is the survival time (in general the time-to-event), SMMPC returns the variables adjacent to T in the BN representing the data distribution. The selected variables also have a causal interpretation that we discuss.<\/jats:p><jats:p>Results: We conduct an extensive empirical analysis of prototypical and state-of-the-art variable selection algorithms for survival data that are applicable to high-dimensional biological data. SMMPC selects on average the smallest variable subsets (less than a dozen per dataset), while statistically significantly outperforming all of the methods in the study returning a manageable number of genes that could be inspected by a human expert.<\/jats:p><jats:p>Availability: Matlab and R code are freely available from http:\/\/www.mensxmachina.org<\/jats:p><jats:p>Contact: \u00a0vlagani@ics.forth.gr<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq261","type":"journal-article","created":{"date-parts":[[2010,6,3]],"date-time":"2010-06-03T01:10:05Z","timestamp":1275527405000},"page":"1887-1894","source":"Crossref","is-referenced-by-count":23,"title":["Structure-based variable selection for survival data"],"prefix":"10.1093","volume":"26","author":[{"given":"Vincenzo","family":"Lagani","sequence":"first","affiliation":[{"name":"1 Institute of Computer Science, Foundation for Research and Technology \u2013 Hellas (FORTH) and 2Computer Science Department, University of Crete, Heraklion, Greece"}]},{"given":"Ioannis","family":"Tsamardinos","sequence":"additional","affiliation":[{"name":"1 Institute of Computer Science, Foundation for Research and Technology \u2013 Hellas (FORTH) and 2Computer Science Department, University of Crete, Heraklion, Greece"}]}],"member":"286","published-online":{"date-parts":[[2010,6,2]]},"reference":[{"key":"2023012507580150500_B1","first-page":"21","article-title":"HITON, a novel Markov blanket algorithm for optimal variable selection","author":"Aliferis","year":"2003","journal-title":"Poceedings of the American Medical Informatics Association"},{"key":"2023012507580150500_B2","first-page":"171","article-title":"Local causal and Markov blanket induction algorithms for causal discovery and feature selection for classification part i: algorithms and empirical evaluation","volume":"11","author":"Aliferis","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"2023012507580150500_B3","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1371\/journal.pbio.0020108","article-title":"Semi-supervised methods to predict patient survival from gene expression data","volume":"2","author":"Bair","year":"2004","journal-title":"PLoS Biol."},{"key":"2023012507580150500_B4","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/nm733","article-title":"Gene-expression profiles predict survival of patients with lung adenocarcinoma","volume":"8","author":"Beer","year":"2002","journal-title":"Nat. Med."},{"key":"2023012507580150500_B5","doi-asserted-by":"crossref","first-page":"2080","DOI":"10.1093\/bioinformatics\/btm305","article-title":"Predicting survival from microarray data a comparative study","volume":"23","author":"Bovelstad","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012507580150500_B6","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"2023012507580150500_B7","article-title":"Markov blanket-based variable selection in feature space","volume-title":"Technical Report DSL TR-08-01.","author":"Brown","year":"2008"},{"key":"2023012507580150500_B8","doi-asserted-by":"crossref","first-page":"1605","DOI":"10.1056\/NEJMoa031046","article-title":"Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia","volume":"350","author":"Bullinger","year":"2004","journal-title":"N. Engl. J. Med."},{"key":"2023012507580150500_B9","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1111\/j.2517-6161.1972.tb00899.x","article-title":"Regression models and life-tables","volume":"34","author":"Cox","year":"1972","journal-title":"J. R. Stat. Soc."},{"key":"2023012507580150500_B10","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/j.stamet.2005.02.003","article-title":"Asymptotics of cross-validated risk estimation in estimator selection and performance assessment","volume":"2","author":"Dudoit","year":"2005","journal-title":"Stat. Methodol."},{"key":"2023012507580150500_B11","doi-asserted-by":"crossref","first-page":"1475","DOI":"10.2307\/2533672","article-title":"Bayesian variable selection method for censored survival data","volume":"54","author":"Faraggi","year":"1998","journal-title":"Biometrics"},{"key":"2023012507580150500_B12","doi-asserted-by":"crossref","first-page":"2529","DOI":"10.1002\/(SICI)1097-0258(19990915\/30)18:17\/18<2529::AID-SIM274>3.0.CO;2-5","article-title":"Assessment and comparison of prognostic classification schemes for survival data","volume":"18","author":"Graf","year":"1999","journal-title":"Stat. Med."},{"key":"2023012507580150500_B13","first-page":"545","article-title":"Result analysis of the NIPS 2003 feature selection challenge","volume-title":"Advances in Neural Information Processing Systems 17.","author":"Guyon","year":"2004"},{"key":"2023012507580150500_B14","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4757-3462-1","volume-title":"Regression Modeling Strategies, With Applications to Linear Models, Logistic Regression, and Survival Analysis.","author":"Harrel","year":"2001"},{"key":"2023012507580150500_B15","doi-asserted-by":"crossref","DOI":"10.1186\/gb-2001-2-1-research0003","article-title":"Supervised harvesting of expression trees","volume":"2","author":"Hastie","year":"2001","journal-title":"Genome Biol."},{"key":"2023012507580150500_B16","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1093\/biomet\/57.1.97","article-title":"Monte carlo sampling methods using Markov chains and their applications","volume":"57","author":"Hastings","year":"1970","journal-title":"Biometrika"},{"key":"2023012507580150500_B17","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1111\/j.0006-341X.2000.00337.x","article-title":"Time-dependent ROC curves for censored survival data and a diagnostic marker","volume":"56","author":"Heagerty","year":"2000","journal-title":"Biometrics"},{"key":"2023012507580150500_B18","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1080\/00401706.2000.10485983","article-title":"Ridge Regression: biased estimation for nonorthogonal problems","volume":"42","author":"Hoerl","year":"2000","journal-title":"Technometrics"},{"key":"2023012507580150500_B19","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1002\/sim.1593","article-title":"Bagging survival trees","volume":"23","author":"Hothorn","year":"2004","journal-title":"Stat. Med."},{"key":"2023012507580150500_B20","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1023\/A:1007631014630","article-title":"Multiple comparisons in induction algorithms","volume":"38","author":"Jensen","year":"2000","journal-title":"Mach. Learn."},{"key":"2023012507580150500_B21","doi-asserted-by":"crossref","DOI":"10.1007\/b97377","volume-title":"Survival Analysis: Techniques for Censored and Truncated Data.","author":"Klein","year":"2003"},{"key":"2023012507580150500_B22","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1016\/S0004-3702(97)00043-X","article-title":"Wrappers for feature subset selection","volume":"97","author":"Kohavi","year":"1997","journal-title":"Artif. Intell."},{"key":"2023012507580150500_B23","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1093\/bioinformatics\/bth900","article-title":"Partial Cox regression analysis for high-dimensional microarray gene expression data","volume":"20","author":"Li","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012507580150500_B24","doi-asserted-by":"crossref","first-page":"1625","DOI":"10.1093\/bioinformatics\/18.12.1625","article-title":"Partial least squares proportional hazard regression for application to DNA microarray survival data","volume":"18","author":"Nguyen","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012507580150500_B25","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1395","article-title":"Dimension reduction of microarray data in the presence of a censored survival response: a simulation study","volume":"8","author":"Nguyen","year":"2009","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"volume-title":"Causality, Models, Reasoning, and Inference.","year":"2000","author":"Pearl","key":"2023012507580150500_B26"},{"key":"2023012507580150500_B27","first-page":"237","article-title":"Neural networks as statistical methods in survival analysis","volume-title":"Artificial Neural Networks: Prospects for Medicine.","author":"Ripley","year":"1998"},{"key":"2023012507580150500_B28","doi-asserted-by":"crossref","first-page":"1937","DOI":"10.1056\/NEJMoa012914","article-title":"The use of molecular profiling to predict survival after chemotherapy for diffuse large B-cell lymphoma","volume":"346","author":"Rosenwald","year":"2002","journal-title":"N. Engl. J. Med."},{"key":"2023012507580150500_B29","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/S1535-6108(03)00028-X","article-title":"The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma","volume":"3","author":"Rosenwald","year":"2003","journal-title":"Cancer Cell"},{"key":"2023012507580150500_B30","doi-asserted-by":"crossref","first-page":"2262","DOI":"10.1093\/bioinformatics\/btl362","article-title":"Bayesian variable selection for the analysis of microarray data with censored outcomes","volume":"22","author":"Sha","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012507580150500_B31","first-page":"655","article-title":"A support vector approach to censored targets","volume-title":"ICDM '07: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining.","author":"Shivaswamy","year":"2007"},{"key":"2023012507580150500_B32","doi-asserted-by":"crossref","first-page":"1775","DOI":"10.1093\/bioinformatics\/btp322","article-title":"Gradient lasso for Cox proportional hazards model","volume":"25","author":"Sohn","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012507580150500_B33","volume-title":"Causation, Prediction, and Search.","author":"Spirtes","year":"2000","edition":"2"},{"key":"2023012507580150500_B34","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1016\/j.ijmedinf.2005.05.002","article-title":"GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data","volume":"74","author":"Statnikov","year":"2005","journal-title":"Int. J. Med. Inform."},{"key":"2023012507580150500_B35","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1002\/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3","article-title":"The lasso method for variable selection in the Cox model","volume":"16","author":"Tibshirani","year":"1997","journal-title":"Stat. Med."},{"key":"2023012507580150500_B36","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1002\/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3","article-title":"The lasso method for variable selection in the Cox model","volume":"16","author":"Tibshirani","year":"1997","journal-title":"Stat. Med."},{"key":"2023012507580150500_B37","article-title":"Towards principled feature selection: relevancy, filters and wrappers","volume-title":"Ninth International Workshop on Artificial Intelligence and Statistics 2003.","author":"Tsamardinos","year":"2003"},{"key":"2023012507580150500_B38","first-page":"1100","article-title":"Bounding the false discovery rate in local bayesian network learning","volume-title":"AAAI'08: Proceedings of the 23rd National Conference on Artificial Intelligence.","author":"Tsamardinos","year":"2008"},{"key":"2023012507580150500_B39","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1145\/956750.956838","article-title":"Time and sample efficient discovery of Markov blankets and direct causal relations","author":"Tsamardinos","year":"2003","journal-title":"The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining"},{"key":"2023012507580150500_B40","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1007\/s10994-006-6889-7","article-title":"The Max\u2013Min Hill-Climbing Bayesian network structure learning algorithm","volume":"65","author":"Tsamardinos","year":"2006","journal-title":"Mach. Learn."},{"key":"2023012507580150500_B41","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1038\/415530a","article-title":"Gene expression profiling predicts clinical outcome of breast cancer","volume":"415","author":"van't Veer","year":"2002","journal-title":"Nature"},{"key":"2023012507580150500_B42","doi-asserted-by":"crossref","first-page":"1999","DOI":"10.1056\/NEJMoa021967","article-title":"A gene-expression signature as a predictor of survival in breast cancer","volume":"347","author":"van de Vijver","year":"2002","journal-title":"N. Engl. J. Med."},{"key":"2023012507580150500_B43","doi-asserted-by":"crossref","first-page":"1590","DOI":"10.1016\/j.csda.2008.05.021","article-title":"Survival prediction using gene expression data: a review and comparison","volume":"53","author":"van Wieringen","year":"2009","journal-title":"Comput. Stat. Data Anal."},{"key":"2023012507580150500_B44","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1177\/0962280209105024","article-title":"Survival analysis with high-dimensional covariates","volume":"19","author":"Witten","year":"2009","journal-title":"Stat. Methods Med. Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/15\/1887\/48852398\/bioinformatics_26_15_1887.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/15\/1887\/48852398\/bioinformatics_26_15_1887.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T15:55:05Z","timestamp":1740153305000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/15\/1887\/188146"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,6,2]]},"references-count":44,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2010,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq261","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2010,8,1]]},"published":{"date-parts":[[2010,6,2]]}}}