{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T08:56:46Z","timestamp":1772009806084,"version":"3.50.1"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2022,8,22]],"date-time":"2022-08-22T00:00:00Z","timestamp":1661126400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,8,22]],"date-time":"2022-08-22T00:00:00Z","timestamp":1661126400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100008730","name":"Kreftforeningen","doi-asserted-by":"publisher","award":["182672-2016"],"award-info":[{"award-number":["182672-2016"]}],"id":[{"id":"10.13039\/100008730","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008119","name":"Norwegian University of Life Sciences","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100008119","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2022,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Feature selection reduces the complexity of high-dimensional datasets and helps to gain insights into systematic variation in the data. These aspects are essential in domains that rely on model interpretability, such as life sciences. We propose a (U)ser-Guided (Bay)esian Framework for (F)eature (S)election, UBayFS, an ensemble feature selection technique embedded in a Bayesian statistical framework. Our generic approach considers two sources of information: data and domain knowledge. From data, we build an ensemble of feature selectors, described by a multinomial likelihood model. Using domain knowledge, the user guides UBayFS by weighting features and penalizing feature blocks or combinations, implemented via a Dirichlet-type prior distribution. Hence, the framework combines three main aspects: ensemble feature selection, expert knowledge, and side constraints. Our experiments demonstrate that UBayFS (a) allows for a balanced trade-off between user knowledge and data observations and (b) achieves accurate and robust results.<\/jats:p>","DOI":"10.1007\/s10994-022-06221-9","type":"journal-article","created":{"date-parts":[[2022,8,22]],"date-time":"2022-08-22T19:02:49Z","timestamp":1661194969000},"page":"3897-3923","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["A user-guided Bayesian framework for ensemble feature selection in life science applications (UBayFS)"],"prefix":"10.1007","volume":"111","author":[{"given":"Anna","family":"Jenul","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1327-4855","authenticated-orcid":false,"given":"Stefan","family":"Schrunner","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J\u00fcrgen","family":"Pilz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Oliver","family":"Tomic","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,8,22]]},"reference":[{"key":"6221_CR1","doi-asserted-by":"crossref","unstructured":"Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press.","DOI":"10.1201\/9781420050646.ptb6"},{"key":"6221_CR2","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.671","volume":"7","author":"S Bose","year":"2021","unstructured":"Bose, S., Das, C., Banerjee, A., Ghosh, K., Chattopadhyay, M., Chattopadhyay, S., & Barik, A. (2021). An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples. Peer J Computer Science, 7, e671.","journal-title":"Peer J Computer Science"},{"key":"6221_CR3","doi-asserted-by":"crossref","unstructured":"Brahim, A. B., & Limam, M. (2014). New prior knowledge based extensions for stable feature selection. In 2014 6th international conference of soft computing and pattern recognition (SoCPaR) (pp. 306\u2013311).","DOI":"10.1109\/SOCPAR.2014.7008024"},{"issue":"1","key":"6221_CR4","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5\u201332.","journal-title":"Machine Learning"},{"key":"6221_CR5","unstructured":"Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Taylor & Francis."},{"key":"6221_CR6","doi-asserted-by":"crossref","unstructured":"Cheng, T.-H., Wei, C.-P. & Tseng, V.S. (2006). Feature selection for medical data mining: Comparisons of expert judgment and automatic approaches. In 19th IEEE symposium on computer-based medical systems (CBMS\u201906) (p. 165-170).","DOI":"10.1109\/CBMS.2006.87"},{"key":"6221_CR7","unstructured":"Chung, D., Chun, H. & Keles, S. (2019). spls: sparse partial least squares (SPLS) regression and classification [Computer software manual]. R package version 2.2-3."},{"key":"6221_CR8","doi-asserted-by":"crossref","unstructured":"Dalton, L. A. (2013). Optimal Bayesian feature selection. In 2013 IEEE global conference on signal and information processing (p. 65-68).","DOI":"10.1109\/GlobalSIP.2013.6736814"},{"issue":"2","key":"6221_CR9","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1109\/TCBB.2006.22","volume":"3","author":"S Danziger","year":"2006","unstructured":"Danziger, S., Swamidass, S., Zeng, J., Dearth, L., Lu, Q., Chen, J., et al. (2006). Functional census of mutation sequence spaces: The example of p53 cancer rescue mutants. IEEE\/ACM Transactions on Computational Biology and Bioinformatics, 3(2), 114\u2013124.","journal-title":"IEEE\/ACM Transactions on Computational Biology and Bioinformatics"},{"key":"6221_CR10","doi-asserted-by":"crossref","unstructured":"DeGroot, M. H. (2005). Optimal statistical decisions. Wiley.","DOI":"10.1002\/0471729000"},{"issue":"5","key":"6221_CR11","doi-asserted-by":"publisher","first-page":"304","DOI":"10.1016\/0002-9149(89)90524-9","volume":"64","author":"R Detrano","year":"1989","unstructured":"Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.-J., Sandhu, S., et al. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64(5), 304\u2013310.","journal-title":"American Journal of Cardiology"},{"issue":"02","key":"6221_CR12","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1142\/S0219720005001004","volume":"3","author":"C Ding","year":"2005","unstructured":"Ding, C., & Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 3(02), 185\u2013205.","journal-title":"Journal of Bioinformatics and Computational Biology"},{"issue":"1","key":"6221_CR13","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1007\/s10994-013-5337-8","volume":"98","author":"H Elghazel","year":"2015","unstructured":"Elghazel, H., & Aussem, A. (2015). Unsupervised feature selection with ensemble learning. Machine Learning, 98(1), 157\u2013180.","journal-title":"Machine Learning"},{"key":"6221_CR14","doi-asserted-by":"crossref","unstructured":"Givens, G. H., & Hoeting, J. A. (2012). Computational statistics (Vol. 703). John Wiley & Sons.","DOI":"10.1002\/9781118555552"},{"issue":"3","key":"6221_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3383685","volume":"1","author":"O Goldstein","year":"2020","unstructured":"Goldstein, O., Kachuee, M., Karkkainen, K., & Sarrafzadeh, M. (2020). Target-focused feature selection using uncertainty measurements in healthcare data. ACM Transactions on Computing for Healthcare, 1(3), 1\u201317.","journal-title":"ACM Transactions on Computing for Healthcare"},{"issue":"5439","key":"6221_CR16","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1126\/science.286.5439.531","volume":"286","author":"TR Golub","year":"1999","unstructured":"Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531\u2013537.","journal-title":"Science"},{"issue":"17","key":"6221_CR17","first-page":"4963","volume":"62","author":"GJ Gordon","year":"2002","unstructured":"Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., et al. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 62(17), 4963\u20134967.","journal-title":"Cancer Research"},{"issue":"1","key":"6221_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1756-9966-28-103","volume":"28","author":"P Guan","year":"2009","unstructured":"Guan, P., Huang, D., He, M., & Zhou, B. (2009). Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method. Journal of Experimental & Clinical Cancer Research., 28(1), 1\u20137.","journal-title":"Journal of Experimental & Clinical Cancer Research."},{"issue":"1","key":"6221_CR19","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1023\/A:1012487302797","volume":"46","author":"I Guyon","year":"2002","unstructured":"Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1), 389\u2013422.","journal-title":"Machine Learning"},{"issue":"11","key":"6221_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v033.i11","volume":"33","author":"RKS Hankin","year":"2010","unstructured":"Hankin, R. K. S. (2010). A generalization of the Dirichlet distribution. Journal of Statistical Software, 33(11), 1\u201318.","journal-title":"Journal of Statistical Software"},{"key":"6221_CR21","doi-asserted-by":"crossref","unstructured":"Hankin, R.K.S. (2017). Partial rank data with the hyper2 package: Likelihood functions for generalized Bradley-Terry models. The R Journal, 9.","DOI":"10.32614\/RJ-2017-061"},{"issue":"6","key":"6221_CR22","doi-asserted-by":"publisher","first-page":"e0129126","DOI":"10.1371\/journal.pone.0129126","volume":"10","author":"C Higuera","year":"2015","unstructured":"Higuera, C., Gardiner, K. J., & Cios, K. J. (2015). Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome. PloS one, 10(6), e0129126.","journal-title":"PloS one"},{"key":"6221_CR23","unstructured":"Ida, Y., Fujiwara, Y. & Kashima, H. (2019). Fast sparse group lasso. Advances in neural information processing systems (Vol. 32). Curran Associates, Inc."},{"key":"6221_CR24","doi-asserted-by":"crossref","unstructured":"Jenul, A., Schrunner, S., Liland, K.H., Indahl, U.G., Futs\u00e6ther, C.M. & Tomic, O. (2021). RENT\u2014repeated elastic net technique for feature selection. IEEE Access, 9, 152333-152346.","DOI":"10.1109\/ACCESS.2021.3126429"},{"issue":"1","key":"6221_CR25","doi-asserted-by":"publisher","first-page":"298","DOI":"10.1109\/TCYB.2015.2401733","volume":"46","author":"M Liu","year":"2015","unstructured":"Liu, M., & Zhang, D. (2015). Pairwise constraint-guided sparse learning for feature selection. IEEE Transactions on Cybernetics, 46(1), 298\u2013310.","journal-title":"IEEE Transactions on Cybernetics"},{"key":"6221_CR26","first-page":"10396","volume":"33","author":"C Lyle","year":"2020","unstructured":"Lyle, C., Schut, L., Ru, R., Gal, Y., & van der Wilk, M. (2020). A Bayesian perspective on training speed and model selection. Advances in neural information processing systems, 33, 10396\u201310408.","journal-title":"Advances in neural information processing systems"},{"key":"6221_CR27","unstructured":"Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z. & Lausen, B. (2014). propOverlap: feature (gene) selection based on the proportional overlapping scores [Computer software manual]. R package version 1.0"},{"key":"6221_CR28","unstructured":"Nakajima, S., Sato, I., Sugiyama, M., Watanabe, K. & Kobayashi, H. (2014). Analysis of variational Bayesian latent Dirichlet allocation: Weaker sparsity than MAP. Advances in neural information processing systems (Vol. 27). Curran Associates, Inc."},{"issue":"174","key":"6221_CR29","first-page":"1","volume":"18","author":"S Nogueira","year":"2018","unstructured":"Nogueira, S., Sechidis, K., & Brown, G. (2018). On the stability of feature selection algorithms. Journal of Machine Learning Research, 18(174), 1\u201354.","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"6221_CR30","first-page":"85","volume":"4","author":"RB O\u2019Hara","year":"2009","unstructured":"O\u2019Hara, R. B., & Sillanp\u00e4\u00e4, M. J. (2009). A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis, 4(1), 85\u2013117.","journal-title":"Bayesian Analysis"},{"key":"6221_CR31","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825\u20132830.","journal-title":"Journal of Machine Learning Research"},{"issue":"11","key":"6221_CR32","doi-asserted-by":"publisher","first-page":"2141","DOI":"10.1007\/s10994-020-05908-1","volume":"109","author":"M Petkovi\u0107","year":"2020","unstructured":"Petkovi\u0107, M., D\u017eeroski, S., & Kocev, D. (2020). Multi-label feature ranking with ensemble methods. Machine Learning, 109(11), 2141\u20132159.","journal-title":"Machine Learning"},{"key":"6221_CR33","doi-asserted-by":"publisher","first-page":"101928","DOI":"10.1016\/j.artmed.2020.101928","volume":"108","author":"S Pozzoli","year":"2020","unstructured":"Pozzoli, S., Soliman, A., Bahri, L., Branca, R. M., Girdzijauskas, S., & Brambilla, M. (2020). Domain expertise-agnostic feature selection for the analysis of breast cancer data. Artificial Intelligence in Medicine, 108, 101928.","journal-title":"Artificial Intelligence in Medicine"},{"key":"6221_CR34","unstructured":"R Core Team. (2020). R: A language and environment for statistical computing [Computer software manual]. Austria."},{"key":"6221_CR35","first-page":"800","volume":"13","author":"G Saon","year":"2001","unstructured":"Saon, G., & Padmanabhan, M. (2001). Minimum Bayes error feature selection for continuous speech recognition. Advances in Neural Information Processing Systems, 13, 800\u2013806.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"4","key":"6221_CR36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v053.i04","volume":"53","author":"L Scrucca","year":"2013","unstructured":"Scrucca, L. (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53(4), 1\u201337.","journal-title":"Journal of Statistical Software"},{"issue":"2","key":"6221_CR37","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1007\/s10994-017-5648-2","volume":"107","author":"K Sechidis","year":"2018","unstructured":"Sechidis, K., & Brown, G. (2018). Simple strategies for semi-supervised feature selection. Machine Learning, 107(2), 357\u2013395.","journal-title":"Machine Learning"},{"key":"6221_CR38","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1016\/j.knosys.2016.11.017","volume":"118","author":"B Seijo-Pardo","year":"2017","unstructured":"Seijo-Pardo, B., Porto-D\u00edaz, I., Bol\u00f3n-Canedo, V., & Alonso-Betanzos, A. (2017). Ensemble feature selection: Homogeneous and heterogeneous approaches. Knowledge-Based Systems, 118, 124\u2013139.","journal-title":"Knowledge-Based Systems"},{"issue":"2","key":"6221_CR39","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1016\/S1535-6108(02)00030-2","volume":"1","author":"D Singh","year":"2002","unstructured":"Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), 203\u2013209.","journal-title":"Cancer Cell"},{"issue":"3","key":"6221_CR40","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1111\/j.1467-9868.2011.00771.x","volume":"73","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 73(3), 273\u2013282.","journal-title":"Journal of the Royal Statistical Society: Series B (Methodological)"},{"issue":"1","key":"6221_CR41","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1109\/TNSRE.2013.2293575","volume":"22","author":"A Tsanas","year":"2013","unstructured":"Tsanas, A., Little, M. A., Fox, C., & Ramig, L. O. (2013). Objective automatic assessment of rehabilitative speech treatment in Parkinson\u2019s disease. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 22(1), 181\u2013190.","journal-title":"IEEE Transactions on Neural Systems and Rehabilitation Engineering"},{"issue":"23","key":"6221_CR42","doi-asserted-by":"publisher","first-page":"9193","DOI":"10.1073\/pnas.87.23.9193","volume":"87","author":"WH Wolberg","year":"1990","unstructured":"Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87(23), 9193\u20139196.","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"2","key":"6221_CR43","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1016\/S0096-3003(97)10140-0","volume":"97","author":"T-T Wong","year":"1998","unstructured":"Wong, T.-T. (1998). Generalized Dirichlet distribution in Bayesian analysis. Applied Mathematics and Computation, 97(2), 165\u2013181.","journal-title":"Applied Mathematics and Computation"},{"issue":"1","key":"6221_CR44","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1162\/NECO_a_00537","volume":"26","author":"M Yamada","year":"2014","unstructured":"Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P., & Sugiyama, M. (2014). High-dimensional feature selection by feature-wise kernelized lasso. Neural Computation, 26(1), 185\u2013207.","journal-title":"Neural Computation"},{"issue":"6","key":"6221_CR45","doi-asserted-by":"publisher","first-page":"1129","DOI":"10.1007\/s11222-014-9498-5","volume":"25","author":"Y Yang","year":"2015","unstructured":"Yang, Y., & Zou, H. (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing, 25(6), 1129\u20131141.","journal-title":"Statistics and Computing"},{"issue":"1","key":"6221_CR46","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1111\/j.1467-9868.2005.00532.x","volume":"68","author":"M Yuan","year":"2006","unstructured":"Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology)., 68(1), 49\u201367.","journal-title":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)."},{"key":"6221_CR47","doi-asserted-by":"crossref","unstructured":"Zhao, Z., Wang, L., Liu, H. (2010). Efficient spectral feature selection with minimum redundancy. In Proceedings of the AAAI conference on artificial intelligence (Vol. 24, pp. 673\u2013678).","DOI":"10.1609\/aaai.v24i1.7671"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-022-06221-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-022-06221-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-022-06221-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,7]],"date-time":"2022-10-07T17:13:50Z","timestamp":1665162830000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-022-06221-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,22]]},"references-count":47,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2022,10]]}},"alternative-id":["6221"],"URL":"https:\/\/doi.org\/10.1007\/s10994-022-06221-9","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,22]]},"assertion":[{"value":"15 December 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 May 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 July 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 August 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"All authors consented to the submission of the manuscript.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"All real-world datasets are obtained from publicly available platforms under open licenses. All figures in this manuscript are created by the authors.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}}]}}