{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:20:54Z","timestamp":1774628454686,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,12,28]],"date-time":"2024-12-28T00:00:00Z","timestamp":1735344000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,12,28]],"date-time":"2024-12-28T00:00:00Z","timestamp":1735344000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100009193","name":"Marsden Fund","doi-asserted-by":"publisher","award":["VUW1913"],"award-info":[{"award-number":["VUW1913"]}],"id":[{"id":"10.13039\/501100009193","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009193","name":"Marsden Fund","doi-asserted-by":"publisher","award":["VUW1914"],"award-info":[{"award-number":["VUW1914"]}],"id":[{"id":"10.13039\/501100009193","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009193","name":"Marsden Fund","doi-asserted-by":"publisher","award":["VUW2016"],"award-info":[{"award-number":["VUW2016"]}],"id":[{"id":"10.13039\/501100009193","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Sci. Eng."],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Symbolic Regression (SR) on high-dimensional datasets often encounters significant challenges, resulting in models with poor generalization capabilities. While feature selection has the potential to enhance the generalization and learning performance in general, its application in Genetic Programming (GP) for high-dimensional SR remains a complex problem. Originating from game theory, the Shapley value is applied to additive feature attribution approaches where it distributes the difference between a model output and a baseline average across input variables. By providing an accurate assessment of each feature importance, the Shapley value offers a robust approach to select features. In this paper, we propose a novel feature selection method leveraging the Shapley value to identify and select important features in GP for high-dimensional SR. Through a series of experiments conducted on ten high-dimensional regression datasets, the results indicate that our algorithm surpasses standard GP and other GP-based feature selection methods in terms of learning and generalization performance on most datasets. Further analysis reveals that our algorithm generates more compact models, focusing on the inclusion of important features.<\/jats:p>","DOI":"10.1007\/s41019-024-00270-x","type":"journal-article","created":{"date-parts":[[2024,12,28]],"date-time":"2024-12-28T05:31:43Z","timestamp":1735363903000},"page":"196-211","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Improving Generalization of Genetic Programming for High-Dimensional Symbolic Regression with Shapley Value Based Feature Selection"],"prefix":"10.1007","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1688-2944","authenticated-orcid":false,"given":"Chunyu","family":"Wang","sequence":"first","affiliation":[]},{"given":"Qi","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Bing","family":"Xue","sequence":"additional","affiliation":[]},{"given":"Mengjie","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,12,28]]},"reference":[{"issue":"5","key":"270_CR1","doi-asserted-by":"publisher","first-page":"3473","DOI":"10.1007\/s10462-020-09928-0","volume":"54","author":"P Ray","year":"2021","unstructured":"Ray P, Reddy SS, Banerjee T (2021) Various dimension reduction techniques for high dimensional data analysis: a review. Artif Intell Rev 54(5):3473\u20133515","journal-title":"Artif Intell Rev"},{"key":"270_CR2","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1007\/BF00175355","volume":"4","author":"JR Koza","year":"1994","unstructured":"Koza JR (1994) Genetic programming as a means for programming computers by natural selection. Stat Comput 4:87\u2013112","journal-title":"Stat Comput"},{"key":"270_CR3","doi-asserted-by":"crossref","unstructured":"Wang C, Chen Q, Xue B, Zhang M (2023) Shapley value based feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. In: Australasian conference on data science and machine learning, pp. 163\u2013176","DOI":"10.1007\/978-981-99-8696-5_12"},{"issue":"5","key":"270_CR4","doi-asserted-by":"publisher","first-page":"792","DOI":"10.1109\/TEVC.2017.2683489","volume":"21","author":"Q Chen","year":"2017","unstructured":"Chen Q, Zhang M, Xue B (2017) Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Trans Evolut Comput 21(5):792\u2013806","journal-title":"IEEE Trans Evolut Comput"},{"key":"270_CR5","unstructured":"Molnar C (2022) Interpretable machine learning, 2nd edn. https:\/\/christophm.github.io\/interpretable-ml-book"},{"issue":"16","key":"270_CR6","doi-asserted-by":"publisher","first-page":"2631","DOI":"10.1126\/sciadv.aay2631","volume":"6","author":"S-M Udrescu","year":"2020","unstructured":"Udrescu S-M (2020) Tegmark, M: Ai feynman: a physics-inspired method for symbolic regression. Sci Adv 6(16):2631","journal-title":"Sci Adv"},{"key":"270_CR7","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2023.3318638","author":"H Zhang","year":"2023","unstructured":"Zhang H, Chen Q, Xue B, Banzhaf W, Zhang M (2023) Modular multi-tree genetic programming for evolutionary feature construction for regression. IEEE Trans Evol Comput. https:\/\/doi.org\/10.1109\/TEVC.2023.3318638","journal-title":"IEEE Trans Evol Comput"},{"issue":"4","key":"270_CR8","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1007\/s10710-019-09368-y","volume":"21","author":"L Mu\u00f1oz","year":"2020","unstructured":"Mu\u00f1oz L, Trujillo L, Silva S (2020) Transfer learning in constructive induction with genetic programming. Genet Program Evol Mach 21(4):529\u2013569","journal-title":"Genet Program Evol Mach"},{"issue":"4","key":"270_CR9","doi-asserted-by":"publisher","first-page":"606","DOI":"10.1109\/TEVC.2015.2504420","volume":"20","author":"B Xue","year":"2015","unstructured":"Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evolut Comput 20(4):606\u2013626","journal-title":"IEEE Trans Evolut Comput"},{"key":"270_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2019.106839","volume":"143","author":"A Bommert","year":"2020","unstructured":"Bommert A, Sun X, Bischl B, Rahnenf\u00fchrer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839","journal-title":"Comput Stat Data Anal"},{"key":"270_CR11","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2020.106092","volume":"89","author":"R Agrawal","year":"2020","unstructured":"Agrawal R, Kaur B, Sharma S (2020) Quantum based whale optimization algorithm for wrapper feature selection. Appl Soft Comput 89:106092","journal-title":"Appl Soft Comput"},{"issue":"11","key":"270_CR12","doi-asserted-by":"publisher","first-page":"4814","DOI":"10.1109\/TNNLS.2020.3015505","volume":"32","author":"J Jin","year":"2020","unstructured":"Jin J, Xiao R, Daly I, Miao Y, Wang X, Cichocki A (2020) Internal feature selection method of csp based on l1-norm and dempster-shafer theory. IEEE Trans Neural Netw Learn Syst 32(11):4814\u20134825","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"270_CR13","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2024.3373802","author":"R Jiao","year":"2024","unstructured":"Jiao R, Xue B, Zhang M (2024) Learning to preselection: A filter-based performance predictor for multiobjective feature selection in classification. IEEE Trans Evolut Comput. https:\/\/doi.org\/10.1109\/TEVC.2024.3373802","journal-title":"IEEE Trans Evolut Comput"},{"key":"270_CR14","doi-asserted-by":"crossref","unstructured":"Chen Q, Xue B, Niu B, Zhang M (2016) Improving generalisation of genetic programming for high-dimensional symbolic regression with feature selection. In: 2016 IEEE congress on evolutionary computation, pp. 3793\u20133800","DOI":"10.1109\/CEC.2016.7744270"},{"key":"270_CR15","doi-asserted-by":"crossref","unstructured":"Sandin I, Andrade G, Viegas F, Madeira D, Rocha L, Salles T, Gon\u00e7alves M (2012) Aggressive and effective feature selection using genetic programming. In: 2012 IEEE congress on evolutionary computation, pp. 1\u20138","DOI":"10.1109\/CEC.2012.6252878"},{"key":"270_CR16","doi-asserted-by":"crossref","unstructured":"Dick G (2017) Sensitivity-like analysis for feature selection in genetic programming. In: proceedings of the genetic and evolutionary computation conference, pp. 401\u2013408","DOI":"10.1145\/3071178.3071338"},{"issue":"4","key":"270_CR17","doi-asserted-by":"publisher","first-page":"2382","DOI":"10.1109\/TCYB.2020.3004361","volume":"52","author":"Q Chen","year":"2020","unstructured":"Chen Q, Xue B, Zhang M (2020) Rademacher complexity for enhancing the generalization of genetic programming for symbolic regression. IEEE Trans Cybernet 52(4):2382\u20132395","journal-title":"IEEE Trans Cybernet"},{"key":"270_CR18","unstructured":"Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30"},{"key":"270_CR19","doi-asserted-by":"publisher","first-page":"647","DOI":"10.1007\/s10115-013-0679-x","volume":"41","author":"E \u0160trumbelj","year":"2014","unstructured":"\u0160trumbelj E, Kononenko I (2014) Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst 41:647\u2013665","journal-title":"Knowl Inf Syst"},{"key":"270_CR20","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1016\/j.knosys.2016.11.017","volume":"118","author":"B Seijo-Pardo","year":"2017","unstructured":"Seijo-Pardo B, Porto-D\u00edaz I, Bol\u00f3n-Canedo V, Alonso-Betanzos A (2017) Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl-Based Syst 118:124\u2013139","journal-title":"Knowl-Based Syst"},{"key":"270_CR21","first-page":"2171","volume":"13","author":"F-A Fortin","year":"2012","unstructured":"Fortin F-A, De Rainville F-M, Gardner M-A (2012) Parizeau, M, Gagn\u00e9, C: DEAP: evolutionary algorithms made easy. J Mach Learn Res 13:2171\u20132175","journal-title":"J Mach Learn Res"},{"key":"270_CR22","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"},{"key":"270_CR23","doi-asserted-by":"crossref","unstructured":"Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: European conference on genetic programming, pp. 70\u201382","DOI":"10.1007\/3-540-36599-0_7"},{"key":"270_CR24","doi-asserted-by":"crossref","unstructured":"Romano JD, Le TT, La\u00a0Cava W, Gregg JT, Goldberg DJ, Chakraborty P, Ray NL, Himmelstein D, Fu W, Moore JH (2021) Pmlb v1.0: An open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058v2","DOI":"10.1093\/bioinformatics\/btab727"},{"key":"270_CR25","unstructured":"Redmond M (2011) Communities and crime unnormalized. UCI machine learning repository. https:\/\/doi.org\/10.24432\/C5PC8X"},{"key":"270_CR26","unstructured":"Redmond M (2009) Communities and crime. UCI machine learning repository. https:\/\/doi.org\/10.24432\/C53W3X"},{"issue":"2","key":"270_CR27","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1145\/2641190.2641198","volume":"15","author":"J Vanschoren","year":"2013","unstructured":"Vanschoren J, Rijn JN, Bischl B, Torgo L (2013) Openml: networked science in machine learning. SIGKDD Explor 15(2):49\u201360","journal-title":"SIGKDD Explor"},{"key":"270_CR28","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1007\/s12293-018-0274-5","volume":"11","author":"L Mu\u00f1oz","year":"2019","unstructured":"Mu\u00f1oz L, Trujillo L, Silva S, Castelli M, Vanneschi L (2019) Evolving multidimensional transformations for symbolic regression with m3gp. Memet Comput 11:111\u2013126","journal-title":"Memet Comput"}],"container-title":["Data Science and Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41019-024-00270-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41019-024-00270-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41019-024-00270-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,6]],"date-time":"2025-06-06T08:57:49Z","timestamp":1749200269000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41019-024-00270-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,28]]},"references-count":28,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["270"],"URL":"https:\/\/doi.org\/10.1007\/s41019-024-00270-x","relation":{},"ISSN":["2364-1185","2364-1541"],"issn-type":[{"value":"2364-1185","type":"print"},{"value":"2364-1541","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,28]]},"assertion":[{"value":"20 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 October 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 November 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 December 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}