{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T20:19:34Z","timestamp":1770149974087,"version":"3.49.0"},"publisher-location":"Cham","reference-count":20,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783031263866","type":"print"},{"value":"9783031263873","type":"electronic"}],"license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,17]],"date-time":"2023-03-17T00:00:00Z","timestamp":1679011200000},"content-version":"vor","delay-in-days":75,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Feature selection is a crucial step in developing robust and powerful machine learning models. Feature selection techniques can be divided into two categories: filter and wrapper methods. While wrapper methods commonly result in strong predictive performances, they suffer from a large computational complexity and therefore take a significant amount of time to complete, especially when dealing with high-dimensional feature sets. Alternatively, filter methods are considerably faster, but suffer from several other disadvantages, such as (i) requiring a threshold value, (ii) many filter methods not taking into account intercorrelation between features, and (iii) ignoring feature interactions with the model. To this end, we present <jats:italic>powershap<\/jats:italic>, a novel wrapper feature selection method, which leverages statistical hypothesis testing and power calculations in combination with Shapley values for quick and intuitive feature selection. <jats:italic>Powershap<\/jats:italic> is built on the core assumption that an informative feature will have a larger impact on the prediction compared to a known random feature. Benchmarks and simulations show that <jats:italic>powershap<\/jats:italic> outperforms other filter methods with predictive performances on par with wrapper methods while being significantly faster, often even reaching half or a third of the execution time. As such, <jats:italic>powershap<\/jats:italic> provides a competitive and quick algorithm that can be used by various models in different domains. Furthermore, <jats:italic>powershap<\/jats:italic> is implemented as a plug-and-play and open-source <jats:italic>sklearn<\/jats:italic> component, enabling easy integration in conventional data science pipelines. User experience is even further enhanced by also providing an automatic mode that automatically tunes the hyper-parameters of the <jats:italic>powershap<\/jats:italic> algorithm, allowing to use the algorithm without any configuration needed.<\/jats:p>","DOI":"10.1007\/978-3-031-26387-3_5","type":"book-chapter","created":{"date-parts":[[2023,3,16]],"date-time":"2023-03-16T15:03:10Z","timestamp":1678978990000},"page":"71-87","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Powershap: A Power-Full Shapley Feature Selection Method"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3322-150X","authenticated-orcid":false,"given":"Jarne","family":"Verhaeghe","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9620-888X","authenticated-orcid":false,"given":"Jeroen","family":"Van Der Donckt","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2529-5477","authenticated-orcid":false,"given":"Femke","family":"Ongenae","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7865-6793","authenticated-orcid":false,"given":"Sofie","family":"Van Hoecke","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,3,17]]},"reference":[{"issue":"10","key":"5_CR1","doi-asserted-by":"publisher","first-page":"1340","DOI":"10.1093\/bioinformatics\/btq134","volume":"26","author":"A Altmann","year":"2010","unstructured":"Altmann, A., Tolo\u015fi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26(10), 1340\u20131347 (2010)","journal-title":"Bioinformatics"},{"issue":"1","key":"5_CR2","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L.: Random Forests. Mach. Learn. 45(1), 5\u201332 (2001)","journal-title":"Mach. Learn."},{"key":"5_CR3","unstructured":"Calzolari, M.: manuel-calzolari\/shapicant (2022)"},{"key":"5_CR4","series-title":"Advances in Intelligent Systems and Computing","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1007\/978-981-13-6001-5_11","volume-title":"Emerging Research in Computing, Information, Communication and Applications","author":"S Colaco","year":"2019","unstructured":"Colaco, S., Kumar, S., Tamang, A., Biju, V.G.: A review on feature selection algorithms. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds.) Emerging Research in Computing, Information, Communication and Applications. AISC, vol. 906, pp. 133\u2013153. Springer, Singapore (2019). https:\/\/doi.org\/10.1007\/978-981-13-6001-5_11"},{"key":"5_CR5","unstructured":"Dua, D., Graff, C.: UCI machine learning repository (2017)"},{"key":"5_CR6","doi-asserted-by":"crossref","unstructured":"Jovi\u0107, A., Brki\u0107, K., Bogunovi\u0107, N.: A review of feature selection methods with applications. In: 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1200\u20131205 (2015)","DOI":"10.1109\/MIPRO.2015.7160458"},{"key":"5_CR7","unstructured":"Keany, E.: Borutashap : A wrapper feature selection method which combines the boruta feature selection algorithm with shapley values (2020)"},{"key":"5_CR8","first-page":"6","volume":"2","author":"B Kumari","year":"2011","unstructured":"Kumari, B., Swarnkar, T.: Filter versus wrapper feature subset selection in large dimensionality micro array: a review. Int. J. Comput. Sci. Inf. Technol. 2, 6 (2011)","journal-title":"Int. J. Comput. Sci. Inf. Technol."},{"issue":"11","key":"5_CR9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v036.i11","volume":"36","author":"MB Kursa","year":"2010","unstructured":"Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat. Softw. 36(11), 1\u201313 (2010)","journal-title":"J. Stat. Softw."},{"key":"5_CR10","doi-asserted-by":"crossref","unstructured":"Li, J., Cheng, K., Wang, S., et al.: Feature selection: a data perspective. ACM Comput. Surv. 50(6), 1\u201345 (2017)","DOI":"10.1145\/3136625"},{"issue":"1","key":"5_CR11","doi-asserted-by":"publisher","first-page":"18","DOI":"10.3390\/e23010018","volume":"23","author":"P Linardatos","year":"2020","unstructured":"Linardatos, P., Papastefanopoulos, V., Kotsiantis, S.: Explainable AI: a review of machine learning interpretability methods. Entropy 23(1), 18 (2020)","journal-title":"Entropy"},{"key":"5_CR12","unstructured":"Lomax, R.G.: An introduction to statistical concepts. In: Mahwah, N.J. (eds.): Lawrence Erlbaum Associates Publishers (2007)"},{"key":"5_CR13","unstructured":"Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30, pp. 4765\u20134774. Curran Associates, Inc. (2017)"},{"issue":"2","key":"5_CR14","doi-asserted-by":"publisher","first-page":"439","DOI":"10.1086\/341527","volume":"71","author":"BV North","year":"2002","unstructured":"North, B.V., Curtis, D., Sham, P.C.: A note on the calculation of empirical p values from monte Carlo procedures. Am. J. Hum. Genet. 71(2), 439\u2013441 (2002)","journal-title":"Am. J. Hum. Genet."},{"key":"5_CR15","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825\u20132830 (2011)","journal-title":"J. Mach. Learn. Res."},{"key":"5_CR16","unstructured":"Prokhorenkova, L., Gusev, G., et al.: CatBoost: unbiased boosting with categorical features. arXiv:1706.09516 (2019)"},{"key":"5_CR17","doi-asserted-by":"crossref","unstructured":"Seabold, S., Perktold, J.: statsmodels: Econometric and statistical modeling with python. In: 9th Python in Science Conference (2010)","DOI":"10.25080\/Majora-92bf1922-011"},{"key":"5_CR18","unstructured":"Vanschoren, J.: OpenML: gina_priori. https:\/\/www.openml.org\/d\/1042"},{"key":"5_CR19","unstructured":"Vanschoren, J.: OpenML: madelon. https:\/\/www.openml.org\/d\/1485"},{"key":"5_CR20","unstructured":"Vanschoren, J.: OpenML: scene. https:\/\/www.openml.org\/d\/312"}],"container-title":["Lecture Notes in Computer Science","Machine Learning and Knowledge Discovery in Databases"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-26387-3_5","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,16]],"date-time":"2023-03-16T15:14:32Z","timestamp":1678979672000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-26387-3_5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"ISBN":["9783031263866","9783031263873"],"references-count":20,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-26387-3_5","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023]]},"assertion":[{"value":"17 March 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ECML PKDD","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Joint European Conference on Machine Learning and Knowledge Discovery in Databases","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Grenoble","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"France","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2022","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"19 September 2022","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"23 September 2022","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"22","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"ecml2022","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/2022.ecmlpkdd.org\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"CMT","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"1060","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"236","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"22% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3-4","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3-4","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"17 demo track papers have been accepted from 28 submissions","order":10,"name":"additional_info_on_review_process","label":"Additional Info on Review Process","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}