{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T20:18:22Z","timestamp":1768421902684,"version":"3.49.0"},"reference-count":12,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T00:00:00Z","timestamp":1764028800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T00:00:00Z","timestamp":1768348800000},"content-version":"vor","delay-in-days":50,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>As data complexity and volume increase rapidly, efficient statistical methods for identifying significant variables become crucial. Variable selection plays a vital role in establishing relationships between predictors and response variables. The challenge lies in achieving this goal while controlling the False Discovery Rate (FDR) and maintaining statistical power. The knockoff filter, a recent approach, generates inexpensive knockoff variables that mimic the correlation structure of the original variables, serving as negative controls for inference. In this study, we extend the use of knockoffs to Light Gradient Boosting Machine (LightGBM), a fast and accurate machine learning technique. Shapely Additive Explanations (SHAP) values are employed to interpret the black-box nature of machine learning. Through extensive experimentation, our proposed method outperforms traditional approaches, accurately identifying important variables for each class. It offers improved speed and efficiency across multiple datasets. To validate our approach, an extensive simulation study is conducted. The integration of knockoffs into LightGBM enhances performance and interpretability, contributing to the advancement of variable selection methods. Our research addresses the challenges of variable selection in the era of big data, providing a valuable tool for identifying relevant variables in statistical modeling and machine learning applications.<\/jats:p>","DOI":"10.1186\/s12859-025-06215-z","type":"journal-article","created":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T05:04:30Z","timestamp":1764047070000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Gradient boosting with knockoff filters: a biostatistical approach to variable selection"],"prefix":"10.1186","volume":"27","author":[{"given":"Amr","family":"Mohamed","sequence":"first","affiliation":[]},{"given":"Kevin H.","family":"Lee","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,11,25]]},"reference":[{"key":"6215_CR1","doi-asserted-by":"publisher","unstructured":"Barber RF, Cand\u00e8s EJ. Controlling the false discovery rate via knockoffs. https:\/\/doi.org\/10.1214\/15-AOS1337","DOI":"10.1214\/15-AOS1337"},{"issue":"3","key":"6215_CR2","doi-asserted-by":"publisher","first-page":"551","DOI":"10.1111\/rssb.12265","volume":"80","author":"E Candes","year":"2018","unstructured":"Candes E, et al. Panning for Gold:\u2019model-X\u2019 knockoffs for high dimensional controlled variable selection. J Royal Stat Soc Series B (Stat Methodol). 2018;80(3):551\u201377.","journal-title":"J Royal Stat Soc Series B (Stat Methodol)"},{"key":"6215_CR3","unstructured":"Chakraborty P et al. Exploratory data analysis for large-scale multiple testing problems and its application in gene expression studies. 2019. Available from: https:\/\/arxiv.org\/abs\/1912.06030"},{"key":"6215_CR4","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1016\/j.chemolab.2019.06.003","volume":"191","author":"C Chen","year":"2019","unstructured":"Chen C, et al. LightGBM-PPI: predicting protein-protein interactions through LightGBM with multi-information fusion. Chemometrics and Intell Lab Syst. 2019;191:54\u201364.","journal-title":"Chemometrics and Intell Lab Syst"},{"key":"6215_CR5","unstructured":"Frazier PI. A tutorial on Bayesian optimization. 2018. Available from: https:\/\/arxiv.org\/abs\/1807.02811"},{"key":"6215_CR6","doi-asserted-by":"crossref","unstructured":"Jiang T, Li Y, Motsinger-Reif AA. Knockoff boosted tree for model-free variable selection. Bioinformatics. 2021; 37(7): 976-83.","DOI":"10.1093\/bioinformatics\/btaa770"},{"key":"6215_CR7","first-page":"30","volume":"2017","author":"G Ke","year":"2017","unstructured":"Ke G, et al. Lightgbm: a highly efficient gradient boosting decision tree\u2019. Adv Neural Inf Process Syst. 2017;2017:30.","journal-title":"Adv Neural Inf Process Syst"},{"key":"6215_CR8","first-page":"30","volume":"2017","author":"SM Lundberg","year":"2017","unstructured":"Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;2017:30.","journal-title":"Adv Neural Inf Process Syst"},{"key":"6215_CR9","unstructured":"R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2021. https:\/\/www.R-project.org\/"},{"issue":"1","key":"6215_CR10","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1109\/JPROC.2015.2494218","volume":"104","author":"B Shahriari","year":"2015","unstructured":"Shahriari B, et al. Taking the human out of the loop: a review of Bayesian optimization. Proceed IEEE. 2015;104(1):148\u201375.","journal-title":"Proceed IEEE"},{"issue":"2","key":"6215_CR11","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1016\/S1535-6108(02)00030-2","volume":"1","author":"D Singh","year":"2002","unstructured":"Singh D, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer cell. 2002;1(2):203\u20139.","journal-title":"Cancer cell"},{"issue":"1","key":"6215_CR12","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B (Methodol). 1996;58(1):267\u201388.","journal-title":"J R Stat Soc Series B (Methodol)"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-025-06215-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-025-06215-z","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-025-06215-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T11:21:19Z","timestamp":1768389679000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1186\/s12859-025-06215-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,25]]},"references-count":12,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,12]]}},"alternative-id":["6215"],"URL":"https:\/\/doi.org\/10.1186\/s12859-025-06215-z","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,25]]},"assertion":[{"value":"12 March 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 July 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 November 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no conflict of interest.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"13"}}