{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T20:34:44Z","timestamp":1774125284853,"version":"3.50.1"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T00:00:00Z","timestamp":1686614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001381","name":"National Research Foundation, Singapore","doi-asserted-by":"crossref","award":["AISG2-TC-2021-002"],"award-info":[{"award-number":["AISG2-TC-2021-002"]}],"id":[{"id":"10.13039\/501100001381","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,6,13]]},"abstract":"<jats:p>As machine learning (ML) has been widely developed in real-world applications, the privacy of ML models draws an increasing concern. In this paper, we study how to forget specific data records from ML models to preserve the privacy of these data. Although some studies propose efficient unlearning algorithms on random forests and extremely randomized trees, Gradient Boosting Decision Trees (GBDT), which are widely used in practice, have not been explored. The efficient unlearning of GBDT faces two major challenges: 1) the training of each tree is deterministic and non-robust; 2) the training of a tree depends on all the previous trees. To solve the first challenge, we propose a robust GBDT-like ML model DeltaBoost that enables efficient and accurate deletion according to our theoretical analysis. For the second challenge, we design a training algorithm for DeltaBoost that minimizes the dependency among trees. Our experiments on five datasets demonstrate that DeltaBoost can remove data records from the trained model efficiently and effectively. Our unlearning approach achieves up to two orders of magnitude speedup compared to retraining GBDT. Besides, DeltaBoost produces competitive performance to existing decision-tree-based ML models.<\/jats:p>","DOI":"10.1145\/3589313","type":"journal-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T20:26:45Z","timestamp":1687292805000},"page":"1-26","source":"Crossref","is-referenced-by-count":11,"title":["DeltaBoost: Gradient Boosting Decision Trees with Efficient Machine Unlearning"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6463-0031","authenticated-orcid":false,"given":"Zhaomin","family":"Wu","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-3894-7488","authenticated-orcid":false,"given":"Junhui","family":"Zhu","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6539-6443","authenticated-orcid":false,"given":"Qinbin","family":"Li","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8618-4581","authenticated-orcid":false,"given":"Bingsheng","family":"He","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2023,6,20]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"Cadata dataset. https:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvmtools\/datasets\/regression.html#cadata. Accessed: 2022--10-02."},{"key":"e_1_2_2_2_1","unstructured":"Codrna dataset. https:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvmtools\/datasets\/binary.html#cod-rna. Accessed: 2022--10-02."},{"key":"e_1_2_2_3_1","unstructured":"Covertype data set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/covertype. Accessed: 2022--10-02."},{"key":"e_1_2_2_4_1","unstructured":"Gisette data set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/Gisette. Accessed: 2022--10-02."},{"key":"e_1_2_2_5_1","unstructured":"Approximations for mean and variance of a ratio 2022. Accessed: 2022--10-01."},{"key":"e_1_2_2_6_1","volume-title":"Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011)","author":"Bertin-Mahieux Thierry","year":"2011","unstructured":"Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The million song dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011), 2011."},{"key":"e_1_2_2_7_1","series-title":"Proceedings of Machine Learning Research","first-page":"1092","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Brophy Jonathan","year":"2021","unstructured":"Jonathan Brophy and Daniel Lowd. Machine unlearning for random forests. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1092--1104. PMLR, 18--24 Jul 2021. URL https:\/\/proceedings.mlr.press\/v139\/brophy21a.html."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2015.35"},{"key":"e_1_2_2_9_1","volume-title":"Incremental and decremental support vector machine learning. Advances in neural information processing systems, 13","author":"Cauwenberghs Gert","year":"2000","unstructured":"Gert Cauwenberghs and Tomaso Poggio. Incremental and decremental support vector machine learning. Advances in neural information processing systems, 13, 2000."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2009932"},{"key":"e_1_2_2_12_1","volume-title":"Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems, 32","author":"Ginart Antonio","year":"2019","unstructured":"Antonio Ginart, Melody Guan, Gregory Valiant, and James Y Zou. Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems, 32, 2019."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00932"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58526-6_23"},{"key":"e_1_2_2_15_1","volume-title":"Santa Clara Univ. Legal Studies Research Paper","author":"Goldman Eric","year":"2020","unstructured":"Eric Goldman. An introduction to the california consumer privacy act (ccpa). Santa Clara Univ. Legal Studies Research Paper, 2020."},{"key":"e_1_2_2_16_1","volume-title":"Certified data removal from machine learning models. arXiv preprint arXiv:1911.03030","author":"Guo Chuan","year":"2019","unstructured":"Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens Van Der Maaten. Certified data removal from machine learning models. arXiv preprint arXiv:1911.03030, 2019."},{"key":"e_1_2_2_17_1","volume-title":"Catboost for big data: an interdisciplinary review. Journal of big data, 7 (1):1--45","author":"Hancock John T","year":"2020","unstructured":"John T Hancock and Taghi M Khoshgoftaar. Catboost for big data: an interdisciplinary review. Journal of big data, 7 (1):1--45, 2020."},{"key":"e_1_2_2_18_1","volume-title":"Statistics: Theory and methods","author":"Harper William V","year":"1991","unstructured":"William V Harper. Statistics: Theory and methods, 1991."},{"key":"e_1_2_2_19_1","volume-title":"Neue begr\u00fcndung der theorie quadratischer formen von unendlichvielen ver\u00e4nderlichen. Journal f\u00fcr die reine und angewandte Mathematik","author":"Hellinger Ernst","year":"1909","unstructured":"Ernst Hellinger. Neue begr\u00fcndung der theorie quadratischer formen von unendlichvielen ver\u00e4nderlichen. Journal f\u00fcr die reine und angewandte Mathematik, 1909(136):210--271, 1909."},{"key":"e_1_2_2_20_1","first-page":"2008","volume-title":"International Conference on Artificial Intelligence and Statistics","author":"Izzo Zachary","year":"2021","unstructured":"Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, and James Zou. Approximate data deletion from machine learning models. In International Conference on Artificial Intelligence and Statistics, pages 2008--2016. PMLR, 2021."},{"key":"e_1_2_2_21_1","volume-title":"Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30","author":"Ke Guolin","year":"2017","unstructured":"Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017."},{"key":"e_1_2_2_22_1","volume-title":"Tianyuan Fu, and Bingsheng He. Fedtree: A fast, effective, and secure tree-based federated learning system. https:\/\/github.com\/Xtra-Computing\/FedTree\/blob\/main\/FedTree_draft_paper.pdf","author":"Li Qinbin","year":"2022","unstructured":"Qinbin Li, Yanzheng Cai, Yuxuan Han, Ching Man Yung, Tianyuan Fu, and Bingsheng He. Fedtree: A fast, effective, and secure tree-based federated learning system. https:\/\/github.com\/Xtra-Computing\/FedTree\/blob\/main\/FedTree_draft_paper.pdf, 2022."},{"key":"e_1_2_2_23_1","first-page":"931","volume-title":"Algorithmic Learning Theory","author":"Neel Seth","year":"2021","unstructured":"Seth Neel, Aaron Roth, and Saeed Sharifi-Malvajerdi. Descent-to-delete: Gradient-based methods for machine unlearning. In Algorithmic Learning Theory, pages 931--962. PMLR, 2021."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/77298"},{"key":"e_1_2_2_25_1","volume-title":"Forgetting personal data and revoking consent under the gdpr: Challenges and proposed solutions. Journal of cybersecurity, 4(1):tyy001","author":"Politou Eugenia","year":"2018","unstructured":"Eugenia Politou, Efthimios Alepis, and Constantinos Patsakis. Forgetting personal data and revoking consent under the gdpr: Challenges and proposed solutions. Journal of cybersecurity, 4(1):tyy001, 2018."},{"key":"e_1_2_2_26_1","volume-title":"CIDR","author":"Schelter Sebastian","year":"2020","unstructured":"Sebastian Schelter. ?amnesia\"-a selection of machine learning models that can forget user data very fast. CIDR, 2020."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457239"},{"key":"e_1_2_2_28_1","volume-title":"Quantized training of gradient boosting decision trees. Advances in neural information processing systems","author":"Shi Yu","year":"2022","unstructured":"Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, and Tie-Yan Liu. Quantized training of gradient boosting decision trees. Advances in neural information processing systems, 2022."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623661"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2920131"},{"key":"e_1_2_2_31_1","first-page":"10355","volume-title":"International Conference on Machine Learning","author":"Wu Yinjun","year":"2020","unstructured":"Yinjun Wu, Edgar Dobriban, and Susan Davidson. Deltagrad: Rapid retraining of machine learning models. In International Conference on Machine Learning, pages 10355--10366. PMLR, 2020."},{"key":"e_1_2_2_32_1","volume-title":"Technical report of deltaboost: Gradient boosting decision trees with efficient machine unlearning. https:\/\/github.com\/Xtra-Computing\/DeltaBoost\/blob\/main\/DeltaBoost_Technical_Report.pdf","author":"Wu Zhaomin","year":"2022","unstructured":"Zhaomin Wu, Junhui Zhu, Qinbin Li, and Bingsheng He. Technical report of deltaboost: Gradient boosting decision trees with efficient machine unlearning. https:\/\/github.com\/Xtra-Computing\/DeltaBoost\/blob\/main\/DeltaBoost_Technical_Report.pdf, 2022."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589313","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3589313","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:14Z","timestamp":1750178774000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589313"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,13]]},"references-count":32,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,13]]}},"alternative-id":["10.1145\/3589313"],"URL":"https:\/\/doi.org\/10.1145\/3589313","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,13]]}}}