{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,19]],"date-time":"2026-04-19T01:06:18Z","timestamp":1776560778955,"version":"3.51.2"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,10]]},"abstract":"<jats:p>Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without directly sharing the raw data. Nevertheless, recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model. This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL. The existing solutions based on cryptography involve heavy computation and communication overhead and are vulnerable to inference attacks. Although the solution based on Local Differential Privacy (LDP) addresses the above problems, it leads to the low accuracy of the trained model.<\/jats:p>\n          <jats:p>This paper explores to improve the accuracy of the widely deployed tree boosting algorithms satisfying differential privacy under vertical FL. Specifically, we introduce a framework called OpBoost. Three order-preserving desensitization algorithms satisfying a variant of LDP called distance-based LDP (dLDP) are designed to desensitize the training data. In particular, we optimize the dLDP definition and study efficient sampling distributions to further improve the accuracy and efficiency of the proposed algorithms. The proposed algorithms provide a trade-off between the privacy of pairs with large distance and the utility of desensitized values. Comprehensive evaluations show that OpBoost has a better performance on prediction accuracy of trained models compared with existing LDP approaches on reasonable settings. Our code is open source.<\/jats:p>","DOI":"10.14778\/3565816.3565823","type":"journal-article","created":{"date-parts":[[2022,11,24]],"date-time":"2022-11-24T00:35:16Z","timestamp":1669250116000},"page":"202-215","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":29,"title":["OpBoost"],"prefix":"10.14778","volume":"16","author":[{"given":"Xiaochen","family":"Li","sequence":"first","affiliation":[{"name":"Zhejiang University"}]},{"given":"Yuke","family":"Hu","sequence":"additional","affiliation":[{"name":"Zhejiang University"}]},{"given":"Weiran","family":"Liu","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Hanwen","family":"Feng","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Li","family":"Peng","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Yuan","family":"Hong","sequence":"additional","affiliation":[{"name":"University of Connecticut"}]},{"given":"Kui","family":"Ren","sequence":"additional","affiliation":[{"name":"Zhejiang University"}]},{"given":"Zhan","family":"Qin","sequence":"additional","affiliation":[{"name":"Zhejiang University"}]}],"member":"320","published-online":{"date-parts":[[2022,11,23]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"1996. Adult Data Set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/Adult.  1996. Adult Data Set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/Adult."},{"key":"e_1_2_1_2_1","unstructured":"1998. Pen-Based Recognition of Handwritten Digits Data Set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/Pen-Based+Recognition+of+Handwritten+Digits.  1998. Pen-Based Recognition of Handwritten Digits Data Set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/Pen-Based+Recognition+of+Handwritten+Digits."},{"key":"e_1_2_1_3_1","unstructured":"2013. Combined Cycle Power Plant Data Set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/Physicochemical+Properties+of+Protein+Tertiary+Structure.  2013. Combined Cycle Power Plant Data Set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/Physicochemical+Properties+of+Protein+Tertiary+Structure."},{"key":"e_1_2_1_4_1","unstructured":"2014. Combined Cycle Power Plant Data Set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/combined+cycle+power+plant.  2014. Combined Cycle Power Plant Data Set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/combined+cycle+power+plant."},{"key":"e_1_2_1_5_1","unstructured":"2018. Xgboost-Predictor-JAVA. https:\/\/github.com\/h2oai\/xgboost-predictor.  2018. Xgboost-Predictor-JAVA. https:\/\/github.com\/h2oai\/xgboost-predictor."},{"key":"e_1_2_1_6_1","unstructured":"2019. SF Salaries Data Set. https:\/\/www.kaggle.com\/datasets\/kaggle\/sf-salaries.  2019. SF Salaries Data Set. https:\/\/www.kaggle.com\/datasets\/kaggle\/sf-salaries."},{"key":"e_1_2_1_7_1","volume-title":"GBRT or","unstructured":"2021. Scalable , Portable and Distributed Gradient Boosting (GBDT , GBRT or GBM) Library , for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow). https:\/\/github.com\/dmlc\/xgboost. 2021. Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow). https:\/\/github.com\/dmlc\/xgboost."},{"key":"e_1_2_1_8_1","unstructured":"2021. Smile (Statistical Machine Intelligence and Learning Engine). https:\/\/github.com\/haifengl\/smile.  2021. Smile (Statistical Machine Intelligence and Learning Engine). https:\/\/github.com\/haifengl\/smile."},{"key":"e_1_2_1_9_1","unstructured":"2022. OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization (full version). https:\/\/arxiv.org\/abs\/2210.01318.  2022. OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization (full version). https:\/\/arxiv.org\/abs\/2210.01318."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.2478\/popets-2021-0010"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007632"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CSF.2018.00026"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508859.2516735"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236217"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-01001-9_13"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2018.00057"},{"key":"e_1_2_1_17_1","first-page":"23","article-title":"From ranknet to lambdarank to lambdamart: An overview","volume":"11","author":"Burges Christopher JC","year":"2010","unstructured":"Christopher JC Burges . 2010 . From ranknet to lambdarank to lambdamart: An overview . Learning 11 , 23 -- 581 (2010), 81. Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23--581 (2010), 81.","journal-title":"Learning"},{"key":"e_1_2_1_18_1","volume-title":"The discrete gaussian for differential privacy. arXiv preprint arXiv:2004.00010","author":"Canonne Cl\u00e9ment","year":"2020","unstructured":"Cl\u00e9ment Canonne , Gautam Kamath , and Thomas Steinke . 2020. The discrete gaussian for differential privacy. arXiv preprint arXiv:2004.00010 ( 2020 ). Cl\u00e9ment Canonne, Gautam Kamath, and Thomas Steinke. 2020. The discrete gaussian for differential privacy. arXiv preprint arXiv:2004.00010 (2020)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-39077-7_5"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1515\/popets-2017-0051"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_2_1_22_1","volume-title":"Secureboost: A lossless federated learning framework","author":"Cheng Kewei","year":"2021","unstructured":"Kewei Cheng , Tao Fan , Yilun Jin , Yang Liu , Tianjian Chen , Dimitrios Papadopoulos , and Qiang Yang . 2021 . Secureboost: A lossless federated learning framework . IEEE Intelligent Systems ( 2021). Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Dimitrios Papadopoulos, and Qiang Yang. 2021. Secureboost: A lossless federated learning framework. IEEE Intelligent Systems (2021)."},{"key":"e_1_2_1_23_1","volume-title":"Intertwining Order Preserving Encryption and Differential Privacy. arXiv preprint arXiv:2009.05679","author":"Chowdhury Amrita Roy","year":"2020","unstructured":"Amrita Roy Chowdhury , Bolin Ding , Somesh Jha , Weiran Liu , and Jingren Zhou . 2020. Intertwining Order Preserving Encryption and Differential Privacy. arXiv preprint arXiv:2009.05679 ( 2020 ). Amrita Roy Chowdhury, Bolin Ding, Somesh Jha, Weiran Liu, and Jingren Zhou. 2020. Intertwining Order Preserving Encryption and Differential Privacy. arXiv preprint arXiv:2009.05679 (2020)."},{"key":"e_1_2_1_24_1","volume-title":"Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy. In International Conference on Information and Knowledge Management (CIKM).","author":"Couchot Jean-Fran\u00e7ois","year":"2021","unstructured":"Jean-Fran\u00e7ois Couchot , H\u00e9ber Hwang Arcolezi , Bechara Al Bouna , and Xiaokui Xiao . 2021 . Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy. In International Conference on Information and Knowledge Management (CIKM). Jean-Fran\u00e7ois Couchot, H\u00e9ber Hwang Arcolezi, Bechara Al Bouna, and Xiaokui Xiao. 2021. Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy. In International Conference on Information and Knowledge Management (CIKM)."},{"key":"e_1_2_1_25_1","volume-title":"CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363","author":"Dorogush Anna Veronika","year":"2018","unstructured":"Anna Veronika Dorogush , Vasily Ershov , and Andrey Gulin . 2018. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 ( 2018 ). Anna Veronika Dorogush, Vasily Ershov, and Andrey Gulin. 2018. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460120.3485668"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2017.1389735"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2976749.2978379"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-79228-4_1"},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Cynthia Dwork Aaron Roth etal 2014. The algorithmic foundations of differential privacy. Foundations and Trends\u00ae in Theoretical Computer Science (TCS) 9 3--4 (2014) 211--407.  Cynthia Dwork Aaron Roth et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends \u00ae in Theoretical Computer Science (TCS) 9 3--4 (2014) 211--407.","DOI":"10.1561\/0400000042"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2660267.2660348"},{"key":"e_1_2_1_32_1","volume-title":"Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics (ANN STAT) 28, 2","author":"Friedman Jerome","year":"2000","unstructured":"Jerome Friedman , Trevor Hastie , and Robert Tibshirani . 2000. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics (ANN STAT) 28, 2 ( 2000 ), 337--407. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2000. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics (ANN STAT) 28, 2 (2000), 337--407."},{"key":"e_1_2_1_33_1","volume-title":"Greedy function approximation: a gradient boosting machine. Annals of statistics (ANN STAT)","author":"Friedman Jerome H","year":"2001","unstructured":"Jerome H Friedman . 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (ANN STAT) ( 2001 ), 1189--1232. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (ANN STAT) (2001), 1189--1232."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457241"},{"key":"e_1_2_1_35_1","volume-title":"Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems (NIPS) 34","author":"Gorishniy Yury","year":"2021","unstructured":"Yury Gorishniy , Ivan Rubachev , Valentin Khrulkov , and Artem Babenko . 2021. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems (NIPS) 34 ( 2021 ). Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. 2021. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems (NIPS) 34 (2021)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2019.2949041"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588581"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372297.3417269"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1137\/090756090"},{"key":"e_1_2_1_40_1","volume-title":"Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems (NIPS) 30","author":"Ke Guolin","year":"2017","unstructured":"Guolin Ke , Qi Meng , Thomas Finley , Taifeng Wang , Wei Chen , Weidong Ma , Qiwei Ye , and Tie-Yan Liu . 2017 . Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems (NIPS) 30 (2017), 3146--3154. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems (NIPS) 30 (2017), 3146--3154."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2976749.2978386"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2810103.2813629"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2660267.2660277"},{"key":"e_1_2_1_44_1","volume-title":"Ananda Theertha Suresh, and Dave Bacon","author":"Kone\u010dn\u1ef3 Jakub","year":"2016","unstructured":"Jakub Kone\u010dn\u1ef3 , H Brendan McMahan , Felix X Yu , Peter Richt\u00e1rik , Ananda Theertha Suresh, and Dave Bacon . 2016 . Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016). Jakub Kone\u010dn\u1ef3, H Brendan McMahan, Felix X Yu, Peter Richt\u00e1rik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016)."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5895"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5422"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2382196.2382264"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2810103.2813651"},{"key":"e_1_2_1_49_1","volume-title":"Hyejin Shin, and Junbum Shin.","author":"Nguy\u00ean Th\u00f4ng T","year":"2016","unstructured":"Th\u00f4ng T Nguy\u00ean , Xiaokui Xiao , Yin Yang , Siu Cheung Hui , Hyejin Shin, and Junbum Shin. 2016 . Collecting and analyzing data from smart device users with local differential privacy. arXiv preprint arXiv:1606.05053 (2016). Th\u00f4ng T Nguy\u00ean, Xiaokui Xiao, Yin Yang, Siu Cheung Hui, Hyejin Shin, and Junbum Shin. 2016. Collecting and analyzing data from smart device users with local differential privacy. arXiv preprint arXiv:1606.05053 (2016)."},{"key":"e_1_2_1_50_1","volume-title":"Privacy games: Optimal user-centric data obfuscation. arXiv preprint arXiv:1402.3426","author":"Shokri Reza","year":"2014","unstructured":"Reza Shokri . 2014. Privacy games: Optimal user-centric data obfuscation. arXiv preprint arXiv:1402.3426 ( 2014 ). Reza Shokri. 2014. Privacy games: Optimal user-centric data obfuscation. arXiv preprint arXiv:1402.3426 (2014)."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/SPW.2019.00021"},{"key":"e_1_2_1_52_1","volume-title":"Federboost: Private federated learning for gbdt. arXiv preprint arXiv:2011.02796","author":"Tian Zhihua","year":"2020","unstructured":"Zhihua Tian , Rui Zhang , Xiaoyang Hou , Jian Liu , and Kui Ren . 2020 . Federboost: Private federated learning for gbdt. arXiv preprint arXiv:2011.02796 (2020). Zhihua Tian, Rui Zhang, Xiaoyang Hou, Jian Liu, and Kui Ren. 2020. Federboost: Private federated learning for gbdt. arXiv preprint arXiv:2011.02796 (2020)."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2736277.2741088"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00063"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2017.8056977"},{"key":"e_1_2_1_56_1","volume-title":"26th USENIX Security Symposium (USENIX Security 17)","author":"Wang Tianhao","year":"2017","unstructured":"Tianhao Wang , Jeremiah Blocki , Ninghui Li , and Somesh Jha . 2017 . Locally differentially private protocols for frequency estimation . In 26th USENIX Security Symposium (USENIX Security 17) . 729--745. Tianhao Wang, Jeremiah Blocki, Ninghui Li, and Somesh Jha. 2017. Locally differentially private protocols for frequency estimation. In 26th USENIX Security Symposium (USENIX Security 17). 729--745."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.09.073"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.2478\/popets-2020-0025"},{"key":"e_1_2_1_59_1","volume-title":"Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170","author":"Wu Yuncheng","year":"2020","unstructured":"Yuncheng Wu , Shaofeng Cai , Xiaokui Xiao , Gang Chen , and Beng Chin Ooi . 2020. Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170 ( 2020 ). Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. 2020. Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170 (2020)."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISIT44484.2020.9173952"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/2810103.2813640"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3298981"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3565816.3565823","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:32:39Z","timestamp":1672219959000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3565816.3565823"}},"subtitle":["a vertical federated tree boosting framework based on order-preserving desensitization"],"short-title":[],"issued":{"date-parts":[[2022,10]]},"references-count":62,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,10]]}},"alternative-id":["10.14778\/3565816.3565823"],"URL":"https:\/\/doi.org\/10.14778\/3565816.3565823","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,10]]},"assertion":[{"value":"2022-11-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}