{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T03:00:18Z","timestamp":1763348418931},"reference-count":34,"publisher":"Privacy Enhancing Technologies Symposium Advisory Board","issue":"2","license":[{"start":{"date-parts":[[2022,3,3]],"date-time":"2022-03-03T00:00:00Z","timestamp":1646265600000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Most existing Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the features are categorical. In real-life applications, features are often numerical. The standard \u201cin the clear\u201d algorithm to grow decision trees on data with continuous values requires sorting of training examples for each feature in the quest for an optimal cut-point in the range of feature values in each node. Sorting is an expensive operation in MPC, hence finding secure protocols that avoid such an expensive step is a relevant problem in privacy-preserving machine learning. In this paper we propose three more efficient alternatives for secure training of decision tree based models on data with continuous features, namely: (1) secure discretization of the data, followed by secure training of a decision tree over the discretized data; (2) secure discretization of the data, followed by secure training of a random forest over the discretized data; and (3) secure training of extremely randomized trees (\u201cextra-trees\u201d) on the original data. Approaches (2) and (3) both involve randomizing feature choices. In addition, in approach (3) cut-points are chosen randomly as well, thereby alleviating the need to sort or to discretize the data up front. We implemented all proposed solutions in the semi-honest setting with additive secret sharing based MPC. In addition to mathematically proving that all proposed approaches are correct and secure, we experimentally evaluated and compared them in terms of classification accuracy and runtime. We privately train tree ensembles over data sets with thousands of instances or features in a few minutes, with accuracies that are at par with those obtained in the clear. This makes our solution more efficient than the existing approaches, which are based on oblivious sorting.<\/jats:p>","DOI":"10.2478\/popets-2022-0042","type":"journal-article","created":{"date-parts":[[2022,3,5]],"date-time":"2022-03-05T04:37:08Z","timestamp":1646455028000},"page":"205-226","source":"Crossref","is-referenced-by-count":13,"title":["Privacy-preserving training of tree ensembles over continuous data"],"prefix":"10.56553","volume":"2022","author":[{"given":"Samuel","family":"Adams","sequence":"first","affiliation":[{"name":"University of Washington Tacoma"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chaitali","family":"Choudhary","sequence":"additional","affiliation":[{"name":"University of Washington Tacoma"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martine","family":"de Cock","sequence":"additional","affiliation":[{"name":"University of Washington Tacoma , E-mail: mdecock@uw.edu; Ghent University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rafael","family":"Dowsley","sequence":"additional","affiliation":[{"name":"Monash University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Melanson","sequence":"additional","affiliation":[{"name":"University of Washington Tacoma"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anderson","family":"Nascimento","sequence":"additional","affiliation":[{"name":"University of Washington , Tacoma"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Davis","family":"Railsback","sequence":"additional","affiliation":[{"name":"University of Washington Tacoma"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianwei","family":"Shen","sequence":"additional","affiliation":[{"name":"University of Arizona"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"35752","published-online":{"date-parts":[[2022,3,3]]},"reference":[{"key":"2022060207222536560_j_popets-2022-0042_ref_001","doi-asserted-by":"crossref","unstructured":"[1] R. Cramer, I. Damg\u00e5rd, and J. B. Nielsen, Secure Multiparty Computation and Secret Sharing. Cambridge University Press, 2015.10.1017\/CBO9781107337756","DOI":"10.1017\/CBO9781107337756"},{"key":"2022060207222536560_j_popets-2022-0042_ref_002","doi-asserted-by":"crossref","unstructured":"[2] P. Mohassel and Y. Zhang, \u201cSecureML: A system for scalable privacy-preserving machine learning,\u201d in IEEE Symposium on Security and Privacy (SP), pp. 19\u201338, 2017.10.1109\/SP.2017.12","DOI":"10.1109\/SP.2017.12"},{"key":"2022060207222536560_j_popets-2022-0042_ref_003","doi-asserted-by":"crossref","unstructured":"[3] S. Wagh, D. Gupta, and N. Chandran, \u201cSecureNN: 3-party secure computation for neural network training,\u201d Proc. on Privacy Enhancing Technologies, no. 3, pp. 26\u201349, 2019.10.2478\/popets-2019-0035","DOI":"10.2478\/popets-2019-0035"},{"key":"2022060207222536560_j_popets-2022-0042_ref_004","doi-asserted-by":"crossref","unstructured":"[4] M. De Cock, R. Dowsley, A. C. A. Nascimento, D. Railsback, J. Shen, and A. Todoki, \u201cHigh performance logistic regression for privacy-preserving genome analysis,\u201d BMC Medical Genomics, vol. 14(23), 2021.10.1186\/s12920-020-00869-9781857733472626","DOI":"10.1186\/s12920-020-00869-9"},{"key":"2022060207222536560_j_popets-2022-0042_ref_005","doi-asserted-by":"crossref","unstructured":"[5] C. Guo, A. Hannun, B. Knott, L. van der Maaten, M. Tygert, and R. Zhu, \u201cSecure multiparty computations in floating-point arithmetic,\u201d arXiv:2001.03192, 2020.","DOI":"10.1093\/imaiai\/iaaa038"},{"key":"2022060207222536560_j_popets-2022-0042_ref_006","doi-asserted-by":"crossref","unstructured":"[6] T. G. Dietterich, \u201cEnsemble methods in machine learning,\u201d in International Workshop on Multiple Classifier Systems, vol. 1857 of LNCS, pp. 1\u201315, Springer, 2000.","DOI":"10.1007\/3-540-45014-9_1"},{"key":"2022060207222536560_j_popets-2022-0042_ref_007","doi-asserted-by":"crossref","unstructured":"[7] D. J. Wu, T. Feng, M. Naehrig, and K. E. Lauter, \u201cPrivately evaluating decision trees and random forests.,\u201d Proc. on Privacy Enhancing Technologies, no. 4, pp. 335\u2013355, 2016.10.1515\/popets-2016-0043","DOI":"10.1515\/popets-2016-0043"},{"key":"2022060207222536560_j_popets-2022-0042_ref_008","doi-asserted-by":"crossref","unstructured":"[8] K. Fritchman, K. Saminathan, R. Dowsley, T. Hughes, M. De Cock, A. Nascimento, and A. Teredesai, \u201cPrivacy-preserving scoring of tree ensembles: A novel framework for AI in healthcare,\u201d in IEEE Big Data, pp. 2413\u20132422, 2018.","DOI":"10.1109\/BigData.2018.8622627"},{"key":"2022060207222536560_j_popets-2022-0042_ref_009","doi-asserted-by":"crossref","unstructured":"[9] \u00c1. Kiss, M. Naderpour, J. Liu, N. Asokan, and T. Schneider, \u201cSok: Modular and efficient private decision tree evaluation,\u201d Proc. on Privacy Enhancing Technologies, no. 2, p. 187\u2013208, 2019.10.2478\/popets-2019-0026","DOI":"10.2478\/popets-2019-0026"},{"key":"2022060207222536560_j_popets-2022-0042_ref_010","doi-asserted-by":"crossref","unstructured":"[10] J. R. Quinlan, \u201cInduction of decision trees,\u201d Machine learning, vol. 1, no. 1, pp. 81\u2013106, 1986.10.1007\/BF00116251","DOI":"10.1007\/BF00116251"},{"key":"2022060207222536560_j_popets-2022-0042_ref_011","doi-asserted-by":"crossref","unstructured":"[11] Y. Lindell and B. Pinkas, \u201cPrivacy preserving data mining,\u201d in Annual International Cryptology Conf., pp. 36\u201354, 2000.10.1007\/3-540-44598-6_3","DOI":"10.1007\/3-540-44598-6_3"},{"key":"2022060207222536560_j_popets-2022-0042_ref_012","doi-asserted-by":"crossref","unstructured":"[12] J. Vaidya and C. Clifton, \u201cPrivacy-preserving decision trees over vertically partitioned data,\u201d in IFIP Annual Conf. on Data and Appl. Security and Privacy, pp. 139\u2013152, 2005.10.1007\/11535706_11","DOI":"10.1007\/11535706_11"},{"key":"2022060207222536560_j_popets-2022-0042_ref_013","doi-asserted-by":"crossref","unstructured":"[13] S. Samet and A. Miri, \u201cPrivacy preserving ID3 using Gini index over horizontally partitioned data,\u201d in 2008 IEEE\/ACS Intern. Conf. on Comp. Syst. and Appl., pp. 645\u2013651, 2008.10.1109\/AICCSA.2008.4493598","DOI":"10.1109\/AICCSA.2008.4493598"},{"key":"2022060207222536560_j_popets-2022-0042_ref_014","doi-asserted-by":"crossref","unstructured":"[14] S. de Hoogh, B. Schoenmakers, P. Chen, and H. op den Akker, \u201cPractical secure decision tree learning in a teletreatment application,\u201d in Intern. Conf. on Financial Cryptography and Data Security, pp. 179\u2013194, Springer, 2014.10.1007\/978-3-662-45472-5_12","DOI":"10.1007\/978-3-662-45472-5_12"},{"key":"2022060207222536560_j_popets-2022-0042_ref_015","doi-asserted-by":"crossref","unstructured":"[15] M.-J. Xiao, K. Han, L.-S. Huang, and J.-Y. Li, \u201cPrivacy preserving C4.5 algorithm over horizontally partitioned data,\u201d in Fifth International Conference on Grid and Cooperative Computing (GCC\u201906), pp. 78\u201385, IEEE, 2006.10.1109\/GCC.2006.73","DOI":"10.1109\/GCC.2006.73"},{"key":"2022060207222536560_j_popets-2022-0042_ref_016","doi-asserted-by":"crossref","unstructured":"[16] Y. Shen, H. Shao, and L. Yang, \u201cPrivacy preserving C4.5 algorithm over vertically distributed datasets,\u201d in Intern. Conf. on Networks Security, Wireless Communications and Trusted Computing, vol. 2, pp. 446\u2013448, IEEE, 2009.10.1109\/NSWCTC.2009.253","DOI":"10.1109\/NSWCTC.2009.253"},{"key":"2022060207222536560_j_popets-2022-0042_ref_017","doi-asserted-by":"crossref","unstructured":"[17] G. Behera, \u201cPrivacy preserving C4.5 using Gini index,\u201d in 2nd National Conference on Emerging Trends and Applications in Computer Science, pp. 1\u20134, 2011.10.1109\/NCETACS.2011.5751385","DOI":"10.1109\/NCETACS.2011.5751385"},{"key":"2022060207222536560_j_popets-2022-0042_ref_018","doi-asserted-by":"crossref","unstructured":"[18] M. Abspoel, D. Escudero, and N. Volgushev, \u201cSecure training of decision trees with continuous attributes,\u201d in Proc. on Privacy Enhancing Technologies, no. 1, pp. 167\u2013187, 2021.10.2478\/popets-2021-0010","DOI":"10.2478\/popets-2021-0010"},{"key":"2022060207222536560_j_popets-2022-0042_ref_019","unstructured":"[19] J. R. Quinlan, C4.5: programs for machine learning. Elsevier, 2014."},{"key":"2022060207222536560_j_popets-2022-0042_ref_020","doi-asserted-by":"crossref","unstructured":"[20] P. Geurts, D. Ernst, and L. Wehenkel, \u201cExtremely randomized trees,\u201d Machine Learning, vol. 63, no. 1, pp. 3\u201342, 2006.10.1007\/s10994-006-6226-1","DOI":"10.1007\/s10994-006-6226-1"},{"key":"2022060207222536560_j_popets-2022-0042_ref_021","unstructured":"[21] K. Deforth, M. Desgroseilliers, N. Gama, M. Georgieva, D. Jetchev, and M. Vuille, \u201cXORBoost: Tree boosting in the multiparty computation setting.\u201d Cryptology ePrint Archive, Report 2021\/432, 2021. https:\/\/eprint.iacr.org\/2021\/432."},{"key":"2022060207222536560_j_popets-2022-0042_ref_022","doi-asserted-by":"crossref","unstructured":"[22] M. De Cock, R. Dowsley, C. Horst, R. Katti, A. Nascimento, W.-S. Poon, and S. Truex, \u201cEfficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation,\u201d IEEE Transactions on Dependable and Secure Computing, vol. 16, no. 2, pp. 217\u2013230, 2019.10.1109\/TDSC.2017.2679189","DOI":"10.1109\/TDSC.2017.2679189"},{"key":"2022060207222536560_j_popets-2022-0042_ref_023","doi-asserted-by":"crossref","unstructured":"[23] D. Beaver, \u201cCommodity-based cryptography,\u201d in STOC, vol. 97, pp. 446\u2013455, 1997.10.1145\/258533.258637","DOI":"10.1145\/258533.258637"},{"key":"2022060207222536560_j_popets-2022-0042_ref_024","doi-asserted-by":"crossref","unstructured":"[24] B. David, R. Dowsley, R. Katti, and A. C. Nascimento, \u201cEfficient unconditionally secure comparison and privacy preserving machine learning classification protocols,\u201d in International Conference on Provable Security, pp. 354\u2013367, Springer, 2015.10.1007\/978-3-319-26059-4_20","DOI":"10.1007\/978-3-319-26059-4_20"},{"key":"2022060207222536560_j_popets-2022-0042_ref_025","doi-asserted-by":"crossref","unstructured":"[25] M. De Cock, R. Dowsley, A. C. A. Nascimento, and S. C. Newman, \u201cFast, privacy preserving linear regression over distributed datasets based on pre-distributed data,\u201d in 8th ACM Workshop on Artificial Intelligence and Security (AISec), pp. 3\u201314, 2015.10.1145\/2808769.2808774","DOI":"10.1145\/2808769.2808774"},{"key":"2022060207222536560_j_popets-2022-0042_ref_026","doi-asserted-by":"crossref","unstructured":"[26] R. Canetti, \u201cUniversally composable security: A new paradigm for cryptographic protocols,\u201d in 42nd Annual Symposium on Foundations of Computer Science, pp. 136\u2013145, IEEE Computer Society, 2001.10.1109\/SFCS.2001.959888","DOI":"10.1109\/SFCS.2001.959888"},{"key":"2022060207222536560_j_popets-2022-0042_ref_027","unstructured":"[27] D. Reich, A. Todoki, R. Dowsley, M. De Cock, and A. Nascimento, \u201cPrivacy-preserving classification of personal text messages with secure multi-party computation,\u201d in Advances in Neural Information Processing Systems (NeurIPS), pp. 3752\u20133764, 2019."},{"key":"2022060207222536560_j_popets-2022-0042_ref_028","doi-asserted-by":"crossref","unstructured":"[28] N. Attrapadung, G. Hanaoka, S. Kiyomoto, T. Mimoto, and J. C. N. Schuldt, \u201cA taxonomy of secure two-party comparison protocols and efficient constructions,\u201d IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. 102-A, no. 9, pp. 1048\u20131060, 2019.","DOI":"10.1587\/transfun.E102.A.1048"},{"key":"2022060207222536560_j_popets-2022-0042_ref_029","doi-asserted-by":"crossref","unstructured":"[29] D. Bogdanov, S. Laur, and J. Willemson, \u201cSharemind: A framework for fast privacy-preserving computations,\u201d in European Symposium on Research in Computer Security, pp. 192\u2013206, Springer, 2008.10.1007\/978-3-540-88313-5_13","DOI":"10.1007\/978-3-540-88313-5_13"},{"key":"2022060207222536560_j_popets-2022-0042_ref_030","doi-asserted-by":"crossref","unstructured":"[30] T. Nishide and K. Ohta, \u201cMultiparty computation for interval, equality, and comparison without bit-decomposition protocol,\u201d in International Workshop on Public Key Cryptography, pp. 343\u2013360, Springer, 2007.10.1007\/978-3-540-71677-8_23","DOI":"10.1007\/978-3-540-71677-8_23"},{"key":"2022060207222536560_j_popets-2022-0042_ref_031","doi-asserted-by":"crossref","unstructured":"[31] L. Breiman, \u201cRandom forests,\u201d Machine learning, vol. 45, no. 1, pp. 5\u201332, 2001.10.1023\/A:1010933404324","DOI":"10.1023\/A:1010933404324"},{"key":"2022060207222536560_j_popets-2022-0042_ref_032","unstructured":"[32] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, \u201cScikit-learn: Machine learning in Python,\u201d Journal of Machine Learning Research, vol. 12, pp. 2825\u20132830, 2011."},{"key":"2022060207222536560_j_popets-2022-0042_ref_033","doi-asserted-by":"crossref","unstructured":"[33] S. Garcia, J. Luengo, J. A. S\u00e1ez, V. Lopez, and F. Herrera, \u201cA survey of discretization techniques: Taxonomy and empirical analysis in supervised learning,\u201d IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, pp. 734\u2013750, 2012.10.1109\/TKDE.2012.35","DOI":"10.1109\/TKDE.2012.35"},{"key":"2022060207222536560_j_popets-2022-0042_ref_034","unstructured":"[34] R. Dowsley, Cryptography Based on Correlated Data: Foundations and Practice. PhD thesis, Karlsruhe Institute of Technology, Germany, 2016."}],"container-title":["Proceedings on Privacy Enhancing Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.sciendo.com\/pdf\/10.2478\/popets-2022-0042","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,20]],"date-time":"2022-07-20T16:32:03Z","timestamp":1658334723000},"score":1,"resource":{"primary":{"URL":"https:\/\/petsymposium.org\/popets\/2022\/popets-2022-0042.php"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,3]]},"references-count":34,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,3,3]]},"published-print":{"date-parts":[[2022,4,1]]}},"alternative-id":["10.2478\/popets-2022-0042"],"URL":"https:\/\/doi.org\/10.2478\/popets-2022-0042","relation":{},"ISSN":["2299-0984"],"issn-type":[{"value":"2299-0984","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,3]]}}}