{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T19:57:04Z","timestamp":1768161424511,"version":"3.49.0"},"reference-count":51,"publisher":"Privacy Enhancing Technologies Symposium Advisory Board","issue":"1","license":[{"start":{"date-parts":[[2020,11,9]],"date-time":"2020-11-09T00:00:00Z","timestamp":1604880000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Recent work on Renyi Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks. Despite their promise, however, differentially private deep networks often lag far behind their non-private counterparts in accuracy, showing the need for more research in model architectures, optimizers, etc. One of the barriers to this expanded research is the training time \u2014 often orders of magnitude larger than training non-private networks. The reason for this slowdown is a crucial privacy-related step called \u201cper-example gradient clipping\u201d whose naive implementation undoes the benefits of batch training with GPUs. By analyzing the back-propagation equations we derive new methods for per-example gradient clipping that are compatible with auto-differeniation (e.g., in Py-Torch and TensorFlow) and provide better GPU utilization. Our implementation in PyTorch showed significant training speed-ups (by factors of 54x - 94x for training various models with batch sizes of 128). These techniques work for a variety of architectural choices including convolutional layers, recurrent networks, attention, residual blocks, etc.<\/jats:p>","DOI":"10.2478\/popets-2021-0008","type":"journal-article","created":{"date-parts":[[2020,12,22]],"date-time":"2020-12-22T11:47:04Z","timestamp":1608637624000},"page":"128-144","source":"Crossref","is-referenced-by-count":18,"title":["Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping"],"prefix":"10.56553","volume":"2021","author":[{"given":"Jaewoo","family":"Lee","sequence":"first","affiliation":[{"name":"University of Georgia"}]},{"given":"Daniel","family":"Kifer","sequence":"additional","affiliation":[{"name":"Penn State University"}]}],"member":"35752","published-online":{"date-parts":[[2020,11,9]]},"reference":[{"key":"2022051413251897261_j_popets-2021-0008_ref_001_w2aab3b7c16b1b6b1ab1ab1Aa","unstructured":"[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man\u00e9, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi\u00e9gas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org."},{"key":"2022051413251897261_j_popets-2021-0008_ref_002_w2aab3b7c16b1b6b1ab1ab2Aa","doi-asserted-by":"crossref","unstructured":"[2] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308\u2013318. ACM, 2016.10.1145\/2976749.2978318","DOI":"10.1145\/2976749.2978318"},{"key":"2022051413251897261_j_popets-2021-0008_ref_003_w2aab3b7c16b1b6b1ab1ab3Aa","doi-asserted-by":"crossref","unstructured":"[3] N. C. Abay, Y. Zhou, M. Kantarcioglu, B. M. Thuraisingham, and L. Sweeney. Privacy preserving synthetic data release using deep learning. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, Part I, pages 510\u2013526, 2018.10.1007\/978-3-030-10925-7_31","DOI":"10.1007\/978-3-030-10925-7_31"},{"key":"2022051413251897261_j_popets-2021-0008_ref_004_w2aab3b7c16b1b6b1ab1ab4Aa","doi-asserted-by":"crossref","unstructured":"[4] G. Acs, L. Melis, C. Castelluccia, and E. D. Cristofaro. Differentially private mixture of generative neural networks. In ICDM, 2017.10.1109\/ICDM.2017.81","DOI":"10.1109\/ICDM.2017.81"},{"key":"2022051413251897261_j_popets-2021-0008_ref_005_w2aab3b7c16b1b6b1ab1ab5Aa","unstructured":"[5] L. J. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. CoRR, abs\/1607.06450, 2016."},{"key":"2022051413251897261_j_popets-2021-0008_ref_006_w2aab3b7c16b1b6b1ab1ab6Aa","unstructured":"[6] E. Bagdasaryan and V. Shmatikov. Differential privacy has disparate impact on model accuracy. CoRR, abs\/1905.12101, 2019."},{"key":"2022051413251897261_j_popets-2021-0008_ref_007_w2aab3b7c16b1b6b1ab1ab7Aa","doi-asserted-by":"crossref","unstructured":"[7] R. Bassily, A. Smith, and A. Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, FOCS \u201914, pages 464\u2013473, Washington, DC, USA, 2014. IEEE Computer Society.10.1109\/FOCS.2014.56","DOI":"10.1109\/FOCS.2014.56"},{"key":"2022051413251897261_j_popets-2021-0008_ref_008_w2aab3b7c16b1b6b1ab1ab8Aa","doi-asserted-by":"crossref","unstructured":"[8] B. K. Beaulieu-Jones, Z. S. Wu, C. Williams, R. Lee, S. P. Bhavnani, J. B. Byrd, and C. S. Greene. Privacy-preserving generative deep neural networks support clinical data sharing. bioRxiv, 2018.10.1101\/159756","DOI":"10.1101\/159756"},{"key":"2022051413251897261_j_popets-2021-0008_ref_009_w2aab3b7c16b1b6b1ab1ab9Aa","doi-asserted-by":"crossref","unstructured":"[9] L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT\u20192010, pages 177\u2013186. Springer, 2010.10.1007\/978-3-7908-2604-3_16","DOI":"10.1007\/978-3-7908-2604-3_16"},{"key":"2022051413251897261_j_popets-2021-0008_ref_010_w2aab3b7c16b1b6b1ab1ac10Aa","unstructured":"[10] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(Mar):1069\u20131109, 2011."},{"key":"2022051413251897261_j_popets-2021-0008_ref_011_w2aab3b7c16b1b6b1ab1ac11Aa","unstructured":"[11] K. Chellapilla, S. Puri, and P. Simard. High performance convolutional neural networks for document processing. In Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft, 2006."},{"key":"2022051413251897261_j_popets-2021-0008_ref_012_w2aab3b7c16b1b6b1ab1ac12Aa","unstructured":"[12] C. Chen, J. Lee, and D. Kifer. Renyi differentially private erm for smooth objectives. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2037\u20132046, 2019."},{"key":"2022051413251897261_j_popets-2021-0008_ref_013_w2aab3b7c16b1b6b1ab1ac13Aa","unstructured":"[13] Q. Chen, C. Xiang, M. Xue, B. Li, N. Borisov, D. Kaafar, and H. Zhu. Differentially private data generative models. https:\/\/arxiv.org\/pdf\/1812.02274.pdf, 2018."},{"key":"2022051413251897261_j_popets-2021-0008_ref_014_w2aab3b7c16b1b6b1ab1ac14Aa","unstructured":"[14] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171\u20134186, 2019."},{"key":"2022051413251897261_j_popets-2021-0008_ref_015_w2aab3b7c16b1b6b1ab1ac15Aa","doi-asserted-by":"crossref","unstructured":"[15] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486\u2013503. Springer, 2006.10.1007\/11761679_29","DOI":"10.1007\/11761679_29"},{"key":"2022051413251897261_j_popets-2021-0008_ref_016_w2aab3b7c16b1b6b1ab1ac16Aa","doi-asserted-by":"crossref","unstructured":"[16] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265\u2013284. Springer, 2006.10.1007\/11681878_14","DOI":"10.1007\/11681878_14"},{"key":"2022051413251897261_j_popets-2021-0008_ref_017_w2aab3b7c16b1b6b1ab1ac17Aa","unstructured":"[17] I. Goodfellow. Efficient per-example gradient computations. arXiv preprint arXiv:1510.01799, 2015."},{"key":"2022051413251897261_j_popets-2021-0008_ref_018_w2aab3b7c16b1b6b1ab1ac18Aa","unstructured":"[18] I. J. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, Cambridge, MA, USA, 2016. http:\/\/www.deeplearningbook.org."},{"key":"2022051413251897261_j_popets-2021-0008_ref_019_w2aab3b7c16b1b6b1ab1ac19Aa","doi-asserted-by":"crossref","unstructured":"[19] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.10.1109\/CVPR.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"key":"2022051413251897261_j_popets-2021-0008_ref_020_w2aab3b7c16b1b6b1ab1ac20Aa","unstructured":"[20] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs\/1502.03167, 2015."},{"key":"2022051413251897261_j_popets-2021-0008_ref_021_w2aab3b7c16b1b6b1ab1ac21Aa","unstructured":"[21] R. Iyengar, J. P. Near, D. Song, O. Thakkar, A. Thakurta, and L. Wang. Towards practical differentially private convex optimization. In Towards Practical Differentially Private Convex Optimization, page 0. IEEE."},{"key":"2022051413251897261_j_popets-2021-0008_ref_022_w2aab3b7c16b1b6b1ab1ac22Aa","unstructured":"[22] Y. Jia. Learning semantic image representations at a large scale. PhD thesis, UC Berkeley, 2014."},{"key":"2022051413251897261_j_popets-2021-0008_ref_023_w2aab3b7c16b1b6b1ab1ac23Aa","unstructured":"[23] J. Jordon, J. Yoon, and M. van der Schaar. Pate-gan: Generating synthetic data with differential privacy guarantees. In ICLR, 2019."},{"key":"2022051413251897261_j_popets-2021-0008_ref_024_w2aab3b7c16b1b6b1ab1ac24Aa","unstructured":"[24] D. Kifer, A. Smith, and A. Thakurta. Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory, pages 25\u20131, 2012."},{"key":"2022051413251897261_j_popets-2021-0008_ref_025_w2aab3b7c16b1b6b1ab1ac25Aa","unstructured":"[25] D. Kingma and J. Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014."},{"key":"2022051413251897261_j_popets-2021-0008_ref_026_w2aab3b7c16b1b6b1ab1ac26Aa","doi-asserted-by":"crossref","unstructured":"[26] J. Lee and D. Kifer. Concentrated differentially private gradient descent with adaptive per-iteration privacy budget. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.10.1145\/3219819.3220076","DOI":"10.1145\/3219819.3220076"},{"key":"2022051413251897261_j_popets-2021-0008_ref_027_w2aab3b7c16b1b6b1ab1ac27Aa","unstructured":"[27] H. B. McMahan, G. Andrew, U. Erlingsson, S. Chien, I. Mironov, N. Papernot, and P. Kairouz. A general approach to adding differential privacy to iterative training procedures. arXiv preprint arXiv:1812.06210, 2018."},{"key":"2022051413251897261_j_popets-2021-0008_ref_028_w2aab3b7c16b1b6b1ab1ac28Aa","unstructured":"[28] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang. Learning differentially private recurrent language models. In International Conference on Learning Representations, 2018."},{"key":"2022051413251897261_j_popets-2021-0008_ref_029_w2aab3b7c16b1b6b1ab1ac29Aa","doi-asserted-by":"crossref","unstructured":"[29] I. Mironov. Renyi differential privacy. In Computer Security Foundations Symposium (CSF), 2017 IEEE 30th, pages 263\u2013275. IEEE, 2017.10.1109\/CSF.2017.11","DOI":"10.1109\/CSF.2017.11"},{"key":"2022051413251897261_j_popets-2021-0008_ref_030_w2aab3b7c16b1b6b1ab1ac30Aa","unstructured":"[30] N. Papernot, M. Abadi, \u00dalfar Erlingsson, I. Goodfellow, and K. Talwar. Semi-supervised knowledge transfer for deep learning from private training data. In Proceedings of the International Conference on Learning Representations, 2017."},{"key":"2022051413251897261_j_popets-2021-0008_ref_031_w2aab3b7c16b1b6b1ab1ac31Aa","unstructured":"[31] N. Papernot, S. Chien, C. C. Choo, G. M. Andrew, and I. Mironov. TensorFlow Privacy."},{"key":"2022051413251897261_j_popets-2021-0008_ref_032_w2aab3b7c16b1b6b1ab1ac32Aa","unstructured":"[32] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and \u00dalfar Erlingsson. Scalable private learning with pate. In International Conference on Learning Representations (ICLR), 2018."},{"key":"2022051413251897261_j_popets-2021-0008_ref_033_w2aab3b7c16b1b6b1ab1ac33Aa","unstructured":"[33] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in PyTorch. In NeurIPS Autodiff Workshop, 2017."},{"key":"2022051413251897261_j_popets-2021-0008_ref_034_w2aab3b7c16b1b6b1ab1ac34Aa","doi-asserted-by":"crossref","unstructured":"[34] N. Phan, Y. Wang, X. Wu, and D. Dou. Differential privacy preservation for deep auto-encoders: an application of human behavior prediction. In AAAI, 2016.","DOI":"10.1609\/aaai.v30i1.10165"},{"key":"2022051413251897261_j_popets-2021-0008_ref_035_w2aab3b7c16b1b6b1ab1ac35Aa","unstructured":"[35] M. Reimherr and J. Awan. KNG: the k-norm gradient mechanism. In NeurIPS, 2019."},{"key":"2022051413251897261_j_popets-2021-0008_ref_036_w2aab3b7c16b1b6b1ab1ac36Aa","doi-asserted-by":"crossref","unstructured":"[36] H. Robbins and S. Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400\u2013407, 1951.10.1214\/aoms\/1177729586","DOI":"10.1214\/aoms\/1177729586"},{"key":"2022051413251897261_j_popets-2021-0008_ref_037_w2aab3b7c16b1b6b1ab1ac37Aa","unstructured":"[37] G. Rochette, A. Manoel, and E. W. Tramel. Efficient per-example gradient computations in convolutional neural networks. ArXiv, abs\/1912.06015, 2019."},{"key":"2022051413251897261_j_popets-2021-0008_ref_038_w2aab3b7c16b1b6b1ab1ac38Aa","unstructured":"[38] S. Ruder. An overview of gradient descent optimization algorithms. CoRR, abs\/1609.04747, 2016."},{"key":"2022051413251897261_j_popets-2021-0008_ref_039_w2aab3b7c16b1b6b1ab1ac39Aa","doi-asserted-by":"crossref","unstructured":"[39] R. Shokri and V. Shmatikov. Privacy-preserving deep learning. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, 2015.10.1145\/2810103.2813687","DOI":"10.1145\/2810103.2813687"},{"key":"2022051413251897261_j_popets-2021-0008_ref_040_w2aab3b7c16b1b6b1ab1ac40Aa","doi-asserted-by":"crossref","unstructured":"[40] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inference attacks against machine learning models. In IEEE Symposium on Security and Privacy (SP), 2017.10.1109\/SP.2017.41","DOI":"10.1109\/SP.2017.41"},{"key":"2022051413251897261_j_popets-2021-0008_ref_041_w2aab3b7c16b1b6b1ab1ac41Aa","unstructured":"[41] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015."},{"key":"2022051413251897261_j_popets-2021-0008_ref_042_w2aab3b7c16b1b6b1ab1ac42Aa","unstructured":"[42] O. Thakkar, G. Andrew, and H. B. McMahan. Differentially private learning with adaptive clipping. CoRR, abs\/1905.03871, 2019."},{"key":"2022051413251897261_j_popets-2021-0008_ref_043_w2aab3b7c16b1b6b1ab1ac43Aa","unstructured":"[43] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky. Instance normalization: The missing ingredient for fast stylization. CoRR, abs\/1607.08022, 2016."},{"key":"2022051413251897261_j_popets-2021-0008_ref_044_w2aab3b7c16b1b6b1ab1ac44Aa","unstructured":"[44] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, \u0141. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998\u20136008, 2017."},{"key":"2022051413251897261_j_popets-2021-0008_ref_045_w2aab3b7c16b1b6b1ab1ac45Aa","unstructured":"[45] D. Wang, M. Ye, and J. Xu. Differentially private empirical risk minimization revisited: Faster and more general. In Advances in Neural Information Processing Systems 30, pages 2719\u20132728. Curran Associates, Inc., 2017."},{"key":"2022051413251897261_j_popets-2021-0008_ref_046_w2aab3b7c16b1b6b1ab1ac46Aa","doi-asserted-by":"crossref","unstructured":"[46] Y. Wu and K. He. Group normalization. In ECCV, 2018.10.1007\/978-3-030-01261-8_1","DOI":"10.1007\/978-3-030-01261-8_1"},{"key":"2022051413251897261_j_popets-2021-0008_ref_047_w2aab3b7c16b1b6b1ab1ac47Aa","unstructured":"[47] L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou. Differentially private generative adversarial network, 2018."},{"key":"2022051413251897261_j_popets-2021-0008_ref_048_w2aab3b7c16b1b6b1ab1ac48Aa","unstructured":"[48] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237, 2019."},{"key":"2022051413251897261_j_popets-2021-0008_ref_049_w2aab3b7c16b1b6b1ab1ac49Aa","unstructured":"[49] F. Yu, Y. Zhang, S. Song, A. Seff, and J. Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. ArXiv, abs\/1506.03365, 2015."},{"key":"2022051413251897261_j_popets-2021-0008_ref_050_w2aab3b7c16b1b6b1ab1ac50Aa","doi-asserted-by":"crossref","unstructured":"[50] L. Yu, L. Liu, C. Pu, M. E. Gursoy, and S. Truex. Differentially private model publishing for deep learning. 2019 IEEE Symposium on Security and Privacy (SP), pages 332\u2013349, 2019.","DOI":"10.1109\/SP.2019.00019"},{"key":"2022051413251897261_j_popets-2021-0008_ref_051_w2aab3b7c16b1b6b1ab1ac51Aa","doi-asserted-by":"crossref","unstructured":"[51] J. Zhang, K. Zheng, W. Mou, and L. Wang. Efficient private erm for smooth objectives. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 3922\u20133928. AAAI Press, 2017.10.24963\/ijcai.2017\/548","DOI":"10.24963\/ijcai.2017\/548"}],"container-title":["Proceedings on Privacy Enhancing Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/content.sciendo.com\/view\/journals\/popets\/2021\/1\/article-p128.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.sciendo.com\/pdf\/10.2478\/popets-2021-0008","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,8]],"date-time":"2022-12-08T14:02:40Z","timestamp":1670508160000},"score":1,"resource":{"primary":{"URL":"https:\/\/petsymposium.org\/popets\/2021\/popets-2021-0008.php"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,9]]},"references-count":51,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,11,9]]},"published-print":{"date-parts":[[2021,1,1]]}},"alternative-id":["10.2478\/popets-2021-0008"],"URL":"https:\/\/doi.org\/10.2478\/popets-2021-0008","relation":{},"ISSN":["2299-0984"],"issn-type":[{"value":"2299-0984","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,9]]}}}