{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T22:52:22Z","timestamp":1768258342550,"version":"3.49.0"},"reference-count":23,"publisher":"Privacy Enhancing Technologies Symposium Advisory Board","issue":"3","license":[{"start":{"date-parts":[[2019,7,1]],"date-time":"2019-07-01T00:00:00Z","timestamp":1561939200000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Hypothesis testing is one of the most common types of data analysis and forms the backbone of scientific research in many disciplines. Analysis of variance (ANOVA) in particular is used to detect dependence between a categorical and a numerical variable. Here we show how one can carry out this hypothesis test under the restrictions of differential privacy. We show that the <jats:italic>F<\/jats:italic> -statistic, the optimal test statistic in the public setting, is no longer optimal in the private setting, and we develop a new test statistic <jats:italic>F<\/jats:italic>\n                  <jats:sub>1<\/jats:sub> with much higher statistical power. We show how to rigorously compute a reference distribution for the <jats:italic>F<\/jats:italic>\n                  <jats:sub>1<\/jats:sub> statistic and give an algorithm that outputs accurate <jats:italic>p<\/jats:italic>-values. We implement our test and experimentally optimize several parameters. We then compare our test to the only previous work on private ANOVA testing, using the same effect size as that work. We see an order of magnitude improvement, with our test requiring only 7% as much data to detect the effect.<\/jats:p>","DOI":"10.2478\/popets-2019-0049","type":"journal-article","created":{"date-parts":[[2019,7,20]],"date-time":"2019-07-20T09:31:41Z","timestamp":1563615101000},"page":"310-330","source":"Crossref","is-referenced-by-count":8,"title":["Improved Differentially Private Analysis of Variance"],"prefix":"10.56553","volume":"2019","author":[{"given":"Marika","family":"Swanberg","sequence":"first","affiliation":[{"name":"Mathematics Department , Reed College"}]},{"given":"Ira","family":"Globus-Harris","sequence":"additional","affiliation":[{"name":"Mathematics Department , Reed College"}]},{"given":"Iris","family":"Griffith","sequence":"additional","affiliation":[{"name":"Mathematics Department , Reed College"}]},{"given":"Anna","family":"Ritz","sequence":"additional","affiliation":[{"name":"Biology Department , Reed College"}]},{"given":"Adam","family":"Groce","sequence":"additional","affiliation":[{"name":"Mathematics Department , Reed College"}]},{"given":"Andrew","family":"Bray","sequence":"additional","affiliation":[{"name":"Mathematics Department , Reed College"}]}],"member":"35752","published-online":{"date-parts":[[2019,7,12]]},"reference":[{"key":"2022061011200048215_j_popets-2019-0049_ref_001_w2aab3b7c17b1b6b1ab1ab1Aa","doi-asserted-by":"crossref","unstructured":"[1] David J Balding. A tutorial on statistical methods for population association studies. Nature Reviews Genetics, 7(10):781, 2006.10.1038\/nrg191616983374","DOI":"10.1038\/nrg1916"},{"key":"2022061011200048215_j_popets-2019-0049_ref_002_w2aab3b7c17b1b6b1ab1ab2Aa","unstructured":"[2] Andr\u00e9s F Barrientos, Jerome P Reiter, Ashwin Machanavajjhala, and Yan Chen. Differentially private significance tests for regression coefficients. arXiv preprint arXiv:1705.09561, 2017."},{"key":"2022061011200048215_j_popets-2019-0049_ref_003_w2aab3b7c17b1b6b1ab1ab3Aa","doi-asserted-by":"crossref","unstructured":"[3] Zachary Campbell, Andrew Bray, Anna M. Ritz, and Adam Groce. Differentially private anova testing. 1st International Conference on Data Intelligence and Security (ICDIS), pages 281\u2013285, 2018.10.1109\/ICDIS.2018.00052","DOI":"10.1109\/ICDIS.2018.00052"},{"key":"2022061011200048215_j_popets-2019-0049_ref_004_w2aab3b7c17b1b6b1ab1ab4Aa","unstructured":"[4] George Casella and Roger Berger. Statistical Inference. Brooks\/Cole, Belmont, CA, 2 edition, 2002."},{"key":"2022061011200048215_j_popets-2019-0049_ref_005_w2aab3b7c17b1b6b1ab1ab5Aa","unstructured":"[5] D. R. Cox. Theoretical statistics. Chapman and Hall, London, 1974.10.1007\/978-1-4899-2887-0"},{"key":"2022061011200048215_j_popets-2019-0049_ref_006_w2aab3b7c17b1b6b1ab1ab6Aa","unstructured":"[6] Bolin Ding, Harsha Nori, Paul Li, and Joshua Allen. Comparing population means under local differential privacy: with significance and power. arXiv preprint arXiv:1803.09027, 2018."},{"key":"2022061011200048215_j_popets-2019-0049_ref_007_w2aab3b7c17b1b6b1ab1ab7Aa","doi-asserted-by":"crossref","unstructured":"[7] Vito D\u2019Orazio, James Honaker, and Gary King. Differential privacy for social science inference. 2015.10.2139\/ssrn.2676160","DOI":"10.2139\/ssrn.2676160"},{"key":"2022061011200048215_j_popets-2019-0049_ref_008_w2aab3b7c17b1b6b1ab1ab8Aa","doi-asserted-by":"crossref","unstructured":"[8] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, volume 3876, pages 265\u2013284. Springer, 2006.10.1007\/11681878_14","DOI":"10.1007\/11681878_14"},{"key":"2022061011200048215_j_popets-2019-0049_ref_009_w2aab3b7c17b1b6b1ab1ab9Aa","doi-asserted-by":"crossref","unstructured":"[9] Stephen E Fienberg, Aleksandra Slavkovic, and Caroline Uhler. Privacy preserving gwas data sharing. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pages 628\u2013635. IEEE, 2011.10.1109\/ICDMW.2011.140","DOI":"10.1109\/ICDMW.2011.140"},{"key":"2022061011200048215_j_popets-2019-0049_ref_010_w2aab3b7c17b1b6b1ab1ac10Aa","unstructured":"[10] Marco Gaboardi, Hyun-Woo Lim, Ryan M Rogers, and Salil P Vadhan. Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing. In ICML, pages 2111\u20132120, 2016."},{"key":"2022061011200048215_j_popets-2019-0049_ref_011_w2aab3b7c17b1b6b1ab1ac11Aa","doi-asserted-by":"crossref","unstructured":"[11] Aaron Johnson and Vitaly Shmatikov. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1079\u20131087. ACM, 2013.10.1145\/2487575.2487687468152826691928","DOI":"10.1145\/2487575.2487687"},{"key":"2022061011200048215_j_popets-2019-0049_ref_012_w2aab3b7c17b1b6b1ab1ac12Aa","unstructured":"[12] Jerome Meyers and Arnold Well. Research Design and Statistical Analysis. Lawrence Erlbaum Associates, London, 2 edition, 2003."},{"key":"2022061011200048215_j_popets-2019-0049_ref_013_w2aab3b7c17b1b6b1ab1ac13Aa","doi-asserted-by":"crossref","unstructured":"[13] Th\u00f4ng T Nguy\u00ean and Siu Cheung Hui. Differentially private regression for discrete-time survival analysis. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1199\u20131208. ACM, 2017.10.1145\/3132847.3132928","DOI":"10.1145\/3132847.3132928"},{"key":"2022061011200048215_j_popets-2019-0049_ref_014_w2aab3b7c17b1b6b1ab1ac14Aa","unstructured":"[14] Ryan Rogers and Daniel Kifer. A New Class of Private Chi-Square Hypothesis Tests. In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 991\u20131000, Fort Lauderdale, FL, USA, 20\u201322 Apr 2017. PMLR."},{"key":"2022061011200048215_j_popets-2019-0049_ref_015_w2aab3b7c17b1b6b1ab1ac15Aa","doi-asserted-by":"crossref","unstructured":"[15] Emanuel Schmider, Matthias Ziegler, Erik Danay, Luzi Beyer, and Markus Buhner. Is it really robust? reinvestigatin the robustness of anova against violations of the normal distribution assumption. Methodology, pages 147\u2013151, 2010.10.1027\/1614-2241\/a000016","DOI":"10.1027\/1614-2241\/a000016"},{"key":"2022061011200048215_j_popets-2019-0049_ref_016_w2aab3b7c17b1b6b1ab1ac16Aa","unstructured":"[16] Or Sheffet. Differentially private ordinary least squares. arXiv preprint arXiv:1507.02482, 2015."},{"key":"2022061011200048215_j_popets-2019-0049_ref_017_w2aab3b7c17b1b6b1ab1ac17Aa","unstructured":"[17] Adam Smith. Efficient, differentially private point estimators. arXiv preprint arXiv:0809.4794, 2008."},{"key":"2022061011200048215_j_popets-2019-0049_ref_018_w2aab3b7c17b1b6b1ab1ac18Aa","doi-asserted-by":"crossref","unstructured":"[18] Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 813\u2013822. ACM, 2011.10.1145\/1993636.1993743","DOI":"10.1145\/1993636.1993743"},{"key":"2022061011200048215_j_popets-2019-0049_ref_019_w2aab3b7c17b1b6b1ab1ac19Aa","unstructured":"[19] Eftychia Solea. Differentially private hypothesis testing for normal random variables. 2014."},{"key":"2022061011200048215_j_popets-2019-0049_ref_020_w2aab3b7c17b1b6b1ab1ac20Aa","doi-asserted-by":"crossref","unstructured":"[20] Caroline Uhlerop, Aleksandra Slavkovi\u0107, and Stephen E Fienberg. Privacy-preserving data sharing for genome-wide association studies. The Journal of privacy and confidentiality, 5(1):137, 2013.10.29012\/jpc.v5i1.629","DOI":"10.29012\/jpc.v5i1.629"},{"key":"2022061011200048215_j_popets-2019-0049_ref_021_w2aab3b7c17b1b6b1ab1ac21Aa","doi-asserted-by":"crossref","unstructured":"[21] Duy Vu and Aleksandra Slavkovic. Differential privacy for clinical trial data: Preliminary evaluations. In Data Mining Workshops, 2009. ICDMW\u201909. IEEE International Conference on, pages 138\u2013143. IEEE, 2009.10.1109\/ICDMW.2009.52","DOI":"10.1109\/ICDMW.2009.52"},{"key":"2022061011200048215_j_popets-2019-0049_ref_022_w2aab3b7c17b1b6b1ab1ac22Aa","unstructured":"[22] Yue Wang, Jaewoo Lee, and Daniel Kifer. Revisiting differentially private hypothesis tests for categorical data. arXiv preprint arXiv:1511.03376, 2015."},{"key":"2022061011200048215_j_popets-2019-0049_ref_023_w2aab3b7c17b1b6b1ab1ac23Aa","doi-asserted-by":"crossref","unstructured":"[23] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375\u2013389, 2010.10.1198\/jasa.2009.tm08651","DOI":"10.1198\/jasa.2009.tm08651"}],"container-title":["Proceedings on Privacy Enhancing Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/content.sciendo.com\/view\/journals\/popets\/2019\/3\/article-p310.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.sciendo.com\/pdf\/10.2478\/popets-2019-0049","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,20]],"date-time":"2022-07-20T16:30:33Z","timestamp":1658334633000},"score":1,"resource":{"primary":{"URL":"https:\/\/petsymposium.org\/popets\/2019\/popets-2019-0049.php"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,1]]},"references-count":23,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,7,12]]},"published-print":{"date-parts":[[2019,7,1]]}},"alternative-id":["10.2478\/popets-2019-0049"],"URL":"https:\/\/doi.org\/10.2478\/popets-2019-0049","relation":{},"ISSN":["2299-0984"],"issn-type":[{"value":"2299-0984","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,7,1]]}}}