{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T16:26:25Z","timestamp":1764174385787,"version":"3.37.3"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"18","license":[{"start":{"date-parts":[[2017,5,4]],"date-time":"2017-05-04T00:00:00Z","timestamp":1493856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000025","name":"NIMH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000025","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01 MH098099","K01 MH096175"],"award-info":[{"award-number":["R01 MH098099","K01 MH096175"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000025","name":"NIMH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000025","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01 MH098099"],"award-info":[{"award-number":["R01 MH098099"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p\u226bn, these differential privacy methods are susceptible to overfitting.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Methods<\/jats:title>\n                  <jats:p>We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Code available at http:\/\/insilico.utulsa.edu\/software\/privateEC.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx298","type":"journal-article","created":{"date-parts":[[2017,5,2]],"date-time":"2017-05-02T19:09:21Z","timestamp":1493752161000},"page":"2906-2913","source":"Crossref","is-referenced-by-count":33,"title":["Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests"],"prefix":"10.1093","volume":"33","author":[{"given":"Trang T","family":"Le","sequence":"first","affiliation":[{"name":"Department of Mathematics, University of Tulsa, Tulsa, OK, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"W Kyle","family":"Simmons","sequence":"additional","affiliation":[{"name":"Laureate Institute for Brain Research, Tulsa, OK, USA"},{"name":"Faculty of Community Medicine, University of Tulsa, Tulsa, OK, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Masaya","family":"Misaki","sequence":"additional","affiliation":[{"name":"Laureate Institute for Brain Research, Tulsa, OK, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jerzy","family":"Bodurka","sequence":"additional","affiliation":[{"name":"Laureate Institute for Brain Research, Tulsa, OK, USA"},{"name":"Stephenson School of Biomedical Engineering, University of Oklahoma, OK, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bill C","family":"White","sequence":"additional","affiliation":[{"name":"Tandy School of Computer Science, University of Tulsa, OK, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan","family":"Savitz","sequence":"additional","affiliation":[{"name":"Laureate Institute for Brain Research, Tulsa, OK, USA"},{"name":"Faculty of Community Medicine, University of Tulsa, Tulsa, OK, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Brett A","family":"McKinney","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of Tulsa, Tulsa, OK, USA"},{"name":"Tandy School of Computer Science, University of Tulsa, OK, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2017,5,4]]},"reference":[{"key":"2023020206421798700_btx298-B1","doi-asserted-by":"crossref","first-page":"2010","DOI":"10.1093\/bioinformatics\/btn356","article-title":"Enriched random forests","volume":"24","author":"Amaratunga","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020206421798700_btx298-B2","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/S0166-4328(01)00297-2","article-title":"Controlling the false discovery rate in behavior genetics research","volume":"125","author":"Benjamini","year":"2001","journal-title":"Behav. Brain Res"},{"key":"2023020206421798700_btx298-B3","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"Breiman","year":"2001","journal-title":"Random forests. Machine Learn"},{"key":"2023020206421798700_btx298-B4","doi-asserted-by":"crossref","DOI":"10.1561\/9781601982773","article-title":"Privacy-Preserving Data Publishing","volume-title":"Foundations and Trends in Database","author":"Chen","year":"2009"},{"key":"2023020206421798700_btx298-B5","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1006\/cbmr.1996.0014","article-title":"AFNI: software for analysis and visualization of functional magnetic resonance neuroimages","volume":"29","author":"Cox","year":"1996","journal-title":"Comput. Biomed. Res. Int. J"},{"year":"2003","author":"Draper","key":"2023020206421798700_btx298-B6"},{"key":"2023020206421798700_btx298-B7","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.pscychresns.2014.10.003","article-title":"Resting state networks in major depressive disorder","volume":"224","author":"Dutta","year":"2014","journal-title":"Psychiatr. Res"},{"first-page":"1","year":"2006","author":"Dwork","key":"2023020206421798700_btx298-B8"},{"key":"2023020206421798700_btx298-B9","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1126\/science.aaa9375","article-title":"STATISTICS. The reusable holdout: preserving validity in adaptive data analysis","volume":"349","author":"Dwork","year":"2015","journal-title":"Science"},{"key":"2023020206421798700_btx298-B10","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1561\/0400000042","article-title":"The algorithmic foundations of differential privacy","volume":"9","author":"Dwork","year":"2013","journal-title":"Found. Trends\u00ae Theor. Comput. Sci"},{"key":"2023020206421798700_btx298-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1749603.1749605","article-title":"Privacy-preserving data publishing","volume":"42","author":"Fung","year":"2010","journal-title":"Survey Recent Dev. ACM Comput. Surv"},{"key":"2023020206421798700_btx298-B12","doi-asserted-by":"crossref","first-page":"2711","DOI":"10.1093\/brain\/aws160","article-title":"Fractionation of social brain circuits in autism spectrum disorders","volume":"135","author":"Gotts","year":"2012","journal-title":"Brain J. Neurol"},{"key":"2023020206421798700_btx298-B13","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/1756-0381-2-5","article-title":"Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions","volume":"2","author":"Greene","year":"2009","journal-title":"BioData Mining"},{"key":"2023020206421798700_btx298-B14","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning: data Mining, Inference, and Prediction","author":"Hastie","year":"2009"},{"key":"2023020206421798700_btx298-B15","doi-asserted-by":"crossref","first-page":"e1000167","DOI":"10.1371\/journal.pgen.1000167","article-title":"Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays","volume":"4","author":"Homer","year":"2008","journal-title":"PLoS Genet"},{"key":"2023020206421798700_btx298-B16","doi-asserted-by":"crossref","first-page":"620","DOI":"10.1103\/PhysRev.106.620","article-title":"Information theory and statistical mechanics","volume":"106","author":"Jaynes","year":"1957","journal-title":"Phys. Rev"},{"key":"2023020206421798700_btx298-B17","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1007\/3-540-57868-4_57","article-title":"Estimating attributes: analysis and extensions of RELIEF","volume":"784","author":"Kononenko","year":"1994","journal-title":"Machine Learn. ECML-94 Lecture Notes Comp. Sci"},{"key":"2023020206421798700_btx298-B18","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1023\/A:1008280620621","article-title":"Overcoming the myopia of inductive learning algorithms with RELIEFF","volume":"7","author":"Kononenko","year":"1997","journal-title":"Appl. Intel"},{"key":"2023020206421798700_btx298-B19","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/j.artmed.2015.11.001","article-title":"The feature selection bias problem in relation to high-dimensional gene data","volume":"66","author":"Krawczuk","year":"2016","journal-title":"Artif. Intel. Med"},{"key":"2023020206421798700_btx298-B20","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/s13040-015-0040-x","article-title":"Differential co-expression network centrality and machine learning feature selection for identifying susceptibility hubs in networks with scale-free structure","volume":"8","author":"Lareau","year":"2015","journal-title":"BioData Mining"},{"key":"2023020206421798700_btx298-B21","doi-asserted-by":"crossref","first-page":"1724","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet"},{"key":"2023020206421798700_btx298-B22","doi-asserted-by":"crossref","first-page":"e79999","DOI":"10.1371\/journal.pone.0079999","article-title":"Identify changes of brain regional homogeneity in bipolar disorder and unipolar depression using resting-state FMRI","volume":"8","author":"Liang","year":"2013","journal-title":"PloS One"},{"key":"2023020206421798700_btx298-B23","first-page":"930.","article-title":"Insular dysfunction within the salience network is associated with severity of symptoms and aberrant inter-network connectivity in major depressive disorder","volume":"7","author":"Manoliu","year":"2013","journal-title":"Front. Human Neurosci"},{"key":"2023020206421798700_btx298-B24","doi-asserted-by":"crossref","first-page":"e1000432","DOI":"10.1371\/journal.pgen.1000432","article-title":"Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis","volume":"5","author":"McKinney","year":"2009","journal-title":"PLoS Genet"},{"key":"2023020206421798700_btx298-B25","doi-asserted-by":"crossref","first-page":"2113","DOI":"10.1093\/bioinformatics\/btm317","article-title":"Evaporative cooling feature selection for genotypic data involving interactions","volume":"23","author":"McKinney","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020206421798700_btx298-B26","doi-asserted-by":"crossref","first-page":"e81527","DOI":"10.1371\/journal.pone.0081527","article-title":"ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data","volume":"8","author":"McKinney","year":"2013","journal-title":"PloS One"},{"first-page":"94","year":"2007","author":"McSherry","key":"2023020206421798700_btx298-B27"},{"key":"2023020206421798700_btx298-B28","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1016\/j.neubiorev.2015.07.014","article-title":"Resting-state functional connectivity in major depressive disorder: a review","volume":"56","author":"Mulders","year":"2015","journal-title":"Neurosci. Biobehav. Rev"},{"key":"2023020206421798700_btx298-B29","doi-asserted-by":"crossref","first-page":"2041","DOI":"10.1017\/S0033291713002596","article-title":"Revisiting default mode network function in major depression: evidence for disrupted subsystem connectivity","volume":"44","author":"Sambataro","year":"2014","journal-title":"Psychol. Med"},{"key":"2023020206421798700_btx298-B30","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1093\/cercor\/bhr099","article-title":"Decoding subject-driven cognitive states with whole-brain connectivity patterns","volume":"22","author":"Shirer","year":"2012","journal-title":"Cereb. Cortex"},{"key":"2023020206421798700_btx298-B31","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1023\/A:1025667309714","article-title":"Theoretical and empirical analysis of ReliefF and RReliefF","volume":"53","author":"Sikonja","year":"2003","journal-title":"Machine Learn"},{"key":"2023020206421798700_btx298-B32","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1093\/jnci\/95.1.14","article-title":"Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification","volume":"95","author":"Simon","year":"2003","journal-title":"J. Natl. Cancer Inst"},{"key":"2023020206421798700_btx298-B33","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1186\/1471-2105-7-91","article-title":"Bias in error estimation when using cross-validation for model selection","volume":"7","author":"Varma","year":"2006","journal-title":"BMC Bioinform"},{"key":"2023020206421798700_btx298-B34","doi-asserted-by":"crossref","first-page":"534","DOI":"10.1145\/1653662.1653726","volume-title":"Proceedings of the 16th ACM Conference on Computer and Communications Security","author":"Wang","year":"2009"},{"key":"2023020206421798700_btx298-B35","doi-asserted-by":"crossref","first-page":"430","DOI":"10.1016\/j.jad.2008.10.013","article-title":"Regional homogeneity in depression and its relationship with separate depressive symptom clusters: a resting-state fMRI study","volume":"115","author":"Yao","year":"2009","journal-title":"J. Affect. Disorders"},{"key":"2023020206421798700_btx298-B36","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1472-6947-14-S1-S3","article-title":"Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge","volume":"14 (Suppl 1)","author":"Yu","year":"2014","journal-title":"BMC Med. Inform. Decision Making"},{"key":"2023020206421798700_btx298-B37","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1016\/j.biopsych.2011.10.035","article-title":"Evidence of a dissociation pattern in resting-state default mode network connectivity in first-episode, treatment-naive major depression patients","volume":"71","author":"Zhu","year":"2012","journal-title":"Biol. Psychiatr"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/18\/2906\/49041392\/bioinformatics_33_18_2906.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/18\/2906\/49041392\/bioinformatics_33_18_2906.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T06:42:52Z","timestamp":1675320172000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/18\/2906\/3796394"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2017,5,4]]},"references-count":37,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2017,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx298","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2017,9,15]]},"published":{"date-parts":[[2017,5,4]]}}}