{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T01:30:33Z","timestamp":1778463033227,"version":"3.51.4"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2020,1,13]],"date-time":"2020-01-13T00:00:00Z","timestamp":1578873600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institute of Health","doi-asserted-by":"publisher","award":["GM121312"],"award-info":[{"award-number":["GM121312"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institute of Health","doi-asserted-by":"publisher","award":["GM103456"],"award-info":[{"award-number":["GM103456"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Summary<\/jats:title>\n                    <jats:p>Machine learning feature selection methods are needed to detect complex interaction-network effects in complicated modeling scenarios in high-dimensional data, such as GWAS, gene expression, eQTL and structural\/functional neuroimage studies for case\u2013control or continuous outcomes. In addition, many machine learning methods have limited ability to address the issues of controlling false discoveries and adjusting for covariates. To address these challenges, we develop a new feature selection technique called Nearest-neighbor Projected-Distance Regression (NPDR) that calculates the importance of each predictor using generalized linear model regression of distances between nearest-neighbor pairs projected onto the predictor dimension. NPDR captures the underlying interaction structure of data using nearest-neighbors in high dimensions, handles both dichotomous and continuous outcomes and predictor data types, statistically corrects for covariates, and permits statistical inference and penalized regression. We use realistic simulations with interactions and other effects to show that NPDR has better precision-recall than standard Relief-based feature selection and random forest importance, with the additional benefit of covariate adjustment and multiple testing correction. Using RNA-Seq data from a study of major depressive disorder (MDD), we show that NPDR with covariate adjustment removes spurious associations due to confounding. We apply NPDR to eQTL data to identify potentially interacting variants that regulate transcripts associated with MDD and demonstrate NPDR\u2019s utility for GWAS and continuous outcomes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Available at: https:\/\/insilico.github.io\/npdr\/.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa024","type":"journal-article","created":{"date-parts":[[2020,1,8]],"date-time":"2020-01-08T23:20:05Z","timestamp":1578525605000},"page":"2770-2777","source":"Crossref","is-referenced-by-count":13,"title":["Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3737-6565","authenticated-orcid":false,"given":"Trang T","family":"Le","sequence":"first","affiliation":[{"name":"Department of Biostatistics , Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bryan A","family":"Dawkins","sequence":"additional","affiliation":[{"name":"Department of Mathematics"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9494-8833","authenticated-orcid":false,"given":"Brett A","family":"McKinney","sequence":"additional","affiliation":[{"name":"Department of Mathematics"},{"name":"Tandy School of Computer Science , University of Tulsa, Tulsa, OK 74104, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,1,13]]},"reference":[{"key":"2023013111481380300_btaa024-B1","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1186\/s13040-018-0186-4","article-title":"Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS","volume":"11","author":"Arabnejad","year":"2018","journal-title":"BioData Min"},{"key":"2023013111481380300_btaa024-B2","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1038\/mp.2013.2","article-title":"Genome-wide study of association and interaction with maternal cytomegalovirus infection suggests new schizophrenia loci","volume":"19","author":"B\u00f8rglum","year":"2014","journal-title":"Mol. Psychiatry"},{"key":"2023013111481380300_btaa024-B3","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1038\/nature11510","article-title":"Epistasis as the primary factor in molecular evolution","volume":"490","author":"Breen","year":"2012","journal-title":"Nature"},{"key":"2023013111481380300_btaa024-B4","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1016\/j.ajhg.2016.02.012","article-title":"Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models","volume":"98","author":"Chen","year":"2016","journal-title":"Am. J. Hum. Genet"},{"key":"2023013111481380300_btaa024-B5","doi-asserted-by":"crossref","first-page":"326","DOI":"10.1016\/j.tig.2010.05.001","article-title":"From differential expression to differential networking\u2013identification of dysfunctional regulatory networks in diseases","volume":"26","author":"De la Fuente","year":"2010","journal-title":"Trends Genet"},{"key":"2023013111481380300_btaa024-B6","first-page":"1","author":"Granizo-Mackenzie","year":"2013"},{"key":"2023013111481380300_btaa024-B7","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/1756-0381-2-5","article-title":"Spatially uniform reliefF (SURF) for computationally-efficient filtering of gene-gene interactions","volume":"2","author":"Greene","year":"2009","journal-title":"BioData Min"},{"key":"2023013111481380300_btaa024-B8","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1080\/07853890601083808","article-title":"Are exposure to cytomegalovirus and genetic variation on chromosome 6p joint risk factors for schizophrenia?","volume":"39","author":"Kim","year":"2007","journal-title":"Ann. Med"},{"key":"2023013111481380300_btaa024-B9","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1023\/A:1008280620621","article-title":"Overcoming the myopia of inductive learning algorithms with RELIEFF","volume":"7","author":"Kononenko","year":"1997","journal-title":"Appl. Intell"},{"key":"2023013111481380300_btaa024-B10","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/s13040-015-0040-x","article-title":"Differential co-expression network centrality and machine learning feature selection for identifying susceptibility hubs in networks with scale-free structure","volume":"8","author":"Lareau","year":"2015","journal-title":"BioData Min"},{"key":"2023013111481380300_btaa024-B11","doi-asserted-by":"crossref","first-page":"2906","DOI":"10.1093\/bioinformatics\/btx298","article-title":"Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests","volume":"33","author":"Le","year":"2017","journal-title":"Bioinformatics"},{"key":"2023013111481380300_btaa024-B12","doi-asserted-by":"crossref","first-page":"317","DOI":"10.3389\/fnagi.2018.00317","article-title":"A nonlinear simulation framework supports adjusting for age when analyzing BrainAGE","volume":"10","author":"Le","year":"2018","journal-title":"Front. Aging Neurosci"},{"key":"2023013111481380300_btaa024-B13","doi-asserted-by":"crossref","first-page":"1358","DOI":"10.1093\/bioinformatics\/bty788","article-title":"STatistical Inference Relief (STIR) feature selection","volume":"35","author":"Le","year":"2018","journal-title":"Bioinformatics"},{"key":"2023013111481380300_btaa024-B14","doi-asserted-by":"crossref","first-page":"i342","DOI":"10.1093\/bioinformatics\/btr204","article-title":"ccSVM: correcting Support Vector Machines for confounding factors in biological data classification","volume":"27","author":"Li","year":"2011","journal-title":"Bioinformatics"},{"key":"2023013111481380300_btaa024-B15","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1515\/ijb-2015-0030","article-title":"Addressing confounding in predictive models with an application to neuroimaging","volume":"12","author":"Linn","year":"2016","journal-title":"Int. J. Biostat"},{"key":"2023013111481380300_btaa024-B16","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1186\/1471-2199-11-33","article-title":"The human RPS4 paralogue on Yq11.223 encodes a structurally conserved ribosomal protein and is preferentially expressed during spermatogenesis","volume":"11","author":"Lopes","year":"2010","journal-title":"BMC Mol. Biol"},{"key":"2023013111481380300_btaa024-B17","doi-asserted-by":"crossref","first-page":"109","DOI":"10.3389\/fgene.2011.00109","article-title":"Six degrees of epistasis: statistical network models for GWAS","volume":"2","author":"McKinney","year":"2012","journal-title":"Front. Genet"},{"key":"2023013111481380300_btaa024-B18","doi-asserted-by":"crossref","first-page":"e1000432","DOI":"10.1371\/journal.pgen.1000432","article-title":"Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis","volume":"5","author":"McKinney","year":"2009","journal-title":"PLoS Genet"},{"key":"2023013111481380300_btaa024-B19","doi-asserted-by":"crossref","first-page":"e81527","DOI":"10.1371\/journal.pone.0081527","article-title":"ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data","volume":"8","author":"McKinney","year":"2013","journal-title":"PLoS One"},{"key":"2023013111481380300_btaa024-B20","doi-asserted-by":"crossref","first-page":"2854","DOI":"10.1093\/hmg\/ddm244","article-title":"Genetic association of CTNNA3 with late-onset Alzheimer\u2019s disease in females","volume":"16","author":"Miyashita","year":"2007","journal-title":"Hum. Mol. Genet"},{"key":"2023013111481380300_btaa024-B21","doi-asserted-by":"crossref","first-page":"1267","DOI":"10.1038\/mp.2013.161","article-title":"Type I interferon signaling genes in recurrent major depression: increased expression detected by whole-blood RNA sequencing","volume":"19","author":"Mostafavi","year":"2014","journal-title":"Mol. Psychiatry"},{"key":"2023013111481380300_btaa024-B22","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1016\/j.neuroimage.2017.01.066","article-title":"Predictive modelling using neuroimaging data in the presence of confounds","volume":"150","author":"Rao","year":"2017","journal-title":"Neuroimage"},{"key":"2023013111481380300_btaa024-B23","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/s41592-018-0138-4","article-title":"Deep generative models of genetic variation capture the effects of mutations","volume":"15","author":"Riesselman","year":"2018","journal-title":"Nat. Methods"},{"key":"2023013111481380300_btaa024-B24","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1023\/A:1025667309714","article-title":"Theoretical and empirical analysis of ReliefF and RReliefF","volume":"53","author":"Robnik-\u0160ikonja","year":"2003","journal-title":"Mach. Learn"},{"key":"2023013111481380300_btaa024-B25","first-page":"190","article-title":"Statistical properties of multivariate distance matrix regression for high-dimensional data analysis","volume":"3","author":"Schork","year":"2012","journal-title":"Front. Genet"},{"key":"2023013111481380300_btaa024-B26","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.jbi.2018.07.015","article-title":"Benchmarking relief-based feature selection methods for bioinformatics data mining","volume":"85","author":"Urbanowicz","year":"2018","journal-title":"J. Biomed. Inform"},{"key":"2023013111481380300_btaa024-B27","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/j.jbi.2018.07.014","article-title":"Relief-based feature selection: introduction and review","volume":"85","author":"Urbanowicz","year":"2018","journal-title":"J. Biomed. Inform"},{"key":"2023013111481380300_btaa024-B28","doi-asserted-by":"crossref","first-page":"528","DOI":"10.1038\/nature07999","article-title":"Common genetic variants on 5p14. 1 associate with autism spectrum disorders","volume":"459","author":"Wang","year":"2009","journal-title":"Nature"},{"key":"2023013111481380300_btaa024-B29","doi-asserted-by":"crossref","first-page":"700","DOI":"10.1016\/j.gde.2013.10.007","article-title":"Should evolutionary geneticists worry about higher-order epistasis?","volume":"23","author":"Weinreich","year":"2013","journal-title":"Curr. Opin. Genet. Dev"},{"key":"2023013111481380300_btaa024-B30","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1186\/1471-2105-13-164","article-title":"SNP interaction detection with Random Forests in high-dimensional genetic data","volume":"13","author":"Winham","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023013111481380300_btaa024-B31","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Series B Stat. Methodol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa024\/32450143\/btaa024.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/9\/2770\/48985011\/btaa024.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/9\/2770\/48985011\/btaa024.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T15:51:52Z","timestamp":1675180312000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/9\/2770\/5701651"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,1,13]]},"references-count":31,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2020,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa024","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/861492","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,5,1]]},"published":{"date-parts":[[2020,1,13]]}}}