{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:09Z","timestamp":1772138049946,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2021,12,2]],"date-time":"2021-12-02T00:00:00Z","timestamp":1638403200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,27]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Genome-wide association studies show that variants in individual genomic loci alone are not sufficient to explain the heritability of complex, quantitative phenotypes. Many computational methods have been developed to address this issue by considering subsets of loci that can collectively predict the phenotype. This problem can be considered a challenging instance of feature selection in which the number of dimensions (loci that are screened) is much larger than the number of samples. While currently available methods can achieve decent phenotype prediction performance, they either do not scale to large datasets or have parameters that require extensive tuning.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We propose a fast and simple algorithm, Macarons, to select a small, complementary subset of variants by avoiding redundant pairs that are likely to be in linkage disequilibrium. Our method features two interpretable parameters that control the time\/performance trade-off without requiring parameter tuning. In our computational experiments, we show that Macarons consistently achieves similar or better prediction performance than state-of-the-art selection methods while having a simpler premise and being at least two orders of magnitude faster. Overall, Macarons can seamlessly scale to the human genome with \u223c107 variants in a matter of minutes while taking the dependencies between the variants into account.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availabilityand implementation<\/jats:title>\n                    <jats:p>Macarons is available in Matlab and Python at https:\/\/github.com\/serhan-yilmaz\/macarons.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab803","type":"journal-article","created":{"date-parts":[[2021,11,24]],"date-time":"2021-11-24T15:20:34Z","timestamp":1637767234000},"page":"908-917","source":"Crossref","is-referenced-by-count":1,"title":["Uncovering complementary sets of variants for predicting quantitative phenotypes"],"prefix":"10.1093","volume":"38","author":[{"given":"Serhan","family":"Yilmaz","sequence":"first","affiliation":[{"name":"Department of Computer and Data Sciences, Case Western Reserve University , Cleveland, OH 44106, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohamad","family":"Fakhouri","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Bilkent University , Ankara 06800, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mehmet","family":"Koyut\u00fcrk","sequence":"additional","affiliation":[{"name":"Department of Computer and Data Sciences, Case Western Reserve University , Cleveland, OH 44106, USA"},{"name":"Center for Proteomics and Bioinformatics, Case Western Reserve University , Cleveland, OH 44106, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"A Erc\u00fcment","family":"\u00c7i\u00e7ek","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Bilkent University , Ankara 06800, Turkey"},{"name":"Department of Computational Biology, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7058-5372","authenticated-orcid":false,"given":"Oznur","family":"Tastan","sequence":"additional","affiliation":[{"name":"Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2021,12,2]]},"reference":[{"key":"2023020108530214700_btab803-B1","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1038\/nrg777","article-title":"Patterns of linkage disequilibrium in the human genome","volume":"3","author":"Ardlie","year":"2002","journal-title":"Nat. Rev. Genet"},{"key":"2023020108530214700_btab803-B2","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1038\/nature08800","article-title":"Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines","volume":"465","author":"Atwell","year":"2010","journal-title":"Nature"},{"key":"2023020108530214700_btab803-B3","doi-asserted-by":"crossref","first-page":"i171","DOI":"10.1093\/bioinformatics\/btt238","article-title":"Efficient network-guided multi-locus association mapping with graph cuts","volume":"29","author":"Azencott","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020108530214700_btab803-B4","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1089\/cmb.2020.0429","article-title":"Potpourri: an epistasis test prioritization algorithm via diverse SNP selection","volume":"28","author":"Caylak","year":"2020","journal-title":"J. Comput. Biol"},{"key":"2023020108530214700_btab803-B5","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1038\/nrg2579","article-title":"Detecting gene\u2013gene interactions that underlie human diseases","volume":"10","author":"Cordell","year":"2009","journal-title":"Nat. Rev. Genet"},{"key":"2023020108530214700_btab803-B6","doi-asserted-by":"crossref","first-page":"e131","DOI":"10.1093\/nar\/gkx505","article-title":"Prioritizing tests of epistasis through hierarchical representation of genomic redundancies","volume":"45","author":"Cowman","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023020108530214700_btab803-B7","first-page":"1057","author":"Das","year":"2011"},{"key":"2023020108530214700_btab803-B8","first-page":"1583","article-title":"Selecting diverse features via spectral regularization","volume":"25","author":"Das","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023020108530214700_btab803-B9","doi-asserted-by":"crossref","first-page":"695","DOI":"10.1109\/TCBB.2014.2363459","article-title":"Searching high-order SNP combinations for complex diseases based on energy distribution difference","volume":"12","author":"Ding","year":"2015","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinf. (TCBB)"},{"key":"2023020108530214700_btab803-B10","doi-asserted-by":"crossref","first-page":"250","DOI":"10.3835\/plantgenome2011.08.0024","article-title":"Ridge regression and other kernels for genomic selection with r package rrblup","volume":"4","author":"Endelman","year":"2011","journal-title":"Plant Genome"},{"key":"2023020108530214700_btab803-B11","doi-asserted-by":"crossref","first-page":"e157","DOI":"10.1371\/journal.pgen.0020157","article-title":"Two-stage two-locus models in genome-wide association","volume":"2","author":"Evans","year":"2006","journal-title":"PLoS Genet"},{"key":"2023020108530214700_btab803-B12","doi-asserted-by":"crossref","first-page":"e33531","DOI":"10.1371\/journal.pone.0033531","article-title":"High-order SNP combinations associated with complex diseases: efficient discovery, statistical power and functional interactions","volume":"7","author":"Fang","year":"2012","journal-title":"PLoS One"},{"key":"2023020108530214700_btab803-B13","doi-asserted-by":"crossref","first-page":"1696","DOI":"10.1056\/NEJMp0806284","article-title":"Common genetic variation and human traits","volume":"360","author":"Goldstein","year":"2009","journal-title":"N. Engl. J. Med"},{"key":"2023020108530214700_btab803-B14","first-page":"2187","article-title":"Trace lasso: a trace norm regularization for correlated designs","volume":"24","author":"Grave","year":"2011","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023020108530214700_btab803-B15","doi-asserted-by":"crossref","first-page":"e89204","DOI":"10.1371\/journal.pone.0089204","article-title":"opensnp\u2014a crowdsourced web resource for personal genomics","volume":"9","author":"Greshake","year":"2014","journal-title":"PLoS One"},{"key":"2023020108530214700_btab803-B16","first-page":"433","author":"Jacob","year":"2009"},{"key":"2023020108530214700_btab803-B17","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1093\/bioinformatics\/btq615","article-title":"dmGWAS: dense module searching for genome-wide association studies in protein\u2013protein interaction networks","volume":"27","author":"Jia","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020108530214700_btab803-B18","doi-asserted-by":"crossref","first-page":"1175","DOI":"10.1093\/bioinformatics\/btn081","article-title":"Network-constrained regularization and variable selection for analysis of genomic data","volume":"24","author":"Li","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020108530214700_btab803-B19","doi-asserted-by":"crossref","first-page":"1536","DOI":"10.1093\/bioinformatics\/btx004","article-title":"Sigmod: an exact and efficient method to identify a strongly interconnected disease-associated module in a gene network","volume":"33","author":"Liu","year":"2017","journal-title":"Bioinformatics"},{"key":"2023020108530214700_btab803-B20","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1086\/518312","article-title":"A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence","volume":"80","author":"Lou","year":"2007","journal-title":"Am. J. Hum. Genet"},{"key":"2023020108530214700_btab803-B21","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1038\/nature08494","article-title":"Finding the missing heritability of complex diseases","volume":"461","author":"Manolio","year":"2009","journal-title":"Nature"},{"key":"2023020108530214700_btab803-B22","doi-asserted-by":"crossref","first-page":"10532","DOI":"10.1038\/ncomms10532","article-title":"Open access resources for genome-wide association mapping in rice","volume":"7","author":"McCouch","year":"2016","journal-title":"Nat. Commun"},{"key":"2023020108530214700_btab803-B23","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1111\/j.1467-9868.2007.00627.x","article-title":"The group lasso for logistic regression","volume":"70","author":"Meier","year":"2008","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"2023020108530214700_btab803-B24","doi-asserted-by":"crossref","DOI":"10.1201\/9781420035933","volume-title":"Subset Selection in Regression","author":"Miller","year":"2002"},{"key":"2023020108530214700_btab803-B25","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1137\/S0097539792240406","article-title":"Sparse approximate solutions to linear systems","volume":"24","author":"Natarajan","year":"1995","journal-title":"SIAM J. Comput"},{"key":"2023020108530214700_btab803-B26","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1101\/gr.172901","article-title":"A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation","volume":"11","author":"Nelson","year":"2001","journal-title":"Genome Res"},{"key":"2023020108530214700_btab803-B27","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1198\/016214508000000337","article-title":"The Bayesian lasso","volume":"103","author":"Park","year":"2008","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020108530214700_btab803-B28","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1534\/genetics.114.164442","article-title":"Genome-wide regression and prediction with the BGLR statistical package","volume":"198","author":"P\u00e9rez","year":"2014","journal-title":"Genetics"},{"key":"2023020108530214700_btab803-B29","doi-asserted-by":"crossref","first-page":"855","DOI":"10.1038\/nrg2452","article-title":"Epistasis\u2014the essential role of gene interactions in the structure and evolution of genetic systems","volume":"9","author":"Phillips","year":"2008","journal-title":"Nat. Rev. Genet"},{"key":"2023020108530214700_btab803-B30","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2164-13-S7-S2","article-title":"iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies","volume":"13","author":"Piriyapongsa","year":"2012","journal-title":"BMC Genomics"},{"key":"2023020108530214700_btab803-B31","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1038\/ng1847","article-title":"Principal components analysis corrects for stratification in genome-wide association studies","volume":"38","author":"Price","year":"2006","journal-title":"Nat. Genet"},{"key":"2023020108530214700_btab803-B32","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1086\/321276","article-title":"Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer","volume":"69","author":"Ritchie","year":"2001","journal-title":"Am. J. Hum. Genet"},{"key":"2023020108530214700_btab803-B33","doi-asserted-by":"crossref","first-page":"825","DOI":"10.1038\/ng.2314","article-title":"An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations","volume":"44","author":"Segura","year":"2012","journal-title":"Nat. Genet"},{"key":"2023020108530214700_btab803-B34","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023020108530214700_btab803-B35","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1007\/s13721-012-0006-6","article-title":"Threshold-based feature selection techniques for high-dimensional bioinformatics data","volume":"1","author":"Van Hulse","year":"2012","journal-title":"Network Model. Anal. Health Inf. Bioinf"},{"key":"2023020108530214700_btab803-B36","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/j.ajhg.2017.06.005","article-title":"10 years of GWAS discovery: biology, function, and translation","volume":"101","author":"Visscher","year":"2017","journal-title":"Am. J. Hum. Genet"},{"key":"2023020108530214700_btab803-B37","doi-asserted-by":"crossref","first-page":"e11384","DOI":"10.1371\/journal.pone.0011384","article-title":"A general model for multilocus epistatic interactions in case-control studies","volume":"5","author":"Wang","year":"2010","journal-title":"PLoS One"},{"key":"2023020108530214700_btab803-B38","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1038\/nrg3747","article-title":"Detecting epistasis in human complex traits","volume":"15","author":"Wei","year":"2014","journal-title":"Nat. Rev. Genet"},{"key":"2023020108530214700_btab803-B39","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1016\/j.ajhg.2011.05.029","article-title":"Rare-variant association testing for sequencing data with the sequence kernel association test","volume":"89","author":"Wu","year":"2011","journal-title":"Am. J. Hum. Genet"},{"key":"2023020108530214700_btab803-B40","doi-asserted-by":"crossref","first-page":"1208","DOI":"10.1109\/TCBB.2019.2935437","article-title":"Spadis: an algorithm for selecting predictive and diverse SNPs in GWAS","volume":"18","author":"Yilmaz","year":"2019","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinf"},{"key":"2023020108530214700_btab803-B41","doi-asserted-by":"crossref","first-page":"e91","DOI":"10.1093\/bioinformatics\/btl298","article-title":"A supervised approach for identifying discriminating genotype patterns and its application to breast cancer data","volume":"23","author":"Yosef","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020108530214700_btab803-B42","first-page":"1151","author":"Zhao","year":"2007"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab803\/41904550\/btab803.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/4\/908\/49008691\/btab803.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/4\/908\/49008691\/btab803.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,12]],"date-time":"2024-09-12T19:22:24Z","timestamp":1726168944000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/4\/908\/6448209"}},"subtitle":[],"editor":[{"given":"Tobias","family":"Marschall","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2021,12,2]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,1,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab803","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.12.11.419952","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,2,15]]},"published":{"date-parts":[[2021,12,2]]}}}