{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T21:50:13Z","timestamp":1772661013222,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"8","funder":[{"name":"Intramural Research Program"},{"DOI":"10.13039\/100000054","name":"National Cancer Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Summary<\/jats:title>\n                  <jats:p>A concern when conducting genome-wide association studies (GWAS) is the potential for population stratification, i.e. ancestry-based genetic differences between cases and controls, that if not properly accounted for, could lead to biased association results. We developed PCAmatchR as an open source R package for performing optimal case\u2013control matching using principal component analysis (PCA) to aid in selecting controls that are well matched by ancestry to cases. PCAmatchR takes user supplied PCA outputs and selects matching controls for cases by utilizing a weighted Mahalanobis distance metric which weights each principal component by the percentage of genetic variation explained. Results from the 1000 Genomes Project data demonstrate both the functionality and performance of PCAmatchR for selecting matching controls for case populations as well as reducing inflation of association test statistics. PCAmatchR improves genomic similarity between matched cases and controls, which minimizes the effects of population stratification in GWAS analyses.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>PCAmatchR is freely available for download on GitHub (https:\/\/github.com\/machiela-lab\/PCAmatchR) or through CRAN (https:\/\/CRAN.R-project.org\/package=PCAmatchR).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa784","type":"journal-article","created":{"date-parts":[[2020,9,3]],"date-time":"2020-09-03T03:21:58Z","timestamp":1599103318000},"page":"1178-1181","source":"Crossref","is-referenced-by-count":28,"title":["PCAmatchR: a flexible R package for optimal case\u2013control matching using weighted principal components"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8393-1713","authenticated-orcid":false,"given":"Derek W","family":"Brown","sequence":"first","affiliation":[{"name":"Division of Cancer Epidemiology and Genetics, National Cancer Institute , Rockville, MD 20850, USA"},{"name":"Cancer Prevention Fellowship Program, Division of Cancer Prevention, National Cancer Institute , Rockville, MD 20850, USA"}]},{"given":"Timothy A","family":"Myers","sequence":"additional","affiliation":[{"name":"Division of Cancer Epidemiology and Genetics, National Cancer Institute , Rockville, MD 20850, USA"}]},{"given":"Mitchell J","family":"Machiela","sequence":"additional","affiliation":[{"name":"Division of Cancer Epidemiology and Genetics, National Cancer Institute , Rockville, MD 20850, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,12,13]]},"reference":[{"key":"2023051612063449900_btaa784-B1","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"Auton","year":"2015","journal-title":"Nature"},{"key":"2023051612063449900_btaa784-B2","author":"Brown","year":"2020"},{"key":"2023051612063449900_btaa784-B3","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1186\/s12864-017-4166-8","article-title":"Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure","volume":"18","author":"Byun","year":"2017","journal-title":"BMC Genomics"},{"key":"2023051612063449900_btaa784-B4","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1002\/gepi.21611","article-title":"Stratification-score matching improves correction for confounding by population stratification in case-control association studies","volume":"36","author":"Epstein","year":"2012","journal-title":"Genet. Epidemiol"},{"key":"2023051612063449900_btaa784-B5","first-page":"18","article-title":"Optmatch: flexible, optimal matching for observational studies","volume":"7","author":"Hansen","year":"2007","journal-title":"New Funct. Multivar. Anal"},{"key":"2023051612063449900_btaa784-B6","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1198\/106186006X137047","article-title":"Optimal full matching and related designs via network flows","volume":"15","author":"Hansen","year":"2006","journal-title":"J. Comput. Graph. Stat"},{"key":"2023051612063449900_btaa784-B7","doi-asserted-by":"crossref","first-page":"720","DOI":"10.1016\/j.ajhg.2015.03.004","article-title":"Mixed model with correction for case\u2013control ascertainment increases association power","volume":"96","author":"Hayeck","year":"2015","journal-title":"Am. J. Hum. Genet"},{"key":"2023051612063449900_btaa784-B8","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1086\/381716","article-title":"Matching strategies for genetic association studies in structured populations","volume":"74","author":"Hinds","year":"2004","journal-title":"Am. J. Hum. Genet"},{"key":"2023051612063449900_btaa784-B9","first-page":"182","article-title":"Fault diagnosis of analogue circuits with weighted Mahalanobis distance based on entropy theory","volume":"7","author":"Hu","year":"2013","journal-title":"Int. J. Digit. Content Technol. Appl"},{"key":"2023051612063449900_btaa784-B10","first-page":"20150202","article-title":"Principal component analysis: a review and recent developments","volume":"374","author":"Jolliffe","year":"2016","journal-title":"Philos. Trans. R. Soc. Math. Phys. Eng. Sci"},{"key":"2023051612063449900_btaa784-B11","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1016\/0031-3203(87)90066-5","article-title":"A valuation of state of object based on weighted Mahalanobis distance","volume":"20","author":"Krusi\u0144ska","year":"1987","journal-title":"Pattern Recognit"},{"key":"2023051612063449900_btaa784-B12","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1186\/s12859-015-0521-4","article-title":"Novel genetic matching methods for handling population stratification in genome-wide association studies","volume":"16","author":"Lacour","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023051612063449900_btaa784-B13","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/j.ajhg.2007.11.003","article-title":"On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants","volume":"82","author":"Luca","year":"2008","journal-title":"Am. J. Hum. Genet"},{"key":"2023051612063449900_btaa784-B14","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1002\/gepi.21742","article-title":"Recommended joint and meta-analysis strategies for case\u2013control association testing of single low-count variants","volume":"37","author":"Ma","year":"2013","journal-title":"Genet. Epidemiol"},{"key":"2023051612063449900_btaa784-B15","doi-asserted-by":"crossref","first-page":"3184","DOI":"10.1038\/s41467-018-05537-2","article-title":"Genome-wide association study identifies multiple new loci associated with Ewing sarcoma susceptibility","volume":"9","author":"Machiela","year":"2018","journal-title":"Nat. Commun"},{"key":"2023051612063449900_btaa784-B16","first-page":"3555","article-title":"LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants","volume":"31","author":"Machiela","year":"2015","journal-title":"Bioinf. Oxf. Engl"},{"key":"2023051612063449900_btaa784-B17","doi-asserted-by":"crossref","first-page":"e236","DOI":"10.1371\/journal.pgen.0030236","article-title":"Discerning the ancestry of European Americans in Genetic Association Studies","volume":"4","author":"Price","year":"2008","journal-title":"PLOS Genet"},{"key":"2023051612063449900_btaa784-B18","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1038\/ng1847","article-title":"Principal components analysis corrects for stratification in genome-wide association studies","volume":"38","author":"Price","year":"2006","journal-title":"Nat. Genet"},{"key":"2023051612063449900_btaa784-B19","doi-asserted-by":"crossref","first-page":"1024","DOI":"10.1080\/01621459.1989.10478868","article-title":"Optimal matching for observational studies","volume":"84","author":"Rosenbaum","year":"1989","journal-title":"J. Am. Stat. Assoc"},{"key":"2023051612063449900_btaa784-B20","first-page":"318","article-title":"Using multivariate matched sampling and regression adjustment to control bias in observational studies","volume":"74","author":"Rubin","year":"1979","journal-title":"J. Am. Stat. Assoc"},{"key":"2023051612063449900_btaa784-B21","first-page":"1","article-title":"Matching methods for causal inference: a review and a look forward","volume":"25","author":"Stuart","year":"2010","journal-title":"Stat. Sci. Rev. J. Inst. Math. Stat"},{"key":"2023051612063449900_btaa784-B22","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1038\/nature15394","article-title":"An integrated map of structural variation in 2,504 human genomes","volume":"526","author":"Sudmant","year":"2015","journal-title":"Nature"},{"key":"2023051612063449900_btaa784-B23","doi-asserted-by":"crossref","first-page":"e4","DOI":"10.1371\/journal.pgen.0040004","article-title":"Analysis and application of European genetic substructure using 300 K SNP information","volume":"4","author":"Tian","year":"2008","journal-title":"PLOS Genet"},{"key":"2023051612063449900_btaa784-B24","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1016\/j.ajhg.2010.11.011","article-title":"GCTA: a tool for genome-wide complex trait analysis","volume":"88","author":"Yang","year":"2011","journal-title":"Am. J. Hum. Genet"},{"key":"2023051612063449900_btaa784-B25","doi-asserted-by":"crossref","first-page":"e2551","DOI":"10.1371\/journal.pone.0002551","article-title":"Population substructure and control selection in genome-wide association studies","volume":"3","author":"Yu","year":"2008","journal-title":"PLoS One"},{"key":"2023051612063449900_btaa784-B26","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1162\/003465304323023705","article-title":"Using matching to estimate treatment effects: data requirements, matching metrics, and Monte Carlo evidence","volume":"86","author":"Zhao","year":"2004","journal-title":"Rev. Econ. Stat"},{"key":"2023051612063449900_btaa784-B27","doi-asserted-by":"crossref","first-page":"1335","DOI":"10.1038\/s41588-018-0184-y","article-title":"Efficiently controlling for case\u2013control imbalance and sample relatedness in large-scale genetic association studies","volume":"50","author":"Zhou","year":"2018","journal-title":"Nat. Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa784\/34883646\/btaa784.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/8\/1178\/50340755\/btaa784.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/8\/1178\/50340755\/btaa784.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T12:08:01Z","timestamp":1684238881000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/8\/1178\/5905474"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,12,13]]},"references-count":27,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2021,5,23]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa784","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,4,15]]},"published":{"date-parts":[[2020,12,13]]}}}