{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T11:08:02Z","timestamp":1740136082905,"version":"3.37.3"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2019,3,14]],"date-time":"2019-03-14T00:00:00Z","timestamp":1552521600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["U24AI117966"],"award-info":[{"award-number":["U24AI117966"]}],"id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Genetic ancestry is a critical co-factor to study phenotype-genotype associations using cohorts of human subjects. Most publicly available molecular datasets are, however, missing this information or only share self-reported race and ethnicity, representing a limitation to identify and repurpose datasets to investigate the contribution of ancestry to diseases and traits. We propose an analytical framework to enrich the metadata from publicly available cohorts with genetic ancestry information and a resulting diversity score at continental resolution, calculated directly from the data. We illustrate this framework using The Cancer Genome Atlas datasets searched through the DataMed Data Discovery Index. Data repositories and contributors can use this framework to provide genetic diversity measurements for controlled access datasets, minimizing the work involved in requesting a dataset that may ultimately prove inadequate for a researcher\u2019s purpose. With the increasing global scale of human genetics research, studies on disease risk and susceptibility would benefit greatly from the adequate estimation and sharing of genetic diversity in publicly available datasets following a framework such as the one presented.<\/jats:p>","DOI":"10.1093\/jamia\/ocy194","type":"journal-article","created":{"date-parts":[[2018,12,29]],"date-time":"2018-12-29T04:26:06Z","timestamp":1546057566000},"page":"457-461","source":"Crossref","is-referenced-by-count":9,"title":["Evaluating and sharing global genetic ancestry in biomedical datasets"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8098-9888","authenticated-orcid":false,"given":"Olivier","family":"Harismendy","sequence":"first","affiliation":[{"name":"Health Department of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA"},{"name":"Moores Cancer Center, University of California, San Diego, La Jolla, California, USA"}]},{"given":"Jihoon","family":"Kim","sequence":"additional","affiliation":[{"name":"Health Department of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA"}]},{"given":"Xiaojun","family":"Xu","sequence":"additional","affiliation":[{"name":"Health Department of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA"}]},{"given":"Lucila","family":"Ohno-Machado","sequence":"additional","affiliation":[{"name":"Health Department of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA"},{"name":"UC San Diego Health Department of Biomedical Informatics"},{"name":"Health Services Research Division, San Diego Veterans Health Administration"}]}],"member":"286","published-online":{"date-parts":[[2019,3,14]]},"reference":[{"issue":"6","key":"2020110613061256000_ocy194-B1","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/ng.3864","article-title":"Finding useful data across multiple biomedical data repositories using DataMed","volume":"49","author":"Ohno-Machado","year":"2017","journal-title":"Nat Genet"},{"issue":"1","key":"2020110613061256000_ocy194-B2","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1037\/0003-066X.60.1.77","article-title":"The use of race variables in genetic studies of complex traits and the goal of reducing health disparities: a transdisciplinary perspective","volume":"60","author":"Shields","year":"2005","journal-title":"Am Psychol"},{"issue":"7","key":"2020110613061256000_ocy194-B3","doi-asserted-by":"crossref","first-page":"61.","DOI":"10.1186\/gm465","article-title":"Genome science and health disparities: a growing success story?","volume":"5","author":"Rotimi","year":"2013","journal-title":"Genome Med"},{"issue":"18","key":"2020110613061256000_ocy194-B4","doi-asserted-by":"crossref","first-page":"1831","DOI":"10.1001\/jama.2017.3096","article-title":"Genomics, health disparities, and missed opportunities for the nation\u2019s research agenda","volume":"317","author":"West","year":"2017","journal-title":"JAMA"},{"issue":"1","key":"2020110613061256000_ocy194-B5","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/s40246-014-0023-x","article-title":"Self-reported race\/ethnicity in the age of genomic research: its potential impact on understanding health disparities","volume":"9","author":"Mersha","year":"2015","journal-title":"Hum Genomics"},{"key":"2020110613061256000_ocy194-B6","doi-asserted-by":"crossref","first-page":"12376.","DOI":"10.1038\/srep12376","article-title":"Ancestry, admixture and fitness in Colombian genomes","volume":"5","author":"Rishishwar","year":"2015","journal-title":"Sci Rep"},{"issue":"3","key":"2020110613061256000_ocy194-B7","doi-asserted-by":"crossref","first-page":"e30950","DOI":"10.1371\/journal.pone.0030950","article-title":"Genetic ancestry, self-reported race and ethnicity in African Americans and European Americans in the PCaP cohort","volume":"7","author":"Sucheston","year":"2012","journal-title":"PLoS One"},{"key":"2020110613061256000_ocy194-B8","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"Auton","year":"2015","journal-title":"Nature"},{"issue":"9","key":"2020110613061256000_ocy194-B9","doi-asserted-by":"crossref","first-page":"1655","DOI":"10.1101\/gr.094052.109","article-title":"Fast model-based estimation of ancestry in unrelated individuals","volume":"19","author":"Alexander","year":"2009","journal-title":"Genome Res"},{"key":"2020110613061256000_ocy194-B10","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1186\/s12859-014-0418-7","article-title":"Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations","volume":"16","author":"Bansal","year":"2015","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"2020110613061256000_ocy194-B11","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1186\/1479-7364-1-1-52","article-title":"Measuring and using admixture to study the genetics of complex diseases","volume":"1","author":"Halder","year":"2003","journal-title":"Hum Genomics"},{"issue":"8","key":"2020110613061256000_ocy194-B12","doi-asserted-by":"crossref","first-page":"1907","DOI":"10.1093\/hmg\/ddr617","article-title":"Admixture mapping identifies a locus on 6q25 associated with breast cancer risk in US Latinas","volume":"21","author":"Fejerman","year":"2012","journal-title":"Hum Mol Genet"},{"issue":"10","key":"2020110613061256000_ocy194-B13","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1038\/ng.2764","article-title":"The Cancer Genome Atlas Pan-Cancer analysis project","volume":"45","author":"Cancer Genome Atlas Research Network","year":"2013","journal-title":"Nat Genet"},{"volume-title":"DataMed-Admixture.TCGA.txt","year":"2017","author":"Harismendy","key":"2020110613061256000_ocy194-B14"},{"issue":"10","key":"2020110613061256000_ocy194-B15","doi-asserted-by":"crossref","first-page":"1253","DOI":"10.1038\/ng.237","article-title":"Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs","volume":"40","author":"Korn","year":"2008","journal-title":"Nat Genet"},{"issue":"3","key":"2020110613061256000_ocy194-B16","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1086\/519795","article-title":"PLINK: a tool set for whole-genome association and population-based linkage analyses","volume":"81","author":"Purcell","year":"2007","journal-title":"Am J Hum Genet"},{"key":"2020110613061256000_ocy194-B17","first-page":"9","author":"Garrison","year":"2012","journal-title":"Haplotype-based variant detection from short-read sequencing. arXiv"},{"issue":"9","key":"2020110613061256000_ocy194-B18","doi-asserted-by":"crossref","first-page":"1760","DOI":"10.1101\/gr.135350.111","article-title":"GENCODE: the reference human genome annotation for the ENCODE Project","volume":"22","author":"Harrow","year":"2012","journal-title":"Genome Res"},{"key":"2020110613061256000_ocy194-B19","article-title":"Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks","author":"Hausser","year":"2008","journal-title":"arXiv"},{"volume-title":"DataMed-Admixture Code Repository","year":"2017","author":"Kim","key":"2020110613061256000_ocy194-B20"},{"issue":"36","key":"2020110613061256000_ocy194-B21","doi-asserted-by":"crossref","first-page":"e4733","DOI":"10.1097\/MD.0000000000004733","article-title":"Genetically determined ancestry is more informative than self-reported race in HIV-infected and -exposed children","volume":"95","author":"Spector","year":"2016","journal-title":"Medicine (Baltimore)"},{"issue":"2","key":"2020110613061256000_ocy194-B22","doi-asserted-by":"crossref","first-page":"R22","DOI":"10.1186\/gb-2014-15-2-r22","article-title":"Genetic ancestry of participants in the National Children\u2019s Study","volume":"15","author":"Smith","year":"2014","journal-title":"Genome Biol"},{"issue":"4","key":"2020110613061256000_ocy194-B23","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1007\/s12041-010-0060-8","article-title":"Comparing genetic ancestry and self-reported race\/ethnicity in a multiethnic population in New York City","volume":"89","author":"Lee","year":"2010","journal-title":"J Genet"},{"issue":"9","key":"2020110613061256000_ocy194-B24","doi-asserted-by":"crossref","first-page":"e73971","DOI":"10.1371\/journal.pone.0073971","article-title":"Determining ancestry proportions in complex admixture scenarios in South Africa using a novel proxy ancestry selection method","volume":"8","author":"Chimusa","year":"2013","journal-title":"PLoS One"},{"key":"2020110613061256000_ocy194-B25","doi-asserted-by":"crossref","first-page":"603\u20137","DOI":"10.1038\/nature11003","article-title":"The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity","volume":"483","author":"Barretina","year":"2012","journal-title":"Nature"},{"key":"2020110613061256000_ocy194-B26","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1038\/nature04226","article-title":". A haplotype map of the human genome","volume":"437","author":"The International HapMap Consortium","year":"2005","journal-title":"Nature"},{"issue":"4","key":"2020110613061256000_ocy194-B27","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.1016\/j.stemcr.2017.03.012","article-title":"iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types","volume":"8","author":"Panopoulos","year":"2017","journal-title":"Stem Cell Reports"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/5\/457\/34151501\/ocy194.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/5\/457\/34151501\/ocy194.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T19:06:36Z","timestamp":1604689596000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/26\/5\/457\/5380688"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,14]]},"references-count":27,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2019,3,14]]},"published-print":{"date-parts":[[2019,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocy194","relation":{},"ISSN":["1527-974X"],"issn-type":[{"type":"electronic","value":"1527-974X"}],"subject":[],"published-other":{"date-parts":[[2019,5]]},"published":{"date-parts":[[2019,3,14]]}}}