{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T10:33:20Z","timestamp":1761561200579},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"23","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivations: The tag SNP approach is a valuable tool in whole genome association studies, and a variety of algorithms have been proposed to identify the optimal tag SNP set. Currently, most tag SNP selection is based on two-marker (pairwise) linkage disequilibrium (LD). Recent literature has shown that multiple-marker LD also contains useful information that can further increase the genetic coverage of the tag SNP set. Thus, tag SNP selection methods that incorporate multiple-marker LD are expected to have advantages in terms of genetic coverage and statistical power.<\/jats:p><jats:p>Results: We propose a novel algorithm to select tag SNPs in an iterative procedure. In each iteration loop, the SNP that captures the most neighboring SNPs (through pair-wise and multiple-marker LD) is selected as a tag SNP. We optimize the algorithm and computer program to make our approach feasible on today's typical workstations. Benchmarked using HapMap release 21, our algorithm outperforms standard pair-wise LD approach in several aspects. (i) It improves genetic coverage (e.g. by 7.2% for 200 K tag SNPs in HapMap CEU) compared to its conventional pair-wise counterpart, when conditioning on a fixed tag SNP number. (ii) It saves genotyping costs substantially when conditioning on fixed genetic coverage (e.g. 34.1% saving in HapMap CEU at 90% coverage). (iii) Tag SNPs identified using multiple-marker LD have good portability across closely related ethnic groups and (iv) show higher statistical power in association tests than those selected using conventional methods.<\/jats:p><jats:p>Availability: A computer software suite, multiTag, has been developed based on this novel algorithm. The program is freely available by written request to the author at ke_hao@merck.com<\/jats:p><jats:p>Contact: \u00a0ke_hao@163.com<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm496","type":"journal-article","created":{"date-parts":[[2007,11,16]],"date-time":"2007-11-16T01:43:16Z","timestamp":1195177396000},"page":"3178-3184","source":"Crossref","is-referenced-by-count":15,"title":["Genome-wide selection of tag SNPs using multiple-marker correlation"],"prefix":"10.1093","volume":"23","author":[{"given":"K.","family":"Hao","sequence":"first","affiliation":[{"name":"Algorithm and Data Analysis, Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California, USA"}]}],"member":"286","published-online":{"date-parts":[[2007,11,15]]},"reference":[{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1038\/nature02168","article-title":"The International HapMap Project","volume":"426","author":"The International HapMap Consortium","year":"2003","journal-title":"Nature"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1038\/ng1801","article-title":"Evaluating coverage of genome-wide association studies","volume":"38","author":"Barrett","year":"2006","journal-title":"Nat. Genet"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1093\/bioinformatics\/bth457","article-title":"Haploview: analysis and visualization of LD and haplotype maps","volume":"21","author":"Barrett","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1016\/S0895-4356(00)00314-0","article-title":"Adjusting for multiple testing\u2013when and how?","volume":"54","author":"Bender","year":"2001","journal-title":"J. Clin. Epidemiol"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1038\/ng1128","article-title":"Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans","volume":"33","author":"Carlson","year":"2003","journal-title":"Nat. Genet"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1086\/381000","article-title":"Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium","volume":"74","author":"Carlson","year":"2004","journal-title":"Am. J. Hum. Genet"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1002\/gepi.20162","article-title":"Resampling-based multiple hypothesis testing procedures for genetic case-control association studies","volume":"30","author":"Chen","year":"2006","journal-title":"Genet. Epidemiol"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"1217","DOI":"10.1038\/ng1669","article-title":"Efficiency and power in genetic association studies","volume":"37","author":"de Bakker","year":"2005","journal-title":"Nat. Genet"},{"key":"2023041107520958100_","first-page":"478","article-title":"Transferability of tag SNPs to capture common genetic variation in DNA repair genes across multiple populations","volume":"11","author":"de Bakker","year":"2006","journal-title":"Pac. Symp. Biocomput"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1101\/gr.4138406","article-title":"The portability of tagSNPs across populations: a worldwide survey","volume":"16","author":"Gonzalez-Neira","year":"2006","journal-title":"Genome Res"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"i195","DOI":"10.1093\/bioinformatics\/bti1021","article-title":"Tag SNP selection in genotype data for maximizing SNP prediction accuracy","volume":"21","author":"Halperin","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1002\/gepi.10293","article-title":"Power estimation of multiple SNP association test of case-control study and application","volume":"26","author":"Hao","year":"2004","journal-title":"Genet. Epidemiol"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1002\/gepi.20095","article-title":"A sparse marker extension tree algorithm for selecting the best set of haplotype tagging single nucleotide polymorphisms","volume":"29","author":"Hao","year":"2005","journal-title":"Genet. Epidemiol"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1093\/bioinformatics\/btl574","article-title":"LdCompare: rapid computation of single- and multiple-marker r2 and genetic coverage","volume":"23","author":"Hao","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1126\/science.1124779","article-title":"A common genetic variant is associated with adult and childhood obesity","volume":"312","author":"Herbert","year":"2006","journal-title":"Science"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1002\/gepi.10292","article-title":"Principal component analysis for selection of optimal SNP-sets that capture intragenic genetic variation","volume":"26","author":"Horne","year":"2004","journal-title":"Genet. Epidemiol"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1007\/s00439-006-0182-5","article-title":"Efficient selection of tagging single-nucleotide polymorphisms in multiple populations","volume":"120","author":"Howie","year":"2006","journal-title":"Hum. Genet"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1038\/85776","article-title":"Variation is the spice of life","volume":"27","author":"Kruglyak","year":"2001","journal-title":"Nat. Genet"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"850","DOI":"10.1086\/425587","article-title":"Finding haplotype tagging SNPs by use of principal components analysis","volume":"75","author":"Lin","year":"2004","journal-title":"Am. J. Hum. Genet"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1086\/500808","article-title":"A comparison of phasing algorithms for trios and unrelated individuals","volume":"78","author":"Marchini","year":"2006","journal-title":"Am. J. Hum. Genet"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"663","DOI":"10.1038\/ng1816","article-title":"Evaluating and improving power in whole-genome association studies using fixed marker sets","volume":"38","author":"Pe\u2019er","year":"2006","journal-title":"Nat. Genet"},{"key":"2023041107520958100_","first-page":"301","article-title":"Choosing SNPs using feature selection","author":"Phuong","year":"2005"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1093\/bioinformatics\/bti762","article-title":"An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria","volume":"22","author":"Qin","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"3134","DOI":"10.1002\/sim.2407","article-title":"Multiple hypothesis testing strategies for genetic case-control association studies","volume":"25","author":"Rosenberg","year":"2006","journal-title":"Stat. Med"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"9900","DOI":"10.1073\/pnas.1633613100","article-title":"Minimal haplotype tagging","volume":"100","author":"Sebastiani","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"978","DOI":"10.1086\/319501","article-title":"A new statistical method for haplotype reconstruction from population data","volume":"68","author":"Stephens","year":"2001","journal-title":"Am. J. Hum. Genet"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1159\/000071807","article-title":"Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study","volume":"55","author":"Stram","year":"2003","journal-title":"Hum. Hered"},{"key":"2023041107520958100_","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1007\/s10038-006-0393-6","article-title":"A two-stage design for multiple testing in large-scale association studies","volume":"51","author":"Wen","year":"2006","journal-title":"J. Hum. Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/23\/3178\/49824350\/bioinformatics_23_23_3178.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/23\/3178\/49824350\/bioinformatics_23_23_3178.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,14]],"date-time":"2023-05-14T19:17:28Z","timestamp":1684091848000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/23\/3178\/290076"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,11,15]]},"references-count":28,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2007,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm496","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,12,1]]},"published":{"date-parts":[[2007,11,15]]}}}