{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T16:20:06Z","timestamp":1751473206839},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Summary: Genotype calling from high-throughput platforms such as Illumina and Affymetrix is a critical step in data processing, so that accurate information on genetic variants can be obtained for phenotype\u2013genotype association studies. A number of algorithms have been developed to infer genotypes from data generated through the Illumina BeadStation platform, including GenCall, GenoSNP, Illuminus and CRLMM. Most of these algorithms are built on population-based statistical models to genotype every SNP in turn, such as GenCall with the GenTrain clustering algorithm, and require a large reference population to perform well. These approaches may not work well for rare variants where only a small proportion of the individuals carry the variant. A fundamentally different approach, implemented in GenoSNP, adopts a single nucleotide polymorphism (SNP)-based model to infer genotypes of all the SNPs in one individual, making it an appealing alternative to call rare variants. However, compared to the population-based strategies, more SNPs in GenoSNP may fail the Hardy\u2013Weinberg Equilibrium test. To take advantage of both strategies, we propose a two-stage SNP calling procedure, named the modified mixture model (M3), to improve call accuracy for both common and rare variants. The effectiveness of our approach is demonstrated through applications to genotype calling on a set of HapMap samples used for quality control purpose in a large case\u2013control study of cocaine dependence. The increase in power with M3 is greater for rare variants than for common variants depending on the model.<\/jats:p>\n               <jats:p>Availability: M3 algorithm: http:\/\/bioinformatics.med.yale.edu\/group.<\/jats:p>\n               <jats:p>Contact: \u00a0name@bio.com; hongyu.zhao@yale.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr673","type":"journal-article","created":{"date-parts":[[2011,12,10]],"date-time":"2011-12-10T02:00:28Z","timestamp":1323482428000},"page":"358-365","source":"Crossref","is-referenced-by-count":25,"title":["M3: an improved SNP calling algorithm for Illumina BeadArray data"],"prefix":"10.1093","volume":"28","author":[{"given":"Gengxin","family":"Li","sequence":"first","affiliation":[{"name":"1 Biostatistics Division, Department of Epidemiology and Public Health, Yale University, 2Department of Psychiatry, Yale University, New Haven, CT 06520 and 3Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA"}]},{"given":"Joel","family":"Gelernter","sequence":"additional","affiliation":[{"name":"1 Biostatistics Division, Department of Epidemiology and Public Health, Yale University, 2Department of Psychiatry, Yale University, New Haven, CT 06520 and 3Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA"}]},{"given":"Henry R.","family":"Kranzler","sequence":"additional","affiliation":[{"name":"1 Biostatistics Division, Department of Epidemiology and Public Health, Yale University, 2Department of Psychiatry, Yale University, New Haven, CT 06520 and 3Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA"}]},{"given":"Hongyu","family":"Zhao","sequence":"additional","affiliation":[{"name":"1 Biostatistics Division, Department of Epidemiology and Public Health, Yale University, 2Department of Psychiatry, Yale University, New Haven, CT 06520 and 3Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA"}]}],"member":"286","published-online":{"date-parts":[[2011,12,8]]},"reference":[{"key":"2023012512170221700_B1","article-title":"BRLMM: an improved genotype calling method for the GeneChip Human Mapping 500K Array Set","volume-title":"Technical Report, White Paper.","author":"AFFYMETRIX","year":"2006"},{"key":"2023012512170221700_B2","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1016\/j.ajhg.2009.11.004","article-title":"Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies","volume":"85","author":"Browning","year":"2009","journal-title":"Am. J. Hum. Genet."},{"key":"2023012512170221700_B3","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1093\/biostatistics\/kxl042","article-title":"Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data","volume":"8","author":"Carvalho","year":"2007","journal-title":"Biostatistics."},{"key":"2023012512170221700_B4","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1038\/tpj.2010.47","article-title":"An interactive effect of batch size and composition contributes to discordant results in GWAS with the CHIAMO genotyping algorithm","volume":"10","author":"Chierici","year":"2010","journal-title":"Pharmacogenomics J."},{"key":"2023012512170221700_B5","doi-asserted-by":"crossref","first-page":"2209","DOI":"10.1093\/bioinformatics\/btn386","article-title":"GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population","volume":"24","author":"Giannoulatou","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512170221700_B6","article-title":"Illumina GenCall Data Analysis Software","author":"Illumina Inc.","year":"2005","journal-title":"TECHNOLOGY SPOTLIGHT."},{"key":"2023012512170221700_B7","article-title":"Improved Cluster Generation with Gentrain2","author":"Illumina Inc.","year":"2009","journal-title":"Technical Note: DNA Analysis."},{"key":"2023012512170221700_B8","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1126\/science.1109557","article-title":"Complement factor H polymorphism in age-related macular degeneration","volume":"308","author":"Klein","year":"2005","journal-title":"Science"},{"key":"2023012512170221700_B9","doi-asserted-by":"crossref","first-page":"906","DOI":"10.1038\/ng2088","article-title":"A new multipoint method for genome-wide association studies by imputation of genotypes","volume":"39","author":"Marchini","year":"2007","journal-title":"Nat. Genet."},{"key":"2023012512170221700_B10"},{"key":"2023012512170221700_B11","article-title":"Finite Mixture Models","volume-title":"Wiley Series in Probability and Statistics","author":"McLachlan","year":"2000"},{"key":"2023012512170221700_B12","first-page":"421","article-title":"Computing Issues for the EM Algorithm in Mixture Models","volume-title":"In Computing Science and Statistics","author":"McLachlan","year":"1999"},{"key":"2023012512170221700_B13","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1007\/BF02834632","article-title":"Mahalanobis distance","volume":"4","author":"McLachlan","year":"1999","journal-title":"Resonance"},{"key":"2023012512170221700_B14","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1093\/bioinformatics\/bti741","article-title":"A genotype calling algorithm for Affymetrix SNP arrays","volume":"22","author":"Rabbee","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512170221700_B15","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1038\/ng1133","article-title":"Quality and completeness of SNP databases","volume":"33","author":"Reich","year":"2003","journal-title":"Nat. Genet."},{"key":"2023012512170221700_B16","doi-asserted-by":"crossref","first-page":"2621","DOI":"10.1093\/bioinformatics\/btp470","article-title":"R\/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips","volume":"25","author":"Ritchie","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512170221700_B17","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1186\/1471-2105-12-68","article-title":"Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips","volume":"12","author":"Ritchie","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012512170221700_B18","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1038\/nature05616","article-title":"A genomewide association study identifies novel risk loci for type 2 diabetes","volume":"445","author":"Sladek","year":"2007","journal-title":"Nature"},{"key":"2023012512170221700_B19","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nmeth842","article-title":"Whole-genome genotyping with the single-base extension assay","volume":"3","author":"Steemers","year":"2006","journal-title":"Nat. Methods"},{"key":"2023012512170221700_B20","doi-asserted-by":"crossref","first-page":"2741","DOI":"10.1093\/bioinformatics\/btm443","article-title":"A genotype calling algorithm for the Illumina BeadArray platform","volume":"23","author":"Teo","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012512170221700_B21","doi-asserted-by":"crossref","first-page":"851","DOI":"10.1038\/nature06258","article-title":"A second generation human haplotype map of over 3.1 million SNPs","volume":"449","author":"The International HapMap Consortium","year":"2007","journal-title":"Nature"},{"key":"2023012512170221700_B22","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1038\/nature05911","article-title":"Genome-wide association study of 14 000 cases of seven common diseases and 3000 shared controls","volume":"447","author":"The Wellcome Trust Case Control Consortium","year":"2007","journal-title":"Nature"},{"key":"2023012512170221700_B23","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1038\/tpj.2010.27","article-title":"Assessment of variability in GWAS with CRLMM genotyping algorithm on WTCCC coronary artery disease","volume":"10","author":"Zhang","year":"2010","journal-title":"Pharmacogenomics J."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/3\/358\/48879783\/bioinformatics_28_3_358.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/3\/358\/48879783\/bioinformatics_28_3_358.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T14:48:18Z","timestamp":1674658098000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/3\/358\/189253"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,12,8]]},"references-count":23,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2012,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr673","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,2,1]]},"published":{"date-parts":[[2011,12,8]]}}}