{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T22:22:19Z","timestamp":1761862939408},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"22","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2203,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,11,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Next-generation sequencing presents several statistical challenges, with one of the most fundamental being determining an individual's genotype from multiple aligned short read sequences at a position. Some simple approaches for genotype calling apply fixed filters, such as calling a heterozygote if more than a specified percentage of the reads have variant nucleotide calls. Other genotype-calling methods, such as MAQ and SOAPsnp, are implementations of Bayes classifiers in that they classify genotypes using posterior genotype probabilities.<\/jats:p>\n               <jats:p>Results: Here, we propose a novel genotype-calling algorithm that, in contrast to the other methods, estimates parameters underlying the posterior probabilities in an adaptive way rather than arbitrarily specifying them a priori. The algorithm, which we call SeqEM, applies the well-known Expectation-Maximization algorithm to an appropriate likelihood for a sample of unrelated individuals with next-generation sequence data, leveraging information from the sample to estimate genotype probabilities and the nucleotide-read error rate. We demonstrate using analytic calculations and simulations that SeqEM results in genotype-call error rates as small as or smaller than filtering approaches and MAQ. We also apply SeqEM to exome sequence data in eight related individuals and compare the results to genotypes from an Illumina SNP array, showing that SeqEM behaves well in real data that deviates from idealized assumptions.<\/jats:p>\n               <jats:p>Conclusion: SeqEM offers an improved, robust and flexible genotype-calling approach that can be widely applied in the next-generation sequencing studies.<\/jats:p>\n               <jats:p>Availability and implementation: Software for SeqEM is freely available from our website: www.hihg.org under Software Download.<\/jats:p>\n               <jats:p>Contact: \u00a0emartin1@med.miami.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq526","type":"journal-article","created":{"date-parts":[[2010,9,23]],"date-time":"2010-09-23T00:31:47Z","timestamp":1285201907000},"page":"2803-2810","source":"Crossref","is-referenced-by-count":66,"title":["SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies"],"prefix":"10.1093","volume":"26","author":[{"given":"E. R.","family":"Martin","sequence":"first","affiliation":[{"name":"John P. Hussman Institute for Human Genomics and the Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, USA"}]},{"given":"D. D.","family":"Kinnamon","sequence":"additional","affiliation":[{"name":"John P. Hussman Institute for Human Genomics and the Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, USA"}]},{"given":"M. A.","family":"Schmidt","sequence":"additional","affiliation":[{"name":"John P. Hussman Institute for Human Genomics and the Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, USA"}]},{"given":"E. H.","family":"Powell","sequence":"additional","affiliation":[{"name":"John P. Hussman Institute for Human Genomics and the Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, USA"}]},{"given":"S.","family":"Zuchner","sequence":"additional","affiliation":[{"name":"John P. Hussman Institute for Human Genomics and the Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, USA"}]},{"given":"R. W.","family":"Morris","sequence":"additional","affiliation":[{"name":"John P. Hussman Institute for Human Genomics and the Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,9,21]]},"reference":[{"key":"2023012507561142700_B1","first-page":"472","volume-title":"Statistical Inference.","author":"Casella","year":"2002","edition":"2"},{"key":"2023012507561142700_B2","first-page":"1","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Series B Methodol."},{"key":"2023012507561142700_B3","first-page":"172","volume-title":"Principles of Population Genetics.","author":"Hartl","year":"2007","edition":"2"},{"key":"2023012507561142700_B4","doi-asserted-by":"crossref","first-page":"e8232","DOI":"10.1371\/journal.pone.0008232","article-title":"Exome sequencing of a multigenerational human pedigree","volume":"4","author":"Hedges","year":"2009","journal-title":"PLoS ONE"},{"key":"2023012507561142700_B5","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res."},{"key":"2023012507561142700_B6","doi-asserted-by":"crossref","first-page":"1606","DOI":"10.1101\/gr.092213.109","article-title":"Multiplex padlock targeted sequencing reveal human hypermutable CpG variations","volume":"19","author":"Li","year":"2009","journal-title":"Genome Res."},{"key":"2023012507561142700_B7","doi-asserted-by":"crossref","first-page":"1124","DOI":"10.1101\/gr.088013.108","article-title":"SNP detection for massively parallel whole-genome resequencing","volume":"19","author":"Li","year":"2009","journal-title":"Genome Res."},{"key":"2023012507561142700_B8","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1016\/j.ajhg.2007.11.004","article-title":"Simple and efficient analysis of disease association with missing genotype data","volume":"82","author":"Lin","year":"2008","journal-title":"Am. J. Hum. Genet."},{"key":"2023012507561142700_B9","doi-asserted-by":"crossref","first-page":"906","DOI":"10.1038\/ng2088","article-title":"A new multipoint method for genome-wide association studies by imputation of genotypes","volume":"39","author":"Marchini","year":"2007","journal-title":"Nat. Genet."},{"key":"2023012507561142700_B10","doi-asserted-by":"crossref","first-page":"899","DOI":"10.1080\/01621459.1991.10475130","article-title":"Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm","volume":"86","author":"Meng","year":"1991","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012507561142700_B11","first-page":"154","article-title":"Bayesian learning","volume-title":"Machine Learning.","author":"Mitchell","year":"1997"},{"key":"2023012507561142700_B12","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1038\/ng.499","article-title":"Exome sequencing identifies the cause of a mendelian disorder","volume":"42","author":"Ng","year":"2010","journal-title":"Nat. Genet."},{"key":"2023012507561142700_B13","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nmeth1110","article-title":"Multiplex amplification of large sets of human exons","volume":"4","author":"Porreca","year":"2007","journal-title":"Nat. Methods"},{"key":"2023012507561142700_B14","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1086\/338688","article-title":"Score tests for association between traits and haplotypes when linkage phase is ambiguous","volume":"70","author":"Schaid","year":"2002","journal-title":"Am. J. Hum. Genet."},{"key":"2023012507561142700_B15","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1016\/j.ajhg.2009.06.022","article-title":"Massively parallel sequencing: the next big thing in genetic medicine","volume":"85","author":"Tucker","year":"2009","journal-title":"Am. J. Hum. Genet."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/22\/2803\/48852441\/bioinformatics_26_22_2803.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/22\/2803\/48852441\/bioinformatics_26_22_2803.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T07:56:42Z","timestamp":1674633402000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/22\/2803\/227284"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,9,21]]},"references-count":15,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2010,11,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq526","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,11,15]]},"published":{"date-parts":[[2010,9,21]]}}}