{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T01:23:39Z","timestamp":1773278619720,"version":"3.50.1"},"reference-count":18,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2004,12,1]],"date-time":"2004-12-01T00:00:00Z","timestamp":1101859200000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"},{"start":{"date-parts":[[2004,12,1]],"date-time":"2004-12-01T00:00:00Z","timestamp":1101859200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>Increasingly researchers are turning to the use of haplotype analysis as a tool in population studies, the investigation of linkage disequilibrium, and candidate gene analysis. When the phase of the data is unknown, computational methods, in particular those employing the Expectation-Maximisation (EM) algorithm, are frequently used for estimating the phase and frequency of the underlying haplotypes. These methods have proved very successful, predicting the phase-known frequencies from data for which the phase is unknown with a high degree of accuracy. Recently there has been much speculation as to the effect of unknown, or missing allelic data \u2013 a common phenomenon even with modern automated DNA analysis techniques \u2013 on the performance of EM-based methods. To this end an EM-based program, modified to accommodate missing data, has been developed, incorporating non-parametric bootstrapping for the calculation of accurate confidence intervals.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>Here we present the results of the analyses of various data sets in which randomly selected known alleles have been relabelled as missing. Remarkably, we find that the absence of up to 30% of the data in both biallelic and multiallelic data sets with moderate to strong levels of linkage disequilibrium can be tolerated. Additionally, the frequencies of haplotypes which predominate in the complete data analysis remain essentially the same after the addition of the random noise caused by missing data.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusions<\/jats:title>\n                        <jats:p>These findings have important implications for the area of data gathering. It may be concluded that small levels of drop out in the data do not affect the overall accuracy of haplotype analysis perceptibly, and that, given recent findings on the effect of inaccurate data, ambiguous data points are best treated as unknown.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-5-188","type":"journal-article","created":{"date-parts":[[2005,1,13]],"date-time":"2005-01-13T09:24:41Z","timestamp":1105608281000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Haplotype frequency estimation error analysis in the presence of missing genotype data"],"prefix":"10.1186","volume":"5","author":[{"given":"Enda D","family":"Kelly","sequence":"first","affiliation":[]},{"given":"Fabian","family":"Sievers","sequence":"additional","affiliation":[]},{"given":"Ross","family":"McManus","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2004,12,1]]},"reference":[{"key":"304_CR1","doi-asserted-by":"publisher","first-page":"4841","DOI":"10.1093\/nar\/24.23.4841","volume":"24","author":"S Michalatos-Beloin","year":"1996","unstructured":"Michalatos-Beloin S, Tishkoff SA, Bentley KL, Kidd KK, Ruano G: Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR.\n                           Nucleic Acids Res 1996, 24: 4841\u20134843. 10.1093\/nar\/24.23.4841","journal-title":"Nucleic Acids Res"},{"key":"304_CR2","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1038\/hdy.1974.89","volume":"33","author":"WG Hill","year":"1974","unstructured":"Hill WG: Estimation of linkage disequilibrium in randomly mating populations.\n                           Heredity 1974, 33: 229\u2013239. 10.1038\/hdy.1974.89","journal-title":"Heredity"},{"key":"304_CR3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","volume":"39","author":"AP Dempster","year":"1977","unstructured":"Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm.\n                           J Royal Stat Soc B 1977, 39: 1\u201338.","journal-title":"J Royal Stat Soc B"},{"key":"304_CR4","first-page":"799","volume":"56","author":"JC Long","year":"1995","unstructured":"Long JC, Williams RC, Urbanek M: An E-M algorithm and testing strategy for multiple-locus haplotypes.\n                           Am J Hum Genet 1995, 56: 799\u2013810.","journal-title":"Am J Hum Genet"},{"key":"304_CR5","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1093\/oxfordjournals.jhered.a111613","volume":"86","author":"ME Hawley","year":"1995","unstructured":"Hawley ME, Kidd KK: HAPLO: A program using the EM algorithm to estimate the frequencies of multi-site haplotypes.\n                           J Hered 1995, 86: 409\u2013411.","journal-title":"J Hered"},{"key":"304_CR6","first-page":"921","volume":"12","author":"L Excoffier","year":"1995","unstructured":"Excoffier L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population.\n                           Mol Biol Evol 1995, 12: 921\u2013927.","journal-title":"Mol Biol Evol"},{"key":"304_CR7","doi-asserted-by":"publisher","first-page":"947","DOI":"10.1086\/303069","volume":"67","author":"D Fallin","year":"2000","unstructured":"Fallin D, Schork NJ: Accuracy of haplotype frequency estimation for biallelic loci, via the Expectation-Maximisation algorithm for unphased diploid genotype data.\n                           Am J Hum Genet 2000, 67: 947\u2013959. 10.1086\/303069","journal-title":"Am J Hum Genet"},{"key":"304_CR8","doi-asserted-by":"publisher","first-page":"518","DOI":"10.1086\/303000","volume":"67","author":"SA Tishkoff","year":"2000","unstructured":"Tishkoff SA, Pakstis AJ, Ruano G, Kidd KK: The accuracy of statistical methods for estimation of haplotype frequencies: An example from the CD4 locus.\n                           Am J Hum Genet 2000, 67: 518\u2013522. 10.1086\/303000","journal-title":"Am J Hum Genet"},{"key":"304_CR9","doi-asserted-by":"publisher","first-page":"1694","DOI":"10.1093\/bioinformatics\/18.12.1694","volume":"18","author":"JH Zhao","year":"2002","unstructured":"Zhao JH, Lissarrague S, Essioux L, Sham PC: GENECOUNTING: haplotype analysis with missing genotypes.\n                           Bioinformatics 2002, 18: 1694\u20131695. 10.1093\/bioinformatics\/18.12.1694","journal-title":"Bioinformatics"},{"key":"304_CR10","unstructured":"SNPHAP: A program for estimating frequencies of large haplotypes of SNPs.[http:\/\/www-gene.cimr.cam.ac.uk\/clayton\/software\/snphap.txt]"},{"key":"304_CR11","doi-asserted-by":"publisher","first-page":"616","DOI":"10.1038\/sj.ejhg.5200855","volume":"10","author":"KM Kirk","year":"2002","unstructured":"Kirk KM, Cardon LR: The impact of genotyping error on haplotype reconstruction and frequency estimation.\n                           Eur J Hum Genet 2002, 10: 616\u2013622. 10.1038\/sj.ejhg.5200855","journal-title":"Eur J Hum Genet"},{"key":"304_CR12","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1093\/genetics\/49.1.49","volume":"49","author":"RC Lewontin","year":"1964","unstructured":"Lewontin RC: The interaction of selection and linkage I. General considerations; heterotic models.\n                           Genetics 1964, 49: 49\u201367.","journal-title":"Genetics"},{"key":"304_CR13","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1086\/338446","volume":"70","author":"T Niu","year":"2002","unstructured":"Niu T, Qin ZS, Xu X, Liu JS: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms.\n                           Am J Hum Genet 2002, 70: 157\u2013169. 10.1086\/338446","journal-title":"Am J Hum Genet"},{"key":"304_CR14","doi-asserted-by":"publisher","first-page":"1073","DOI":"10.1126\/science.2570460","volume":"245","author":"B Kerem","year":"1989","unstructured":"Kerem B, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A, Buchwald M, Tsui LC: Identification of the cystic fibrosis gene: genetic analysis.\n                           Science 1989, 245: 1073\u20131080. 10.1126\/science.2570460","journal-title":"Science"},{"key":"304_CR15","doi-asserted-by":"publisher","first-page":"361","DOI":"10.2307\/2532296","volume":"48","author":"SW Guo","year":"1992","unstructured":"Guo SW, Thompson EA: Performing the exact test of Hardy-Weinberg proportion for multiple alleles.\n                           Biometrics 1992, 48: 361\u2013372. 10.2307\/2532296","journal-title":"Biometrics"},{"key":"304_CR16","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1111\/j.1469-1809.1955.tb01360.x","volume":"20","author":"R Ceppellini","year":"1955","unstructured":"Ceppellini R, Siniscalco M, Smith CAB: The estimation of gene frequencies in a random mating population.\n                           Ann Hum Genet 1955, 20: 97\u2013115. 10.1111\/j.1469-1809.1955.tb01360.x","journal-title":"Ann Hum Genet"},{"key":"304_CR17","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1111\/j.1469-1809.1972.tb00287.x","volume":"21","author":"CAB Smith","year":"1957","unstructured":"Smith CAB: Counting methods in genetical statistics.\n                           Ann Hum Genet 1957, 21: 254\u2013276.","journal-title":"Ann Hum Genet"},{"key":"304_CR18","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-4541-9","volume-title":"An Introduction to the Bootstrap","author":"B Efron","year":"1993","unstructured":"Efron B, Tibshirani RJ: An Introduction to the Bootstrap. New York: Chapman and Hall; 1993."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-188.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-5-188\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-188.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:21:28Z","timestamp":1728303688000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-5-188"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,12,1]]},"references-count":18,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2004,12]]}},"alternative-id":["304"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-5-188","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2004,12,1]]},"assertion":[{"value":"29 July 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 December 2004","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 December 2004","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"188"}}