{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T02:17:37Z","timestamp":1771035457261,"version":"3.50.1"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"17","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Pooling DNA is a cost-effective alternative to individual genotyping method. It is often used for initial screening in genome-wide association analysis. In some studies, large pools with sizes up to several hundreds were applied in order to significantly reduce genotyping cost. However, method for estimating haplotype frequencies from large DNA pools has not been available due to computational complexity involved.<\/jats:p>\n               <jats:p>Methods: We propose a novel constrained EM algorithm, PoooL, to estimate frequencies of single-nucleotide polymorphism (SNP) haplotypes from DNA pools. A quantity called importance factor is introduced to measure the contribution of a haplotype to the likelihood. Under the assumption of asymptotic normality of the estimated allele frequencies and a system of linear constraints on haplotype frequencies the importance factor remains a constant in the iterative maximization process. The maximization problem in the EM algorithm is then formulated into a constrained maximum entropy model and solved by the improved iterative scaling method.<\/jats:p>\n               <jats:p>Results: Simulation study shows that our algorithm can efficiently estimate haplotype frequencies from DNA pools with arbitrarily large sizes. The algorithm works equally well for large pools with sizes up to hundreds or thousands and for pools with sizes as small as one or two individuals. The computational complexity of the PoooL algorithm is independent of pool sizes, and the computational efficiency for large pools is thus substantially improved over existing estimating methods. Simulation results also show that the proposed method is robust to genotype errors and population admixture.<\/jats:p>\n               <jats:p>Availability: \u00a0http:\/\/staff.ustc.edu.cn\/~ynyang\/poool<\/jats:p>\n               <jats:p>Contact: \u00a0zhanghan@mail.ustc.edu.cn; ynyang@ustc.edu.cn<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn324","type":"journal-article","created":{"date-parts":[[2008,6,24]],"date-time":"2008-06-24T00:13:49Z","timestamp":1214266429000},"page":"1942-1948","source":"Crossref","is-referenced-by-count":20,"title":["PoooL: an efficient method for estimating haplotype frequencies from large DNA pools"],"prefix":"10.1093","volume":"24","author":[{"given":"Han","family":"Zhang","sequence":"first","affiliation":[{"name":"1 Department of Statistics and Finance, University of Science and Technology of China, Anhui 230026 and 2Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan"}]},{"given":"Hsin-Chou","family":"Yang","sequence":"additional","affiliation":[{"name":"1 Department of Statistics and Finance, University of Science and Technology of China, Anhui 230026 and 2Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan"}]},{"given":"Yaning","family":"Yang","sequence":"additional","affiliation":[{"name":"1 Department of Statistics and Finance, University of Science and Technology of China, Anhui 230026 and 2Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan"}]}],"member":"286","published-online":{"date-parts":[[2008,6,23]]},"reference":[{"key":"2023020211091346600_B1","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1086\/515512","article-title":"g of disease loci, by use of a pooled DNA genomic screen","volume":"61","author":"Barcellos","year":"1997","journal-title":"Am. J. Hum. Genet."},{"key":"2023020211091346600_B2","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1046\/j.1469-1809.2002.00125.x","article-title":"Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design","volume":"66","author":"Barratt","year":"2002","journal-title":"Ann. Hum. Genet."},{"key":"2023020211091346600_B3","first-page":"39","article-title":"A maximum entropy approach to natural language processing","volume":"22","author":"Berger","year":"1996","journal-title":"Comput. Lingui."},{"key":"2023020211091346600_B4","first-page":"146","article-title":"I-divergence geometry of probability distributions and minimization problems","volume":"3","author":"Csisa\u00e1r","year":"1975","journal-title":"Ann. Prob."},{"key":"2023020211091346600_B5","doi-asserted-by":"crossref","first-page":"1409","DOI":"10.1214\/aos\/1176347279","article-title":"A geometric interpretation of Darroch and Ratcliff's generalized iterative scaling","volume":"17","author":"Csisz\u00e1r","year":"1989","journal-title":"Ann. Stat."},{"key":"2023020211091346600_B6","doi-asserted-by":"crossref","first-page":"1470","DOI":"10.1214\/aoms\/1177692379","article-title":"Generalized iterative scaling for log-linear models","volume":"43","author":"Darroch","year":"1972","journal-title":"Ann. Math. Statist."},{"key":"2023020211091346600_B7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/34.588021","article-title":"Inducing features of random fields","volume":"19","author":"Della Pietra","year":"1997","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023020211091346600_B8","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1086\/346116","article-title":"Estimation of haplotype frequencies, linkage-disequilibrium measures, and combination of haplotype copies in each pool by use of pooled DNA data","volume":"72","author":"Ito","year":"2003","journal-title":"Am. J. Hum. Genet."},{"key":"2023020211091346600_B9","doi-asserted-by":"crossref","first-page":"36889","DOI":"10.1074\/jbc.M204732200","article-title":"Angiotensinogen gene polymorphism at -217 affects basal promoter activity and is associated with hypertension in African\u2013Americans","volume":"277","author":"Jain","year":"2002","journal-title":"J. Biol. Chem."},{"key":"2023020211091346600_B10","doi-asserted-by":"crossref","first-page":"620","DOI":"10.1103\/PhysRev.106.620","article-title":"Information theory and statistical mechanics","volume":"106","author":"Jaynes","year":"1957","journal-title":"Phys. Rev."},{"key":"2023020211091346600_B11","doi-asserted-by":"crossref","first-page":"3048","DOI":"10.1093\/bioinformatics\/btm435","article-title":"HaploPool: improving haplotype frequency estimation through DNA pools and phylogenetic modeling","volume":"23","author":"Kirkpatrick","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020211091346600_B12","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1198\/016214505000000808","article-title":"Likelihood-based inference on haplotype effects in genetic association studies","volume":"101","author":"Lin","year":"2006","journal-title":"J. Am. Stat. Assoc."},{"key":"2023020211091346600_B13","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1086\/338446","article-title":"Bayesian haplotype inference for multiple linked single\u2013nucleotide polymorphisms","volume":"70","author":"Niu","year":"2002","journal-title":"Am. J. Hum. Genet."},{"key":"2023020211091346600_B14","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1002\/gepi.20024","article-title":"Algorithms for inferring haplotypes","volume":"27","author":"Niu","year":"2004","journal-title":"Genet. Epidemiol."},{"key":"2023020211091346600_B15","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1080\/07853890310021724","article-title":"DNA pooling as a tool for large-scale association studies in complex traits","volume":"36","author":"Norton","year":"2004","journal-title":"Ann. Med."},{"key":"2023020211091346600_B16","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1086\/510686","article-title":"Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies","volume":"80","author":"Pearson","year":"2007","journal-title":"Am. J. Hum. Genet."},{"key":"2023020211091346600_B17","first-page":"237","article-title":"Resolution of haplotypes and haplotype frequencies from SNP genotypes of pooled samples","volume-title":"Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology (RECOMB2003)","author":"Pe'er","year":"2003"},{"key":"2023020211091346600_B18","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1101\/gr.8.12.1273","article-title":"The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling","volume":"8","author":"Risch","year":"1998","journal-title":"Genome Res."},{"key":"2023020211091346600_B19","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1038\/nrg930","article-title":"DNA pooling: a tool for large-scale association studies","volume":"3","author":"Sham","year":"2002","journal-title":"Nat. Rev. Genet."},{"key":"2023020211091346600_B20","doi-asserted-by":"crossref","first-page":"949","DOI":"10.2337\/diacare.21.6.949","article-title":"Mapping genes for NIDDM: design of the Finland-United States Investigation of NIDDM Genetics (FUSION) study","volume":"21","author":"Valle","year":"1998","journal-title":"Diabetes Care"},{"key":"2023020211091346600_B21","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1002\/gepi.10195","article-title":"On the use of DNA pooling to estimate haplotype frequencies","volume":"24","author":"Wang","year":"2003","journal-title":"Genet. Epidemiol."},{"key":"2023020211091346600_B22","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1186\/1471-2105-7-233","article-title":"PDA: pooled DNA analyzer","volume":"7","author":"Yang","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023020211091346600_B23","doi-asserted-by":"crossref","first-page":"7225","DOI":"10.1073\/pnas.1237858100","article-title":"Efficiency of SNP haplotype estimation from pooled DNA","volume":"100","author":"Yang","year":"2003","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020211091346600_B24","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1002\/gepi.20212","article-title":"Statistical methods for haplotype-based matched case-control association studies","volume":"31","author":"Zhang","year":"2007","journal-title":"Genet. Epidemiol."},{"key":"2023020211091346600_B25","doi-asserted-by":"crossref","first-page":"1747","DOI":"10.1534\/genetics.105.042648","article-title":"Two-stage designs in case-control association analysis","volume":"173","author":"Zuo","year":"2006","journal-title":"Genetics."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/17\/1942\/49050561\/bioinformatics_24_17_1942.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/17\/1942\/49050561\/bioinformatics_24_17_1942.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T13:07:06Z","timestamp":1675343226000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/17\/1942\/261023"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,6,23]]},"references-count":25,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2008,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn324","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,9,1]]},"published":{"date-parts":[[2008,6,23]]}}}