{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,12]],"date-time":"2025-09-12T19:21:36Z","timestamp":1757704896481},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status\u2014an epigenetic mark describing chromatin accessibility\u2014from population-scale haplotype data.<\/jats:p>\n               <jats:p>Results: We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R2=0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R2 than the two competing methods in empirical data.<\/jats:p>\n               <jats:p>Availability and implementation: Software implementing the method described can be downloaded at http:\/\/bogdan.bioinformatics.ucla.edu\/software\/.<\/jats:p>\n               <jats:p>Contact: \u00a0shihuwenbo@ucla.edu or pasaniuc@ucla.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv397","type":"journal-article","created":{"date-parts":[[2015,7,3]],"date-time":"2015-07-03T19:10:45Z","timestamp":1435950645000},"page":"3514-3521","source":"Crossref","is-referenced-by-count":3,"title":["A multivariate Bernoulli model to predict DNaseI hypersensitivity status from haplotype data"],"prefix":"10.1093","volume":"31","author":[{"given":"Huwenbo","family":"Shi","sequence":"first","affiliation":[{"name":"1 Bioinformatics Interdepartmental Program, University of California, Los Angeles,"}]},{"given":"Bogdan","family":"Pasaniuc","sequence":"additional","affiliation":[{"name":"1 Bioinformatics Interdepartmental Program, University of California, Los Angeles,"},{"name":"2 Department of Pathology and Laboratory Medicine,"},{"name":"3 Department of Human Genetics and"}]},{"given":"Kenneth L.","family":"Lange","sequence":"additional","affiliation":[{"name":"1 Bioinformatics Interdepartmental Program, University of California, Los Angeles,"},{"name":"3 Department of Human Genetics and"},{"name":"4 Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90024, USA"}]}],"member":"286","published-online":{"date-parts":[[2015,7,2]]},"reference":[{"key":"2023020202331762800_btv397-B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"1000 Genomes Project Consortium et\u00a0al","year":"2010","journal-title":"Nature"},{"key":"2023020202331762800_btv397-B2","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1016\/j.cell.2007.12.014","article-title":"High-resolution mapping and characterization of open chromatin across the genome","volume":"132","author":"Boyle","year":"2008","journal-title":"Cell"},{"key":"2023020202331762800_btv397-B3","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1016\/j.ajhg.2011.01.010","article-title":"A fast, powerful method for detecting identity by descent","volume":"88","author":"Browning","year":"2011","journal-title":"Am. J. Hum. Genet."},{"key":"2023020202331762800_btv397-B4","doi-asserted-by":"crossref","first-page":"1084","DOI":"10.1086\/521987","article-title":"Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering","volume":"81","author":"Browning","year":"2007","journal-title":"Am. J. Hum. Genet."},{"key":"2023020202331762800_btv397-B5","doi-asserted-by":"crossref","first-page":"680","DOI":"10.1038\/ng.2634","article-title":"Meta-analysis identifies four new loci associated with testicular germ cell tumor","volume":"45","author":"Chung","year":"2013","journal-title":"Nat. Genet."},{"key":"2023020202331762800_btv397-B6","doi-asserted-by":"crossref","first-page":"1465","DOI":"10.3150\/12-BEJSP10","article-title":"Multivariate Bernoulli distribution","volume":"19","author":"Dai","year":"2013","journal-title":"Bernoulli"},{"key":"2023020202331762800_btv397-B7","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1038\/ng1001-229","article-title":"High-resolution haplotype structure in the human genome","volume":"29","author":"Daly","year":"2001","journal-title":"Nat. Genet."},{"key":"2023020202331762800_btv397-B8","doi-asserted-by":"crossref","first-page":"e1003608","DOI":"10.1371\/journal.pgen.1003608","article-title":"Prediction of complex human traits using the genomic best linear unbiased predictor","volume":"9","author":"de los Campos","year":"2013","journal-title":"PLoS Genet."},{"key":"2023020202331762800_btv397-B9","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1038\/nature10808","article-title":"DNase I sensitivity QTLs are a major determinant of human expression variation","volume":"482","author":"Degner","year":"2012","journal-title":"Nature"},{"key":"2023020202331762800_btv397-B10","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1038\/nature02168","article-title":"The international hapmap project","volume":"426","author":"Gibbs","year":"2003","journal-title":"Nature"},{"key":"2023020202331762800_btv397-B11","doi-asserted-by":"crossref","first-page":"e1000529","DOI":"10.1371\/journal.pgen.1000529","article-title":"A flexible and accurate genotype imputation method for the next generation of genome-wide association studies","volume":"5","author":"Howie","year":"2009","journal-title":"PLoS Genet."},{"key":"2023020202331762800_btv397-B12","doi-asserted-by":"crossref","first-page":"955","DOI":"10.1038\/ng.2354","article-title":"Fast and accurate genotype imputation in genome-wide association studies through pre-phasing","volume":"44","author":"Howie","year":"2012","journal-title":"Nat. Genet."},{"key":"2023020202331762800_btv397-B13","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1038\/9642","article-title":"Prospects for whole-genome linkage disequilibrium mapping of common disease genes","volume":"22","author":"Kruglyak","year":"1999","journal-title":"Nat. Genet."},{"key":"2023020202331762800_btv397-B14","volume-title":"Applied Probability. Springer Texts in Statistics","author":"Lange","year":"2010"},{"key":"2023020202331762800_btv397-B15","volume-title":"Optimization. Springer Texts in Statistics","author":"Lange","year":"2013"},{"key":"2023020202331762800_btv397-B16","doi-asserted-by":"crossref","first-page":"e1002453","DOI":"10.1371\/journal.pgen.1002453","article-title":"Inference of population structure using dense haplotype data","volume":"8","author":"Lawson","year":"2012","journal-title":"PLoS Genet."},{"key":"2023020202331762800_btv397-B17","doi-asserted-by":"crossref","first-page":"2213","DOI":"10.1093\/genetics\/165.4.2213","article-title":"Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data","volume":"165","author":"Li","year":"2003","journal-title":"Genetics"},{"key":"2023020202331762800_btv397-B18","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1002\/gepi.20533","article-title":"Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes","volume":"34","author":"Li","year":"2010","journal-title":"Genet. Epidemiol."},{"key":"2023020202331762800_btv397-B19","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1534\/genetics.108.099275","article-title":"Methods for human demographic inference using haplotype patterns from genomewide single-nucleotide polymorphism data","volume":"182","author":"Lohmueller","year":"2009","journal-title":"Genetics"},{"key":"2023020202331762800_btv397-B20","doi-asserted-by":"crossref","DOI":"10.3389\/fgene.2012.00230","article-title":"Current bioinformatic approaches to identify DNase I hypersensitive sites and genomic footprints from DNase-seq data","volume":"3","author":"Madrigal","year":"2012","journal-title":"Front. Genet."},{"key":"2023020202331762800_btv397-B21","doi-asserted-by":"crossref","first-page":"906","DOI":"10.1038\/ng2088","article-title":"A new multipoint method for genome-wide association studies by imputation of genotypes","volume":"39","author":"Marchini","year":"2007","journal-title":"Nat. Genet."},{"key":"2023020202331762800_btv397-B22","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1086\/508264","article-title":"A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants","volume":"79","author":"Morris","year":"2006","journal-title":"Am. J. Hum. Genet."},{"key":"2023020202331762800_btv397-B23","doi-asserted-by":"crossref","first-page":"i213","DOI":"10.1093\/bioinformatics\/btp197","article-title":"Inference of locus-specific ancestry in closely related populations","volume":"25","author":"Pasaniuc","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020202331762800_btv397-B24","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1101\/gr.079509.108","article-title":"Population genetic inference from genomic sequence variation","volume":"20","author":"Pool","year":"2010","journal-title":"Genome Res."},{"key":"2023020202331762800_btv397-B25","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/j.ajhg.2008.06.005","article-title":"Long-range ld can confound genome scans in admixed populations","volume":"83","author":"Price","year":"2008","journal-title":"Am. J. Hum. Genet."},{"key":"2023020202331762800_btv397-B26","doi-asserted-by":"crossref","first-page":"e1000519","DOI":"10.1371\/journal.pgen.1000519","article-title":"Sensitive detection of chromosomal segments of distinct ancestry in admixed populations","volume":"5","author":"Price","year":"2009","journal-title":"PLoS Genet."},{"key":"2023020202331762800_btv397-B27","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1038\/ng.2645","article-title":"Genome-wide association study identifies two susceptibility loci for osteosarcoma","volume":"45","author":"Savage","year":"2013","journal-title":"Nat. Genet."},{"key":"2023020202331762800_btv397-B28","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1086\/502802","article-title":"A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase","volume":"78","author":"Scheet","year":"2006","journal-title":"Am. J. Hum. Genet."},{"key":"2023020202331762800_btv397-B29","doi-asserted-by":"crossref","first-page":"2304","DOI":"10.1093\/bioinformatics\/btr341","article-title":"Hapgen2: simulation of multiple disease SNPs","volume":"27","author":"Su","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020202331762800_btv397-B30","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1002\/ajpa.20351","article-title":"Haplotype trees and modern human origins","volume":"128","author":"Templeton","year":"2005","journal-title":"Am. J. Phys. Anthropol."},{"key":"2023020202331762800_btv397-B31","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1038\/nrg1123","article-title":"Haplotype blocks and linkage disequilibrium in the human genome","volume":"4","author":"Wall","year":"2003","journal-title":"Nat. Rev. Genet."},{"key":"2023020202331762800_btv397-B32","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1007\/978-3-319-05269-4_30","article-title":"A spatial-aware haplotype copying model with applications to genotype imputation","volume-title":"Research in Computational Molecular Biology","author":"Yang","year":"2014"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/21\/3514\/49036105\/bioinformatics_31_21_3514.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/21\/3514\/49036105\/bioinformatics_31_21_3514.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T03:53:14Z","timestamp":1675309994000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/21\/3514\/194833"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,7,2]]},"references-count":32,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2015,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv397","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,11,1]]},"published":{"date-parts":[[2015,7,2]]}}}