{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T10:54:40Z","timestamp":1768992880304,"version":"3.49.0"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,6,1]]},"abstract":"<jats:p>Summary: The Dirichlet-multinomial (DMN) distribution is a fundamental model for multicategory count data with overdispersion. This distribution has many uses in bioinformatics including applications to metagenomics data, transctriptomics and alternative splicing. The DMN distribution reduces to the multinomial distribution when the overdispersion parameter \u03c8 is 0. Unfortunately, numerical computation of the DMN log-likelihood function by conventional methods results in instability in the neighborhood of . An alternative formulation circumvents this instability, but it leads to long runtimes that make it impractical for large count data common in bioinformatics. We have developed a new method for computation of the DMN log-likelihood to solve the instability problem without incurring long runtimes. The new approach is composed of a novel formula and an algorithm to extend its applicability. Our numerical experiments show that this new method both improves the accuracy of log-likelihood evaluation and the runtime by several orders of magnitude, especially in high-count data situations that are common in deep sequencing data. Using real metagenomic data, our method achieves manyfold runtime improvement. Our method increases the feasibility of using the DMN distribution to model many high-throughput problems in bioinformatics. We have included in our work an R package giving access to this method and a vingette applying this approach to metagenomic data.<\/jats:p><jats:p>Availability and implementation: An implementation of the algorithm together with a vignette describing its use is available in Supplementary Data.<\/jats:p><jats:p>Contact: \u00a0pengyu.bio@gmail.com or cashaw@bcm.edu<\/jats:p><jats:p>Supplementary information:\u2003Supplementary Data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu079","type":"journal-article","created":{"date-parts":[[2014,2,12]],"date-time":"2014-02-12T01:35:39Z","timestamp":1392168939000},"page":"1547-1554","source":"Crossref","is-referenced-by-count":27,"title":["An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function"],"prefix":"10.1093","volume":"30","author":[{"given":"Peng","family":"Yu","sequence":"first","affiliation":[{"name":"1 \u00a01Department of Electrical and Computer Engineering & TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering (CBGSE), Texas A&M University, College Station, TX 77843, USA and 2Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA"}]},{"given":"Chad A.","family":"Shaw","sequence":"additional","affiliation":[{"name":"1 \u00a01Department of Electrical and Computer Engineering & TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering (CBGSE), Texas A&M University, College Station, TX 77843, USA and 2Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,2,11]]},"reference":[{"key":"2023012710564498900_btu079-B1","doi-asserted-by":"crossref","DOI":"10.1002\/0471249688","volume-title":"Categorical Data Analysis. Wiley Series in Probability and Statistics","author":"Agresti","year":"2002","edition":"2nd edn"},{"key":"2023012710564498900_btu079-B2","doi-asserted-by":"crossref","first-page":"R106","DOI":"10.1186\/gb-2010-11-10-r106","article-title":"Differential expression analysis for sequence count data","volume":"11","author":"Anders","year":"2010","journal-title":"Genome Biol."},{"key":"2023012710564498900_btu079-B3","volume-title":"Pattern Recognition and Machine Learning. Information Science and Statistics","author":"Bishop","year":"2006"},{"key":"2023012710564498900_btu079-B4","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1109\/TKDE.2007.190726","article-title":"Clustering of count data using generalized Dirichlet multinomial distributions","volume":"20","author":"Bouguila","year":"2008","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"2023012710564498900_btu079-B5","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1093\/biomet\/67.3.591","article-title":"Analysis of contingency tables under cluster sampling","volume":"67","author":"Brier","year":"1980","journal-title":"Biometrika."},{"key":"2023012710564498900_btu079-B6","article-title":"Using Dirichlet mixture priors to derive hidden Markov models for protein families","author":"Brown","year":"1993"},{"key":"2023012710564498900_btu079-B7","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139013567","volume-title":"Regression Analysis of Count Data. Econometric Society Monographs","author":"Cameron","year":"2013","edition":"2nd edn"},{"key":"2023012710564498900_btu079-B8","article-title":"Duxbury advanced series in statistics and decision sciences","volume-title":"Statistical inference","author":"Casella","year":"2002"},{"key":"2023012710564498900_btu079-B9","doi-asserted-by":"crossref","first-page":"34","DOI":"10.2307\/2346223","article-title":"Beta-binomial ANOVA for proportions","volume":"27","author":"Crowder","year":"1978","journal-title":"Appl. Stat."},{"key":"2023012710564498900_btu079-B10","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1093\/biostatistics\/kxs050","article-title":"Dirichlet negative multinomial regression for overdispersed correlated count data","volume":"14","author":"Farewell","year":"2013","journal-title":"Biostatistics."},{"key":"2023012710564498900_btu079-B11","volume-title":"Statistical Methods for Research Workers","author":"Fisher","year":"1973","edition":"14th edn"},{"key":"2023012710564498900_btu079-B12","volume-title":"Complex Analysis","author":"Freitag","year":"2009","edition":"2nd edn"},{"key":"2023012710564498900_btu079-B13","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1186\/1471-2180-12-122","article-title":"Metagenome analyses of corroded concrete wastewater pipe biofilms reveal a complex microbial system","volume":"12","author":"Gomez-Alvarez","year":"2012","journal-title":"BMC Microbiol."},{"key":"2023012710564498900_btu079-B14","doi-asserted-by":"crossref","first-page":"281","DOI":"10.2307\/2529950","article-title":"Analysis of dichotomous response data from certain toxicological experiments","volume":"35","author":"Haseman","year":"1979","journal-title":"Biometrics."},{"key":"2023012710564498900_btu079-B15","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511973420","volume-title":"Negative Binomial Regression","author":"Hilbe","year":"2011","edition":"2nd edn"},{"key":"2023012710564498900_btu079-B16","volume-title":"IEEE 754-2008, Standard for Floating-Point Arithmetic","author":"IEEE Task P754","year":"2008"},{"key":"2023012710564498900_btu079-B17","doi-asserted-by":"crossref","first-page":"711","DOI":"10.2307\/2532338","article-title":"Testing goodness of fit of a multinomial model against overdispersed alternatives","volume":"48","author":"Kim","year":"1992","journal-title":"Biometrics."},{"key":"2023012710564498900_btu079-B18","first-page":"46","article-title":"Proportions with extraneous variance: single and independent sample","volume":"68","author":"Kleinman","year":"1973","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012710564498900_btu079-B19","doi-asserted-by":"crossref","first-page":"e52078","DOI":"10.1371\/journal.pone.0052078","article-title":"Hypothesis testing and power calculations for taxonomic-based human microbiome data","volume":"7","author":"La Rosa","year":"2012","journal-title":"PLoS One"},{"key":"2023012710564498900_btu079-B20","article-title":"Optimizing polynomials for floating-point implementation","author":"Lauter","year":"2008","journal-title":"In: Proceedings of the 8th Conference on Real Numbers and Computers, Santiago de Compostela, Spain"},{"key":"2023012710564498900_btu079-B21","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1177\/002224378402100110","article-title":"The Dirichlet multinomial distribution as a magazine exposure model","volume":"21","author":"Leckenby","year":"1984","journal-title":"J. Mark. Res."},{"key":"2023012710564498900_btu079-B22","volume-title":"aod: Analysis of Overdispersed Data","author":"Lesnoff","year":"2012"},{"key":"2023012710564498900_btu079-B23","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.1038\/ncb2839","article-title":"Son connects the splicing-regulatory network with pluripotency in human embryonic stem cells","volume":"15","author":"Lu","year":"2013","journal-title":"Nat. Cell Biol."},{"key":"2023012710564498900_btu079-B24","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1002\/wrna.47","article-title":"Alternative mRNA polyadenylation in eukaryotes: an effective regulator of gene expression","volume":"2","author":"Lutz","year":"2011","journal-title":"Wiley Interdiscip. Rev. RNA"},{"key":"2023012710564498900_btu079-B25","first-page":"1","article-title":"A hierarchical Dirichlet language model","volume":"1","author":"MacKay","year":"1994","journal-title":"Nat. Lang. Eng."},{"key":"2023012710564498900_btu079-B26","article-title":"Modeling word burstiness using the Dirichlet distribution","author":"Madsen","year":"2005","journal-title":"In: Proceedings of the 22nd International Conference on Machine Learning"},{"key":"2023012710564498900_btu079-B27","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4899-3242-6","volume-title":"Generalized Linear Models. Monographs on Statistics and Applied Probability","author":"McCullagh","year":"1989"},{"key":"2023012710564498900_btu079-B28","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1145\/6497.214326","article-title":"Algorithm 643: Fexact: a fortran subroutine for Fisher\u2019s exact test on unordered contingency tables","volume":"12","author":"Mehta","year":"1986","journal-title":"ACM Trans. Math. Softw."},{"key":"2023012710564498900_btu079-B29","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologies \u2014 the next generation","volume":"11","author":"Metzker","year":"2010","journal-title":"Nat. Rev. Genet."},{"key":"2023012710564498900_btu079-B30","volume-title":"Topic Models Conditioned on Arbitrary Features with Dirichlet-Multinomial Regression","author":"Mimno","year":"2008"},{"key":"2023012710564498900_btu079-B31","first-page":"65","article-title":"On the compound multinomial distribution, the multivariate \u03b2-distribution, and correlations among proportions","volume":"49","author":"Mosimann","year":"1962","journal-title":"Biometrika."},{"key":"2023012710564498900_btu079-B32","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1093\/biomet\/50.1-2.47","article-title":"On the compound negative multinomial distribution and correlations among inversely sampled pollen counts","volume":"50","author":"Mosimann","year":"1963","journal-title":"Biometrika."},{"key":"2023012710564498900_btu079-B33","article-title":"Human Microbiome Project 16S rRNA Clinical Production Pilot (ID: 48335)","author":"The NCBI BioProject website","year":"2010"},{"key":"2023012710564498900_btu079-B34","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/j.csda.2004.05.007","article-title":"An improved method for the computation of maximum likelihood estimates for multinomial overdispersion models","volume":"49","author":"Neerchal","year":"2005","journal-title":"Comput. Stat. Data Anal."},{"key":"2023012710564498900_btu079-B35","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1002\/bimj.200410103","article-title":"Fisher information matrix of the Dirichlet-multinomial distribution","volume":"47","author":"Paul","year":"2005","journal-title":"Biom. J."},{"key":"2023012710564498900_btu079-B36","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1111\/1467-9574.00094","article-title":"On modelling overdispersion of counts","volume":"53","author":"Poortema","year":"1999","journal-title":"Stat. Neerl."},{"key":"2023012710564498900_btu079-B37","volume-title":"R: A Language and Environment for Statistical Computing","author":"R Core Team","year":"2013"},{"key":"2023012710564498900_btu079-B38","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edgeR: a Bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics."},{"key":"2023012710564498900_btu079-B39","volume-title":"HMP: Hypothesis Testing and Power Calculations for Comparing Metagenomic Samples from HMP","author":"Rosa","year":"2013"},{"key":"2023012710564498900_btu079-B40","doi-asserted-by":"crossref","first-page":"10","DOI":"10.2307\/1968409","article-title":"A proof of the asymptotic series for and","volume":"32","author":"Rowe","year":"1931","journal-title":"Ann. Math., Second Ser"},{"key":"2023012710564498900_btu079-B41","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1177\/002224378402100109","article-title":"The mixed-media Dirichlet multinomial distribution: a model for evaluating television-magazine advertising schedules","volume":"21","author":"Rust","year":"1984","journal-title":"J. Mark. Res."},{"key":"2023012710564498900_btu079-B42","first-page":"327","article-title":"Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology","volume":"12","author":"Sj\u00f6lander","year":"1996","journal-title":"Comput. Appl. Biosci."},{"key":"2023012710564498900_btu079-B43","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1111\/j.2517-6161.1948.tb00014.x","article-title":"A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials","volume":"10","author":"Skellam","year":"1948","journal-title":"J. R. Stat. Soc. Ser. B Methodol."},{"key":"2023012710564498900_btu079-B44","volume-title":"Sage Mathematics Software (Version 5.0.1)","author":"Stein","year":"2012"},{"key":"2023012710564498900_btu079-B45","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1093\/biomet\/66.3.585","article-title":"Testing the goodness of fit of the binomial distribution","volume":"66","author":"Tarone","year":"1979","journal-title":"Biometrika."},{"key":"2023012710564498900_btu079-B46","volume-title":"dirmult: Estimation in Dirichlet-Multinomial Distribution","author":"Tvedebrink","year":"2009"},{"key":"2023012710564498900_btu079-B47","doi-asserted-by":"crossref","DOI":"10.17077\/etd.a6sywkpm","article-title":"Global analysis of alternative polyadenylation regulation using high-throughput sequencing","author":"Wan","year":"2012"},{"key":"2023012710564498900_btu079-B48","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1038\/nature07509","article-title":"Alternative isoform regulation in human tissue transcriptomes","volume":"456","author":"Wang","year":"2008","journal-title":"Nature."},{"key":"2023012710564498900_btu079-B49","volume-title":"A Course of Modern Analysis","author":"Whittaker","year":"1927","edition":"4th edn"},{"key":"2023012710564498900_btu079-B50","volume-title":"Econometric Analysis of Count Data","author":"Winkelmann","year":"2008","edition":"5th edn"},{"key":"2023012710564498900_btu079-B51","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v032.i10","article-title":"The VGAM package for categorical data analysis","volume":"32","author":"Yee","year":"2010","journal-title":"J. Stat. Softw."},{"key":"2023012710564498900_btu079-B52","volume-title":"VGAM: Vector Generalized Linear and Additive Models","author":"Yee","year":"2012"},{"key":"2023012710564498900_btu079-B53","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1111\/j.2517-6161.1996.tb02095.x","article-title":"Vector generalized additive models","volume":"58","author":"Yee","year":"1996","journal-title":"J. R. Stat. Soc. B"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/11\/1547\/48928586\/bioinformatics_30_11_1547.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/11\/1547\/48928586\/bioinformatics_30_11_1547.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T05:01:29Z","timestamp":1716526889000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/11\/1547\/2748132"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,2,11]]},"references-count":53,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2014,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu079","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2014,6,1]]},"published":{"date-parts":[[2014,2,11]]}}}