{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:51Z","timestamp":1772138091769,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"18","license":[{"start":{"date-parts":[[2019,2,7]],"date-time":"2019-02-07T00:00:00Z","timestamp":1549497600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000038","name":"NSERC","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000196","name":"Canada Foundation for Innovation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000196","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Ontario Ministry of Research and Innovation"},{"name":"Connaught International Scholarships"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Mammalian genomes can contain thousands of enhancers but only a subset are actively driving gene expression in a given cellular context. Integrated genomic datasets can be harnessed to predict active enhancers. One challenge in integration of large genomic datasets is the increasing heterogeneity: continuous, binary and discrete features may all be relevant. Coupled with the typically small numbers of training examples, semi-supervised approaches for heterogeneous data are needed; however, current enhancer prediction methods are not designed to handle heterogeneous data in the semi-supervised paradigm.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We implemented a Dirichlet Process Heterogeneous Mixture model that infers Gaussian, Bernoulli and Poisson distributions over features. We derived a novel variational inference algorithm to handle semi-supervised learning tasks where certain observations are forced to cluster together. We applied this model to enhancer candidates in mouse heart tissues based on heterogeneous features. We constrained a small number of known active enhancers to appear in the same cluster, and 47 additional regions clustered with them. Many of these are located near heart-specific genes. The model also predicted 1176 active promoters, suggesting that it can discover new enhancers and promoters.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>We created the \u2018dphmix\u2019 Python package: https:\/\/pypi.org\/project\/dphmix\/.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz064","type":"journal-article","created":{"date-parts":[[2019,2,6]],"date-time":"2019-02-06T15:14:20Z","timestamp":1549466060000},"page":"3232-3239","source":"Crossref","is-referenced-by-count":1,"title":["Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers"],"prefix":"10.1093","volume":"35","author":[{"given":"Tahmid F","family":"Mehdi","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Toronto , Toronto, ON, Canada"}]},{"given":"Gurdeep","family":"Singh","sequence":"additional","affiliation":[{"name":"Department of Cell & Systems Biology, University of Toronto , Toronto, ON, Canada"}]},{"given":"Jennifer A","family":"Mitchell","sequence":"additional","affiliation":[{"name":"Department of Cell & Systems Biology, University of Toronto , Toronto, ON, Canada"},{"name":"Centre for the Analysis of Genome Evolution and Function, University of Toronto , Toronto, ON, Canada"}]},{"given":"Alan M","family":"Moses","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Toronto , Toronto, ON, Canada"},{"name":"Department of Cell & Systems Biology, University of Toronto , Toronto, ON, Canada"},{"name":"Centre for the Analysis of Genome Evolution and Function, University of Toronto , Toronto, ON, Canada"}]}],"member":"286","published-online":{"date-parts":[[2019,2,7]]},"reference":[{"key":"2023013108003543200_btz064-B1","author":"Beal","year":"2003"},{"key":"2023013108003543200_btz064-B2","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop","year":"2006"},{"key":"2023013108003543200_btz064-B3","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1214\/06-BA104","article-title":"Variational inference for dirichlet process mixtures","volume":"1","author":"Blei","year":"2006","journal-title":"Bayesian Anal"},{"key":"2023013108003543200_btz064-B4","first-page":"65","volume-title":"Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI\u201910","author":"Blundell","year":"2010"},{"key":"2023013108003543200_btz064-B5","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/j.cell.2011.01.024","article-title":"Functional and mechanistic diversity of distal transcription enhancers","volume":"144","author":"Bulger","year":"2011","journal-title":"Cell"},{"key":"2023013108003543200_btz064-B6","doi-asserted-by":"crossref","first-page":"825","DOI":"10.1016\/j.molcel.2013.01.038","article-title":"Modification of enhancer chromatin: what, how, and why?","volume":"49","author":"Calo","year":"2013","journal-title":"Mol. Cell"},{"key":"2023013108003543200_btz064-B7","doi-asserted-by":"crossref","first-page":"2550","DOI":"10.1016\/j.celrep.2016.05.027","article-title":"Defining the minimal factors required for erythropoiesis through direct lineage conversion","volume":"15","author":"Capellera-Garcia","year":"2016","journal-title":"Cell Rep"},{"key":"2023013108003543200_btz064-B8","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1101\/gad.310367.117","article-title":"Assessing sufficiency and necessity of enhancer activities for gene expression and the mechanisms of transcription activation","volume":"32","author":"Catarino","year":"2018","journal-title":"Genes Dev"},{"key":"2023013108003543200_btz064-B9","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.celrep.2015.08.065","article-title":"Sequential binding of meis1 and nkx2-5 on the popdc2 gene: a mechanism for spatiotemporal regulation of enhancers during cardiogenesis","volume":"13","author":"Dupays","year":"2015","journal-title":"Cell Rep"},{"key":"2023013108003543200_btz064-B10","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of dna elements in the human genome","volume":"489","year":"2012","journal-title":"Nature"},{"key":"2023013108003543200_btz064-B11","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1038\/nmeth.1906","article-title":"Chromhmm: automating chromatin-state discovery and characterization","volume":"9","author":"Ernst","year":"2012","journal-title":"Nat. Methods"},{"key":"2023013108003543200_btz064-B12","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1038\/nature09906","article-title":"Mapping and analysis of chromatin state dynamics in nine human cell types","volume":"473","author":"Ernst","year":"2011","journal-title":"Nature"},{"key":"2023013108003543200_btz064-B13","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1016\/S0014-5793(98)01476-8","article-title":"Cardiac specific expression of the green fluorescent protein during early murine embryonic development","volume":"440","author":"Fleischmann","year":"1998","journal-title":"FEBS Lett"},{"key":"2023013108003543200_btz064-B14","doi-asserted-by":"crossref","first-page":"E1633","DOI":"10.1073\/pnas.1618353114","article-title":"Improved regulatory element prediction based on tissue-specific local epigenomic signatures","volume":"114","author":"He","year":"2017","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013108003543200_btz064-B15","doi-asserted-by":"crossref","first-page":"D590","DOI":"10.1093\/nar\/gkj144","article-title":"The ucsc genome browser database: update 2006","volume":"34","author":"Hinrichs","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023013108003543200_btz064-B16","first-page":"1303","article-title":"Stochastic variational inference","volume":"14","author":"Hoffman","year":"2013","journal-title":"J. Mach. Learn. Res"},{"key":"2023013108003543200_btz064-B17","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1038\/nmeth.1937","article-title":"Unsupervised pattern discovery in human chromatin structure through genomic segmentation","volume":"9","author":"Hoffman","year":"2012","journal-title":"Nat. Methods"},{"key":"2023013108003543200_btz064-B18","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classif"},{"key":"2023013108003543200_btz064-B19","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1198\/016214501750332758","article-title":"Gibbs sampling methods for stick breaking priors","volume":"96","author":"Ishwaran","year":"2001","journal-title":"J. Am. Stat. Assoc"},{"key":"2023013108003543200_btz064-B20","first-page":"3581","volume-title":"Proceedings of the 27th International Conference on Neural Information Processing Systems","author":"Kingma","year":"2014"},{"key":"2023013108003543200_btz064-B21","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1186\/s12859-018-2187-1","article-title":"Genome-wide prediction of cis-regulatory regions using supervised deep learning methods","volume":"19","author":"Li","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023013108003543200_btz064-B22","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/j.ijar.2017.11.001","article-title":"Fast approximation of variational bayes dirichlet process mixture using the maximization\u2013maximization algorithm","volume":"93","author":"Lim","year":"2018","journal-title":"Int. J. Appr. Reason"},{"key":"2023013108003543200_btz064-B23","doi-asserted-by":"crossref","first-page":"219.","DOI":"10.1186\/s13059-017-1345-5","article-title":"Functional assessment of human enhancer activities using whole-genome STARR-sequencing","volume":"18","author":"Liu","year":"2017","journal-title":"Genome Biol"},{"key":"2023013108003543200_btz064-B24","doi-asserted-by":"crossref","first-page":"e49274.","DOI":"10.1371\/journal.pone.0049274","article-title":"Nuclear rna sequencing of the mouse erythroid cell transcriptome","volume":"7","author":"Mitchell","year":"2012","journal-title":"PLoS One"},{"key":"2023013108003543200_btz064-B25","first-page":"249","article-title":"Markov chain sampling methods for dirichlet process mixture models","volume":"9","author":"Neal","year":"2000","journal-title":"J. Comput. Graph. Stat"},{"key":"2023013108003543200_btz064-B26","doi-asserted-by":"crossref","first-page":"170112","DOI":"10.1038\/sdata.2017.112","article-title":"Fantom5 cage profiles of human and mouse samples","volume":"4","author":"Noguchi","year":"2017","journal-title":"Sci. Data"},{"key":"2023013108003543200_btz064-B27","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1101\/gr.5972507","article-title":"Predicting tissue-specific enhancers in the human genome","volume":"17","author":"Pennacchio","year":"2007","journal-title":"Genome Res"},{"key":"2023013108003543200_btz064-B28","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1080\/21541264.2016.1253529","article-title":"Causal role of histone acetylations in enhancer function","volume":"8","author":"Pradeepa","year":"2017","journal-title":"Transcription"},{"key":"2023013108003543200_btz064-B29","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1038\/nature14248","article-title":"Integrative analysis of 111 reference human epigenomes","volume":"518","year":"2015","journal-title":"Nature"},{"key":"2023013108003543200_btz064-B30","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math"},{"key":"2023013108003543200_btz064-B31","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1126\/science.281.5373.108","article-title":"Congenital heart disease caused by mutations in the transcription factor NKX2-5","volume":"281","author":"Schott","year":"1998","journal-title":"Science"},{"key":"2023013108003543200_btz064-B32","doi-asserted-by":"crossref","first-page":"126","DOI":"10.4161\/nucl.19232","article-title":"Chromatin signatures of active enhancers","volume":"3","author":"Spicuglia","year":"2012","journal-title":"Nucleus"},{"key":"2023013108003543200_btz064-B33","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1242\/dev.126.6.1269","article-title":"The cardiac homeobox gene Csx\/Nkx2.5 lies genetically upstream of multiple genes essential for heart development","volume":"126","author":"Tanaka","year":"1999","journal-title":"Development"},{"key":"2023013108003543200_btz064-B34","doi-asserted-by":"crossref","first-page":"D88","DOI":"10.1093\/nar\/gkl822","article-title":"Vista enhancer browser\u2013a database of tissue-specific human enhancers","volume":"35","author":"Visel","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023013108003543200_btz064-B35","doi-asserted-by":"crossref","first-page":"74","DOI":"10.3115\/1705415.1705425","volume-title":"Proceedings of the Workshop on Geometrical Models of Natural Language Semantics, GEMS \u201909","author":"Vlachos","year":"2009"},{"key":"2023013108003543200_btz064-B36","first-page":"577","volume-title":"Proceedings of the Eighteenth International Conference on Machine Learning, ICML \u201901","author":"Wagstaff","year":"2001"},{"key":"2023013108003543200_btz064-B37","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1101\/gr.122382.111","article-title":"Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions","volume":"21","author":"Zentner","year":"2011","journal-title":"Genome Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/18\/3232\/48975377\/bioinformatics_35_18_3232.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/18\/3232\/48975377\/bioinformatics_35_18_3232.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T06:25:17Z","timestamp":1675146317000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/18\/3232\/5308600"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,2,7]]},"references-count":37,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2019,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz064","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/442392","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,9,15]]},"published":{"date-parts":[[2019,2,7]]}}}