{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T01:11:11Z","timestamp":1773277871832,"version":"3.50.1"},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Identifying regulatory elements is a fundamental problem in the field of gene transcription. Motif discovery\u2014the task of identifying the sequence preference of transcription factor proteins, which bind to these elements\u2014is an important step in this challenge. MEME is a popular motif discovery algorithm. Unfortunately, MEME\u2019s running time scales poorly with the size of the dataset. Experiments such as ChIP-Seq and DNase-Seq are providing a rich amount of information on the binding preference of transcription factors. MEME cannot discover motifs in data from these experiments in a practical amount of time without a compromising strategy such as discarding a majority of the sequences.<\/jats:p><jats:p>Results: We present EXTREME, a motif discovery algorithm designed to find DNA-binding motifs in ChIP-Seq and DNase-Seq data. Unlike MEME, which uses the expectation-maximization algorithm for motif discovery, EXTREME uses the online expectation-maximization algorithm to discover motifs. EXTREME can discover motifs in large datasets in a practical amount of time without discarding any sequences. Using EXTREME on ChIP-Seq and DNase-Seq data, we discover many motifs, including some novel and infrequent motifs that can only be discovered by using the entire dataset. Conservation analysis of one of these novel infrequent motifs confirms that it is evolutionarily conserved and possibly functional.<\/jats:p><jats:p>Availability and implementation: All source code is available at the Github repository http:\/\/github.com\/uci-cbcl\/EXTREME.<\/jats:p><jats:p>Contact: \u00a0xhx@ics.uci.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu093","type":"journal-article","created":{"date-parts":[[2014,2,15]],"date-time":"2014-02-15T02:04:22Z","timestamp":1392429862000},"page":"1667-1673","source":"Crossref","is-referenced-by-count":45,"title":["EXTREME: an online EM algorithm for motif discovery"],"prefix":"10.1093","volume":"30","author":[{"given":"Daniel","family":"Quang","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, University of California, Irvine, CA 92697, USA and 2Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA"},{"name":"1 Department of Computer Science, University of California, Irvine, CA 92697, USA and 2Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA"}]},{"given":"Xiaohui","family":"Xie","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of California, Irvine, CA 92697, USA and 2Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA"},{"name":"1 Department of Computer Science, University of California, Irvine, CA 92697, USA and 2Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,2,14]]},"reference":[{"key":"2023012711064222000_btu093-B1","doi-asserted-by":"crossref","first-page":"1653","DOI":"10.1093\/bioinformatics\/btr261","article-title":"DREME: motif discovery in transcription factor ChIP-seq data","volume":"27","author":"Bailey","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012711064222000_btu093-B2","first-page":"28","article-title":"Fitting a mixture model by expectation maximization to discover motifs in bipolymers","volume":"2","author":"Bailey","year":"1994","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"2023012711064222000_btu093-B3","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1007\/BF00993379","article-title":"Unsupervised learning of multiple motifs in biopolymers using expectation maximization","volume":"21","author":"Bailey","year":"1995","journal-title":"Mach. Learn."},{"key":"2023012711064222000_btu093-B4","first-page":"21","article-title":"The value of prior knowledge in discovering motifs with MEME","volume":"3","author":"Bailey","year":"1995","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"2023012711064222000_btu093-B5","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1186\/1471-2105-11-179","article-title":"The value of position-specific priors in motif discovery using MEME","volume":"11","author":"Bailey","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012711064222000_btu093-B6","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/nar\/27.2.573","article-title":"Tandem repeats finder: a program to analyze DNA sequences","volume":"27","author":"Benson","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"2023012711064222000_btu093-B7","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1038\/nature05874","article-title":"Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project","volume":"447","author":"Birney","year":"2007","journal-title":"Nature"},{"key":"2023012711064222000_btu093-B8","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1111\/j.1467-9868.2009.00698.x","article-title":"On-line expectation\u2013maximization algorithm for latent data models","volume":"71","author":"Capp\u00e9","year":"2009","journal-title":"J. R. Stat. Soc. Series B Stat. Methodol."},{"key":"2023012711064222000_btu093-B9","doi-asserted-by":"crossref","first-page":"901","DOI":"10.1101\/gr.3577405","article-title":"Distribution and intensity of constraint in mammalian genomic sequence","volume":"15","author":"Cooper","year":"2005","journal-title":"Genome Res."},{"issue":"1","key":"2023012711064222000_btu093-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Series B Methodol."},{"key":"2023012711064222000_btu093-B11","doi-asserted-by":"crossref","first-page":"R24","DOI":"10.1186\/gb-2007-8-2-r24","article-title":"Quantifying similarity between motifs","volume":"8","author":"Gupta","year":"2007","journal-title":"Genome Biol."},{"key":"2023012711064222000_btu093-B12","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1038\/nmeth.1313","article-title":"Global mapping of protein-DNA interactions in vivo by digital genomic footprinting","volume":"6","author":"Hesselberth","year":"2009","journal-title":"Nat. Methods"},{"key":"2023012711064222000_btu093-B13","doi-asserted-by":"crossref","first-page":"1497","DOI":"10.1126\/science.1141319","article-title":"Genome-wide mapping of in vivo protein-DNA interactions","volume":"316","author":"Johnson","year":"2007","journal-title":"Science"},{"key":"2023012711064222000_btu093-B14","doi-asserted-by":"crossref","first-page":"1696","DOI":"10.1093\/bioinformatics\/btr189","article-title":"MEME-ChIP: motif analysis of large DNA datasets","volume":"27","author":"Machanick","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012711064222000_btu093-B15","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1038\/nature11212","article-title":"An expansive human regulatory lexicon encoded in transcription factor footprints","volume":"489","author":"Neph","year":"2012","journal-title":"Nature"},{"key":"2023012711064222000_btu093-B16","first-page":"498","article-title":"Threshold for positional weight matrix","volume":"16","author":"Pan","year":"2009","journal-title":"Eng. Lett."},{"key":"2023012711064222000_btu093-B17","doi-asserted-by":"crossref","first-page":"e126","DOI":"10.1093\/nar\/gkr574","article-title":"STEME: efficient EM to find motifs in large data sets","volume":"39","author":"Reid","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012711064222000_btu093-B18","doi-asserted-by":"crossref","first-page":"D91","DOI":"10.1093\/nar\/gkh012","article-title":"JASPAR: an open-access database for eukaryotic transcription factor binding profiles","volume":"32","author":"Sandelin","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012711064222000_btu093-B19","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/11851561_19","article-title":"Accelerating motif discovery: motif matching on parallel hardware","volume-title":"Algorithms in Bioinformatics","author":"Sandve","year":"2006"},{"key":"2023012711064222000_btu093-B20","author":"Smit","year":"1996\u20132010"},{"key":"2023012711064222000_btu093-B21","doi-asserted-by":"crossref","first-page":"W86","DOI":"10.1093\/nar\/gkr377","article-title":"RSAT 2011: regulatory sequence analysis tools","volume":"39","author":"Thomas-Chollier","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012711064222000_btu093-B22","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/nature03441","article-title":"Systematic discovery of regulatory motifs in human promoters and 3\u2019 UTRs by comparison of several mammals","volume":"434","author":"Xie","year":"2005","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/12\/1667\/48927259\/bioinformatics_30_12_1667.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/12\/1667\/48927259\/bioinformatics_30_12_1667.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T07:48:38Z","timestamp":1716536918000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/12\/1667\/381282"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,2,14]]},"references-count":22,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2014,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu093","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,6,15]]},"published":{"date-parts":[[2014,2,14]]}}}