{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T19:50:28Z","timestamp":1770839428868,"version":"3.50.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"24","license":[{"start":{"date-parts":[[2017,8,28]],"date-time":"2017-08-28T00:00:00Z","timestamp":1503878400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Transcription factors play a crucial role in gene regulation by binding to specific regulatory sequences. The sequence motifs recognized by a transcription factor can be described in terms of position frequency matrices. When scanning a sequence for matches to a position frequency matrix, one needs to determine a cut-off, which then in turn results in a certain number of hits. In this paper we describe how to compute the distribution of match scores and of the number of motif hits, which are the prerequisites to perform motif hit enrichment analysis.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We put forward an improved compound Poisson model that supports general order-d Markov background models and which computes the number of motif-hits more accurately than earlier models. We compared the accuracy of the improved compound Poisson model with previously proposed models across a range of parameters and motifs, demonstrating the improvement. The importance of the order-d model is supported in a case study using CpG-island sequences.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The method is available as a Bioconductor package named \u2019motifcounter\u2019 https:\/\/bioconductor.org\/packages\/motifcounter.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx539","type":"journal-article","created":{"date-parts":[[2017,8,25]],"date-time":"2017-08-25T19:09:59Z","timestamp":1503688199000},"page":"3929-3937","source":"Crossref","is-referenced-by-count":8,"title":["An improved compound Poisson model for the number of motif hits in DNA sequences"],"prefix":"10.1093","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0376-0032","authenticated-orcid":false,"given":"Wolfgang","family":"Kopp","sequence":"first","affiliation":[{"name":"Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany"}]},{"given":"Martin","family":"Vingron","sequence":"additional","affiliation":[{"name":"Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany"}]}],"member":"286","published-online":{"date-parts":[[2017,8,28]]},"reference":[{"key":"2023020207504560300_btx539-B1","volume-title":"Molecular Biology of the Cell","author":"Alberts","year":"2002","edition":"4th edn"},{"key":"2023020207504560300_btx539-B2","first-page":"gkp335","article-title":"Meme suite: tools for motif discovery and searching","author":"Bailey","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023020207504560300_btx539-B3","doi-asserted-by":"crossref","first-page":"2933","DOI":"10.1093\/bioinformatics\/bti473","article-title":"Matinspector and beyond: promoter analysis based on transcription factor binding sites","volume":"21","author":"Cartharius","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020207504560300_btx539-B4","first-page":"563","article-title":"Matrix search 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices","volume":"11","author":"Chen","year":"1995","journal-title":"Comput. Appl. Biosci. CABIOS"},{"key":"2023020207504560300_btx539-B30"},{"key":"2023020207504560300_btx539-B5","doi-asserted-by":"crossref","first-page":"1372","DOI":"10.1093\/nar\/gkh299","article-title":"Detection of functional DNA motifs via statistical over-representation","volume":"32","author":"Frith","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023020207504560300_btx539-B6","doi-asserted-by":"crossref","first-page":"1017","DOI":"10.1093\/bioinformatics\/btr064","article-title":"Fimo: scanning for occurrences of a given motif","volume":"27","author":"Grant","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020207504560300_btx539-B7","first-page":"151","author":"Kemp","year":"1967"},{"key":"2023020207504560300_btx539-B8","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1101\/gr.229102","article-title":"The human genome browser at ucsc","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res"},{"key":"2023020207504560300_btx539-B9","doi-asserted-by":"crossref","first-page":"D195","DOI":"10.1093\/nar\/gks1089","article-title":"Hocomoco: a comprehensive collection of human transcription factor binding sites models","volume":"41","author":"Kulakovskiy","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020207504560300_btx539-B10","doi-asserted-by":"crossref","first-page":"8.","DOI":"10.1186\/1748-7188-1-8","article-title":"Analysis of computational approaches for motif discovery","volume":"1","author":"Li","year":"2006","journal-title":"Algorithms Mol. Biol"},{"key":"2023020207504560300_btx539-B11","author":"Marschall","year":"2010"},{"key":"2023020207504560300_btx539-B12","doi-asserted-by":"crossref","first-page":"165.","DOI":"10.1186\/1471-2105-11-165","article-title":"Motif enrichment analysis: a unified framework and an evaluation on chip data","volume":"11","author":"McLeay","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020207504560300_btx539-B13","author":"Neyman","year":"1933"},{"key":"2023020207504560300_btx539-B14","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1089\/cmb.2007.0084","article-title":"Compound poisson approximation of the number of occurrences of a position frequency matrix (pfm) on both strands","volume":"15","author":"Pape","year":"2008","journal-title":"J. Comput. Biol"},{"key":"2023020207504560300_btx539-B15","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1032","article-title":"On the power of profiles for transcription factor binding site detection","volume":"2","author":"Rahmann","year":"2003","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023020207504560300_btx539-B16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1089\/10665270050081360","article-title":"Probabilistic and statistical properties of words: an overview","volume":"7","author":"Reinert","year":"2000","journal-title":"J. Comput. Biol"},{"key":"2023020207504560300_btx539-B17","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1093\/bioinformatics\/btl565","article-title":"Predicting transcription factor affinities to DNA from a biophysical model","volume":"23","author":"Roider","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020207504560300_btx539-B18","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1093\/bioinformatics\/btn627","article-title":"Pastaa: identifying transcription factors associated with sets of co-regulated genes","volume":"25","author":"Roider","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020207504560300_btx539-B19","doi-asserted-by":"crossref","first-page":"D91","DOI":"10.1093\/nar\/gkh012","article-title":"Jaspar: an open-access database for eukaryotic transcription factor binding profiles","volume":"32","author":"Sandelin","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023020207504560300_btx539-B20","doi-asserted-by":"crossref","first-page":"6097","DOI":"10.1093\/nar\/18.20.6097","article-title":"Sequence logos: a new way to display consensus sequences","volume":"18","author":"Schneider","year":"1990","journal-title":"Nucleic Acids Res"},{"key":"2023020207504560300_btx539-B22","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1093\/bioinformatics\/16.1.16","article-title":"DNA binding sites: representation and discovery","volume":"16","author":"Stormo","year":"2000","journal-title":"Bioinformatics"},{"key":"2023020207504560300_btx539-B23","doi-asserted-by":"crossref","first-page":"W119","DOI":"10.1093\/nar\/gkn304","article-title":"Rsat: regulatory sequence analysis tools","volume":"36","author":"Thomas-Chollier","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023020207504560300_btx539-B24","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1038\/nature11232","article-title":"The accessible chromatin landscape of the human genome","volume":"489","author":"Thurman","year":"2012","journal-title":"Nature"},{"key":"2023020207504560300_btx539-B25","doi-asserted-by":"crossref","first-page":"1748","DOI":"10.1186\/1748-7188-2-15","article-title":"Efficient and accurate p-value computation for position weight matrices","volume":"2","author":"Touzet","year":"2007","journal-title":"Algorithms Mol. Biol"},{"key":"2023020207504560300_btx539-B26","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4899-6846-3","volume-title":"Introduction to Computational Biology: Maps, Sequences and Genomes","author":"Waterman","year":"1995"},{"key":"2023020207504560300_btx539-B27","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1093\/nar\/24.1.238","article-title":"Transfac: a database on transcription factors and their DNA binding sites","volume":"24","author":"Wingender","year":"1996","journal-title":"Nucleic Acids Res"},{"key":"2023020207504560300_btx539-B28","doi-asserted-by":"crossref","first-page":"W247","DOI":"10.1093\/nar\/gkp464","article-title":"Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes","volume":"37","author":"Zambelli","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023020207504560300_btx539-B29","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1093\/bioinformatics\/btl662","article-title":"Computing exact p-values for DNA motifs","volume":"23","author":"Zhang","year":"2007","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/24\/3929\/49041952\/bioinformatics_33_24_3929.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/24\/3929\/49041952\/bioinformatics_33_24_3929.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T07:52:13Z","timestamp":1675324333000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/24\/3929\/4096361"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,8,28]]},"references-count":29,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2017,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx539","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,12,15]]},"published":{"date-parts":[[2017,8,28]]}}}