{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T12:31:17Z","timestamp":1773664277372,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Gene expression arrays enable measurements of transcription values for a large number or all genes in the genome. In order to better interpret these resluts and to use them to reconstruct transcription networks, information on location of binding sites for regulatory proteins in the entire genome is needed. In particular, this represents an open problem in Escherichia coli.<\/jats:p><jats:p>Results: We describe the first implementation of dictionary-style models to the study of transcription factors binding sites in an entire genome. Vocabulon's unique feature is that it can both reconstruct binding sites characterized by unknown motifs and impute locations of known binding sites in long sequences by simultaneous search. On one hand, the dictionary model specifies a probability for the entire sequence taking simultaneously into account all the possible binding sites. This greatly reduces the number of false positives. On the other hand, the possibility of refining motif description, as an increasig number of binding sites are identified, augments the sensitivity of the method. We illustrate these properties with examples in E.coli. The results of gene expression arrays are used both to guide the search and corroborate it.<\/jats:p><jats:p>Availability: For copy of the Vocabulon program and other details please contact csabatti@mednet.ucla.edu<\/jats:p><jats:p>Contact: \u00a0csabatti@mednet.ucla.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti083","type":"journal-article","created":{"date-parts":[[2004,10,28]],"date-time":"2004-10-28T00:23:01Z","timestamp":1098922981000},"page":"922-931","source":"Crossref","is-referenced-by-count":14,"title":["Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites"],"prefix":"10.1093","volume":"21","author":[{"given":"Chiara","family":"Sabatti","sequence":"first","affiliation":[]},{"given":"Lars","family":"Rohlin","sequence":"additional","affiliation":[]},{"given":"Kenneth","family":"Lange","sequence":"additional","affiliation":[]},{"given":"James C.","family":"Liao","sequence":"additional","affiliation":[]}],"member":"286","published-online":{"date-parts":[[2004,10,27]]},"reference":[{"key":"2023013107281357400_B1","doi-asserted-by":"crossref","unstructured":"Avison, M.B., Horton, R.E., Walsh, T.R., Bennett, P.M. 2001Escherichia coli CreBC is a global regulator of gene expression that responds to growth in minimal media. J. Biol. Chem.2926955\u201326961","DOI":"10.1074\/jbc.M011186200"},{"key":"2023013107281357400_B2","unstructured":"Baum, L.E. 1972\u2018An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes.\u2019. Inequalities31\u20138"},{"key":"2023013107281357400_B3","unstructured":"Blattner, F.R., Plunkett, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., et al. 1997The complete genome sequence of Escherichia coli K-12. Science2771453\u20131474"},{"key":"2023013107281357400_B4","doi-asserted-by":"crossref","unstructured":"Bussemaker, H.J., Li, H., Siggia, E.D. 2000Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc. Natl Acad. Sci.9710096\u201310100","DOI":"10.1073\/pnas.180265397"},{"key":"2023013107281357400_B5","doi-asserted-by":"crossref","unstructured":"Bussemaker, H.J., Li, H., Siggia, E.D. 2001Regulatory element detection using correlation with expression. Nat. Genet.27167\u2013171","DOI":"10.1145\/369133.369174"},{"key":"2023013107281357400_B6","unstructured":"Colon, E., Liu, X., Lieb, J., Liu, J.S. 2003Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl Acad. Sci.1003339\u20133344"},{"key":"2023013107281357400_B7","doi-asserted-by":"crossref","unstructured":"Courcelle, J., Khodursky, A., Peter, B., Brown, P.O., Hanawalt, P.C. 2001Comparative gene expression profiles following UV exposure in wild-type and SOS-deficient Escherichia coli. Genetics15841\u201364","DOI":"10.1093\/genetics\/158.1.41"},{"key":"2023013107281357400_B8","unstructured":"Devijver, P.A. 1985Baum's forward\u2013backward algorithm revisited. Pattern Recogn. Lett.3369\u2013373"},{"key":"2023013107281357400_B9","doi-asserted-by":"crossref","unstructured":"Djordjevic, M., Sengupta, A.M., Shraiman, B.I. 2003A biophysical approach to transcription factor binding site discovery. Genome Res.132381\u20132390","DOI":"10.1101\/gr.1271603"},{"key":"2023013107281357400_B10","unstructured":"Gupta, M. and Liu, J.S. 2003Discovery of conserved sequence patterns using a stochastic dictionary model. J. Am. Statist. Assoc.9855\u201366"},{"key":"2023013107281357400_B11","doi-asserted-by":"crossref","unstructured":"Jennings, M. and Beacham, I.R. 1993Co-dependent positive regulation of the ansB promoter of Escherichia coli by CRP and the FNR protein: a molecular analysis. Mol. Microbiol.9155\u2013164","DOI":"10.1111\/j.1365-2958.1993.tb01677.x"},{"key":"2023013107281357400_B12","unstructured":"Keles, M., van der Laan, M., Eisen, M. 2002Identification of regulatory elements using a feature selection method. Bioinformatics181167\u20131175"},{"key":"2023013107281357400_B13","unstructured":"Lange, K., Hunter, D.R., Yang, I. 2000Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graph. Statist.91\u201359"},{"key":"2023013107281357400_B14","doi-asserted-by":"crossref","unstructured":"Lawrence, C.E. and Reilly, A.A. 1990An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins741\u201351","DOI":"10.1002\/prot.340070105"},{"key":"2023013107281357400_B15","unstructured":"Lawrence, C.E., Altschul, S.F., Bogouski, M.S., Liu, J.S., Neuwald, A.F., Wooten, J.C. 1993Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science262208\u2013214"},{"key":"2023013107281357400_B16","doi-asserted-by":"crossref","unstructured":"Liao, J., Boscolo, R., Yang, Y., Tran, L., Sabatti, C., Roychowdhury, V. 2003Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl Acad. Sci.10015522\u201315527","DOI":"10.1073\/pnas.2136632100"},{"key":"2023013107281357400_B17","doi-asserted-by":"crossref","unstructured":"McCue, L.A., Thompson, W., Carmack, C.S., Ryan, M.P., Liu, J.S., Derbyshire, V., Lawrence, C.E. 2001Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res.29774\u2013782","DOI":"10.1093\/nar\/29.3.774"},{"key":"2023013107281357400_B18","doi-asserted-by":"crossref","unstructured":"Park, K., Choi, S., Ko, M., Park, C. 2001Novel F-dependent genes of Escherichia coli found using a specified promoter consensus. FEMS Microbiol. Lett.202243\u2013250","DOI":"10.1111\/j.1574-6968.2001.tb10811.x"},{"key":"2023013107281357400_B19","doi-asserted-by":"crossref","unstructured":"Quandt, K., Frech, K., Karas, H., Wingender, E., Werner, T. 1995MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res.234878\u20134884","DOI":"10.1093\/nar\/23.23.4878"},{"key":"2023013107281357400_B20","doi-asserted-by":"crossref","unstructured":"Robison, K., McGuire, A.M., Church, G.M. 1998A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K12 genome. J. Mol. Biol.284241\u2013254","DOI":"10.1006\/jmbi.1998.2160"},{"key":"2023013107281357400_B21","doi-asserted-by":"crossref","unstructured":"Sabatti, C. and Lange, K. 2002Genomewide motif identification using a dictionary model. IEEE Proc.901803\u20131810","DOI":"10.1109\/JPROC.2002.804689"},{"key":"2023013107281357400_B22","doi-asserted-by":"crossref","unstructured":"Sabatti, C., Rohlin, L., Oh, M., Liao, J. 2002Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res.302886\u20132893","DOI":"10.1093\/nar\/gkf388"},{"key":"2023013107281357400_B23","doi-asserted-by":"crossref","unstructured":"Schneider, T.D. and Stephens, R.M. 1990Sequence logos: a new way to display consensus sequences. Nucleic Acids Res.186097\u20136100","DOI":"10.1093\/nar\/18.20.6097"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/7\/922\/48966536\/bioinformatics_21_7_922.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/7\/922\/48966536\/bioinformatics_21_7_922.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,29]],"date-time":"2023-04-29T18:41:09Z","timestamp":1682793669000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/7\/922\/268825"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,10,27]]},"references-count":23,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2005,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti083","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2005,4,1]]},"published":{"date-parts":[[2004,10,27]]}}}