{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T15:13:52Z","timestamp":1764688432012},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2004,10,28]],"date-time":"2004-10-28T00:00:00Z","timestamp":1098921600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"},{"start":{"date-parts":[[2004,10,28]],"date-time":"2004-10-28T00:00:00Z","timestamp":1098921600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>This paper addresses the problem of discovering transcription factor binding sites in <jats:italic>heterogeneous<\/jats:italic> sequence data, which includes regulatory sequences of one or more genes, as well as their orthologs in other species.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>We propose an algorithm that integrates two important aspects of a motif's significance \u2013 <jats:italic>overrepresentation<\/jats:italic> and <jats:italic>cross-species conservation<\/jats:italic> \u2013 into one probabilistic score. The algorithm allows the input orthologous sequences to be related by any user-specified phylogenetic tree. It is based on the Expectation-Maximization technique, and scales well with the number of species and the length of input sequences. We evaluate the algorithm on synthetic data, and also present results for data sets from yeast, fly, and human.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusions<\/jats:title>\n                        <jats:p>The results demonstrate that the new approach improves motif discovery by exploiting multiple species information.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-5-170","type":"journal-article","created":{"date-parts":[[2004,11,11]],"date-time":"2004-11-11T07:29:58Z","timestamp":1100158198000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":127,"title":["PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences"],"prefix":"10.1186","volume":"5","author":[{"given":"Saurabh","family":"Sinha","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mathieu","family":"Blanchette","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"Tompa","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2004,10,28]]},"reference":[{"issue":"1\u20132","key":"286_CR1","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1023\/A:1022617714621","volume":"21","author":"TL Bailey","year":"1995","unstructured":"Bailey TL, Elkan C: Unsupervised learning of multiple motifs in biopolymers using expectation maximization.\n                           Machine Learning 1995, 21(1\u20132):51\u201380. 10.1023\/A:1022617714621","journal-title":"Machine Learning"},{"issue":"2","key":"286_CR2","first-page":"81","volume":"6","author":"GZ Hertz","year":"1990","unstructured":"Hertz GZ, Hartzell GW III, Stormo GD: Identification of Consensus Patterns in Unaligned DNA Sequences Known to be Functionally Related.\n                           Computer Applications in the Biosciences 1990, 6(2):81\u201392.","journal-title":"Computer Applications in the Biosciences"},{"key":"286_CR3","doi-asserted-by":"publisher","first-page":"208","DOI":"10.1126\/science.8211139","volume":"262","author":"CE Lawrence","year":"1993","unstructured":"Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC: Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment.\n                           Science 1993, 262: 208\u2013214.","journal-title":"Science"},{"key":"286_CR4","doi-asserted-by":"publisher","first-page":"939","DOI":"10.1038\/nbt1098-939","volume":"16","author":"FP Roth","year":"1998","unstructured":"Roth FP, Hughes JD, Estep PW, Church GM: Finding DNA Regulatory Motifs Within Unaligned Noncoding Sequences Clustered by Whole-Genome mRNA Quantitation.\n                           Nature Biotechnology 1998, 16: 939\u2013945. 10.1038\/nbt1098-939","journal-title":"Nature Biotechnology"},{"key":"286_CR5","first-page":"344","volume-title":"In Proceedings of the Eigth International Conference on Intelligent Systems for Molecular Biology: August 2000; La Jolla","author":"S Sinha","year":"2000","unstructured":"Sinha S, Tompa M: A Statistical Method for Finding Transcription Factor Binding Sites.\n                           In Proceedings of the Eigth International Conference on Intelligent Systems for Molecular Biology: August 2000; La Jolla 2000, 344\u2013354."},{"issue":"5","key":"286_CR6","doi-asserted-by":"publisher","first-page":"827","DOI":"10.1006\/jmbi.1998.1947","volume":"281","author":"J van Helden","year":"1998","unstructured":"van Helden J, Andr\u00e9 B, Collado-Vides J: Extracting Regulatory Sites from the Upstream Region of Yeast Genes by Computational Analysis of Oligonucleotide Frequencies.\n                           Journal of Molecular Biology 1998, 281(5):827\u2013842. 10.1006\/jmbi.1998.1947","journal-title":"Journal of Molecular Biology"},{"key":"286_CR7","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1101\/gr.6902","volume":"12","author":"M Blanchette","year":"2002","unstructured":"Blanchette M, Tompa M: Discovery of Regulatory elements by a Computational Method for Phylogenetic Footprinting.\n                           Genome Research 2002, 12: 739\u2013748. 10.1101\/gr.6902","journal-title":"Genome Research"},{"issue":"5","key":"286_CR8","doi-asserted-by":"publisher","first-page":"832","DOI":"10.1101\/gr.225502. Article published online before print in April 2002","volume":"12","author":"G Loots","year":"2002","unstructured":"Loots G, Ovcharenko I, Pachter L, Dubchak I, Rubin E: rVista for comparative sequence-based discovery of functional transcription factor binding sites.\n                           Genome Research 2002, 12(5):832\u20139. 10.1101\/gr.225502. Article published online before print in April 2002","journal-title":"Genome Research"},{"issue":"3","key":"286_CR9","doi-asserted-by":"publisher","first-page":"695","DOI":"10.1093\/nar\/28.3.695","volume":"28","author":"M Gelfand","year":"2000","unstructured":"Gelfand M, Koonin E, Mironov A: Prediction of transcription regulatory sites in Archea by a comparative genomic approach.\n                           Nucleic Acids Research 2000, 28(3):695\u2013705. 10.1093\/nar\/28.3.695","journal-title":"Nucleic Acids Research"},{"key":"286_CR10","doi-asserted-by":"publisher","first-page":"744","DOI":"10.1101\/gr.10.6.744","volume":"10","author":"AM McGuire","year":"2000","unstructured":"McGuire AM, Hughes JD, Church GM: Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes.\n                           Genome Research 2000, 10: 744\u2013757. 10.1101\/gr.10.6.744","journal-title":"Genome Research"},{"key":"286_CR11","doi-asserted-by":"publisher","first-page":"1175","DOI":"10.1101\/gr.182901","volume":"11","author":"P Cliften","year":"2001","unstructured":"Cliften P, Hillier L, Fulton L, Graves T, Miner T, Gish W, Waterston R, Johnston M: Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis.\n                           Genome Research 2001, 11: 1175\u20131186. 10.1101\/gr.182901","journal-title":"Genome Research"},{"issue":"6937","key":"286_CR12","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1038\/nature01644","volume":"423","author":"M Kellis","year":"2003","unstructured":"Kellis M, Patterson N, Endrizzi M, Birren B, Lander E: Sequencing and comparison of yeast species to identify genes and regulatory elements.\n                           Nature 2003, 423(6937):241\u201354. 10.1038\/nature01644","journal-title":"Nature"},{"key":"286_CR13","doi-asserted-by":"publisher","first-page":"451","DOI":"10.1101\/gr.1327604","volume":"14","author":"Y Liu","year":"2004","unstructured":"Liu Y, Liu XS, Wei L, Altman R, Batzoglou S: Eukaryotic Regulatory Element Conservation Analysis and Identification Using Comparative Genomics.\n                           Genome Research 2004, 14: 451\u2013458. 10.1101\/gr.1327604","journal-title":"Genome Research"},{"key":"286_CR14","doi-asserted-by":"publisher","first-page":"701","DOI":"10.1101\/gr.228902","volume":"12","author":"D GuhaThakurta","year":"2002","unstructured":"GuhaThakurta D, Palomar L, Stormo G, Tedesco P, Johnson T, Walker D, Lithgow G, Kim S, Link C: Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods.\n                           Genome Research 2002, 12: 701\u201312. 10.1101\/gr.228902","journal-title":"Genome Research"},{"key":"286_CR15","first-page":"348","volume-title":"In Pacific Symposium on Biocomputing: January 2004; Hawaii","author":"A Prakash","year":"2004","unstructured":"Prakash A, Blanchette M, Sinha S, Tompa M: Motif discovery in heterogeneous sequence data.\n                           In Pacific Symposium on Biocomputing: January 2004; Hawaii 2004, 348\u2013359."},{"key":"286_CR16","doi-asserted-by":"crossref","unstructured":"Emberly E, Rajewsky N, Siggia E: Conservation of regulatory elements between two species of Drosophila.\n                           BMC Bioinformatics 2003., 4(57):","DOI":"10.1186\/1471-2105-4-57"},{"key":"286_CR17","volume-title":"In RECOMB Satellite Workshop on Regulatory Genomics","author":"R Siddharthan","year":"2004","unstructured":"Siddharthan R, van Nimwegen E, Siggia E: PhyloGibbs: Incorporating phylogeny and tracking-based significance assessment in a Gibbs sampler.\n                           In RECOMB Satellite Workshop on Regulatory Genomics 2004."},{"key":"286_CR18","first-page":"324","volume-title":"In Pacific Symposium on Biocomputing: January 2004; Hawaii","author":"A Moses","year":"2004","unstructured":"Moses A, Chiang D, Eisen M: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures.\n                           In Pacific Symposium on Biocomputing: January 2004; Hawaii 2004, 324\u2013335."},{"key":"286_CR19","volume-title":"In Mammalian Protein Metabolism","author":"T Jukes","year":"1969","unstructured":"Jukes T, Cantor C: Evolution of protein molecules. In In Mammalian Protein Metabolism. Edited by: Munro MN. Academic Press; 1969."},{"key":"286_CR20","doi-asserted-by":"publisher","first-page":"2369","DOI":"10.1093\/bioinformatics\/btg329","volume":"19","author":"T Wang","year":"2003","unstructured":"Wang T, Stormo G: Combining phylogenetic data with co-regulated genes to identify regulatorymotifs.\n                           Bioinformatics 2003, 19: 2369\u20132380. 10.1093\/bioinformatics\/btg329","journal-title":"Bioinformatics"},{"issue":"4","key":"286_CR21","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1101\/gr.926603","volume":"13","author":"M Brudno","year":"2003","unstructured":"Brudno M, Do C, Cooper G, Kim M, Davydov E, Green E, Sidow A, Batzoglou S, NISC Comparative Sequencing Program: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.\n                           Genome Research 2003, 13(4):721\u201331. 10.1101\/gr.926603","journal-title":"Genome Research"},{"key":"286_CR22","first-page":"292","volume-title":"In Proceedings of the Eleventh International Conference on Intelligent Systems for Molecular Biology: July 2003; Brisbane","author":"S Sinha","year":"2003","unstructured":"Sinha S, van Nimwegen E, Siggia E: A Probabilistic Method to Detect Regulatory Modules.\n                           In Proceedings of the Eleventh International Conference on Intelligent Systems for Molecular Biology: July 2003; Brisbane 2003, 292\u2013301."},{"key":"286_CR23","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological sequences analysis","author":"R Durbin","year":"1998","unstructured":"Durbin R, Eddy S, Krogh A, Mitchison G: Biological sequences analysis. Cambridge University Press; 1998."},{"issue":"12","key":"286_CR24","doi-asserted-by":"publisher","first-page":"3580","DOI":"10.1093\/nar\/gkg608","volume":"31","author":"W Thompson","year":"2003","unstructured":"Thompson W, Rouchka E, Lawrence C: Gibbs Recursive Sampler: finding transcription factor binding sites.\n                           Nucleic Acids Research 2003, 31(12):3580\u20133585. 10.1093\/nar\/gkg608","journal-title":"Nucleic Acids Research"},{"issue":"7\/8","key":"286_CR25","doi-asserted-by":"publisher","first-page":"607","DOI":"10.1093\/bioinformatics\/15.7.607","volume":"15","author":"J Zhu","year":"1999","unstructured":"Zhu J, Zhang MQ: SCPD: a Promoter Database of the Yeast\n                           Saccharomyces cerevisiae\n                           .\n                           Bioinformatics 1999, 15(7\/8):607\u2013611. [http:\/\/cgsigma.cshl.org\/jian\/] 10.1093\/bioinformatics\/15.7.607","journal-title":"Bioinformatics"},{"key":"286_CR26","doi-asserted-by":"crossref","unstructured":"Rajewsky N, Vergassola M, Gaul U, Siggia E: Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo.\n                           BMC Bioinformatics 2002., 3(30):","DOI":"10.1186\/1471-2105-3-30"},{"key":"286_CR27","unstructured":"WebLogo[http:\/\/weblogo.berkeley.edu\/]"},{"key":"286_CR28","doi-asserted-by":"publisher","first-page":"708","DOI":"10.1101\/gr.1933104","volume":"14","author":"M Blanchette","year":"2004","unstructured":"Blanchette M, Kent J, Riemer C, Elnitski L, Smit A, Roskin K, Baertsch R, Rosenbloom K, Clawson H, Green E, Haussler D, Miller W: Aligning multiple genomic sequences with the threaded blockset aligner.\n                           Genome Research 2004, 14: 708\u2013715. 10.1101\/gr.1933104","journal-title":"Genome Research"},{"key":"286_CR29","doi-asserted-by":"publisher","first-page":"368","DOI":"10.1007\/BF01734359","volume":"17","author":"J Felsenstein","year":"1981","unstructured":"Felsenstein J: Evolutionary trees from DNA sequences: maximum likelihood approach.\n                           Journal of Molecular Evolution 1981, 17: 368\u2013376.","journal-title":"Journal of Molecular Evolution"},{"issue":"13","key":"286_CR30","doi-asserted-by":"publisher","first-page":"3593","DOI":"10.1093\/nar\/gkg567","volume":"31","author":"J van Helden","year":"2003","unstructured":"van Helden J: Regulatory sequence analysis tools.\n                           Nucleic Acids Research 2003, 31(13):3593\u20136. 10.1093\/nar\/gkg567","journal-title":"Nucleic Acids Research"},{"key":"286_CR31","first-page":"41","volume":"10","author":"GJ Olsen","year":"1994","unstructured":"Olsen GJ, Matsuda H, Hagstrom R, Overbeek R: fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood.\n                           Comput Appl Biosci 1994, 10: 41\u201348.","journal-title":"Comput Appl Biosci"},{"key":"286_CR32","doi-asserted-by":"crossref","unstructured":"Celniker S, Wheeler D, Kronmiller B, Carlson J, Halpern A, Patel S, Adams M, Champe M, Dugan S, Frise E, Hodgson A, George R, Hoskins R, Laverty T, Muzny D, Nelson C, Pacleb J, Park S, Pfeiffer B, Richards S, Sodergren E, Svirskas R, Tabor P, Wan K, Stapleton M, Sutton G, Venter C, Weinstock G, Scherer S, Myers E, Gibbs R, Rubin G: Finishing a whole genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence.\n                           Genome Biology 2002., 3(12):","DOI":"10.1186\/gb-2002-3-12-research0079"},{"key":"286_CR33","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1093\/nar\/24.1.238","volume":"24","author":"E Wingender","year":"1996","unstructured":"Wingender E, Dietze P, Karas H, Kn\u00fcppel R: TRANSFAC: a Database on Transcription Factors and their DNA Binding Sites.\n                           Nucleic Acids Research 1996, 24: 238\u2013241. [http:\/\/transfac.gbf.de] 10.1093\/nar\/24.1.238","journal-title":"Nucleic Acids Research"},{"key":"286_CR34","unstructured":"HomoloGene[http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?db=homologene]"},{"key":"286_CR35","unstructured":"UCSC Genome Browser[http:\/\/genome.ucsc.edu\/]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-170.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-5-170\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-170.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:20:28Z","timestamp":1728303628000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-5-170"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,10,28]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2004,12]]}},"alternative-id":["286"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-5-170","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2004,10,28]]},"assertion":[{"value":"6 May 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 October 2004","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 October 2004","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"170"}}