{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,23]],"date-time":"2024-06-23T05:19:05Z","timestamp":1719119945574},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Some recent comparative studies have revealed that regulatory regions can retain function over large evolutionary distances, even though the DNA sequences are divergent and difficult to align. It is also known that such enhancers can drive very similar expression patterns. This poses a challenge for the in silico detection of biologically related sequences, as they can only be discovered using alignment-free methods.<\/jats:p><jats:p>Results: Here, we present a new computational framework called Regulatory Region Scoring (RRS) model for the detection of functional conservation of regulatory sequences using predicted occupancy levels of transcription factors of interest. We demonstrate that our model can detect the functional and\/or evolutionary links between some non-alignable enhancers with a strong statistical significance. We also identify groups of enhancers that are likely to be similarly regulated. Our model is motivated by previous work on prediction of expression patterns and it can capture similarity by strong binding sites, weak binding sites and even the statistically significant absence of sites. Our results support the hypothesis that weak binding sites contribute to the functional similarity of sequences.<\/jats:p><jats:p>Our model fills a gap between two families of models: detailed, data-intensive models for the prediction of precise spatio-temporal expression patterns on the one side, and crude, generally applicable models on the other side. Our model borrows some of the strengths of each group and addresses their drawbacks.<\/jats:p><jats:p>Availability: The RRS source code is freely available upon publication of this manuscript: http:\/\/www2.warwick.ac.uk\/fac\/sci\/systemsbiology\/staff\/ott\/tools_and_software\/rrs<\/jats:p><jats:p>Contact: \u00a0s.ott@warwick.ac.uk; hashem.koohy@warwick.ac.uk<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq453","type":"journal-article","created":{"date-parts":[[2010,8,10]],"date-time":"2010-08-10T02:04:44Z","timestamp":1281405884000},"page":"2391-2397","source":"Crossref","is-referenced-by-count":6,"title":["An alignment-free model for comparison of regulatory sequences"],"prefix":"10.1093","volume":"26","author":[{"given":"Hashem","family":"Koohy","sequence":"first","affiliation":[{"name":"1 MOAC Doctoral Training Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, 2MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Robinson Way, Cambridge, CB2 0SR, 3Department of Biological Sciences, Gibbet Hill Campus and 4Warwick Systems Biology Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, UK"}]},{"given":"Nigel P.","family":"Dyer","sequence":"additional","affiliation":[{"name":"1 MOAC Doctoral Training Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, 2MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Robinson Way, Cambridge, CB2 0SR, 3Department of Biological Sciences, Gibbet Hill Campus and 4Warwick Systems Biology Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, UK"}]},{"given":"John E.","family":"Reid","sequence":"additional","affiliation":[{"name":"1 MOAC Doctoral Training Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, 2MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Robinson Way, Cambridge, CB2 0SR, 3Department of Biological Sciences, Gibbet Hill Campus and 4Warwick Systems Biology Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, UK"}]},{"given":"Georgy","family":"Koentges","sequence":"additional","affiliation":[{"name":"1 MOAC Doctoral Training Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, 2MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Robinson Way, Cambridge, CB2 0SR, 3Department of Biological Sciences, Gibbet Hill Campus and 4Warwick Systems Biology Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, UK"}]},{"given":"Sascha","family":"Ott","sequence":"additional","affiliation":[{"name":"1 MOAC Doctoral Training Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, 2MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Robinson Way, Cambridge, CB2 0SR, 3Department of Biological Sciences, Gibbet Hill Campus and 4Warwick Systems Biology Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, UK"}]}],"member":"286","published-online":{"date-parts":[[2010,8,9]]},"reference":[{"issue":"Suppl. 2","key":"2023012508171464100_B1","doi-asserted-by":"crossref","first-page":"ii5","DOI":"10.1093\/bioinformatics\/btg1052","article-title":"Computational detection of cis-regulatory modules","volume":"19","author":"Aerts","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508171464100_B2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023012508171464100_B3","doi-asserted-by":"crossref","first-page":"5155","DOI":"10.1073\/pnas.83.14.5155","article-title":"A measure of the similarity of sets of sequences not requiring sequence alignment","volume":"83","author":"Blaisdell","year":"1986","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508171464100_B4","doi-asserted-by":"crossref","first-page":"2381","DOI":"10.1101\/gr.1271603","article-title":"A biophysical approach to transcription factor binding site discovery","volume":"13","author":"Djordjevic","year":"2003","journal-title":"Genome Res."},{"key":"2023012508171464100_B5","doi-asserted-by":"crossref","first-page":"e141","DOI":"10.1093\/bioinformatics\/btl223","article-title":"Statistical mechanical modeling of genome-wide transcription factor occupancy data by matrixreduce","volume":"22","author":"Foat","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012508171464100_B6","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1038\/nature07521","article-title":"Analysis of combinatorial cis-regulation in synthetic and genomic promoters","volume":"457","author":"Gertz","year":"2009","journal-title":"Nature"},{"key":"2023012508171464100_B7","doi-asserted-by":"crossref","first-page":"e1000106","DOI":"10.1371\/journal.pgen.1000106","article-title":"Sepsid even-skipped enhancers are functionally conserved in drosophila despite lack of sequence conservation","volume":"4","author":"Hare","year":"2008","journal-title":"PLoS Genet."},{"key":"2023012508171464100_B8","doi-asserted-by":"crossref","first-page":"i249","DOI":"10.1093\/bioinformatics\/btm211","article-title":"A statistical method for alignment-free comparison of regulatory sequences","volume":"23","author":"Kantorovitz","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508171464100_B9","doi-asserted-by":"crossref","first-page":"e6901","DOI":"10.1371\/journal.pone.0006901","article-title":"Identifying cis-regulatory sequences by word profile similarity","volume":"4","author":"Leung","year":"2009","journal-title":"PLoS One"},{"key":"2023012508171464100_B10","doi-asserted-by":"crossref","first-page":"13980","DOI":"10.1073\/pnas.202468099","article-title":"Distributional regimes for the number of k-word matches between two random sequences","volume":"99","author":"Lippert","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508171464100_B11","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1093\/bib\/bbp025","article-title":"Computational methods for the detection of cis-regulatory modules","volume":"10","author":"Loo","year":"2009","journal-title":"Brief. Bioinform."},{"key":"2023012508171464100_B12","doi-asserted-by":"crossref","first-page":"e93","DOI":"10.1371\/journal.pbio.0030093","article-title":"Functional evolution of a cis-regulatory module","volume":"3","author":"Ludwig","year":"2005","journal-title":"PLoS Biol."},{"key":"2023012508171464100_B13","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1093\/nar\/gkg108","article-title":"TRANSFAC: transcriptional regulation, from patterns to profiles","volume":"31","author":"Matys","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023012508171464100_B14","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023012508171464100_B15","doi-asserted-by":"crossref","first-page":"4960","DOI":"10.1073\/pnas.0500373102","article-title":"The role of binding site cluster strength in Bicoid-dependent patterning in drosophila","volume":"102","author":"Ochoa-Espinosa","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508171464100_B16","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1093\/bioinformatics\/btl565","article-title":"Predicting transcription factor affinities to DNA from a biophysical model","volume":"23","author":"Roider","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508171464100_B17","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1038\/nrg2591","article-title":"From DNA sequence to transcriptional behaviour: a quantitative approach","volume":"10","author":"Segal","year":"2009","journal-title":"Nat. Rev. Genet."},{"key":"2023012508171464100_B18","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1038\/nature06496","article-title":"Predicting expression patterns from regulatory sequence in drosophila segmentation","volume":"451","author":"Segal","year":"2008","journal-title":"Nature"},{"key":"2023012508171464100_B19","doi-asserted-by":"crossref","first-page":"855","DOI":"10.1016\/S0092-8674(94)90622-X","article-title":"Synergy between the hunchback and bicoid morphogens is required for anterior patterning in drosophila","volume":"78","author":"Simpson-Brose","year":"1994","journal-title":"Cell"},{"key":"2023012508171464100_B20","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023012508171464100_B21","doi-asserted-by":"crossref","first-page":"962","DOI":"10.1101\/gr.5113606","article-title":"Extensive low-affinity transcriptional interactions in the yeast genome","volume":"16","author":"Tanay","year":"2006","journal-title":"Genome Res."},{"key":"2023012508171464100_B22","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1093\/bioinformatics\/btg425","article-title":"Metrics for comparing regulatory sequences on the basis of pattern counts","volume":"20","author":"van Helden","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012508171464100_B23","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison-a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508171464100_B24","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1038\/nature08531","article-title":"Combinatorial binding predicts spatio-temporal cis-regulatory activity","volume":"462","author":"Zinzen","year":"2009","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/19\/2391\/48856764\/bioinformatics_26_19_2391.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/19\/2391\/48856764\/bioinformatics_26_19_2391.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T18:33:43Z","timestamp":1685730823000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/19\/2391\/229951"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,8,9]]},"references-count":24,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2010,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq453","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,10,1]]},"published":{"date-parts":[[2010,8,9]]}}}