{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T13:38:38Z","timestamp":1742391518974},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Identification of motifs is one of the critical stages in studying the regulatory interactions of genes. Motifs can have complicated patterns. In particular, spaced motifs, an important class of motifs, consist of several short segments separated by spacers of different lengths. Locating spaced motifs is not trivial. Existing motif-finding algorithms are either designed for monad motifs (short contiguous patterns with some mismatches) or have assumptions on the spacer lengths or can only handle at most two segments. An effective motif finder for generic spaced motifs is highly desirable.<\/jats:p><jats:p>Results: This article proposes a novel approach for identifying spaced motifs with any number of spacers of different lengths. We introduce the notion of submotifs to capture the segments in the spaced motif and formulate the motif-finding problem as a frequent submotif mining problem. We provide an algorithm called SPACE to solve the problem. Based on experiments on real biological datasets, synthetic datasets and the motif assessment benchmarks by Tompa et al., we show that our algorithm performs better than existing tools for spaced motifs with improvements in both sensitivity and specificity and for monads, SPACE performs as good as other tools.<\/jats:p><jats:p>Availability: The source code is available upon request from the authors.<\/jats:p><jats:p>Contact: \u00a0ksung@comp.nus.edu.sg<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm118","type":"journal-article","created":{"date-parts":[[2007,5,6]],"date-time":"2007-05-06T00:28:57Z","timestamp":1178411337000},"page":"1476-1485","source":"Crossref","is-referenced-by-count":23,"title":["Detection of generic spaced motifs using submotif pattern mining"],"prefix":"10.1093","volume":"23","author":[{"given":"Edward","family":"Wijaya","sequence":"first","affiliation":[{"name":"1 Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, 2School of Computing, National University of Singapore, Singapore 119260, 3Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672 and 4Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong"},{"name":"1 Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, 2School of Computing, National University of Singapore, Singapore 119260, 3Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672 and 4Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kanagasabai","family":"Rajaraman","sequence":"additional","affiliation":[{"name":"1 Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, 2School of Computing, National University of Singapore, Singapore 119260, 3Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672 and 4Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siu-Ming","family":"Yiu","sequence":"additional","affiliation":[{"name":"1 Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, 2School of Computing, National University of Singapore, Singapore 119260, 3Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672 and 4Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wing-Kin","family":"Sung","sequence":"additional","affiliation":[{"name":"1 Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, 2School of Computing, National University of Singapore, Singapore 119260, 3Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672 and 4Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong"},{"name":"1 Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, 2School of Computing, National University of Singapore, Singapore 119260, 3Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672 and 4Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2007,5,5]]},"reference":[{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"1743","DOI":"10.1126\/science.1102216","article-title":"Environmentally induced foregut remodeling by PHA-4\/FoxA and DAF-12\/NHR","volume":"305","author":"Ao","year":"2001","journal-title":"Science"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1007\/BF00993379","article-title":"Unsupervised learning of multiple motifs in biopolymers using expectation maximization","volume":"21","author":"Bailey","year":"1995","journal-title":"Machine Learning"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1007\/BF02110333","article-title":"Yeast multidrug resistance: the PDR network","volume":"27","author":"Balzi","year":"1995","journal-title":"J. Bioenerg. Biomembr"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1046\/j.1365-2958.1998.00916.x","article-title":"A nonameric core sequence is required upstream of the LYS genes of Saccharomyces cerevisiae for Lys14p-mediated activation and apparent repression by lysine","volume":"29","author":"Becker","year":"1998","journal-title":"Mol. Microbiol"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1101\/gr.6902","article-title":"Discovery of regulatory elements by a computational method for phylogenetic Footprinting","volume":"12","author":"Blanchette","year":"2002","journal-title":"Genome Res"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"D63","DOI":"10.1093\/nar\/gkj116","article-title":"ABS: a database of annotated regulatory binding sites from orthologous promoters","volume":"34","author":"Blanco","year":"2006","journal-title":"Nucleic Acid Res"},{"key":"2023041105083374900_","first-page":"273","article-title":"A highly scalable algorithm for the extraction of cis-regulatory regions","author":"Carvalho","year":"2003"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"1114","DOI":"10.1093\/oxfordjournals.molbev.a004169","article-title":"Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover","volume":"19","author":"Dermitzakis","year":"2002","journal-title":"Mol. Biol. Evol"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"P7","DOI":"10.1186\/gb-2005-6-5-p7","article-title":"All motifs are not created equal: structural properties of transcription factor - DNA interaction and the inference of sequence specificity","volume":"6","author":"Eisen","year":"2005","journal-title":"Genome Biol"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"S354","DOI":"10.1093\/bioinformatics\/18.suppl_1.S354","article-title":"Finding composite regulatory patterns in DNA sequences","volume":"18","author":"Eskin","year":"2002","journal-title":"Bioinformatics"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"2240","DOI":"10.1093\/bioinformatics\/bti336","article-title":"A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length","volume":"21","author":"Favorov","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1093\/bioinformatics\/17.7.608","article-title":"Identifying target sites for cooperatively binding factors","volume":"17","author":"GuhaThakurta","year":"2001","journal-title":"Bioinformatics"},{"key":"2023041105083374900_","first-page":"230","article-title":"Data Mining: concepts and techniques","volume-title":"Morgan Kaufmann.","author":"Han","year":"2000"},{"key":"2023041105083374900_","first-page":"4472","article-title":"Mapping of epidermal growth factor-, serum-, and phorbol ester-responsive sequence elements in the c-jun promoter","volume":"12","author":"Han","year":"1992","journal-title":"Mol. Cell. Biol"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1038\/nature02800","article-title":"Transcription regulatory code of a eukaryotic genome","volume":"431","author":"Harbison","year":"2004","journal-title":"Nature"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"2229","DOI":"10.1073\/pnas.050586197","article-title":"Identification of CDK4 as a target of c-MYC","volume":"97","author":"Hermeking","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1093\/bioinformatics\/15.7.563","article-title":"Identifying DNA and protein patterns with statistically significant alighments of multiple sequences","volume":"15","author":"Hertz","year":"1999","journal-title":"Bioinformatics"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1093\/bioinformatics\/bti745","article-title":"A generic motif discovery algorithm for sequential data","volume":"22","author":"Jensen","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041105083374900_","first-page":"193","article-title":"Regulation of carbon and phosphate utilisation, In Molecular and Cellular Biology of the Yeast Saccharomyces: Gene Expression","author":"Johnston","year":"1992"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"R56","DOI":"10.1186\/gb-2004-5-8-r56","article-title":"Identifying combinatorial regulation of transcription factors and binding motifs","volume":"5","author":"Kato","year":"2004","journal-title":"Genome Biol"},{"key":"2023041105083374900_","first-page":"133","article-title":"Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment","volume":"1993","author":"Lawrence","year":"1993","journal-title":"Science"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/1475-4924-2-13","article-title":"Identification of conserved regulatory elements by comparative genome analysis","volume":"2","author":"Lenhard","year":"2003","journal-title":"J. Biol"},{"key":"2023041105083374900_","first-page":"127","article-title":"BioProspector: discovering DNA motifs in upstream regulatory regions of co-expressed genes","author":"Liu","year":"2001"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"12588","DOI":"10.1074\/jbc.M313454200","article-title":"Probing ArcA-P modulon of Escherichia coli by whole genome transcriptional analysis and sequence recognition profiling","volume":"279","author":"Liu","year":"2004","journal-title":"J. Biol. Chem"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"6016","DOI":"10.1093\/nar\/gkg799","article-title":"Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information","volume":"31","author":"Makeev","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1089\/106652700750050826","article-title":"Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification","volume":"7","author":"Marsan","year":"2000","journal-title":"J. Comp. Biol"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1046\/j.1365-2958.1999.01347.x","article-title":"A weight matrix for binding recognition by the redox-response regulator ArcA-P of Escherichia coli","volume":"32","author":"McGuire","year":"1999","journal-title":"Molecular Microbiology"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"809","DOI":"10.1007\/s000180050043","article-title":"Origins and evolutionary diversification of nuclear receptor superfamily","volume":"57","author":"Owen","year":"2000","journal-title":"Cell Mol. Life. Sci"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"S207","DOI":"10.1093\/bioinformatics\/17.suppl_1.S207","article-title":"An algorithm for finding signals of unknown length in DNA sequences","volume":"17","author":"Pavesi","year":"2001","journal-title":"Bioinformatics"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"6379","DOI":"10.1093\/nar\/gkl658","article-title":"Identification of degenerate motifs using position restricted selection and hybrid ranking combination","volume":"34","author":"Peng","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023041105083374900_","first-page":"792","article-title":"Escherichia coli. RNA polymerase \u03c370 promoters, and the kinetics of the stepstranscription initiation","volume":"1","author":"Record","year":"1996","journal-title":"Escherichia Coli and Salmonella"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1093\/bioinformatics\/14.1.55","article-title":"Combinatorial pattern discovery in biological sequences","volume":"14","author":"Rigoutsos","year":"1998","journal-title":"Bioinformatics"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"4599","DOI":"10.1093\/nar\/24.23.4599","article-title":"Comparative amino acid sequence analysis of the C6 zinc cluster family of transcriptional regulators","volume":"24","author":"Schjerling","year":"1996","journal-title":"Nucleic Acid Research"},{"key":"2023041105083374900_","first-page":"344","article-title":"A statistical method for finding transcription factor binding sites","author":"Sinha","year":"2000"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"1439","DOI":"10.1002\/yea.320111502","article-title":"Compilation and characteristics of dedicated transcription factors in Saccharomyces cerevisiae","volume":"11","author":"Svetlov","year":"1995","journal-title":"Yeast"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1038\/10343","article-title":"Systematic determination of genetic network architecture","volume":"22","author":"Tavazoie","year":"1999","journal-title":"Nat. Genet"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1089\/10665270252935566","article-title":"A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes","volume":"9","author":"Thijs","year":"2002","journal-title":"J. Comput. Biol"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1038\/nbt1053","article-title":"Assessing computational tools for the discovery of transcription factor binding sites","volume":"23","author":"Tompa","year":"2005","journal-title":"Nat. Biotechnol"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"3539","DOI":"10.1093\/nar\/gkg567","article-title":"Regulatory sequence analysis tools","volume":"31","author":"van Helden","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"1808","DOI":"10.1093\/nar\/28.8.1808","article-title":"Discovering regulatory elements in non-coding sequences by analysis of spaced dyads","volume":"28","author":"van Helden","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1006\/jmbi.1998.1700","article-title":"Identification of regulatory regions which confer muscle-specific gene expression","volume":"278","author":"Wasserman","year":"1998","journal-title":"J. Mol. Biol"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"1577","DOI":"10.1093\/bioinformatics\/btl147","article-title":"GAME: detecting cis-regulatory elements using a genetic algorithm","volume":"22","author":"Wei","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1007\/s003359900963","article-title":"Models for prediction and recognition of eukaryotic promoters","volume":"10","author":"Werner","year":"1999","journal-title":"Mamm. Genome"},{"key":"2023041105083374900_","doi-asserted-by":"crossref","first-page":"18759","DOI":"10.1074\/jbc.270.32.18759","article-title":"Regulation of the mouse histone H2A.X gene promoter by the transcription factor E2F and CCAAT binding protein","volume":"270","author":"Yagi","year":"1995","journal-title":"J. Biol. Chem"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/12\/1476\/49814701\/bioinformatics_23_12_1476.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/12\/1476\/49814701\/bioinformatics_23_12_1476.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,14]],"date-time":"2024-02-14T03:40:08Z","timestamp":1707882008000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/12\/1476\/223103"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,5,5]]},"references-count":44,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2007,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm118","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,6,15]]},"published":{"date-parts":[[2007,5,5]]}}}