{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T07:36:43Z","timestamp":1778657803357,"version":"3.51.4"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>It has been increasingly appreciated that coding sequences harbor regulatory sequence motifs in addition to encoding for protein. These sequence motifs are expected to be overrepresented in nucleotide sequences bound by a common protein or small RNA. However, detecting overrepresented motifs has been difficult because of interference by constraints at the protein level. Sampling-based approaches to solve this problem based on codon-shuffling have been limited to exploring only an infinitesimal fraction of the sequence space and by their use of parametric approximations.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We present a novel <jats:italic>O<\/jats:italic>(<jats:italic>N<\/jats:italic>(log <jats:italic>N<\/jats:italic>)<jats:sup>2<\/jats:sup>)-time algorithm, CodingMotif, to identify nucleotide-level motifs of unusual copy number in protein-coding regions. Using a new dynamic programming algorithm we are able to exhaustively calculate the distribution of the number of occurrences of a motif over all possible coding sequences that encode the same amino acid sequence, given a background model for codon usage and dinucleotide biases. Our method takes advantage of the sparseness of loci where a given motif can occur, greatly speeding up the required convolution calculations. Knowledge of the distribution allows one to assess the exact non-parametric p-value of whether a given motif is over- or under- represented. We demonstrate that our method identifies known functional motifs more accurately than sampling and parametric-based approaches in a variety of coding datasets of various size, including ChIP-seq data for the transcription factors NRSF and GABP.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>CodingMotif provides a theoretically and empirically-demonstrated advance for the detection of motifs overrepresented in coding sequences. We expect CodingMotif to be useful for identifying motifs in functional genomic datasets such as DNA-protein binding, RNA-protein binding, or microRNA-RNA binding within coding regions. A software implementation is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/bioinformatics.bc.edu\/chuanglab\/codingmotif.tar\" ext-link-type=\"uri\">http:\/\/bioinformatics.bc.edu\/chuanglab\/codingmotif.tar<\/jats:ext-link>\n            <\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-13-32","type":"journal-article","created":{"date-parts":[[2012,2,14]],"date-time":"2012-02-14T13:14:24Z","timestamp":1329225264000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["CodingMotif: exact determination of overrepresented nucleotide motifs in coding sequences"],"prefix":"10.1186","volume":"13","author":[{"given":"Yang","family":"Ding","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"William A","family":"Lorenz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeffrey H","family":"Chuang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2012,2,14]]},"reference":[{"key":"5055_CR1","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1261\/rna.262607","volume":"13","author":"A Jambhekar","year":"2007","unstructured":"Jambhekar A, Derisi J: Cis-acting determinants of asymmetric, cytoplasmic RNA transport. RNA 2007, 13: 625\u2013642. 10.1261\/rna.262607","journal-title":"RNA"},{"key":"5055_CR2","doi-asserted-by":"publisher","first-page":"1281","DOI":"10.1093\/nar\/15.3.1281","volume":"15","author":"P Sharp","year":"1987","unstructured":"Sharp P, Li W: The codon Adaptation Index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Research 1987, 15: 1281\u20131295. 10.1093\/nar\/15.3.1281","journal-title":"Nucleic Acids Research"},{"key":"5055_CR3","doi-asserted-by":"publisher","first-page":"1007","DOI":"10.1126\/science.1073774","volume":"297","author":"W Fairbrother","year":"2002","unstructured":"Fairbrother W, Yeh R, Sharp P, Burge C: Predictive identification of exonic splicing enhancers in human genes. Science 2002, 297: 1007\u20131013. 10.1126\/science.1073774","journal-title":"Science"},{"key":"5055_CR4","doi-asserted-by":"publisher","first-page":"e180","DOI":"10.1371\/journal.pbio.0040180","volume":"4","author":"G Kudla","year":"2006","unstructured":"Kudla G, Lipinski L, Caffin F, Helwak A, Zylicz M: High guanine and cytosine content increases mRNA levels in mammalian cells. PLoS Biology 2006, 4: e180. 10.1371\/journal.pbio.0040180","journal-title":"PLoS Biology"},{"key":"5055_CR5","doi-asserted-by":"publisher","first-page":"1930","DOI":"10.1126\/science.1131262","volume":"314","author":"A Nackley","year":"2006","unstructured":"Nackley A, Shabalina S, Tchivileva I, Satterfield K, Korchynskyi O, Makarov S, Maixner W, Diatchenko L: Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 2006, 314: 1930\u20131933. 10.1126\/science.1131262","journal-title":"Science"},{"key":"5055_CR6","doi-asserted-by":"publisher","first-page":"e255","DOI":"10.1371\/journal.pbio.0060255","volume":"6","author":"D Hogan","year":"2008","unstructured":"Hogan D, Riordan D, Gerber A, Herschlag D, Brown P: Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system. PLoS Biology 2008, 6: e255. 10.1371\/journal.pbio.0060255","journal-title":"PLoS Biology"},{"key":"5055_CR7","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1038\/nature08170","volume":"460","author":"S Chi","year":"2009","unstructured":"Chi S, Zang J, Mele A, Darnell R: Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 2009, 460: 479\u2013486.","journal-title":"Nature"},{"key":"5055_CR8","doi-asserted-by":"publisher","first-page":"2085","DOI":"10.1038\/msb.2009.42","volume":"5","author":"T Koide","year":"2009","unstructured":"Koide T, Reiss D, Bare J, Pang W, Facciotti M, Schmid A, Marzolf MPB, Van P, Lo F, Pratap A, Deutsch E, Peterson A, Martin D, Baliga N: Prevalence of transcription promoters within archaeal operons and coding sequences. Molecular Systems Biology 2009, 5: 2085.","journal-title":"Molecular Systems Biology"},{"key":"5055_CR9","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1038\/nature05874","volume":"447","author":"ENCODE","year":"2007","unstructured":"ENCODE: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007, 447: 799. 10.1038\/nature05874","journal-title":"Nature"},{"key":"5055_CR10","doi-asserted-by":"publisher","first-page":"e27","DOI":"10.1371\/journal.pbio.0060027","volume":"6","author":"X Li","year":"2008","unstructured":"Li X, MacArthur S, Bourgon R, Nix D, Pollard D, Iyer V, Hechmer A, Simirenko LMMS, Hendriks CL, Chu H, Ogawa N, Inwood W, Sementchenko V, Beaton A, Weiszmann R, Celniker S, Knowles D, Gingeras T, Speed TMBME, Biggin M: Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biology 2008, 6: e27. 10.1371\/journal.pbio.0060027","journal-title":"PLoS Biology"},{"key":"5055_CR11","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s00284-003-4120-7","volume":"48","author":"S Boycheva","year":"2004","unstructured":"Boycheva S, Bachvarov B, Berzal-Heranz A, Ivanov I: Effect of 3' Terminal Codon Pairs with Different Frequency of Occurrence on the Expression of cat Gene in Escherichia coli. Current Microbiology 2004, 48: 97. 10.1007\/s00284-003-4120-7","journal-title":"Current Microbiology"},{"key":"5055_CR12","doi-asserted-by":"publisher","first-page":"R133","DOI":"10.1186\/gb-2009-10-11-r133","volume":"10","author":"D Kural","year":"2009","unstructured":"Kural D, Ding Y, Wu J, Korpi A, Chuang J: COMIT: identification of noncoding motifs under selection in coding sequences. Genome Biology 2009, 10: R133. 10.1186\/gb-2009-10-11-r133","journal-title":"Genome Biology"},{"key":"5055_CR13","doi-asserted-by":"publisher","first-page":"15751","DOI":"10.1073\/pnas.1006172107","volume":"107","author":"M Schnall-Levin","year":"2010","unstructured":"Schnall-Levin M, Zhao Y, Perrimon N, Berger B: Conserved microRNA targeting in Drosophila is as widespread in coding regions as in 3'UTRs. PNAS 2010, 107: 15751\u201315756. 10.1073\/pnas.1006172107","journal-title":"PNAS"},{"key":"5055_CR14","doi-asserted-by":"publisher","first-page":"14879","DOI":"10.1073\/pnas.0803230105","volume":"105","author":"J Forman","year":"2008","unstructured":"Forman J, Legesse-Miller A, Coller H: A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence. PNAS 2008, 105: 14879. 10.1073\/pnas.0803230105","journal-title":"PNAS"},{"key":"5055_CR15","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1186\/1471-2105-7-419","volume":"7","author":"T Down","year":"2006","unstructured":"Down T, Leong B, Hubbard T: A machine learning strategy to identify candidate binding sites in human protein-coding sequence. BMC Bioinformatics 2006, 7: 419. 10.1186\/1471-2105-7-419","journal-title":"BMC Bioinformatics"},{"key":"5055_CR16","doi-asserted-by":"publisher","first-page":"8370","DOI":"10.1128\/JB.187.24.8370-8374.2005","volume":"187","author":"H Robins","year":"2005","unstructured":"Robins H, Krasnitz M, Barak H, Levine A: A relative-entropy algorithm for genomic fingerprinting captures host-phage similarities. J Bacteriol 2005, 187: 8370\u20138374. 10.1128\/JB.187.24.8370-8374.2005","journal-title":"J Bacteriol"},{"key":"5055_CR17","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1101\/gr.105072.110","volume":"20","author":"S Itzkovitz","year":"2010","unstructured":"Itzkovitz S, Hodis E, Segal E: Overlapping codes within protein-coding sequences. Genome Research 2010, 20: 158.","journal-title":"Genome Research"},{"key":"5055_CR18","doi-asserted-by":"publisher","first-page":"3390","DOI":"10.1093\/nar\/gki615","volume":"33","author":"L Brocchieri","year":"2005","unstructured":"Brocchieri L, Karlin S: Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Research 2005, 33: 3390. 10.1093\/nar\/gki615","journal-title":"Nucleic Acids Research"},{"key":"5055_CR19","doi-asserted-by":"publisher","first-page":"665","DOI":"10.3181\/0704-MR-97","volume":"233","author":"H Robins","year":"2008","unstructured":"Robins H, Krasnitz M, Levine A: The computational detection of functional nucleotide sequence motifs in the coding regions of organisms. Exp Biol Med 2008, 233: 665\u2013673. 10.3181\/0704-MR-97","journal-title":"Exp Biol Med"},{"key":"5055_CR20","doi-asserted-by":"publisher","first-page":"e191","DOI":"10.1371\/journal.pgen.0020191","volume":"2","author":"M Stadler","year":"2006","unstructured":"Stadler M, Shomron N, Yeo G, Schneider A, Xiao X, Burge C: Inference of splicing regulatory activities by sequence neighborhood analysis. PLoS Genetics 2006, 2: e191. 10.1371\/journal.pgen.0020191","journal-title":"PLoS Genetics"},{"key":"5055_CR21","doi-asserted-by":"publisher","first-page":"18005","DOI":"10.1073\/pnas.0509229102","volume":"102","author":"A Jambhekar","year":"2005","unstructured":"Jambhekar A, McDermott K, Sorber K, Shepard K, Vale R, Takizawa P, DeRisi J: Unbiased selection of localization elements reveals cis-acting determinants of mRNA bud localization in Saccharomyces cerevisiae. PNAS 2005, 102: 18005\u201318010. 10.1073\/pnas.0509229102","journal-title":"PNAS"},{"key":"5055_CR22","first-page":"28","volume-title":"Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology","author":"TL Bailey","year":"1994","unstructured":"Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology 1994, 28\u201336."},{"key":"5055_CR23","doi-asserted-by":"publisher","first-page":"2042","DOI":"10.1101\/gr.1257503","volume":"13","author":"L Katz","year":"2003","unstructured":"Katz L, Burge C: Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Research 2003, 13: 2042\u20132051. 10.1101\/gr.1257503","journal-title":"Genome Research"},{"key":"5055_CR24","doi-asserted-by":"publisher","first-page":"987","DOI":"10.1093\/bioinformatics\/btg082","volume":"19","author":"S Boycheva","year":"2003","unstructured":"Boycheva S, Chkodrov G, Ivanov I: Codon pairs in the genome of Escherichia coli. Bioinformatics 2003, 19: 987. 10.1093\/bioinformatics\/btg082","journal-title":"Bioinformatics"},{"key":"5055_CR25","doi-asserted-by":"publisher","first-page":"R28","DOI":"10.1186\/gb-2005-6-3-r28","volume":"6","author":"G Moura","year":"2005","unstructured":"Moura G, Pinheiro M, Silva R, Miranda I, Afreixo V, Dias G, Freitas A, Oliveira J, Santos M: Comparative context analysis of codon pairs on an ORFeome scale. Genome Biology 2005, 6: R28. 10.1186\/gb-2005-6-3-r28","journal-title":"Genome Biology"},{"key":"5055_CR26","doi-asserted-by":"publisher","first-page":"e847","DOI":"10.1371\/journal.pone.0000847","volume":"9","author":"G Moura","year":"2007","unstructured":"Moura G, Pinheiro M, Arrais J, Gomes A, Carreto L, Freitas A, Oliveira J, Santos M: Large Scale Comparative Codon-Pair Context Analysis Unveils General Rules that Fine-Tune Evolution of mRNA Primary Structure. PLoS ONE 2007, 9: e847.","journal-title":"PLoS ONE"},{"key":"5055_CR27","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1006\/jmbi.1997.0951","volume":"268","author":"C Burge","year":"1997","unstructured":"Burge C, Karlin S: Prediction of Complete Gene Structures in Human Genomic DNA. J Mol Biol 1997, 268: 78. 10.1006\/jmbi.1997.0951","journal-title":"J Mol Biol"},{"key":"5055_CR28","doi-asserted-by":"publisher","first-page":"1360","DOI":"10.1101\/gr.119628.110","volume":"21","author":"S Ke","year":"2011","unstructured":"Ke S, Shang S, Kalachikov S, Morozova I, Yu L, Russo J, Ju J, Chasin L: Quantitative evaluation of all hexamers as exonic splicing elements. Genome Research 2011, 21: 1360. 10.1101\/gr.119628.110","journal-title":"Genome Research"},{"key":"5055_CR29","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1016\/j.cell.2009.01.002","volume":"136","author":"D Bartel","year":"2009","unstructured":"Bartel D: MicroRNAs: Target Recognition and Regulatory Functions. Cell 2009, 136: 215. 10.1016\/j.cell.2009.01.002","journal-title":"Cell"},{"key":"5055_CR30","doi-asserted-by":"publisher","first-page":"2322","DOI":"10.1093\/bioinformatics\/bti376","volume":"21","author":"P Arndt","year":"2005","unstructured":"Arndt P, Hwa T: Identification and measurement of neighbor-dependent nucleotide substitution processes. Bioinformatics 2005, 21: 2322. 10.1093\/bioinformatics\/bti376","journal-title":"Bioinformatics"},{"key":"5055_CR31","doi-asserted-by":"publisher","first-page":"829","DOI":"10.1038\/nmeth.1246","volume":"5","author":"A Valouev","year":"2008","unstructured":"Valouev A, Johnson D, Sundquist A, Medina C, Anton E, Batzoglou S, Myers R, Sidow A: Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nature Methods 2008, 5: 829. 10.1038\/nmeth.1246","journal-title":"Nature Methods"},{"key":"5055_CR32","doi-asserted-by":"publisher","first-page":"1916","DOI":"10.1101\/gr.108753.110","volume":"21","author":"M Lin","year":"2011","unstructured":"Lin M, Kheradpour P, Washietl S, Parker B, Pedersen J, Kellis M: Locating protein-coding seqeunces under selection for additional, overlapping functions in 29 mammalian genomes. Genome Research 2011, 21: 1916. 10.1101\/gr.108753.110","journal-title":"Genome Research"},{"key":"5055_CR33","doi-asserted-by":"publisher","first-page":"1720","DOI":"10.1126\/science.1162327","volume":"324","author":"G Badis","year":"2009","unstructured":"Badis G, Berger M, Philippakis A, Talukder S, Gehrke A, JAeger S, Chan E, Metzler G, Vedenko A, Chen X, Kuznetsov H, Wang C, Coburn D, Newburger D, Morris Q, Hughes T, Bulyk M: Diversity and Complexity in DNA Recognition by Transcription Factors. Science 2009, 324: 1720. 10.1126\/science.1162327","journal-title":"Science"},{"key":"5055_CR34","first-page":"898","volume-title":"Introduction to Algorithms","author":"T Cormen","year":"2009","unstructured":"Cormen T, Rivest R, Leierson C, Stein C: Polynomials and the FFT. In Introduction to Algorithms. 3rd edition. Cambridge: MIT Press; 2009:898\u2013925.","edition":"3"},{"key":"5055_CR35","doi-asserted-by":"publisher","first-page":"R86","DOI":"10.1186\/gb-2010-11-8-r86","volume":"11","author":"J Goecks","year":"2010","unstructured":"Goecks J, Nekutrenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology 2010, 11: R86. 10.1186\/gb-2010-11-8-r86","journal-title":"Genome Biology"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-13-32.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T18:23:39Z","timestamp":1630520619000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-13-32"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,2,14]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["5055"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-13-32","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,2,14]]},"assertion":[{"value":"7 September 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 February 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 February 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"32"}}