{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:31:12Z","timestamp":1760243472214,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2013,5,22]],"date-time":"2013-05-22T00:00:00Z","timestamp":1369180800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>In biology, the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g., [FY ]DPC[LIM][ASG]C[ASG], are, in general, represented by degenerate patterns with character classes. Researchers have developed several approaches over the years to discover degenerate patterns. Although these methods have been exhaustively and successfully tested on genomes and proteins, their outcomes often far exceed the size of the original input, making the output hard to be managed and to be interpreted by refined analysis requiring manual inspection. In this paper, we discuss a characterization of degenerate patterns with character classes, without gaps, and we introduce the concept of pattern priority for comparing and ranking different patterns. We define the class of underlying patterns for filtering any set of degenerate patterns into a new set that is linear in the size of the input sequence. We present some preliminary results on the detection of subtle signals in protein families. Results show that our approach drastically reduces the number of patterns in output for a tool for protein analysis, while retaining the representative patterns.<\/jats:p>","DOI":"10.3390\/a6020352","type":"journal-article","created":{"date-parts":[[2013,5,22]],"date-time":"2013-05-22T12:45:12Z","timestamp":1369226712000},"page":"352-370","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Filtering Degenerate Patterns with Application to Protein Sequence Analysis"],"prefix":"10.3390","volume":"6","author":[{"given":"Matteo","family":"Comin","sequence":"first","affiliation":[{"name":"Department of Information Engineering, University of Padova, Padova 35131, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5152-4662","authenticated-orcid":false,"given":"Davide","family":"Verzotto","sequence":"additional","affiliation":[{"name":"Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore"}]}],"member":"1968","published-online":{"date-parts":[[2013,5,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"D245","DOI":"10.1093\/nar\/gkm977","article-title":"The 20 years of PROSITE","volume":"36","author":"Hulo","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"ref_2","unstructured":"Parida, L. (2007). Pattern Discovery in Bioinformatics: Theory and Algorithms, Mathematical and Computational Biology, Chapman and Hall\/CRC."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1093\/bioinformatics\/bti745","article-title":"A generic motif discovery algorithm for sequential data","volume":"22","author":"Jensen","year":"2006","journal-title":"Bioinformatics"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1039","DOI":"10.1137\/0216067","article-title":"Generalized string matching","volume":"16","author":"Abrahamson","year":"1987","journal-title":"SIAM J. Comput."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1089\/106652703322756140","article-title":"Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching","volume":"10","author":"Navarro","year":"2003","journal-title":"J. Comput. Biol."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1007\/s10791-008-9054-z","article-title":"Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance","volume":"11","author":"Fredriksson","year":"2008","journal-title":"Inf. Retr."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1145\/135239.135244","article-title":"Fast text searching: Allowing errors","volume":"35","author":"Wu","year":"1992","journal-title":"Commun. ACM"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1016\/0167-8655(94)00095-K","article-title":"Searching for flexible repeated patterns using a non-transitive similarity relation","volume":"16","author":"Soldano","year":"1995","journal-title":"Pattern Recognit. Lett."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/11496656_20","article-title":"Incremental inference of relational motifs with a degenerate Alphabet","volume":"3537","author":"Pisanti","year":"2005","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Frith, M.C., Saunders, N.F.W., Kobe, B., and Bailey, T.L. (2008). Discovering sequence motifs with arbitrary insertions and deletions. PLoS Comput. Biol., 4.","DOI":"10.1371\/journal.pcbi.1000071"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"5549","DOI":"10.1093\/nar\/gkf669","article-title":"Discovery of novel transcription factor binding sites by statistical overrepresentation","volume":"30","author":"Sinha","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"752","DOI":"10.1109\/TCBB.2008.123","article-title":"VARUN: Discovering extensible motifs under saturation constraints","volume":"7","author":"Apostolico","year":"2010","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinforma."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1109\/TCBB.2005.5","article-title":"Bases of motifs for generating repeated patterns with wild cards","volume":"2","author":"Pisanti","year":"2005","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinforma."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"793","DOI":"10.1007\/11889342_51","article-title":"Bridging lossy and lossless compression by motif pattern discovery","volume":"4123","author":"Apostolico","year":"2006","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_15","unstructured":"Apostolico, A., Comin, M., and Parida, L. (2004, January 23\u201325). Motifs in Ziv-Lempel-Welch Clef. Proceedings of IEEE DCC Data Compression Conference, Snowbird, UT, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Apostolico, A., Comin, M., and Parida, L. (2006). Mining, compressing and classifying with extensible motifs. Algorithms Mol. Biol., 1.","DOI":"10.1186\/1748-7188-1-4"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1819","DOI":"10.1089\/cmb.2010.0171","article-title":"The Irredundant Class method for remote homology detection of protein sequences","volume":"18","author":"Comin","year":"2011","journal-title":"J. Comput. Biol."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Comin, M., and Verzotto, D. (2010). Classification of protein sequences by means of irredundant patterns. BMC Bioinforma., 11.","DOI":"10.1186\/1471-2105-11-S1-S16"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Comin, M., and Verzotto, D. (2012). Alignment-Free phylogeny of whole genomes using underlying subwords. BMC Algorithms Mol. Biol., 7.","DOI":"10.1186\/1748-7188-7-34"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1016\/j.tcs.2008.01.017","article-title":"Detection of subtle variations as consensus motifs","volume":"395","author":"Comin","year":"2008","journal-title":"Theory Comput. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Comin, M., and Parida, L. (2007, January 14\u201317). Subtle Motif Discovery for Detection of Dna Regulatory Sites. Proceedings of the 5th Asia-Pacific Bioinformatics Conference, APBC, Hong Kong.","DOI":"10.1142\/9781860947995_0006"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1093\/bioinformatics\/bti745","article-title":"A generic motif discovery algorithm for sequential data","volume":"22","author":"Jensen","year":"2006","journal-title":"Bioinformatics"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1093\/bioinformatics\/btg431","article-title":"Mismatch string kernels for discriminative protein classification","volume":"20","author":"Leslie","year":"2004","journal-title":"Bioinformatics"},{"key":"ref_24","unstructured":"Dipartimento Di Ingegneria Dell\u2019Informazione. Available online: http:\/\/www.dei.unipd.it\/\u223cciompin\/main\/filtering.html."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1093\/bioinformatics\/bti1051","article-title":"Conservative extraction of over-represented extensible motifs","volume":"21","author":"Apostolico","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2996","DOI":"10.1093\/bioinformatics\/btl537","article-title":"MUSA: A parameter free algorithm for the identification of biologically significant motifs","volume":"22","author":"Mendes","year":"2006","journal-title":"Bioinformatics"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"6379","DOI":"10.1093\/nar\/gkl658","article-title":"Identification of degenerate motifs using position restricted selection and hybrid ranking combination","volume":"34","author":"Peng","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"W417","DOI":"10.1093\/nar\/gki459","article-title":"ARGO: A web system for the detection of degenerate motifs and large-scale recognition of eukaryotic promoters","volume":"33","author":"Vishnevsky","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1029","DOI":"10.1093\/bioinformatics\/btm041","article-title":"SPACER: Identification of cis-regulatory elements with non-contiguous critical residues","volume":"23","author":"Chakravarty","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wu, R., Chaivorapol, C., Zheng, J., Li, H., and Liang, S. (2007). fREDUCE: Detection of degenerate regulatory elements using correlation with expression. BMC Bioinforma., 8.","DOI":"10.1186\/1471-2105-8-399"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"W412","DOI":"10.1093\/nar\/gki492","article-title":"WordSpy: Identifying transcription factor binding motifs by building a dictionary and learning a grammar","volume":"33","author":"Wang","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"4341","DOI":"10.1016\/j.tcs.2009.07.015","article-title":"Maximal and minimal representations of gapped and non-gapped motifs of a string","volume":"410","author":"Ukkonen","year":"2009","journal-title":"Theoret. Comput. Sci."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"W217","DOI":"10.1093\/nar\/gkm376","article-title":"WebMOTIFS: Automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches","volume":"35","author":"Romer","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1093\/bioinformatics\/btn609","article-title":"ARCS-Motif: Discovering correlated motifs from unaligned biological sequences","volume":"25","author":"Zhang","year":"2009","journal-title":"Bioinformatics"},{"key":"ref_35","unstructured":"Coatney, M., and Parthasarathy, S. (2003, January 10\u201312). MotifMiner: A General Toolkit for Efficiently Identifying Common Substructures in Molecules. Proceedings of the 3rd IEEE BIBE, Maryland, MD, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"2288","DOI":"10.1093\/bioinformatics\/btn420","article-title":"MotifVoter: A novel ensemble method for fine-grained integration of generic motif finders","volume":"24","author":"Wijaya","year":"2008","journal-title":"Bioinformatics"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1038\/nbt1053","article-title":"Assessing computational tools for the discovery of transcription factor binding sites","volume":"23","author":"Tompa","year":"2005","journal-title":"Nat. Biotechnol."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.1093\/bioinformatics\/btn105","article-title":"CompariMotif: Quick and easy comparisons of sequence motifs","volume":"24","author":"Edwards","year":"2008","journal-title":"Bioinformatics"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1007\/978-1-4419-5913-3_3","article-title":"Searching Maximal Degenerate Motifs Guided by a Compact Suffix Tree","volume":"Volume 680","author":"Arabnia","year":"2010","journal-title":"Advances in Computational Biology"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"13763","DOI":"10.1073\/pnas.231499798","article-title":"Degeneracy and complexity in biological systems","volume":"98","author":"Edelman","year":"2001","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1093\/bioinformatics\/btg1079","article-title":"Finding optimal degenerate patterns in DNA sequences","volume":"19","author":"Shinozaki","year":"2003","journal-title":"Bioinformatics"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1093\/nar\/gkl198","article-title":"MEME: Discovering and analyzing DNA and protein sequence motifs","volume":"34","author":"Bailey","year":"2006","journal-title":"Nucleic Acids Res."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/6\/2\/352\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T21:46:55Z","timestamp":1760219215000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/6\/2\/352"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,5,22]]},"references-count":42,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2013,6]]}},"alternative-id":["a6020352"],"URL":"https:\/\/doi.org\/10.3390\/a6020352","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2013,5,22]]}}}