{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T18:44:25Z","timestamp":1773773065688,"version":"3.50.1"},"reference-count":29,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2020,8,14]],"date-time":"2020-08-14T00:00:00Z","timestamp":1597363200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Consensus string is a significant feature of a deoxyribonucleic acid (DNA) sequence. The median string is one of the most popular exact algorithms to find DNA consensus. A DNA sequence is represented using the alphabet \u03a3= {a, c, g, t}. The algorithm generates a set of all the 4l possible motifs or l-mers from the alphabet to search a motif of length l. Out of all possible l-mers, it finds the consensus. This algorithm guarantees to return the consensus but this is NP-complete and runtime increases with the increase in l-mer size. Using transitional probability from the Markov chain, the proposed algorithm symmetrically generates four subsets of l-mers. Each of the subsets contains a few l-mers starting with a particular letter. We used these reduced sets of l-mers instead of using 4ll-mers. The experimental result shows that the proposed algorithm produces a much lower number of l-mers and takes less time to execute. In the case of l-mer of length 7, the proposed system is 48 times faster than the median string algorithm. For l-mer of size 7, the proposed algorithm produces only 2.5% l-mer in comparison with the median string algorithm. While compared with the recently proposed voting algorithm, our proposed algorithm is found to be 4.4 times faster for a longer l-mer size like 9.<\/jats:p>","DOI":"10.3390\/sym12081363","type":"journal-article","created":{"date-parts":[[2020,8,14]],"date-time":"2020-08-14T13:00:18Z","timestamp":1597410018000},"page":"1363","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["A Modified Median String Algorithm for Gene Regulatory Motif Classification"],"prefix":"10.3390","volume":"12","author":[{"given":"Mohammad Shibli","family":"Kaysar","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong 4349, Bangladesh"}]},{"given":"Mohammad Ibrahim","family":"Khan","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong 4349, Bangladesh"}]}],"member":"1968","published-online":{"date-parts":[[2020,8,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1089\/1066527041410319","article-title":"Methods in Comparative Genomics: Genome Correspondence, Gene Identification and Regulatory Motif Discovery","volume":"11","author":"Kellis","year":"2004","journal-title":"J. Comput. Boil."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","article-title":"CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice","volume":"22","author":"Thompson","year":"1994","journal-title":"Nucleic Acids Res."},{"key":"ref_3","first-page":"111","article-title":"Consensus sequence Zen","volume":"1","author":"Schneider","year":"2002","journal-title":"Appl. Bioinform."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1126\/science.8211139","article-title":"Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment","volume":"262","author":"Lawrence","year":"1993","journal-title":"Science"},{"key":"ref_5","unstructured":"Bailey, T.L., and Elkan, C. (1994, January 14\u201317). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the International Conference on Intelligent Systems for Molecular Biology, Stanford, CA, USA."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1350009","DOI":"10.1142\/S0219720013500091","article-title":"A heuristic cluster-based em algorithm for the planted (l, d) problem","volume":"11","author":"Zhang","year":"2013","journal-title":"J. Bioinform. Comput. Boil."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-11-S8-S1","article-title":"Efficient motif finding algorithms for large-alphabet inputs","volume":"11","author":"Kuksa","year":"2010","journal-title":"BMC Bioinform."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1137\/0149012","article-title":"Trees, stars, and multiple sequence alignment","volume":"49","author":"Altschul","year":"1989","journal-title":"SIAM J. Appl. Math."},{"key":"ref_9","unstructured":"Gramm, J., H\u00fcffner, F., and Niedermeier, R. (2002, January 18\u201321). Closest strings, primer design, and motif search. Proceedings of the 6th Annual International Conference on Computational Biology(RECOMB 2002), Washington, DC, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Gramm, J., Niedermeier, R., and Rossmanith, P. (2001, January 19\u201321). Exact solutions for closest string and related problems. Proceedings of the 12th International Symposium on Algorithms and Computation, Christchurch, New Zealand.","DOI":"10.1007\/3-540-45678-3_38"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Karp, R.M. (1993, January 16\u201318). Mapping the genome: Some combinatorial problems arising in molecular biology. Proceedings of the 25th Annual ACM Symposium on Theory of Computing, San Diego, CA, USA.","DOI":"10.1145\/167088.167170"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1145\/506147.506150","article-title":"On the closest string and substring problems","volume":"49","author":"Li","year":"2002","journal-title":"J. ACM"},{"key":"ref_13","unstructured":"Mauch, H., Melzer, M.J., and Hu, J.S. (2003, January 11\u201314). Genetic algorithm approach for the closest string problem. Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference, Stanford, CA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1287\/ijoc.1040.0090","article-title":"Optimal Solutions for the Closest-String Problem via Integer Programming","volume":"16","author":"Meneses","year":"2004","journal-title":"INFORMS J. Comput."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Nicolas, F., and Rivals, E. (2003, January 25\u201327). Complexities of the centre and median string problems. Proceedings of the 14th Symposium on Combinatorial Pattern Matching, Michoacan, Mexico.","DOI":"10.1007\/3-540-44888-8_23"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/s00453-003-1028-3","article-title":"Fixed-Parameter Algorithms for Closest String and Related Problems","volume":"37","author":"Gramm","year":"2003","journal-title":"Algorithmica"},{"key":"ref_17","unstructured":"Ma, B., and Sun, X. (April, January 30). More efficient algorithms for closest string and substring problems. Proceedings of the 12th Annual International Conference on Research in Computational Molecular Biology, Singapore."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., and Miller, W. (1997, January 6\u20138). A linear-time algorithm for the 1-mismatch problem. Proceedings of the 5th International Workshop on Algorithms and Data Structures, NS, Canada.","DOI":"10.1007\/3-540-63307-3_53"},{"key":"ref_19","unstructured":"Ben-Dor, A., Lancia, G., Perone, J., and Ravi, R. (July, January 30). Banishing bias from consensus sequences. Proceedings of the 8th Symposium on Combinatorial Pattern Matching, Aarhus, Denmark."},{"key":"ref_20","unstructured":"Gasieniec, L., Jansson, J., and Lingas, A. (1999, January 17\u201319). Efficient approximation algorithms for the Hamming center problem. Proceedings of the 10th ACM-SIAM Symposium on Discrete Algorithms, Baltimore, MD, USA."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1016\/S1570-8667(03)00079-0","article-title":"Approximation algorithms for Hamming clustering problems","volume":"2","author":"Jansson","year":"2004","journal-title":"J. Discret. Algorithms"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/S0890-5401(03)00057-9","article-title":"Distinguishing string selection problems","volume":"185","author":"Lanctot","year":"2003","journal-title":"Inf. Comput."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Boucher, C., Brown, D., and Durocher, S. (2008, January 11\u201312). On the structure of small motif recognition instances. Proceedings of the 15th Symposium on String Processing and Information Retrieval, Melbourne, Australia.","DOI":"10.1007\/978-3-540-89097-3_26"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sze, S., Lu, S., and Chen, J. (2004, January 17\u201321). Integrating sample-driven and pattern-driven approaches in motif finding. Proceedings of the 4th Workshop on Algorithms in Bioinformatics, Bargen, Norway.","DOI":"10.1007\/978-3-540-30219-3_37"},{"key":"ref_25","first-page":"130","article-title":"Review of different sequence motif finding algorithms","volume":"11","author":"Fatma","year":"2019","journal-title":"Avicenna J. Med. Biotechnol."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2641","DOI":"10.1093\/bioinformatics\/btr459","article-title":"Tree-structured algorithm for long weak motif discovery","volume":"27","author":"Sun","year":"2011","journal-title":"Bioinformatics"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Jia, C., Carson, M.B., Wang, Y., Lin, Y., and Lu, H. (2014). A New Exhaustive Method and Strategy for Finding Motifs in ChIP-Enriched Regions. PLoS ONE, 9.","DOI":"10.1371\/journal.pone.0086044"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1504\/IJBRA.2014.062990","article-title":"PMS6: A fast algorithm for motif discovery","volume":"10","author":"Bandyopadhyay","year":"2014","journal-title":"Int. J. Bioinform. Res. Appl."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1109\/TCBB.2014.2306842","article-title":"Improved Exact Enumerative Algorithms for the Planted (l, d)-Motif Search Problem","volume":"11","author":"Tanaka","year":"2014","journal-title":"IEEE\/ACM Trans. Comput. Boil. Bioinform."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/8\/1363\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:01:15Z","timestamp":1760176875000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/8\/1363"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,14]]},"references-count":29,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2020,8]]}},"alternative-id":["sym12081363"],"URL":"https:\/\/doi.org\/10.3390\/sym12081363","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,14]]}}}