{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T22:18:47Z","timestamp":1775254727797,"version":"3.50.1"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,8,23]],"date-time":"2022-08-23T00:00:00Z","timestamp":1661212800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,8,23]],"date-time":"2022-08-23T00:00:00Z","timestamp":1661212800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"ICT Division, Ministry of Posts, Telecommunications and Information Technology, Government of Bangladesh","award":["SL 311, 1st Round, 2020-2021"],"award-info":[{"award-number":["SL 311, 1st Round, 2020-2021"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>DNA sequence similarity analysis is necessary for enormous purposes including genome analysis, extracting biological information, finding the evolutionary relationship of species. There are two types of sequence analysis which are alignment-based (AB) and alignment-free (AF). AB is effective for small homologous sequences but becomes <jats:italic>NP<\/jats:italic>-hard problem for long sequences. However, AF algorithms can solve the major limitations of AB. But most of the existing AF methods show high time complexity and memory consumption, less precision, and less performance on benchmark datasets. To minimize these limitations, we develop an AF algorithm using a 2D <jats:inline-formula><jats:alternatives><jats:tex-math>$$k-mer$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mi>k<\/mml:mi>\n                    <mml:mo>-<\/mml:mo>\n                    <mml:mi>m<\/mml:mi>\n                    <mml:mi>e<\/mml:mi>\n                    <mml:mi>r<\/mml:mi>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> count matrix inspired by the CGR approach. Then we shrink the matrix by analyzing the neighbors and then measure similarities using the best combinations of pairwise distance (PD) and phylogenetic tree methods. We also dynamically choose the value of <jats:italic>k<\/jats:italic> for <jats:inline-formula><jats:alternatives><jats:tex-math>$$k-mer$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mi>k<\/mml:mi>\n                    <mml:mo>-<\/mml:mo>\n                    <mml:mi>m<\/mml:mi>\n                    <mml:mi>e<\/mml:mi>\n                    <mml:mi>r<\/mml:mi>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>. We develop an efficient system for finding the positions of <jats:inline-formula><jats:alternatives><jats:tex-math>$$k-mer$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mi>k<\/mml:mi>\n                    <mml:mo>-<\/mml:mo>\n                    <mml:mi>m<\/mml:mi>\n                    <mml:mi>e<\/mml:mi>\n                    <mml:mi>r<\/mml:mi>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> in the count matrix. We apply our system in six different datasets. We achieve the top rank for two benchmark datasets from AFproject, 100% accuracy for two datasets (16\u00a0S Ribosomal, 18 Eutherian), and achieve a milestone for time complexity and memory consumption in comparison to the existing study datasets (HEV, HIV-1). Therefore, the comparative results of the benchmark datasets and existing studies demonstrate that our method is highly effective, efficient, and accurate. Thus, our method can be used with the top level of authenticity for DNA sequence similarity measurement.<\/jats:p>","DOI":"10.1007\/s40747-022-00846-y","type":"journal-article","created":{"date-parts":[[2022,8,23]],"date-time":"2022-08-23T05:02:35Z","timestamp":1661230955000},"page":"1265-1280","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["A fast and efficient algorithm for DNA sequence similarity identification"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1572-8606","authenticated-orcid":false,"given":"Machbah","family":"Uddin","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5432-1406","authenticated-orcid":false,"given":"Mohammad Khairul","family":"Islam","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9368-6942","authenticated-orcid":false,"given":"Md. Rakib","family":"Hassan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9991-9575","authenticated-orcid":false,"given":"Farah","family":"Jahan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4606-6737","authenticated-orcid":false,"given":"Joong Hwan","family":"Baek","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,8,23]]},"reference":[{"key":"846_CR1","doi-asserted-by":"crossref","unstructured":"Adetiba E, Badejo JA, Thakur S, Matthews VO, Adebiyi MO, Adebiyi EF (2017) Experimental investigation of frequency chaos game representation for in silico and accurate classification of viral pathogens from genomic sequences. In: International conference on bioinformatics and biomedical engineering, pp 155\u2013164. Springer, New York","DOI":"10.1007\/978-3-319-56148-6_13"},{"issue":"5","key":"846_CR2","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1093\/bioinformatics\/17.5.429","volume":"17","author":"JS Almeida","year":"2001","unstructured":"Almeida JS, Carrico JA, Maretzek A, Noble PA, Fletcher M (2001) Analysis of genomic sequences by chaos game representation. Bioinformatics 17(5):429\u2013437","journal-title":"Bioinformatics"},{"issue":"2","key":"846_CR3","first-page":"218","volume":"66","author":"M Bogusz","year":"2017","unstructured":"Bogusz M, Whelan S (2017) Phylogenetic tree estimation with and without alignment: new distance methods and benchmarking. Syst Biol 66(2):218\u2013231","journal-title":"Syst Biol"},{"issue":"10","key":"846_CR4","first-page":"1","volume":"21","author":"S Briand","year":"2020","unstructured":"Briand S, Dessimoz C, El-Mabrouk N, Lafond M, Lobinska G (2020) A generalized robinson-foulds distance for labeled trees. BMC Genom 21(10):1\u201313","journal-title":"BMC Genom"},{"key":"846_CR5","doi-asserted-by":"publisher","first-page":"319","DOI":"10.3389\/fgene.2018.00319","volume":"9","author":"S Cai","year":"2018","unstructured":"Cai S, Georgakilas GK, Johnson JL, Vahedi G (2018) A cosine similarity-based method to infer variability of chromatin accessibility at the single-cell level. Front Genet 9:319","journal-title":"Front Genet"},{"issue":"2","key":"846_CR6","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1109\/TCBB.2020.2973084","volume":"18","author":"W Chen","year":"2020","unstructured":"Chen W, Li W (2020) Definition and usage of texture feature for biological sequence. IEEE\/ACM Trans Comput Biol Bioinf 18(2):773\u2013776","journal-title":"IEEE\/ACM Trans Comput Biol Bioinf"},{"key":"846_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.jtbi.2018.07.001","volume":"455","author":"W Chen","year":"2018","unstructured":"Chen W, Liao B, Li W (2018) Use of image texture analysis to find dna sequence similarities. J Theor Biol 455:1\u20136","journal-title":"J Theor Biol"},{"key":"846_CR8","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmgm.2020.107603","volume":"99","author":"E Deliba\u015f","year":"2020","unstructured":"Deliba\u015f E, Arslan A (2020) Dna sequence similarity analysis using image texture analysis based on first-order statistics. J Mol Graph Model 99:107603","journal-title":"J Mol Graph Model"},{"key":"846_CR9","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmgm.2020.107693","volume":"100","author":"E Deliba\u015f","year":"2020","unstructured":"Deliba\u015f E, Arslan A, \u015eeker A, Diri B (2020) A novel alignment-free dna sequence similarity analysis approach based on top-k n-gram match-up. J Mol Graph Model 100:107693","journal-title":"J Mol Graph Model"},{"key":"846_CR10","doi-asserted-by":"crossref","unstructured":"Dick K, Green JR (2020) Chaos game representations & deep learning for proteome-wide protein prediction. In: 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 115\u2013121. IEEE","DOI":"10.1109\/BIBE50027.2020.00027"},{"key":"846_CR11","doi-asserted-by":"crossref","unstructured":"Emam M, Ali A, Abdelrazik E, Elattar M, El-Hadidi M (2020) Detection of mammalian coding sequences using a hybrid approach of chaos game representation and machine learning. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 2949\u20132951","DOI":"10.1109\/BIBM49941.2020.9313497"},{"issue":"7","key":"846_CR12","doi-asserted-by":"publisher","first-page":"685","DOI":"10.1093\/oxfordjournals.molbev.a025808","volume":"14","author":"O Gascuel","year":"1997","unstructured":"Gascuel O (1997) Bionj: an improved version of the nj algorithm based on a simple model of sequence data. Mol Biol Evol 14(7):685\u2013695","journal-title":"Mol Biol Evol"},{"issue":"5","key":"846_CR13","doi-asserted-by":"publisher","first-page":"1229","DOI":"10.1093\/molbev\/mst012","volume":"30","author":"BG Hall","year":"2013","unstructured":"Hall BG (2013) Building phylogenetic trees from molecular data with mega. Mol Biol Evol 30(5):1229\u20131235","journal-title":"Mol Biol Evol"},{"key":"846_CR14","doi-asserted-by":"publisher","first-page":"342","DOI":"10.1016\/j.jmgm.2017.07.019","volume":"76","author":"X Jin","year":"2017","unstructured":"Jin X, Jiang Q, Chen Y, Lee SJ, Nie R, Yao S, Zhou D, He K (2017) Similarity\/dissimilarity calculation methods of dna sequences: A survey. J Mol Graph Model 76:342\u2013355","journal-title":"J Mol Graph Model"},{"key":"846_CR15","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1016\/j.physa.2016.05.004","volume":"461","author":"X Jin","year":"2016","unstructured":"Jin X, Nie R, Zhou D, Yao S, Chen Y, Yu J, Wang Q (2016) A novel dna sequence similarity calculation based on simplified pulse-coupled neural network and huffman coding. Phys A 461:325\u2013338","journal-title":"Phys A"},{"issue":"1","key":"846_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-7-243","volume":"7","author":"J Joseph","year":"2006","unstructured":"Joseph J, Sasikumar R (2006) Chaos game representation for comparison of whole genomes. BMC Bioinf 7(1):1\u201310","journal-title":"BMC Bioinf"},{"issue":"3","key":"846_CR17","doi-asserted-by":"publisher","first-page":"1428","DOI":"10.1016\/j.ygeno.2021.03.015","volume":"113","author":"A Kania","year":"2021","unstructured":"Kania A, Sarapata K (2021) The robustness of the chaos game representation to mutations and its application in free-alignment methods. Genomics 113(3):1428\u20131437","journal-title":"Genomics"},{"issue":"7","key":"846_CR18","doi-asserted-by":"publisher","first-page":"2040","DOI":"10.1093\/bioinformatics\/btz903","volume":"36","author":"F Kl\u00f6tzl","year":"2020","unstructured":"Kl\u00f6tzl F, Haubold B (2020) Phylonium: fast estimation of evolutionary distances from large samples of similar genomes. Bioinformatics 36(7):2040\u20132046","journal-title":"Bioinformatics"},{"issue":"1","key":"846_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-019-3330-3","volume":"20","author":"D Lichtblau","year":"2019","unstructured":"Lichtblau D (2019) Alignment-free genomic sequence comparison using fcgr and signal processing. BMC Bioinformatics 20(1):1\u201317","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"846_CR20","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1093\/bioinformatics\/btz493","volume":"36","author":"HF L\u00f6chel","year":"2020","unstructured":"L\u00f6chel HF, Eger D, Sperlea T, Heider D (2020) Deep learning on chaos game representation for proteins. Bioinformatics 36(1):272\u2013279","journal-title":"Bioinformatics"},{"issue":"2","key":"846_CR21","first-page":"1","volume":"18","author":"B Lu","year":"2017","unstructured":"Lu B, Zhang L, Leong HW (2017) A program to compute the soft robinson-foulds distance between phylogenetic networks. BMC Genom 18(2):1\u201310","journal-title":"BMC Genom"},{"issue":"W1","key":"846_CR22","doi-asserted-by":"publisher","first-page":"W554","DOI":"10.1093\/nar\/gkx351","volume":"45","author":"YY Lu","year":"2017","unstructured":"Lu YY, Tang K, Ren J, Fuhrman JA, Waterman MS, Sun F (2017) Cafe: a c celerated a lignment-f r e e sequence analysis. Nucleic Acids Res 45(W1):W554\u2013W559","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"846_CR23","doi-asserted-by":"publisher","first-page":"1222","DOI":"10.1093\/bib\/bbx161","volume":"20","author":"BB Luczak","year":"2019","unstructured":"Luczak BB, James BT, Girgis HZ (2019) A survey and evaluations of histogram-based statistics in alignment-free sequence comparison. Brief Bioinform 20(4):1222\u20131237","journal-title":"Brief Bioinform"},{"issue":"5","key":"846_CR24","doi-asserted-by":"publisher","first-page":"863","DOI":"10.1109\/TCBB.2014.2315991","volume":"11","author":"I Messaoudi","year":"2014","unstructured":"Messaoudi I, Elloumi-Oueslati A, Lachiri Z (2014) Building specific signals from frequency chaos game and revealing periodicities using a smoothed fourier analysis. IEEE\/ACM Trans Comput Biol Bioinf 11(5):863\u2013877","journal-title":"IEEE\/ACM Trans Comput Biol Bioinf"},{"key":"846_CR25","doi-asserted-by":"crossref","unstructured":"Ni H, Mu H. Qi D (2021) Applying frequency chaos game representation with perceptual image hashing to gene sequence phylogenetic analyses. J Mol Graph Model p. 107942 (2021)","DOI":"10.1016\/j.jmgm.2021.107942"},{"issue":"1","key":"846_CR26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13059-016-0997-x","volume":"17","author":"BD Ondov","year":"2016","unstructured":"Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM (2016) Mash: fast genome and metagenome distance estimation using minhash. Genom Biol 17(1):1\u201314","journal-title":"Genom Biol"},{"key":"846_CR27","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1016\/j.gdata.2016.01.001","volume":"7","author":"CS Rao","year":"2016","unstructured":"Rao CS, Raju SV (2016) Similarity analysis between chromosomes of homo sapiens and monkeys with correlation coefficient, rank correlation coefficient and cosine similarity measures. Genom data 7:202\u2013209","journal-title":"Genom data"},{"key":"846_CR28","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1146\/annurev-biodatasci-080917-013431","volume":"1","author":"J Ren","year":"2018","unstructured":"Ren J, Bai X, Lu YY, Tang K, Wang Y, Reinert G, Sun F (2018) Alignment-free sequence analysis and applications. Annu Rev Biomed Data Sci 1:93\u2013114","journal-title":"Annu Rev Biomed Data Sci"},{"key":"846_CR29","doi-asserted-by":"crossref","unstructured":"Rizzo R, Fiannaca A, La\u00a0Rosa M, Urso A (2016) Classification experiments of dna sequences by using a deep neural network and chaos game representation. In: Proceedings of the 17th International Conference on Computer Systems and Technologies 2016, pp 222\u2013228","DOI":"10.1145\/2983468.2983489"},{"key":"846_CR30","doi-asserted-by":"crossref","unstructured":"Safoury S, Hussein W (2019) Enriched dna strands classification using cgr images and convolutional neural network. In: Proceedings of the 2019 8th international conference on bioinformatics and biomedical science, pp 87\u201392 (2019)","DOI":"10.1145\/3369166.3369176"},{"issue":"4","key":"846_CR31","first-page":"406","volume":"4","author":"N Saitou","year":"1987","unstructured":"Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406\u2013425","journal-title":"Mol Biol Evol"},{"key":"846_CR32","doi-asserted-by":"publisher","first-page":"105","DOI":"10.13053\/rcs-148-3-9","volume":"148","author":"MRL Somodevilla","year":"2019","unstructured":"Somodevilla MRL, Rossainz M et al (2019) Dna sequence recognition using image representation. Res Comput Sci 148:105\u2013114","journal-title":"Res Comput Sci"},{"issue":"9","key":"846_CR33","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0222271","volume":"14","author":"A Tampuu","year":"2019","unstructured":"Tampuu A, Bzhalava Z, Dillner J, Vicente R (2019) Viraminer: Deep learning on raw dna sequences for identifying viral genomes in human samples. PLoS ONE 14(9):e0222271","journal-title":"PLoS ONE"},{"key":"846_CR34","doi-asserted-by":"publisher","first-page":"1032","DOI":"10.3389\/fbioe.2020.01032","volume":"8","author":"A Yang","year":"2020","unstructured":"Yang A, Zhang W, Wang J, Yang K, Han Y, Zhang L (2020) Review on the application of machine learning algorithms in the sequence data mining of dna. Front Bioeng Biotechnol 8:1032","journal-title":"Front Bioeng Biotechnol"},{"issue":"7","key":"846_CR35","doi-asserted-by":"publisher","first-page":"e75","DOI":"10.1093\/nar\/gkt003","volume":"41","author":"H Yi","year":"2013","unstructured":"Yi H, Jin L (2013) Co-phylog: an assembly-free phylogenomic approach for closely related organisms. Nucl Acids Res 41(7):e75\u2013e75","journal-title":"Nucl Acids Res"},{"issue":"2","key":"846_CR36","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1089\/cmb.2018.0173","volume":"26","author":"C Yin","year":"2019","unstructured":"Yin C (2019) Encoding and decoding dna sequences by integer chaos game representation. J Comput Biol 26(2):143\u2013151","journal-title":"J Comput Biol"},{"key":"846_CR37","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1016\/j.jtbi.2014.05.043","volume":"359","author":"C Yin","year":"2014","unstructured":"Yin C, Chen Y, Yau SST (2014) A measure of dna sequence similarity by fourier transform with applications on hierarchical clustering. J Theor Biol 359:18\u201328","journal-title":"J Theor Biol"},{"issue":"5","key":"846_CR38","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1007872","volume":"16","author":"K Zheng","year":"2020","unstructured":"Zheng K, You ZH, Li JQ, Wang L, Guo ZH, Huang YA (2020) icda-cgr: Identification of circrna-disease associations based on chaos game representation. PLoS Comput Biol 16(5):e1007872","journal-title":"PLoS Comput Biol"},{"key":"846_CR39","doi-asserted-by":"publisher","DOI":"10.1016\/j.chaos.2021.110649","volume":"144","author":"Q Zhou","year":"2021","unstructured":"Zhou Q, Qi S, Ren C (2021) Gene essentiality prediction based on chaos game representation and spiking neural networks. Chaos Solit Fract 144:110649","journal-title":"Chaos Solit Fract"},{"key":"846_CR40","doi-asserted-by":"crossref","unstructured":"Zielezinski A, Girgis HZ, Bernard G, Leimeister CA, Tang K, Dencker T, Lau AK, R\u00f6hling S, Choi JJ, Waterman MS et al (2019) Benchmarking of alignment-free sequence comparison methods. Genome Biol 20(1):1\u201318","DOI":"10.1186\/s13059-019-1755-7"},{"issue":"1","key":"846_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13059-017-1319-7","volume":"18","author":"A Zielezinski","year":"2017","unstructured":"Zielezinski A, Vinga S, Almeida J, Karlowski WM (2017) Alignment-free sequence comparison: benefits, applications, and tools. Genom Biol 18(1):1\u201317","journal-title":"Genom Biol"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00846-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00846-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00846-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,18]],"date-time":"2023-04-18T09:22:33Z","timestamp":1681809753000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00846-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,23]]},"references-count":41,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["846"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00846-y","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,23]]},"assertion":[{"value":"2 September 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 August 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 August 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}