{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,27]],"date-time":"2025-07-27T07:22:14Z","timestamp":1753600934539},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"S6","license":[{"start":{"date-parts":[[2020,11,1]],"date-time":"2020-11-01T00:00:00Z","timestamp":1604188800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2020,11,18]],"date-time":"2020-11-18T00:00:00Z","timestamp":1605657600000},"content-version":"vor","delay-in-days":17,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the <jats:italic>average common substring<\/jats:italic> measure, <jats:italic>ACS<\/jats:italic>, and its <jats:italic>k<\/jats:italic>-mismatch counterpart, <jats:italic>ACS<\/jats:italic><jats:sub><jats:italic>k<\/jats:italic><\/jats:sub>, have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing <jats:italic>ACS<\/jats:italic><jats:sub><jats:italic>k<\/jats:italic><\/jats:sub> takes <jats:italic>O<\/jats:italic>(<jats:italic>n<\/jats:italic> log<jats:italic>k<\/jats:italic><jats:italic>n<\/jats:italic>) time and hence impractical for large datasets, multiple heuristics that can approximate <jats:italic>ACS<\/jats:italic><jats:sub><jats:italic>k<\/jats:italic><\/jats:sub> have been introduced.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>In this paper, we present a novel linear-time heuristic to approximate <jats:italic>ACS<\/jats:italic><jats:sub><jats:italic>k<\/jats:italic><\/jats:sub>, which is faster than computing the exact <jats:italic>ACS<\/jats:italic><jats:sub><jats:italic>k<\/jats:italic><\/jats:sub> while being closer to the exact <jats:italic>ACS<\/jats:italic><jats:sub><jats:italic>k<\/jats:italic><\/jats:sub> values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Conclusions<\/jats:title>\n<jats:p>Our method produces a better approximation for <jats:italic>ACS<\/jats:italic><jats:sub><jats:italic>k<\/jats:italic><\/jats:sub> and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/srirampc\/adyar-rs\">https:\/\/github.com\/srirampc\/adyar-rs<\/jats:ext-link>.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12859-020-03738-5","type":"journal-article","created":{"date-parts":[[2020,11,18]],"date-time":"2020-11-18T04:14:07Z","timestamp":1605672847000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction"],"prefix":"10.1186","volume":"21","author":[{"given":"Sriram P.","family":"Chockalingam","sequence":"first","affiliation":[]},{"given":"Jodh","family":"Pannu","sequence":"additional","affiliation":[]},{"given":"Sahar","family":"Hooshmand","sequence":"additional","affiliation":[]},{"given":"Sharma V.","family":"Thankachan","sequence":"additional","affiliation":[]},{"given":"Srinivas","family":"Aluru","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,11,18]]},"reference":[{"issue":"4","key":"3738_CR1","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","volume":"19","author":"S Vinga","year":"2003","unstructured":"Vinga S, Almeida J. Alignment-free sequence comparison\u2013a review. Bioinformatics. 2003; 19(4):513\u201323.","journal-title":"Bioinformatics"},{"issue":"1","key":"3738_CR2","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1186\/s13059-017-1319-7","volume":"18","author":"A Zielezinski","year":"2017","unstructured":"Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 2017; 18(1):186.","journal-title":"Genome Biol"},{"key":"3738_CR3","first-page":"1409","volume":"28","author":"RR Sokal","year":"1958","unstructured":"Sokal RR. A statistical method for evaluating systematic relationship. Univ Kans Sci Bull. 1958; 28:1409\u201338.","journal-title":"Univ Kans Sci Bull"},{"issue":"4","key":"3738_CR4","first-page":"406","volume":"4","author":"N Saitou","year":"1987","unstructured":"Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987; 4(4):406\u201325.","journal-title":"Mol Biol Evol"},{"issue":"1","key":"3738_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s00239-003-2493-7","volume":"58","author":"J Qi","year":"2004","unstructured":"Qi J, Wang B, Hao B-I. Whole proteome prokaryote phylogeny without sequence alignment: a k-string composition approach. J Mol Evol. 2004; 58(1):1\u201311.","journal-title":"J Mol Evol"},{"issue":"8","key":"3738_CR6","doi-asserted-by":"publisher","first-page":"2677","DOI":"10.1073\/pnas.0813249106","volume":"106","author":"GE Sims","year":"2009","unstructured":"Sims GE, Jun S-R, Wu GA, Kim S-H. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc Natl Acad Sci. 2009; 106(8):2677\u201382.","journal-title":"Proc Natl Acad Sci"},{"issue":"W1","key":"3738_CR7","doi-asserted-by":"publisher","first-page":"554","DOI":"10.1093\/nar\/gkx351","volume":"45","author":"YY Lu","year":"2017","unstructured":"Lu YY, Tang K, Ren J, Fuhrman JA, Waterman MS, Sun F. CAFE: aCcelerated Alignment-FrEe sequence analysis. Nucleic Acids Res. 2017; 45(W1):554\u20139.","journal-title":"Nucleic Acids Res"},{"issue":"W1","key":"3738_CR8","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1093\/nar\/gku398","volume":"42","author":"S Horwege","year":"2014","unstructured":"Horwege S, Lindner S, Boden M, Hatje K, Kollmar M, Leimeister C-A, Morgenstern B. Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches. Nucleic Acids Res. 2014; 42(W1):7\u201311.","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"3738_CR9","doi-asserted-by":"publisher","first-page":"336","DOI":"10.1089\/cmb.2006.13.336","volume":"13","author":"I Ulitsky","year":"2006","unstructured":"Ulitsky I, Burstein D, Tuller T, Chor B. The average common substring approach to phylogenomic reconstruction. J Comput Biol. 2006; 13(2):336\u201350.","journal-title":"J Comput Biol"},{"issue":"14","key":"3738_CR10","doi-asserted-by":"publisher","first-page":"2000","DOI":"10.1093\/bioinformatics\/btu331","volume":"30","author":"C-A Leimeister","year":"2014","unstructured":"Leimeister C-A, Morgenstern B. Kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics. 2014; 30(14):2000\u20138.","journal-title":"Bioinformatics"},{"key":"3738_CR11","doi-asserted-by":"crossref","unstructured":"Aluru S, Apostolico A, Thankachan SV. Efficient alignment free sequence comparison with bounded mismatches. In: International Conference on Research in Computational Molecular Biology. Springer: 2015. p. 1\u201312.","DOI":"10.1007\/978-3-319-16706-0_1"},{"key":"3738_CR12","doi-asserted-by":"crossref","unstructured":"Thankachan SV, Aluru C, Chockalingam SP, Aluru S. Algorithmic framework for approximate matching under bounded edits with applications to sequence analysis. In: International Conference on Research in Computational Molecular Biology. Springer: 2018. p. 211\u201324.","DOI":"10.1007\/978-3-319-89929-9_14"},{"issue":"8","key":"3738_CR13","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1186\/s12859-017-1658-0","volume":"18","author":"SV Thankachan","year":"2017","unstructured":"Thankachan SV, Chockalingam SP, Liu Y, Krishnan A, Aluru S. A greedy alignment-free distance estimator for phylogenetic inference. BMC Bioinformatics. 2017; 18(8):238.","journal-title":"BMC Bioinformatics"},{"key":"3738_CR14","doi-asserted-by":"crossref","unstructured":"Matsakis ND, Klock II FS. The rust language. In: ACM SIGAda Ada Letters. ACM: 2014. p. 103\u20134.","DOI":"10.1145\/2692956.2663188"},{"key":"3738_CR15","unstructured":"Mori Y. Libdivsufsort. 2006. https:\/\/github.com\/y-256\/libdivsufsort. Accessed on 9 Sept 2020."},{"issue":"7","key":"3738_CR16","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1093\/nar\/gkt003","volume":"41","author":"H Yi","year":"2013","unstructured":"Yi H, Jin L. Co-phylog: an assembly-free phylogenomic approach for closely related organisms. Nucleic Acids Res. 2013; 41(7):75.","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"3738_CR17","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1002\/prot.20527","volume":"61","author":"JD Thompson","year":"2005","unstructured":"Thompson JD, Koehl P, Ripp R, Poch O. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins Struct Funct Bioinforma. 2005; 61(1):127\u201336.","journal-title":"Proteins Struct Funct Bioinforma"},{"issue":"6","key":"3738_CR18","doi-asserted-by":"publisher","first-page":"784","DOI":"10.1038\/ismej.2009.150","volume":"4","author":"RJ Newton","year":"2010","unstructured":"Newton RJ, Griffin LE, Bowles KM, Meile C, Gifford S, Givens CE, Howard EC, King E, Oakley CA, Reisch CR, et al. Genome characteristics of a generalist marine bacterial lineage. ISME journal. 2010; 4(6):784\u201398.","journal-title":"ISME journal"},{"key":"3738_CR19","unstructured":"Felsenstein J. PHYLIP (phylogeny Inference Package), Version 3.5 C: Joseph Felsenstein; 1993."},{"issue":"6","key":"3738_CR20","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1089\/cmb.2015.0217","volume":"23","author":"SV Thankachan","year":"2016","unstructured":"Thankachan SV, Chockalingam SP, Liu Y, Apostolico A, Aluru S. Alfred: a practical method for alignment-free distance computation. J Comput Biol. 2016; 23(6):452\u201360.","journal-title":"J Comput Biol"},{"issue":"1","key":"3738_CR21","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1186\/s13015-016-0072-x","volume":"11","author":"C Pizzi","year":"2016","unstructured":"Pizzi C. MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics. Algorithms Mol Biol. 2016; 11(1):6.","journal-title":"Algorithms Mol Biol"},{"key":"3738_CR22","doi-asserted-by":"publisher","first-page":"192","DOI":"10.3389\/fpls.2012.00192","volume":"3","author":"K Hatje","year":"2012","unstructured":"Hatje K, Kollmar M. A phylogenetic analysis of the brassicales clade based on an alignment-free sequence comparison method. Frontiers Plant Sci. 2012; 3:192.","journal-title":"Frontiers Plant Sci"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03738-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-020-03738-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03738-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,18]],"date-time":"2020-11-18T04:17:05Z","timestamp":1605673025000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-03738-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11]]},"references-count":22,"journal-issue":{"issue":"S6","published-print":{"date-parts":[[2020,11]]}},"alternative-id":["3738"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-03738-5","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11]]},"assertion":[{"value":"26 August 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 September 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 November 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"404"}}