{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T14:29:12Z","timestamp":1749479352407,"version":"3.37.3"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T00:00:00Z","timestamp":1679875200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T00:00:00Z","timestamp":1679875200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001823","name":"Ministerstvo \u0160kolstv\u00ed, Ml\u00e1de\u017ee a T\u011blov\u00fdchovy","doi-asserted-by":"publisher","award":["CZ.02.1.01\/0.0\/0.0\/16_019\/0000765","CZ.02.1.01\/0.0\/0.0\/16_019\/0000765"],"award-info":[{"award-number":["CZ.02.1.01\/0.0\/0.0\/16_019\/0000765","CZ.02.1.01\/0.0\/0.0\/16_019\/0000765"]}],"id":[{"id":"10.13039\/501100001823","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BioData Mining"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Motivation<\/jats:title>\n                <jats:p>Clustering of genetic sequences is one of the key parts of bioinformatics analyses. Resulting phylogenetic trees are beneficial for solving many research questions, including tracing the history of species, studying migration in the past, or tracing a source of a virus outbreak. At the same time, biologists provide more data in the raw form of reads or only on contig-level assembly. Therefore, tools that are able to process those data without supervision need to be developed.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>In this paper, we present a tool for reference-free phylogeny capable of handling data where no mature-level assembly is available. The tool allows distance calculation for raw reads, contigs, and the combination of the latter. The tool provides an estimation of the Levenshtein distance between the sequences, which in turn estimates the number of mutations between the organisms. Compared to the previous research, the novelty of the method lies in a newly proposed combination of the read and contig measures, a new method for read-contig mapping, and an efficient embedding of contigs.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s13040-023-00329-x","type":"journal-article","created":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T21:03:05Z","timestamp":1679950985000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Reference-free phylogeny from sequencing data"],"prefix":"10.1186","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6597-6616","authenticated-orcid":false,"given":"Petr","family":"Ry\u0161av\u00fd","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9780-3376","authenticated-orcid":false,"given":"Filip","family":"\u017delezn\u00fd","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,3,27]]},"reference":[{"key":"329_CR1","doi-asserted-by":"publisher","unstructured":"$$1000$$ Genomes Project Consortium, et\u00a0al. A global reference for human genetic variation. Nature. 2015;526(7571):68\u201374. https:\/\/doi.org\/10.1038\/nature15393.","DOI":"10.1038\/nature15393"},{"issue":"6","key":"329_CR2","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1145\/360825.360855","volume":"18","author":"AV Aho","year":"1975","unstructured":"Aho AV, Corasick MJ. Efficient String Matching: An Aid to Bibliographic Search. Commun ACM. 1975;18(6):333\u201340. https:\/\/doi.org\/10.1145\/360825.360855.","journal-title":"Commun ACM."},{"issue":"1","key":"329_CR3","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1145\/214174.214183","volume":"14","author":"H Berghel","year":"1996","unstructured":"Berghel H, Roach D. An Extension of Ukkonen\u2019s Enhanced Dynamic Programming ASM Algorithm. ACM Trans Inf Syst. 1996;14(1):94\u2013106. https:\/\/doi.org\/10.1145\/214174.214183.","journal-title":"ACM Trans Inf Syst."},{"issue":"14","key":"329_CR4","doi-asserted-by":"publisher","first-page":"5155","DOI":"10.1073\/pnas.83.14.5155","volume":"83","author":"BE Blaisdell","year":"1986","unstructured":"Blaisdell BE. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci. 1986;83(14):5155\u20139.","journal-title":"Proc Natl Acad Sci."},{"key":"329_CR5","doi-asserted-by":"publisher","first-page":"327","DOI":"10.1007\/978-3-319-59826-0_14","volume-title":"Algorithms for Next-Generation Sequencing Data: Techniques, Approaches, and Applications","author":"M Comin","year":"2017","unstructured":"Comin M, Schimd M. Assembly-Free Techniques for NGS Data. In: Elloumi M, editor. Algorithms for Next-Generation Sequencing Data: Techniques, Approaches, and Applications. Cham: Springer; 2017. p. 327\u201355. https:\/\/doi.org\/10.1007\/978-3-319-59826-0_14."},{"issue":"3","key":"329_CR6","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1093\/sysbio\/45.3.323","volume":"45","author":"DE Critchlow","year":"1996","unstructured":"Critchlow DE, Pearl DK, Qian C. The Triples Distance for Rooted Bifurcating Phylogenetic Trees. Syst Biol. 1996;45(3):323\u201334. https:\/\/doi.org\/10.1093\/sysbio\/45.3.323.","journal-title":"Syst Biol."},{"issue":"383","key":"329_CR7","doi-asserted-by":"publisher","first-page":"553","DOI":"10.1080\/01621459.1983.10478008","volume":"78","author":"EB Fowlkes","year":"1983","unstructured":"Fowlkes EB, Mallows CL. A Method for Comparing Two Hierarchical Clusterings. J Am Stat Assoc. 1983;78(383):553\u201369. https:\/\/doi.org\/10.1080\/01621459.1983.10478008.","journal-title":"J Am Stat Assoc."},{"issue":"1","key":"329_CR8","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1016\/0022-0000(80)90004-5","volume":"20","author":"J Gallant","year":"1980","unstructured":"Gallant J, Maier D, Astorer J. On finding minimal length superstrings. J Comput Syst Sci. 1980;20(1):50\u20138. https:\/\/doi.org\/10.1016\/0022-0000(80)90004-5.","journal-title":"J Comput Syst Sci."},{"issue":"6202","key":"329_CR9","doi-asserted-by":"publisher","first-page":"1369","DOI":"10.1126\/science.1259657","volume":"345","author":"SK Gire","year":"2014","unstructured":"Gire SK, Goba A, et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science. 2014;345(6202):1369\u201372. https:\/\/doi.org\/10.1126\/science.1259657.","journal-title":"Science."},{"issue":"2","key":"329_CR10","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1016\/j.ajhg.2017.06.013","volume":"101","author":"M Haber","year":"2017","unstructured":"Haber M, Doumet-Serhal C, et al. Continuity and Admixture in the Last Five Millennia of Levantine History from Ancient Canaanite and Present-Day Lebanese Genome Sequences. Am J Hum Genet. 2017;101(2):274\u201382. https:\/\/doi.org\/10.1016\/j.ajhg.2017.06.013.","journal-title":"Am J Hum Genet."},{"issue":"5","key":"329_CR11","doi-asserted-by":"publisher","first-page":"802","DOI":"10.1101\/gr.072033.107","volume":"18","author":"D Hernandez","year":"2008","unstructured":"Hernandez D, Fran\u00e7ois P, Farinelli L, \u00d8ster\u00e5s M, Schrenzel J. De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer. Genome Res. 2008;18(5):802\u20139. https:\/\/doi.org\/10.1101\/gr.072033.107.","journal-title":"Genome Res."},{"issue":"4","key":"329_CR12","doi-asserted-by":"publisher","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","volume":"28","author":"W Huang","year":"2012","unstructured":"Huang W, Li L, et al. ART: a next-generation sequencing read simulator. Bioinformatics. 2012;28(4):593\u20134. https:\/\/doi.org\/10.1093\/bioinformatics\/btr708.","journal-title":"Bioinformatics."},{"issue":"13","key":"329_CR13","doi-asserted-by":"publisher","first-page":"i249","DOI":"10.1093\/bioinformatics\/btm211","volume":"23","author":"MR Kantorovitz","year":"2007","unstructured":"Kantorovitz MR, Robinson GE, Sinha S. A statistical method for alignment-free comparison of regulatory sequences. Bioinformatics. 2007;23(13):i249\u201355. https:\/\/doi.org\/10.1093\/bioinformatics\/btm211.","journal-title":"Bioinformatics."},{"key":"329_CR14","volume-title":"Algorithm Design","author":"J Kleinberg","year":"2005","unstructured":"Kleinberg J, Tardos E. Algorithm Design. Boston: Addison-Wesley Longman Publishing Co., Inc.; 2005."},{"issue":"suppl-1","key":"329_CR15","doi-asserted-by":"publisher","first-page":"D28","DOI":"10.1093\/nar\/gkq967","volume":"39","author":"R Leinonen","year":"2011","unstructured":"Leinonen R, Akhtar R, Birney E, et al. The European Nucleotide Archive. Nucleic Acids Res. 2011;39(suppl-1):D28\u201331. https:\/\/doi.org\/10.1093\/nar\/gkq967.","journal-title":"Nucleic Acids Res."},{"issue":"8","key":"329_CR16","first-page":"707","volume":"10","author":"VI Levenshtein","year":"1966","unstructured":"Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Dokl. 1966;10(8):707.","journal-title":"Soviet Phys Dokl."},{"issue":"5","key":"329_CR17","doi-asserted-by":"publisher","first-page":"582","DOI":"10.1186\/s12864-020-06982-4","volume":"21","author":"A Melnyk","year":"2020","unstructured":"Melnyk A, Knyazev S, Vannberg F, Bunimovich L, Skums P, Zelikovsky A. Using earth mover\u2019s distance for viral outbreak investigations. BMC Genomics. 2020;21(5):582. https:\/\/doi.org\/10.1186\/s12864-020-06982-4.","journal-title":"BMC Genomics."},{"key":"329_CR18","first-page":"267","volume-title":"The Field Matching Problem: Algorithms and Applications. KDD\u201996","author":"AE Monge","year":"1996","unstructured":"Monge AE, Elkan CP. The Field Matching Problem: Algorithms and Applications. KDD\u201996. Portland: AAAI Press; 1996. p. 267\u201370."},{"issue":"1","key":"329_CR19","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1145\/375360.375365","volume":"33","author":"G Navarro","year":"2001","unstructured":"Navarro G. A Guided Tour to Approximate String Matching. ACM Comput Surv. 2001;33(1):31\u201388. https:\/\/doi.org\/10.1145\/375360.375365.","journal-title":"ACM Comput Surv."},{"issue":"3","key":"329_CR20","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","volume":"48","author":"SB Needleman","year":"1970","unstructured":"Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443\u201353. https:\/\/doi.org\/10.1016\/0022-2836(70)90057-4.","journal-title":"J Mol Biol."},{"key":"329_CR21","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1007\/978-3-642-37195-0_13","volume-title":"Research in Computational Molecular Biology: 17th Annual International Conference, RECOMB 2013, Beijing, China, April 7-10, 2013. Proceedings","author":"S Nurk","year":"2013","unstructured":"Nurk S, Bankevich A, et al. Assembling Genomes and Mini-metagenomes from Highly Chimeric Reads. In: Deng M, Jiang R, Sun F, Zhang X, editors. Research in Computational Molecular Biology: 17th Annual International Conference, RECOMB 2013, Beijing, China, April 7-10, 2013. Proceedings. Berlin: Springer Berlin Heidelberg; 2013. p. 158\u201370. https:\/\/doi.org\/10.1007\/978-3-642-37195-0_13."},{"key":"329_CR22","doi-asserted-by":"publisher","first-page":"140","DOI":"10.1038\/317140a0","volume":"317","author":"SJ O\u2019Brien","year":"1985","unstructured":"O\u2019Brien SJ, Nash WG, et al. A molecular solution to the riddle of the giant panda\u2019s phylogeny. Nature. 1985;317:140\u20134. https:\/\/doi.org\/10.1038\/317140a0.","journal-title":"Nature."},{"issue":"1","key":"329_CR23","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","volume":"17","author":"BD Ondov","year":"2016","unstructured":"Ondov BD, Treangen TJ, et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17(1):132. https:\/\/doi.org\/10.1186\/s13059-016-0997-x.","journal-title":"Genome Biol."},{"key":"329_CR24","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1007\/978-1-4939-7471-9_12","volume-title":"MiSeq: A Next Generation Sequencing Platform for Genomic Analysis","author":"RK Ravi","year":"2018","unstructured":"Ravi RK, Walton K, Khosroheidari M. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis. New York: Springer New York; 2018. p. 223\u201332. https:\/\/doi.org\/10.1007\/978-1-4939-7471-9_12."},{"issue":"12","key":"329_CR25","doi-asserted-by":"publisher","first-page":"1615","DOI":"10.1089\/cmb.2009.0198","volume":"16","author":"G Reinert","year":"2009","unstructured":"Reinert G, Chew D, Sun F, Waterman MS. Alignment-free sequence comparison (I): statistics and power. J Comput Biol. 2009;16(12):1615\u201334. https:\/\/doi.org\/10.1089\/cmb.2009.0198.","journal-title":"J Comput Biol."},{"issue":"1","key":"329_CR26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10618-018-0584-8","volume":"33","author":"P Ry\u0161av\u00fd","year":"2019","unstructured":"Ry\u0161av\u00fd P, \u017delezn\u00fd F. Estimating sequence similarity from read sets for clustering next-generation sequencing data. Data Min Knowl Discov. 2019;33(1):1\u201323. https:\/\/doi.org\/10.1007\/s10618-018-0584-8.","journal-title":"Data Min Knowl Discov."},{"key":"329_CR27","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1007\/978-3-319-68765-0_23","volume-title":"Advances in Intelligent Data Analysis XVI","author":"P Ry\u0161av\u00fd","year":"2017","unstructured":"Ry\u0161av\u00fd P, \u017delezn\u00fd F, et al. Estimating Sequence Similarity from Contig Sets. In: Adams N, et al., editors. Advances in Intelligent Data Analysis XVI. Cham: Springer; 2017. p. 272\u201383. https:\/\/doi.org\/10.1007\/978-3-319-68765-0_23."},{"key":"329_CR28","doi-asserted-by":"publisher","unstructured":"Ry\u0161av\u00fd P, \u017delezn\u00fd F. Estimating Sequence Similarity from Read Sets for Clustering Sequencing Data. In: Bostr\u00f6m H, et\u00a0al., editors. Advances in Intelligent Data Analysis XV. Cham: Springer; 2016. p. 204\u2013214. (BEST PAPER AWARD). https:\/\/doi.org\/10.1007\/978-3-319-46349-0_18.","DOI":"10.1007\/978-3-319-46349-0_18"},{"issue":"4","key":"329_CR29","doi-asserted-by":"publisher","first-page":"406","DOI":"10.1093\/oxfordjournals.molbev.a040454","volume":"4","author":"N Saitou","year":"1987","unstructured":"Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4(4):406\u201325. https:\/\/doi.org\/10.1093\/oxfordjournals.molbev.a040454.","journal-title":"Mol Biol Evol."},{"issue":"6","key":"329_CR30","doi-asserted-by":"publisher","first-page":"1117","DOI":"10.1101\/gr.089532.108","volume":"19","author":"JT Simpson","year":"2009","unstructured":"Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol \u0130. ABySS: A parallel assembler for short read sequence data. Genome Res. 2009;19(6):1117\u201323. https:\/\/doi.org\/10.1101\/gr.089532.108.","journal-title":"Genome Res."},{"issue":"16","key":"329_CR31","doi-asserted-by":"publisher","first-page":"3673","DOI":"10.1093\/nar\/8.16.3673","volume":"8","author":"R Staden","year":"1980","unstructured":"Staden R. A mew computer method for the storage and manipulation of DNA gel reading data. Nucleic Acids Res. 1980;8(16):3673\u201394. https:\/\/doi.org\/10.1093\/nar\/8.16.3673.","journal-title":"Nucleic Acids Res."},{"issue":"1","key":"329_CR32","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1016\/S0019-9958(85)80046-2","volume":"64","author":"E Ukkonen","year":"1985","unstructured":"Ukkonen E. Algorithms for approximate string matching. Inf Control. 1985;64(1):100\u201318. https:\/\/doi.org\/10.1016\/S0019-9958(85)80046-2.","journal-title":"Inf Control."},{"issue":"1","key":"329_CR33","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1016\/0304-3975(92)90143-4","volume":"92","author":"E Ukkonen","year":"1992","unstructured":"Ukkonen E. Approximate string-matching with q-grams and maximal matches. Theor Comput Sci. 1992;92(1):191\u2013211. https:\/\/doi.org\/10.1016\/0304-3975(92)90143-4.","journal-title":"Theor Comput Sci."},{"issue":"1","key":"329_CR34","doi-asserted-by":"publisher","first-page":"168","DOI":"10.1145\/321796.321811","volume":"21","author":"RA Wagner","year":"1974","unstructured":"Wagner RA, Fischer MJ. The String-to-String Correction Problem. J Assoc Comput Mach. 1974;21(1):168\u201373. https:\/\/doi.org\/10.1145\/321796.321811.","journal-title":"J Assoc Comput Mach."},{"issue":"4","key":"329_CR35","doi-asserted-by":"publisher","first-page":"500","DOI":"10.1093\/bioinformatics\/btl629","volume":"23","author":"RL Warren","year":"2007","unstructured":"Warren RL, Sutton GG, Jones SJM, Holt RA. Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2007;23(4):500\u20131. https:\/\/doi.org\/10.1093\/bioinformatics\/btl629.","journal-title":"Bioinformatics."},{"issue":"4","key":"329_CR36","doi-asserted-by":"publisher","first-page":"502","DOI":"10.1093\/bioinformatics\/btq696","volume":"27","author":"Z Wu","year":"2010","unstructured":"Wu Z, Wang X, Zhang X. Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq. Bioinformatics. 2010;27(4):502\u20138. https:\/\/doi.org\/10.1093\/bioinformatics\/btq696.","journal-title":"Bioinformatics."},{"issue":"7","key":"329_CR37","doi-asserted-by":"publisher","first-page":"e75","DOI":"10.1093\/nar\/gkt003","volume":"41","author":"H Yi","year":"2013","unstructured":"Yi H, Jin L. Co-phylog: an assembly-free phylogenomic approach for closely related organisms. Nucleic Acids Res. 2013;41(7):e75. https:\/\/doi.org\/10.1093\/nar\/gkt003.","journal-title":"Nucleic Acids Res."},{"issue":"5","key":"329_CR38","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1101\/gr.074492.107","volume":"18","author":"DR Zerbino","year":"2008","unstructured":"Zerbino DR, Birney E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821\u20139. https:\/\/doi.org\/10.1101\/gr.074492.107.","journal-title":"Genome Res."},{"issue":"1","key":"329_CR39","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1186\/s13059-017-1319-7","volume":"18","author":"A Zielezinski","year":"2017","unstructured":"Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 2017;18(1):186. https:\/\/doi.org\/10.1186\/s13059-017-1319-7.","journal-title":"Genome Biol."}],"container-title":["BioData Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-023-00329-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13040-023-00329-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-023-00329-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T21:03:33Z","timestamp":1679951013000},"score":1,"resource":{"primary":{"URL":"https:\/\/biodatamining.biomedcentral.com\/articles\/10.1186\/s13040-023-00329-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,27]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["329"],"URL":"https:\/\/doi.org\/10.1186\/s13040-023-00329-x","relation":{},"ISSN":["1756-0381"],"issn-type":[{"type":"electronic","value":"1756-0381"}],"subject":[],"published":{"date-parts":[[2023,3,27]]},"assertion":[{"value":"14 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 March 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"13"}}