{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T23:49:29Z","timestamp":1773100169906,"version":"3.50.1"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2024,3,28]],"date-time":"2024-03-28T00:00:00Z","timestamp":1711584000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2021YFF1000900"],"award-info":[{"award-number":["2021YFF1000900"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32222019"],"award-info":[{"award-number":["32222019"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32100459"],"award-info":[{"award-number":["32100459"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,3,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Constructing a phylogenetic tree requires calculating the evolutionary distance between samples or species via large-scale resequencing data, a process that is both time-consuming and computationally demanding. Striking the right balance between accuracy and efficiency is a significant challenge.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To address this, we introduce a new algorithm, MIKE (MinHash-based k-mer algorithm). This algorithm is designed for the swift calculation of the Jaccard coefficient directly from raw sequencing reads and enables the construction of phylogenetic trees based on the resultant Jaccard coefficient. Simulation results highlight the superior speed of MIKE compared to existing state-of-the-art methods. We used MIKE to reconstruct a phylogenetic tree, incorporating 238 yeast, 303 Zea, 141 Ficus, 67 Oryza, and 43 Saccharum spontaneum samples. MIKE demonstrated accurate performance across varying evolutionary scales, reproductive modes, and ploidy levels, proving itself as a powerful tool for phylogenetic tree construction.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>MIKE is publicly available on Github at https:\/\/github.com\/Argonum-Clever2\/mike.git.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae154","type":"journal-article","created":{"date-parts":[[2024,3,28]],"date-time":"2024-03-28T19:34:20Z","timestamp":1711654460000},"source":"Crossref","is-referenced-by-count":13,"title":["MIKE: an ultrafast, assembly-, and alignment-free approach for phylogenetic tree construction"],"prefix":"10.1093","volume":"40","author":[{"given":"Fang","family":"Wang","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Taiyuan University of Technology , Taiyuan, Shanxi 030024, China"},{"name":"National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences , Shenzhen, Guangdong 518120, China"}]},{"given":"Yibin","family":"Wang","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences , Shenzhen, Guangdong 518120, China"}]},{"given":"Xiaofei","family":"Zeng","sequence":"additional","affiliation":[{"name":"Department of Human Cell Biology and Genetics, Joint Laboratory of Guangdong-Hong Kong Universities for Vascular Homeostasis and Diseases, School of Medicine, Southern University of Science and Technology , Shenzhen, Guangdong 508055, China"}]},{"given":"Shengcheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences , Shenzhen, Guangdong 518120, China"}]},{"given":"Jiaxin","family":"Yu","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences , Shenzhen, Guangdong 518120, China"}]},{"given":"Dongxi","family":"Li","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Taiyuan University of Technology , Taiyuan, Shanxi 030024, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5207-0882","authenticated-orcid":false,"given":"Xingtan","family":"Zhang","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences , Shenzhen, Guangdong 518120, China"}]}],"member":"286","published-online":{"date-parts":[[2024,3,28]]},"reference":[{"key":"2024040400420104500_btae154-B1","doi-asserted-by":"crossref","first-page":"D1023","DOI":"10.1093\/nar\/gku1039","article-title":"SNP-seek database of SNPs derived from 3000 rice genomes","volume":"43","author":"Alexandrov","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2024040400420104500_btae154-B2","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1007\/978-0-387-36011-9_6","volume-title":"Association Mapping in Plants","author":"Batley","year":"2007"},{"key":"2024040400420104500_btae154-B3","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/nbt.3238","article-title":"Assembling large genomes with single-molecule sequencing and locality-sensitive hashing","volume":"33","author":"Berlin","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2024040400420104500_btae154-B4","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/j.dci.2004.07.007","article-title":"Using models of nucleotide evolution to build phylogenetic trees","volume":"29","author":"Bos","year":"2005","journal-title":"Dev Comp Immunol"},{"key":"2024040400420104500_btae154-B5","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1093\/bioinformatics\/17.5.419","article-title":"Efficient large-scale sequence comparison by locality-sensitive hashing","volume":"17","author":"Buhler","year":"2001","journal-title":"Bioinformatics"},{"key":"2024040400420104500_btae154-B6","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1186\/1471-2164-15-581","article-title":"Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef)","volume":"15","author":"Cannarozzi","year":"2014","journal-title":"BMC Genomics"},{"key":"2024040400420104500_btae154-B7","doi-asserted-by":"crossref","first-page":"1736","DOI":"10.1038\/s41588-022-01184-y","article-title":"Genome sequencing reveals evidence of adaptive variation in the genus Zea","volume":"54","author":"Chen","year":"2022","journal-title":"Nat Genet"},{"key":"2024040400420104500_btae154-B8","doi-asserted-by":"crossref","first-page":"giab008","DOI":"10.1093\/gigascience\/giab008","article-title":"Twelve years of SAMtools and BCFtools","volume":"10","author":"Danecek","year":"2021","journal-title":"Gigascience"},{"key":"2024040400420104500_btae154-B9","doi-asserted-by":"crossref","first-page":"572","DOI":"10.1038\/s41576-021-00367-3","article-title":"Towards population-scale long-read sequencing","volume":"22","author":"De Coster","year":"2021","journal-title":"Nat Rev Genet"},{"key":"2024040400420104500_btae154-B10","doi-asserted-by":"crossref","first-page":"bbaa227","DOI":"10.1093\/bib\/bbaa227","article-title":"LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files","volume":"22","author":"Dong","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024040400420104500_btae154-B11","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1038\/s41587-023-01753-4","article-title":"Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree","volume":"42","author":"Dylus","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2024040400420104500_btae154-B12","doi-asserted-by":"crossref","first-page":"438","DOI":"10.1080\/10511970903487705","article-title":"The jukes-cantor model of molecular evolution","volume":"20","author":"Erickson","year":"2010","journal-title":"Primus"},{"key":"2024040400420104500_btae154-B13","doi-asserted-by":"crossref","first-page":"522","DOI":"10.1186\/s12864-015-1647-5","article-title":"An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data","volume":"16","author":"Fan","year":"2015","journal-title":"BMC Genomics"},{"key":"2024040400420104500_btae154-B14","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1093\/oxfordjournals.molbev.a025808","article-title":"BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data","volume":"14","author":"Gascuel","year":"1997","journal-title":"Mol Biol Evol"},{"key":"2024040400420104500_btae154-B15","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nrg.2016.49","article-title":"Coming of age: ten years of next-generation sequencing technologies","volume":"17","author":"Goodwin","year":"2016","journal-title":"Nat Rev Genet"},{"key":"2024040400420104500_btae154-B16","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: a next-generation sequencing read simulator","volume":"28","author":"Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"2024040400420104500_btae154-B17","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1038\/s41576-020-0233-0","article-title":"Phylogenetic tree building in the genomic age","volume":"21","author":"Kapli","year":"2020","journal-title":"Nat Rev Genet"},{"key":"2024040400420104500_btae154-B18","doi-asserted-by":"crossref","first-page":"2759","DOI":"10.1093\/bioinformatics\/btx304","article-title":"KMC 3: counting and manipulating k-mer statistics","volume":"33","author":"Kokot","year":"2017","journal-title":"Bioinformatics"},{"key":"2024040400420104500_btae154-B20","doi-asserted-by":"crossref","first-page":"W293","DOI":"10.1093\/nar\/gkab301","article-title":"Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation","volume":"49","author":"Letunic","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024040400420104500_btae154-B21","author":"Li","year":"2013"},{"key":"2024040400420104500_btae154-B22","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologies\u2014the next generation","volume":"11","author":"Metzker","year":"2010","journal-title":"Nat Rev Genet"},{"key":"2024040400420104500_btae154-B23","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1016\/j.tree.2004.01.009","article-title":"SNPs in ecology, evolution and conservation","volume":"19","author":"Morin","year":"2004","journal-title":"Trends in Ecology & Evolution"},{"key":"2024040400420104500_btae154-B24","first-page":"380","author":"Niwattanakul","year":"2013"},{"key":"2024040400420104500_btae154-B25","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2024040400420104500_btae154-B26","doi-asserted-by":"crossref","first-page":"526","DOI":"10.1093\/bioinformatics\/bty633","article-title":"Ape 5.0: an environment for modern phylogenetics and evolutionary analyses","volume":"35","author":"Paradis","year":"2019","journal-title":"Bioinformatics"},{"key":"2024040400420104500_btae154-B27","article-title":"Scaling accurate genetic variant discovery to tens of thousands of samples","author":"Poplin","year":"2017","journal-title":"Genomics"},{"key":"2024040400420104500_btae154-B28","doi-asserted-by":"crossref","first-page":"2399","DOI":"10.2174\/1389557521666210111150036","article-title":"Brassica oleracea var. capitata f. alba: a review on its botany, traditional uses, phytochemistry and pharmacological activities","volume":"21","author":"Ray","year":"2021","journal-title":"Mini Rev Med Chem"},{"key":"2024040400420104500_btae154-B29","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1111\/j.2041-210X.2011.00169.x","article-title":"Phytools: an R package for phylogenetic comparative biology (and other things): phytools: R package","volume":"3","author":"Revell","year":"2012","journal-title":"Methods Ecol Evol"},{"key":"2024040400420104500_btae154-B19","first-page":"210","volume-title":"The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing","year":"2009"},{"key":"2024040400420104500_btae154-B30","first-page":"406","article-title":"The neighbor-joining method: a new method for reconstructing phylogenetic trees","volume":"4","author":"Saitou","year":"1987","journal-title":"Molecul Biol Evol"},{"key":"2024040400420104500_btae154-B31","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1186\/s13059-019-1632-4","article-title":"Skmer: assembly-free and alignment-free sample identification using genome skims","volume":"20","author":"Sarmashghi","year":"2019","journal-title":"Genome Biol"},{"key":"2024040400420104500_btae154-B32","doi-asserted-by":"crossref","first-page":"e0163962","DOI":"10.1371\/journal.pone.0163962","article-title":"SeqKit: a cross-platform and ultrafast toolkit for FASTA\/Q file manipulation","volume":"11","author":"Shen","year":"2016","journal-title":"PLoS One"},{"key":"2024040400420104500_btae154-B33","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1016\/j.cell.2018.10.023","article-title":"Tempo and mode of genome evolution in the budding yeast subphylum","volume":"175","author":"Shen","year":"2018","journal-title":"Cell"},{"key":"2024040400420104500_btae154-B34","first-page":"3154","author":"Shrivastava","year":"2017"},{"key":"2024040400420104500_btae154-B35","author":"Smith","year":"2020"},{"key":"2024040400420104500_btae154-B36","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1016\/j.tig.2014.07.001","article-title":"Ten years of next-generation sequencing technology","volume":"30","author":"Van Dijk","year":"2014","journal-title":"Trends Genet"},{"key":"2024040400420104500_btae154-B37","doi-asserted-by":"crossref","first-page":"468","DOI":"10.1111\/1755-0998.13490","article-title":"Arabis alpina: a perennial model plant for ecological genomics and life-history evolution","volume":"22","author":"W\u00f6tzel","year":"2022","journal-title":"Mol Ecol Resour"},{"key":"2024040400420104500_btae154-B38","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1186\/s13059-021-02303-4","article-title":"Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis","volume":"22","author":"Yi","year":"2021","journal-title":"Genome Biol"},{"key":"2024040400420104500_btae154-B39","doi-asserted-by":"crossref","first-page":"D801","DOI":"10.1093\/nar\/gkv1204","article-title":"InsectBase: a resource for insect genomes and transcriptomes","volume":"44","author":"Yin","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2024040400420104500_btae154-B40","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1016\/j.cell.2020.09.043","article-title":"Genomes of the banyan tree and pollinator wasp provide insights into fig-wasp coevolution","volume":"183","author":"Zhang","year":"2020","journal-title":"Cell"},{"key":"2024040400420104500_btae154-B41","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1038\/s41588-022-01084-1","article-title":"Genomic insights into the recent chromosome reduction of autopolyploid sugarcane Saccharum spontaneum","volume":"54","author":"Zhang","year":"2022","journal-title":"Nat Genet"},{"key":"2024040400420104500_btae154-B42","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1038\/s41588-018-0041-z","article-title":"Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice","volume":"50","author":"Zhao","year":"2018","journal-title":"Nat Genet"},{"key":"2024040400420104500_btae154-B43","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1093\/bioinformatics\/bty651","article-title":"BinDash, software for fast genome distance estimation on a typical personal laptop","volume":"35","author":"Zhao","year":"2019","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae154\/57114667\/btae154.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae154\/57149237\/btae154.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae154\/57149237\/btae154.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,4]],"date-time":"2024-04-04T00:42:34Z","timestamp":1712191354000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae154\/7636962"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,3,28]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,3,29]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae154","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,4,1]]},"published":{"date-parts":[[2024,3,28]]},"article-number":"btae154"}}