{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:59Z","timestamp":1772138039508,"version":"3.50.1"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2020,8,7]],"date-time":"2020-08-07T00:00:00Z","timestamp":1596758400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61673324"],"award-info":[{"award-number":["61673324"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2018YFD0901401"],"award-info":[{"award-number":["2018YFD0901401"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"U.S. National Science Foundation","doi-asserted-by":"crossref","award":["DMS-1518001"],"award-info":[{"award-number":["DMS-1518001"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003392","name":"Natural Science Foundation of Fujian","doi-asserted-by":"publisher","award":["2018J01097"],"award-info":[{"award-number":["2018J01097"]}],"id":[{"id":"10.13039\/501100003392","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Open Fund of Engineering Research Center for Medical Data Mining and Application of Fujian Province","award":["MDM2018002"],"award-info":[{"award-number":["MDM2018002"]}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,4,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Rapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We report CRAFT, a general genomic\/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102\u2212104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>CRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https:\/\/github.com\/jiaxingbai\/CRAFT.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa699","type":"journal-article","created":{"date-parts":[[2020,8,4]],"date-time":"2020-08-04T15:31:10Z","timestamp":1596555070000},"page":"155-161","source":"Crossref","is-referenced-by-count":4,"title":["CRAFT: Compact genome Representation toward large-scale Alignment-Free daTabase"],"prefix":"10.1093","volume":"37","author":[{"given":"Yang Young","family":"Lu","sequence":"first","affiliation":[{"name":"Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California , Los Angeles, CA 90089, USA"}]},{"given":"Jiaxing","family":"Bai","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , Xiamen 361000, China"}]},{"given":"Yiwen","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , Xiamen 361000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8766-5950","authenticated-orcid":false,"given":"Ying","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , Xiamen 361000, China"},{"name":"Xiamen Key Lab. of Big Data Intelligent Analysis and Decision , Xiamen 361000, China"}]},{"given":"Fengzhu","family":"Sun","sequence":"additional","affiliation":[{"name":"Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California , Los Angeles, CA 90089, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,8,7]]},"reference":[{"key":"2023051511003871400_btaa699-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023051511003871400_btaa699-B2","doi-asserted-by":"crossref","first-page":"28970","DOI":"10.1038\/srep28970","article-title":"Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer","volume":"6","author":"Bernard","year":"2016","journal-title":"Sci. Rep"},{"key":"2023051511003871400_btaa699-B3","doi-asserted-by":"crossref","first-page":"e00257","DOI":"10.1128\/mSystems.00257-18","article-title":"k-mer similarity, networks of microbial genomes, and taxonomic rank","volume":"3","author":"Bernard","year":"2018","journal-title":"mSystems"},{"key":"2023051511003871400_btaa699-B4","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1093\/biomet\/53.3-4.325","article-title":"Some distance properties of latent root and vector methods used in multivariate analysis","volume":"53","author":"Gower","year":"1966","journal-title":"Biometrika"},{"key":"2023051511003871400_btaa699-B5","author":"Landauer","year":"2006"},{"key":"2023051511003871400_btaa699-B6","doi-asserted-by":"crossref","first-page":"W554","DOI":"10.1093\/nar\/gkx351","article-title":"CAFE: aCcelerated Alignment-FrEe sequence analysis","volume":"45","author":"Lu","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051511003871400_btaa699-B7","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1093\/bioinformatics\/btw290","article-title":"COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge","volume":"33","author":"Lu","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051511003871400_btaa699-B8","doi-asserted-by":"crossref","first-page":"1797","DOI":"10.1101\/gr.6761107","article-title":"28-way vertebrate alignment and conservation track in the UCSC Genome Browser","volume":"17","author":"Miller","year":"2007","journal-title":"Genome Res"},{"key":"2023051511003871400_btaa699-B9","doi-asserted-by":"crossref","first-page":"5127","DOI":"10.1073\/pnas.0700429104","article-title":"Distinctive features of large complex virus genomes and proteomes","volume":"104","author":"Mr\u00e1zek","year":"2007","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051511003871400_btaa699-B10","doi-asserted-by":"crossref","first-page":"970","DOI":"10.1126\/science.1198719","article-title":"Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans","volume":"332","author":"Muegge","year":"2011","journal-title":"Science"},{"key":"2023051511003871400_btaa699-B11","first-page":"101","article-title":"Complexities of hierarchic clustering algorithms: state of the art","volume":"1","author":"Murtagh","year":"1984","journal-title":"Comput. Stat. Q"},{"key":"2023051511003871400_btaa699-B12","doi-asserted-by":"crossref","first-page":"1416","DOI":"10.1093\/nar\/gks1285","article-title":"One size does not fit all: on how Markov model order dictates performance of genomic sequence analyses","volume":"41","author":"Narlikar","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023051511003871400_btaa699-B13","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol"},{"key":"2023051511003871400_btaa699-B14","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2023051511003871400_btaa699-B15","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1093\/bioinformatics\/btg412","article-title":"APE: analyses of phylogenetics and evolution in R language","volume":"20","author":"Paradis","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051511003871400_btaa699-B16","first-page":"1532","author":"Pennington","year":"2014"},{"key":"2023051511003871400_btaa699-B17","doi-asserted-by":"crossref","first-page":"e1001342","DOI":"10.1371\/journal.pgen.1001342","article-title":"A molecular phylogeny of living primates","volume":"7","author":"Perelman","year":"2011","journal-title":"PLoS Genet"},{"key":"2023051511003871400_btaa699-B18","doi-asserted-by":"crossref","first-page":"D501","DOI":"10.1093\/nar\/gki025","article-title":"NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"33","author":"Pruitt","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023051511003871400_btaa699-B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s00239-003-2493-7","article-title":"Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach","volume":"58","author":"Qi","year":"2004","journal-title":"J. Mol. Evol"},{"key":"2023051511003871400_btaa699-B20","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1146\/annurev-biodatasci-080917-013431","article-title":"Alignment-free sequence analysis and applications","volume":"1","author":"Ren","year":"2018","journal-title":"Annu. Rev. Biomed. Data Sci"},{"key":"2023051511003871400_btaa699-B21","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/0025-5564(81)90043-2","article-title":"Comparison of phylogenetic trees","volume":"53","author":"Robinson","year":"1981","journal-title":"Math. Biosci"},{"key":"2023051511003871400_btaa699-B22","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1186\/s13059-019-1632-4","article-title":"Skmer: assembly-free and alignment-free sample identification using genome skims","volume":"20","author":"Sarmashghi","year":"2019","journal-title":"Genome Biol"},{"key":"2023051511003871400_btaa699-B23","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023051511003871400_btaa699-B24","doi-asserted-by":"crossref","first-page":"e84348","DOI":"10.1371\/journal.pone.0084348","article-title":"Comparison of metatranscriptomic samples based on k-tuple frequencies","volume":"9","author":"Wang","year":"2014","journal-title":"PLoS One"},{"key":"2023051511003871400_btaa699-B25","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2023051511003871400_btaa699-B26","doi-asserted-by":"crossref","first-page":"e75","DOI":"10.1093\/nar\/gkt003","article-title":"Co-phylog: an assembly-free phylogenomic approach for closely related organisms","volume":"41","author":"Yi","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023051511003871400_btaa699-B27","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1186\/s13059-017-1319-7","article-title":"Alignment-free sequence comparison: benefits, applications, and tools","volume":"18","author":"Zielezinski","year":"2017","journal-title":"Genome Biol"},{"key":"2023051511003871400_btaa699-B28","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/s13059-019-1755-7","article-title":"Benchmarking of alignment-free sequence comparison methods","volume":"20","author":"Zielezinski","year":"2019","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa699\/34127088\/btaa699.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/2\/155\/50321819\/btaa699.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/2\/155\/50321819\/btaa699.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T07:01:50Z","timestamp":1684134110000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/2\/155\/5885078"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,8,7]]},"references-count":28,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,4,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa699","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.07.10.196741","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,1,15]]},"published":{"date-parts":[[2020,8,7]]}}}