{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T15:28:46Z","timestamp":1768318126226,"version":"3.49.0"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,3,12]],"date-time":"2020-03-12T00:00:00Z","timestamp":1583971200000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2020,3,12]],"date-time":"2020-03-12T00:00:00Z","timestamp":1583971200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["HySim"],"award-info":[{"award-number":["HySim"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003329","name":"Ministerio de Econom?a y Competitividad","doi-asserted-by":"publisher","award":["RTI2018-093336-B-C21"],"award-info":[{"award-number":["RTI2018-093336-B-C21"]}],"id":[{"id":"10.13039\/501100003329","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010801","name":"Xunta de Galicia","doi-asserted-by":"publisher","award":["ED481B 2018\/013 and ED431C 2018\/19"],"award-info":[{"award-number":["ED481B 2018\/013 and ED431C 2018\/19"]}],"id":[{"id":"10.13039\/501100010801","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Federal O?ce for Agriculture and Food","award":["2816503814"],"award-info":[{"award-number":["2816503814"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires the comparison of sequence reads to large collections of reference genomes. The steadily increasing amount of available reference genomes establishes the need for efficient big data approaches.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>We introduce an alignment-free <jats:italic>k<\/jats:italic>-mer based method for detection and quantification of species composition in food and other complex biological matters. It is orders-of-magnitude faster than our previous alignment-based AFS pipeline. In comparison to the established tools CLARK, Kraken2, and Kraken2+Bracken it is superior in terms of false-positive rate and quantification accuracy. Furthermore, the usage of an efficient database partitioning scheme allows for the processing of massive collections of reference genomes with reduced memory requirements on a workstation (AFS-MetaCache) or on a Spark-based compute cluster (MetaCacheSpark).<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Conclusions<\/jats:title>\n<jats:p>We present a fast yet accurate screening method for whole genome shotgun sequencing-based biosurveillance applications such as food testing. By relying on a big data approach it can scale efficiently towards large-scale collections of complex eukaryotic and bacterial reference genomes. AFS-MetaCache and MetaCacheSpark are suitable tools for broad-scale metagenomic screening applications. They are available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/muellan.github.io\/metacache\/afs.html\">https:\/\/muellan.github.io\/metacache\/afs.html<\/jats:ext-link>\n(C++ version for a workstation) and <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/jmabuin\/MetaCacheSpark\">https:\/\/github.com\/jmabuin\/MetaCacheSpark<\/jats:ext-link>\n(Spark version for big data clusters).<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12859-020-3429-6","type":"journal-article","created":{"date-parts":[[2020,3,12]],"date-time":"2020-03-12T15:02:57Z","timestamp":1584025377000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["A big data approach to metagenomics for all-food-sequencing"],"prefix":"10.1186","volume":"21","author":[{"given":"Robin","family":"Kobus","sequence":"first","affiliation":[]},{"given":"Jos\u00e9 M.","family":"Abu\u00edn","sequence":"additional","affiliation":[]},{"given":"Andr\u00e9","family":"M\u00fcller","sequence":"additional","affiliation":[]},{"given":"S\u00f6ren Lukas","family":"Hellmann","sequence":"additional","affiliation":[]},{"given":"Juan C.","family":"Pichel","sequence":"additional","affiliation":[]},{"given":"Tom\u00e1s F.","family":"Pena","sequence":"additional","affiliation":[]},{"given":"Andreas","family":"Hildebrandt","sequence":"additional","affiliation":[]},{"given":"Thomas","family":"Hankeln","sequence":"additional","affiliation":[]},{"given":"Bertil","family":"Schmidt","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,3,12]]},"reference":[{"issue":"2","key":"3429_CR1","doi-asserted-by":"publisher","first-page":"425","DOI":"10.1111\/1541-4337.12419","volume":"18","author":"M Esteki","year":"2019","unstructured":"Esteki M, Regueiro J, Simal-G\u00e1ndara J. Tackling fraudsters with global strategies to expose fraud in the food chain. Compr Rev Food Sci Food Saf. 2019; 18(2):425\u201340.","journal-title":"Compr Rev Food Sci Food Saf"},{"issue":"1","key":"3429_CR2","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1007\/s00217-010-1371-y","volume":"232","author":"R K\u00f6ppel","year":"2011","unstructured":"K\u00f6ppel R, Ruf J, Rentsch J. Multiplex real-time pcr for the detection and quantification of dna from beef, pork, horse and sheep. Eur Food Res Technol. 2011; 232(1):151\u20135.","journal-title":"Eur Food Res Technol"},{"issue":"1","key":"3429_CR3","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1007\/s00217-018-3147-8","volume":"245","author":"R K\u00f6ppel","year":"2019","unstructured":"K\u00f6ppel R, Ganeshan A, van Velsen F, Weber S, Schmid J, Graf C, Hochegger R. Digital duplex versus real-time pcr for the determination of meat proportions from sausages containing pork and beef. Eur Food Res Technol. 2019; 245(1):151\u20137.","journal-title":"Eur Food Res Technol"},{"issue":"12","key":"3429_CR4","doi-asserted-by":"publisher","first-page":"83761","DOI":"10.1371\/journal.pone.0083761","volume":"8","author":"AO Tillmar","year":"2013","unstructured":"Tillmar AO, Dell\u2019Amico B, Welander J, Holmlund G. A universal method for species identification of mammals utilizing next generation sequencing for the analysis of dna mixtures. PloS ONE. 2013; 8(12):83761.","journal-title":"PloS ONE"},{"key":"3429_CR5","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1186\/1471-2164-15-639","volume":"15","author":"F Ripp","year":"2014","unstructured":"Ripp F, Krombholz CF, Liu Y, et al.All-food-seq (afs): a quantifiable screen for species in biological samples by deep dna sequencing. BMC Genomics. 2014; 15:639.","journal-title":"BMC Genomics"},{"key":"3429_CR6","doi-asserted-by":"publisher","unstructured":"Liu Y, Ripp F, Koeppel R, et al.Afs: identification and quantification of species composition by metagenomic sequencing. Bioinformatics. 2017:822. https:\/\/doi.org\/10.1093\/bioinformatics\/btw822.","DOI":"10.1093\/bioinformatics\/btw822"},{"issue":"14","key":"3429_CR7","doi-asserted-by":"publisher","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25(14):1754\u201360.","journal-title":"Bioinformatics"},{"issue":"5","key":"3429_CR8","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","volume":"26","author":"H Li","year":"2010","unstructured":"Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010; 26(5):589\u201395.","journal-title":"Bioinformatics"},{"key":"3429_CR9","unstructured":"Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2. 2013."},{"issue":"4","key":"3429_CR10","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1038\/nmeth.1923","volume":"9","author":"B Langmead","year":"2012","unstructured":"Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012; 9(4):357.","journal-title":"Nat Methods"},{"issue":"14","key":"3429_CR11","doi-asserted-by":"publisher","first-page":"1830","DOI":"10.1093\/bioinformatics\/bts276","volume":"28","author":"Y Liu","year":"2012","unstructured":"Liu Y, Schmidt B, Maskell DL. Cushaw: a cuda compatible short read aligner to large genomes based on the burrows\u2013wheeler transform. Bioinformatics. 2012; 28(14):1830\u20137.","journal-title":"Bioinformatics"},{"key":"3429_CR12","doi-asserted-by":"publisher","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","volume":"15","author":"DE Wood","year":"2014","unstructured":"Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014; 15:R46.","journal-title":"Genome Biol"},{"key":"3429_CR13","doi-asserted-by":"publisher","unstructured":"Lindgreen S, Adair KL, Gardner P. An evaluation of the accuracy and speed of metagenome analysis tools. Sci Rep. 2016; 6(19233). https:\/\/doi.org\/10.1038\/srep19233.","DOI":"10.1038\/srep19233"},{"key":"3429_CR14","unstructured":"Seppey M, Manni M, Zdobnov EM. Lemmi: A live evaluation of computational methods for metagenome investigation. bioRxiv. 2019. https:\/\/doi.org\/10.1101\/507731. http:\/\/arxiv.org\/abs\/https:\/\/www.biorxiv.org\/content\/early\/2019\/04\/16\/507731.full.pdf."},{"key":"3429_CR15","doi-asserted-by":"publisher","first-page":"104","DOI":"10.7717\/peerj-cs.104","volume":"3","author":"J Lu","year":"2017","unstructured":"Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Comput Sci. 2017; 3:104.","journal-title":"PeerJ Comput Sci"},{"issue":"1","key":"3429_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12864-015-1419-2","volume":"16","author":"R Ounit","year":"2015","unstructured":"Ounit R, Wanamaker S, Close TJ, et al. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics. 2015; 16(1):1\u201313. https:\/\/doi.org\/10.1186\/s12864-015-1419-2.","journal-title":"BMC Genomics"},{"issue":"23","key":"3429_CR17","doi-asserted-by":"publisher","first-page":"3740","DOI":"10.1093\/bioinformatics\/btx520","volume":"33","author":"A M\u00fcller","year":"2017","unstructured":"M\u00fcller A, Hundt C, Hildebrandt A, Hankeln T, Schmidt B. Metacache: context-aware classification of metagenomic reads using minhashing. Bioinformatics. 2017; 33(23):3740\u20138.","journal-title":"Bioinformatics"},{"issue":"10","key":"3429_CR18","doi-asserted-by":"publisher","first-page":"902","DOI":"10.1038\/nmeth.3589","volume":"12","author":"DT Truong","year":"2015","unstructured":"Truong DT, Franzosa EA, Tickle TL, et al.MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods. 2015; 12(10):902\u20133. https:\/\/doi.org\/10.1038\/nmeth.3589.","journal-title":"Nat Methods"},{"issue":"12","key":"3429_CR19","doi-asserted-by":"publisher","first-page":"1196","DOI":"10.1038\/nmeth.2693","volume":"10","author":"S Sunagawa","year":"2013","unstructured":"Sunagawa S, Mende DR, Zeller G, Izquierdo-Carrasco F, Berger SA, Kultima JR, Coelho LP, Arumugam M, Tap J, Nielsen HB, et al.Metagenomic species profiling using universal phylogenetic marker genes. Nat Methods. 2013; 10(12):1196.","journal-title":"Nat Methods"},{"issue":"5","key":"3429_CR20","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1038\/nmeth.f.303","volume":"7","author":"JG Caporaso","year":"2010","unstructured":"Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, et al.Qiime allows analysis of high-throughput community sequencing data. Nat Methods. 2010; 7(5):335\u20136.","journal-title":"Nat Methods"},{"key":"3429_CR21","doi-asserted-by":"publisher","first-page":"11257","DOI":"10.1038\/ncomms11257","volume":"7","author":"P Menzel","year":"2016","unstructured":"Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with kaiju. Nat Commun. 2016; 7:11257.","journal-title":"Nat Commun"},{"key":"3429_CR22","first-page":"1","volume-title":"Combinatorial Pattern Matching","author":"Andrei Z. Broder","year":"2000","unstructured":"Broder AZ. Identifying and Filtering Near-Duplicate Documents. In: Proc. 11th Annual Symposium on Combinatorial Pattern Matching, COM \u201900: 2000. p. 1\u201310. http:\/\/dl.acm.org\/citation.cfm?id=647819.736184."},{"key":"3429_CR23","doi-asserted-by":"publisher","first-page":"623","DOI":"10.1038\/nbt.3238","volume":"33","author":"K Berlin","year":"2015","unstructured":"Berlin K, Koren S, Chin C-S, et al.Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotech. 2015; 33:623\u201330. https:\/\/doi.org\/10.1038\/nbt.3238.","journal-title":"Nat Biotech"},{"issue":"1","key":"3429_CR24","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","volume":"17","author":"BD Ondov","year":"2016","unstructured":"Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Phillippy AM. Mash: fast genome and metagenome distance estimation using minhash. Genome Biol. 2016; 17(1):132. https:\/\/doi.org\/10.1186\/s13059-016-0997-x.","journal-title":"Genome Biol"},{"key":"3429_CR25","doi-asserted-by":"publisher","first-page":"15311","DOI":"10.1038\/ncomms15311","volume":"8","author":"V Popic","year":"2017","unstructured":"Popic V, Batzoglou S. A hybrid cloud read aligner based on minhash and kmer voting that preserves privacy. Nat Commun. 2017; 8:15311.","journal-title":"Nat Commun"},{"issue":"1","key":"3429_CR26","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1186\/s13059-019-1841-x","volume":"20","author":"BD Ondov","year":"2019","unstructured":"Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, Phillippy AM. Mash screen: high-throughput sequence containment estimation for genome discovery. Genome Biol. 2019; 20(1):232. https:\/\/doi.org\/10.1186\/s13059-019-1841-x.","journal-title":"Genome Biol"},{"issue":"11","key":"3429_CR27","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1145\/2934664","volume":"59","author":"M Zaharia","year":"2016","unstructured":"Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, et al.Apache spark: a unified engine for big data processing. Commun ACM. 2016; 59(11):56\u201365.","journal-title":"Commun ACM"},{"key":"3429_CR28","doi-asserted-by":"publisher","first-page":"3138","DOI":"10.7717\/peerj.3138","volume":"5","author":"TH Dadi","year":"2017","unstructured":"Dadi TH, Renard BY, Wieler LH, Semmler T, Reinert K. Slimm: species level identification of microorganisms from metagenomes. PeerJ. 2017; 5:3138.","journal-title":"PeerJ"},{"issue":"1","key":"3429_CR29","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1007\/s00217-010-1371-y","volume":"232","author":"R K\u00f6ppel","year":"2011","unstructured":"K\u00f6ppel R, Ruf J, Rentsch J. Multiplex real-time pcr for the detection and quantification of dna from beef, pork, horse and sheep. Eur Food Res Technol. 2011; 232(1):151\u20135.","journal-title":"Eur Food Res Technol"},{"issue":"1","key":"3429_CR30","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1007\/s00217-009-1138-5","volume":"230","author":"A Eugster","year":"2009","unstructured":"Eugster A, Ruf J, Rentsch J, K\u00f6ppel R. Quantification of beef, pork, chicken and turkey proportions in sausages: use of matrix-adapted standards and comparison of single versus multiplex pcr in an interlaboratory trial. Eur Food Res Technol. 2009; 230(1):55.","journal-title":"Eur Food Res Technol"},{"issue":"1","key":"3429_CR31","doi-asserted-by":"publisher","first-page":"385","DOI":"10.1186\/1471-2105-12-385","volume":"12","author":"BD Ondov","year":"2011","unstructured":"Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a web browser. BMC Bioinformatics. 2011; 12(1):385.","journal-title":"BMC Bioinformatics"},{"issue":"7","key":"3429_CR32","doi-asserted-by":"publisher","first-page":"1002195","DOI":"10.1371\/journal.pbio.1002195","volume":"13","author":"ZD Stephens","year":"2015","unstructured":"Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE. Big data: astronomical or genomical?. PLoS Biol. 2015; 13(7):1002195.","journal-title":"PLoS Biol"},{"issue":"4","key":"3429_CR33","doi-asserted-by":"publisher","first-page":"712","DOI":"10.1016\/j.drudis.2017.01.014","volume":"22","author":"B Schmidt","year":"2017","unstructured":"Schmidt B, Hildebrandt A. Next-generation sequencing: big data meets high performance computing. Drug Discov Today. 2017; 22(4):712\u20137.","journal-title":"Drug Discov Today"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3429-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-020-3429-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3429-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T00:06:59Z","timestamp":1615507619000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-3429-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,12]]},"references-count":33,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3429"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-3429-6","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,12]]},"assertion":[{"value":"19 August 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 February 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 March 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"102"}}