{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T09:22:55Z","timestamp":1715332975947},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"20","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Several technical challenges in metagenomic data analysis, including assembling metagenomic sequence data or identifying operational taxonomic units, are both significant and well known. These forms of analysis are increasingly cited as conceptually flawed, given the extreme variation within traditionally defined species and rampant horizontal gene transfer. Furthermore, computational requirements of such analysis have hindered content-based organization of metagenomic data at large scale.<\/jats:p>\n               <jats:p>Results: In this article, we introduce the Amordad database engine for alignment-free, content-based indexing of metagenomic datasets. Amordad places the metagenome comparison problem in a geometric context, and uses an indexing strategy that combines random hashing with a regular nearest neighbor graph. This framework allows refinement of the database over time by continual application of random hash functions, with the effect of each hash function encoded in the nearest neighbor graph. This eliminates the need to explicitly maintain the hash functions in order for query efficiency to benefit from the accumulated randomness. Results on real and simulated data show that Amordad can support logarithmic query time for identifying similar metagenomes even as the database size reaches into the millions.<\/jats:p>\n               <jats:p>Availability and implementation: Source code, licensed under the GNU general public license (version 3) is freely available for download from http:\/\/smithlabresearch.org\/amordad<\/jats:p>\n               <jats:p>Contact: \u00a0andrewds@usc.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu405","type":"journal-article","created":{"date-parts":[[2014,6,29]],"date-time":"2014-06-29T00:12:32Z","timestamp":1404000752000},"page":"2949-2955","source":"Crossref","is-referenced-by-count":7,"title":["The Amordad database engine for metagenomics"],"prefix":"10.1093","volume":"30","author":[{"given":"Ehsan","family":"Behnam","sequence":"first","affiliation":[{"name":"Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA"}]},{"given":"Andrew D.","family":"Smith","sequence":"additional","affiliation":[{"name":"Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,6,27]]},"reference":[{"key":"2023012711560085300_btu405-B1","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1038\/nature09944","article-title":"Enterotypes of the human gut microbiome","volume":"473","author":"Arumugam","year":"2011","journal-title":"Nature"},{"key":"2023012711560085300_btu405-B2","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1089\/cmb.2012.0280","article-title":"A geometric interpretation for local alignment-free sequence comparison","volume":"20","author":"Behnam","year":"2013","journal-title":"J. Comput. Biol."},{"key":"2023012711560085300_btu405-B3","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.1997.609451","article-title":"Shape indexing using approximate nearest-neighbour search in high-dimensional spaces","author":"Beis","year":"1997"},{"key":"2023012711560085300_btu405-B4","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1214\/aos\/1018031204","article-title":"Variable length Markov chains","volume":"27","author":"B\u00fchlmann","year":"1999","journal-title":"Ann. Stat."},{"key":"2023012711560085300_btu405-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1745-6150-8-3","article-title":"Next-generation phylogenomics","volume":"8","author":"Chan","year":"2013","journal-title":"Biol. Direct"},{"key":"2023012711560085300_btu405-B6","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1145\/509907.509965","article-title":"Similarity estimation techniques from rounding algorithms","volume-title":"Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing","author":"Charikar","year":"2002"},{"key":"2023012711560085300_btu405-B7","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1038\/nrmicro1160","article-title":"The metagenomics of soil","volume":"3","author":"Daniel","year":"2005","journal-title":"Nat. Rev. Microbiol."},{"key":"2023012711560085300_btu405-B8","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1145\/1963405.1963487","article-title":"Efficient k-nearest neighbor graph construction for generic similarity measures","volume-title":"Proceedings of the 20th International Conference on World Wide Web","author":"Dong","year":"2011"},{"key":"2023012711560085300_btu405-B9","first-page":"518","article-title":"Similarity search in high dimensions via hashing","volume-title":"VLDB","author":"Gionis","year":"1999"},{"key":"2023012711560085300_btu405-B10","doi-asserted-by":"crossref","first-page":"1115","DOI":"10.1145\/227683.227684","article-title":"Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming","volume":"42","author":"Goemans","year":"1995","journal-title":"J. ACM"},{"key":"2023012711560085300_btu405-B11","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1038\/nbt.1883","article-title":"Full-length transcriptome assembly from RNA-Seq data without a reference genome","volume":"29","author":"Grabherr","year":"2011","journal-title":"Nat. Biotechnol."},{"key":"2023012711560085300_btu405-B12","doi-asserted-by":"crossref","first-page":"1552","DOI":"10.1101\/gr.120618.111","article-title":"Integrative analysis of environmental sequences using megan4","volume":"21","author":"Huson","year":"2011","journal-title":"Genome Res."},{"key":"2023012711560085300_btu405-B13","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1145\/202660.202666","article-title":"Implementing deletion in B+-trees","volume":"24","author":"Jannink","year":"1995","journal-title":"ACM Sigmod Rec."},{"key":"2023012711560085300_btu405-B14","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1239\/jap\/1189717545","article-title":"Asymptotic behavior of k-word matches between two uniformly distributed sequences","volume":"44","author":"Kantorovitz","year":"2007","journal-title":"J. Appl. Probab."},{"key":"2023012711560085300_btu405-B15","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1038\/nature12506","article-title":"Richness of human gut microbiome correlates with metabolic markers","volume":"500","author":"Le Chatelier","year":"2013","journal-title":"Nature"},{"key":"2023012711560085300_btu405-B16","doi-asserted-by":"crossref","first-page":"D28","DOI":"10.1093\/nar\/gkq967","article-title":"The european nucleotide archive","volume":"39","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012711560085300_btu405-B17","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/2047-217X-1-18","article-title":"SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler","volume":"1","author":"Luo","year":"2012","journal-title":"GigaScience"},{"key":"2023012711560085300_btu405-B18","first-page":"950","article-title":"Multi-probe LSH: efficient indexing for high-dimensional similarity search","volume-title":"Proceedings of the 33rd international conference on Very large data bases","author":"Lv","year":"2007"},{"key":"2023012711560085300_btu405-B19","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1038\/nmeth976","article-title":"Accurate phylogenetic classification of variable-length DNA fragments","volume":"4","author":"McHardy","year":"2007","journal-title":"Nat. Methods"},{"key":"2023012711560085300_btu405-B20","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1186\/1471-2105-9-386","article-title":"The metagenomics rast server\u2013a public resource for the automatic phylogenetic and functional analysis of metagenomes","volume":"9","author":"Meyer","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012711560085300_btu405-B21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/256292.256294","article-title":"Separators for sphere-packings and nearest neighbor graphs","volume":"44","author":"Miller","year":"1997","journal-title":"J. ACM"},{"key":"2023012711560085300_btu405-B22","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1145\/377939.377946","article-title":"A note on a method for generating points uniformly on n-dimensional spheres","volume":"2","author":"Muller","year":"1959","journal-title":"Commun. ACM"},{"key":"2023012711560085300_btu405-B23","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/1471-2105-12-41","article-title":"RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles","volume":"12","author":"Nalbantoglu","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012711560085300_btu405-B24","first-page":"1186","article-title":"Entropy based nearest neighbor search in high dimensions","volume-title":"Proceedings of the seventeenth annual ACM-SIAM Symposium on Discrete Algorithm","author":"Panigrahy","year":"2006"},{"key":"2023012711560085300_btu405-B25","doi-asserted-by":"crossref","first-page":"1858","DOI":"10.1093\/bioinformatics\/btt313","article-title":"SPANNER: Taxonomic assignment of sequences using pyramid matching of similarity profiles","volume":"29","author":"Porter","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012711560085300_btu405-B26","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature08821","article-title":"A human gut microbial gene catalogue established by metagenomic sequencing","volume":"464","author":"Qin","year":"2010","journal-title":"Nature"},{"key":"2023012711560085300_btu405-B27","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1038\/nature11450","article-title":"A metagenome-wide association study of gut microbiota in type 2 diabetes","volume":"490","author":"Qin","year":"2012","journal-title":"Nature"},{"key":"2023012711560085300_btu405-B28","first-page":"622","article-title":"Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering","volume-title":"Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics","author":"Ravichandran","year":"2005"},{"key":"2023012711560085300_btu405-B29","doi-asserted-by":"crossref","first-page":"974","DOI":"10.1126\/science.253.5023.974","article-title":"Developments in automatic text retrieval","volume":"253","author":"Salton","year":"1991","journal-title":"Science"},{"key":"2023012711560085300_btu405-B30","volume-title":"Foundations of Multidimensional and Metric Data Structures","author":"Samet","year":"2006"},{"key":"2023012711560085300_btu405-B31","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1093\/bib\/bbt067","article-title":"New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing","volume":"15","author":"Song","year":"2014","journal-title":"Brief. Bioinformatics"},{"key":"2023012711560085300_btu405-B32","doi-asserted-by":"crossref","first-page":"804","DOI":"10.1038\/nature06244","article-title":"The human microbiome project","volume":"449","author":"Turnbaugh","year":"2007","journal-title":"Nature"},{"key":"2023012711560085300_btu405-B33","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1038\/nature02340","article-title":"Community structure and metabolism through reconstruction of microbial genomes from the environment","volume":"428","author":"Tyson","year":"2004","journal-title":"Nature"},{"key":"2023012711560085300_btu405-B34","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison\u2013a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012711560085300_btu405-B35","doi-asserted-by":"crossref","first-page":"e1000667","DOI":"10.1371\/journal.pcbi.1000667","article-title":"A primer on metagenomics","volume":"6","author":"Wooley","year":"2010","journal-title":"PLoS Comput. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/20\/2949\/48929849\/bioinformatics_30_20_2949.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/20\/2949\/48929849\/bioinformatics_30_20_2949.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T12:43:17Z","timestamp":1674823397000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/20\/2949\/2422191"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6,27]]},"references-count":35,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2014,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu405","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,10,15]]},"published":{"date-parts":[[2014,6,27]]}}}