{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T00:36:20Z","timestamp":1774312580138,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,6,9]],"date-time":"2021-06-09T00:00:00Z","timestamp":1623196800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF BIGDATA","award":["1838177"],"award-info":[{"award-number":["1838177"]}]},{"name":"AFOSR-YIP","award":["FA9550-18-1-0152"],"award-info":[{"award-number":["FA9550-18-1-0152"]}]},{"name":"ONR BRC"},{"name":"ONR DURIP"},{"name":"NSF IIS","award":["1652131"],"award-info":[{"award-number":["1652131"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,6,9]]},"DOI":"10.1145\/3448016.3457333","type":"proceedings-article","created":{"date-parts":[[2021,6,18]],"date-time":"2021-06-18T17:22:30Z","timestamp":1624036950000},"page":"2226-2234","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)"],"prefix":"10.1145","author":[{"given":"Gaurav","family":"Gupta","sequence":"first","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"Minghao","family":"Yan","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"Benjamin","family":"Coleman","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"Bryce","family":"Kille","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"R. A. Leo","family":"Elworth","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"Tharun","family":"Medini","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"Todd","family":"Treangen","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"Anshumali","family":"Shrivastava","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,6,18]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"[n.d.]. Sample wikipedia corpus . Bitfunnel http:\/\/bitfunnel.org\/wikipedia-astest- corpus-for-bitfunnel.  [n.d.]. Sample wikipedia corpus . Bitfunnel http:\/\/bitfunnel.org\/wikipedia-astest- corpus-for-bitfunnel."},{"key":"e_1_3_2_2_2_1","unstructured":"[n.d.]. The ClueWeb09 Dataset. The Lemur Project https:\/\/www.lemurproject. org\/clueweb09.php\/.  [n.d.]. The ClueWeb09 Dataset. The Lemur Project https:\/\/www.lemurproject. org\/clueweb09.php\/."},{"key":"e_1_3_2_2_3_1","volume-title":"The European Bioinformatics Institute (EBI): European Nucleotide Archive (ENA) Resource","year":"2018","unstructured":"[n.d.]. The European Bioinformatics Institute (EBI): European Nucleotide Archive (ENA) Resource . The European Bioinformatics Institute (EBI) FTP Site , http: \/\/ftp.ebi.ac.uk\/pub\/software\/bigsi\/nat_biotech_ 2018 \/ctx\/. [n.d.]. The European Bioinformatics Institute (EBI): European Nucleotide Archive (ENA) Resource. The European Bioinformatics Institute (EBI) FTP Site, http: \/\/ftp.ebi.ac.uk\/pub\/software\/bigsi\/nat_biotech_2018\/ctx\/."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1224252.1224501"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-2836(05)80360-2"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"crossref","unstructured":"Timo Bingmann Phelim Bradley Florian Gauger and Zamin Iqbal. 2019. COBS: a Compact Bit-Sliced Signature Index. In SPIRE.  Timo Bingmann Phelim Bradley Florian Gauger and Zamin Iqbal. 2019. COBS: a Compact Bit-Sliced Signature Index. In SPIRE.","DOI":"10.1007\/978-3-030-32686-9_21"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-73951-7_13"},{"key":"e_1_3_2_2_9_1","volume-title":"Gil McVean, and Zamin Iqbal.","author":"Bradley Phelim","year":"2019","unstructured":"Phelim Bradley , Henk C den Bakker , Eduardo PC Rocha , Gil McVean, and Zamin Iqbal. 2019 . Ultrafast search of all deposited bacterial and viral genomic data. Nature biotechnology 37, 2 (2019), 152. Phelim Bradley, Henk C den Bakker, Eduardo PC Rocha, Gil McVean, and Zamin Iqbal. 2019. Ultrafast search of all deposited bacterial and viral genomic data. Nature biotechnology 37, 2 (2019), 152."},{"key":"e_1_3_2_2_10_1","volume-title":"Gil McVean, and Zamin Iqbal.","author":"Bradley Phelim","year":"2019","unstructured":"Phelim Bradley , Henk C den Bakker , Eduardo PC Rocha , Gil McVean, and Zamin Iqbal. 2019 . Ultrafast search of all deposited bacterial and viral genomic data. Nature biotechnology 37, 2 (2019), 152. Phelim Bradley, Henk C den Bakker, Eduardo PC Rocha, Gil McVean, and Zamin Iqbal. 2019. Ultrafast search of all deposited bacterial and viral genomic data. Nature biotechnology 37, 2 (2019), 152."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/800133.804332"},{"key":"e_1_3_2_2_12_1","volume-title":"Informed and automated k-mer size selection for genome assembly. Bioinformatics 30, 1 (06","author":"Chikhi Rayan","year":"2013","unstructured":"Rayan Chikhi and Paul Medvedev . 2013. Informed and automated k-mer size selection for genome assembly. Bioinformatics 30, 1 (06 2013 ), 31--37. https:\/\/doi.org\/10.1093\/bioinformatics\/ btt310 arXiv:https:\/\/academic.oup.com\/bioinformatics\/articlepdf\/ 30\/1\/31\/643259\/btt310.pdf 10.1093\/bioinformatics Rayan Chikhi and Paul Medvedev. 2013. Informed and automated k-mer size selection for genome assembly. Bioinformatics 30, 1 (06 2013), 31--37. https:\/\/doi.org\/10.1093\/bioinformatics\/ btt310 arXiv:https:\/\/academic.oup.com\/bioinformatics\/articlepdf\/ 30\/1\/31\/643259\/btt310.pdf"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2010.07.024"},{"key":"e_1_3_2_2_14_1","volume-title":"The Sanger FASTQ file format for sequences with quality scores, and the Solexa\/Illumina FASTQ variants. Nucleic acids research 38, 6","author":"Cock Peter JA","year":"2010","unstructured":"Peter JA Cock , Christopher J Fields , Naohisa Goto , Michael L Heuer , and Peter M Rice . 2010. The Sanger FASTQ file format for sequences with quality scores, and the Solexa\/Illumina FASTQ variants. Nucleic acids research 38, 6 ( 2010 ), 1767--1771. Peter JA Cock, Christopher J Fields, Naohisa Goto, Michael L Heuer, and Peter M Rice. 2010. The Sanger FASTQ file format for sequences with quality scores, and the Solexa\/Illumina FASTQ variants. Nucleic acids research 38, 6 (2010), 1767--1771."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872787"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2003.12.001"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2501928.2501931"},{"key":"e_1_3_2_2_18_1","volume-title":"Search engines: Information retrieval in practice","author":"Croft W Bruce","unstructured":"W Bruce Croft , Donald Metzler , and Trevor Strohman . [n.d.]. Search engines: Information retrieval in practice . Vol. 520 . W Bruce Croft, Donald Metzler, and Trevor Strohman. [n.d.]. Search engines: Information retrieval in practice. Vol. 520."},{"key":"e_1_3_2_2_19_1","volume-title":"Improved representation of sequence bloom trees. Bioinformatics (08","author":"Harris Robert S","year":"2019","unstructured":"Robert S Harris and Paul Medvedev . 2019. Improved representation of sequence bloom trees. Bioinformatics (08 2019 ). Robert S Harris and Paul Medvedev. 2019. Improved representation of sequence bloom trees. Bioinformatics (08 2019)."},{"key":"e_1_3_2_2_20_1","volume-title":"D1","author":"Kodama Yuichi","year":"2011","unstructured":"Yuichi Kodama , Martin Shumway , and Rasko Leinonen . 2011. The Sequence Read Archive: explosive growth of sequencing data. Nucleic acids research 40 , D1 ( 2011 ), D54--D56. Yuichi Kodama, Martin Shumway, and Rasko Leinonen. 2011. The Sequence Read Archive: explosive growth of sequencing data. Nucleic acids research 40, D1 (2011), D54--D56."},{"key":"e_1_3_2_2_21_1","unstructured":"Daniel Lemire. 2012. When is a bitmap faster than an integer list? https:\/\/lemire. me\/blog\/2012\/10\/23\/when-is-a-bitmap-faster-than-an-integer-list\/  Daniel Lemire. 2012. When is a bitmap faster than an integer list? https:\/\/lemire. me\/blog\/2012\/10\/23\/when-is-a-bitmap-faster-than-an-integer-list\/"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2002.803864"},{"key":"e_1_3_2_2_23_1","volume-title":"Mash: fast genome and metagenome distance estimation using MinHash. Genome biology 17, 1","author":"Ondov Brian D","year":"2016","unstructured":"Brian D Ondov , Todd J Treangen , P\u00e1ll Melsted , Adam B Mallonee , Nicholas H Bergman , Sergey Koren , and Adam M Phillippy . 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome biology 17, 1 ( 2016 ), 132. Brian D Ondov, Todd J Treangen, P\u00e1ll Melsted, Adam B Mallonee, Nicholas H Bergman, Sergey Koren, and Adam M Phillippy. 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome biology 17, 1 (2016), 132."},{"key":"e_1_3_2_2_24_1","volume-title":"Mantis: A fast, small, and exact large-scale sequence-search index. Cell systems 7, 2","author":"Pandey Prashant","year":"2018","unstructured":"Prashant Pandey , Fatemeh Almodaresi , Michael A Bender , Michael Ferdman , Rob Johnson , and Rob Patro . 2018 . Mantis: A fast, small, and exact large-scale sequence-search index. Cell systems 7, 2 (2018), 201--207. Prashant Pandey, Fatemeh Almodaresi, Michael A Bender, Michael Ferdman, Rob Johnson, and Rob Patro. 2018. Mantis: A fast, small, and exact large-scale sequence-search index. Cell systems 7, 2 (2018), 201--207."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1290672.1290680"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSPEC.2013.6545119"},{"key":"e_1_3_2_2_27_1","volume-title":"NISC Comparative Sequencing Program, et al","author":"Snitkin Evan S","year":"2012","unstructured":"Evan S Snitkin , Adrian M Zelazny , Pamela J Thomas , Frida Stock , David K Henderson , Tara N Palmore , Julia A Segre , NISC Comparative Sequencing Program, et al . 2012 . Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Science translational medicine 4, 148 (2012), 148ra116--148ra116. Evan S Snitkin, Adrian M Zelazny, Pamela J Thomas, Frida Stock, David K Henderson, Tara N Palmore, Julia A Segre, NISC Comparative Sequencing Program, et al. 2012. Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Science translational medicine 4, 148 (2012), 148ra116--148ra116."},{"key":"e_1_3_2_2_28_1","volume-title":"Fast search of thousands of short-read sequencing experiments. Nature biotechnology 34, 3","author":"Solomon Brad","year":"2016","unstructured":"Brad Solomon and Carl Kingsford . 2016. Fast search of thousands of short-read sequencing experiments. Nature biotechnology 34, 3 ( 2016 ), 300. Brad Solomon and Carl Kingsford. 2016. Fast search of thousands of short-read sequencing experiments. Nature biotechnology 34, 3 (2016), 300."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-56970-3_16"},{"key":"e_1_3_2_2_30_1","volume-title":"The public health impact of a publically available, environmental database of microbial genomes. Frontiers in microbiology 8","author":"Stevens Eric L","year":"2017","unstructured":"Eric L Stevens , Ruth Timme , Eric W Brown , Marc W Allard , Errol Strain , Kelly Bunning , and Steven Musser . 2017. The public health impact of a publically available, environmental database of microbial genomes. Frontiers in microbiology 8 ( 2017 ), 808. Eric L Stevens, Ruth Timme, Eric W Brown, Marc W Allard, Errol Strain, Kelly Bunning, and Steven Musser. 2017. The public health impact of a publically available, environmental database of microbial genomes. Frontiers in microbiology 8 (2017), 808."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1089\/cmb.2017.0258"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bty157"}],"event":{"name":"SIGMOD\/PODS '21: International Conference on Management of Data","location":"Virtual Event China","acronym":"SIGMOD\/PODS '21","sponsor":["SIGMOD ACM Special Interest Group on Management of Data"]},"container-title":["Proceedings of the 2021 International Conference on Management of Data"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3448016.3457333","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3448016.3457333","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:25:04Z","timestamp":1750195504000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3448016.3457333"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,9]]},"references-count":32,"alternative-id":["10.1145\/3448016.3457333","10.1145\/3448016"],"URL":"https:\/\/doi.org\/10.1145\/3448016.3457333","relation":{},"subject":[],"published":{"date-parts":[[2021,6,9]]},"assertion":[{"value":"2021-06-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}