{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T16:48:23Z","timestamp":1755794903741,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,7,20]],"date-time":"2025-07-20T00:00:00Z","timestamp":1752969600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["SHF-2211815"],"award-info":[{"award-number":["SHF-2211815"]}],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,7,20]]},"DOI":"10.1145\/3690624.3709233","type":"proceedings-article","created":{"date-parts":[[2025,4,4]],"date-time":"2025-04-04T18:44:43Z","timestamp":1743792283000},"page":"1972-1983","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["IDentity with Locality: An Ideal Hash for Gene Sequence Search"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-1753-9933","authenticated-orcid":false,"given":"Tianyi","family":"Zhang","sequence":"first","affiliation":[{"name":"Computer Science, Rice University, Houston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8302-4037","authenticated-orcid":false,"given":"Gaurav","family":"Gupta","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering, Rice University, Houston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9111-9391","authenticated-orcid":false,"given":"Aditya","family":"Desai","sequence":"additional","affiliation":[{"name":"Electrical Engineering and Computer Sciences, University of California, Berkeley, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5042-2856","authenticated-orcid":false,"given":"Anshumali","family":"Shrivastava","sequence":"additional","affiliation":[{"name":"Computer Science, Rice University, Houston, USA, ThirdAI Corp., Houston, USA, xMAD.ai, Houston, USA, and Ken Kennedy Institute, Houston, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,7,20]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_3_2_2_1_1","DOI":"10.1016\/S0022-2836(05)80360-2"},{"unstructured":"Austin Appleby. 2018. smhasher. https:\/\/github.com\/aappleby\/smhasher\/blob\/ master\/src\/MurmurHash3.cpp.","key":"e_1_3_2_2_2_1"},{"doi-asserted-by":"crossref","unstructured":"Timo Bingmann Phelim Bradley Florian Gauger and Zamin Iqbal. 2019. COBS: a Compact Bit-Sliced Signature Index. In SPIRE.","key":"e_1_3_2_2_3_1","DOI":"10.1007\/978-3-030-32686-9_21"},{"key":"e_1_3_2_2_4_1","volume-title":"Gil McVean, and Zamin Iqbal.","author":"Bradley Phelim","year":"2019","unstructured":"Phelim Bradley, Henk C den Bakker, Eduardo PC Rocha, Gil McVean, and Zamin Iqbal. 2019. Ultrafast search of all deposited bacterial and viral genomic data. Nature biotechnology 37, 2 (2019), 152."},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_5_1","DOI":"10.5555\/829502.830043"},{"doi-asserted-by":"publisher","unstructured":"Josephine Burgin Alisha Ahamed Carla Cummins Rajkumar Devraj Khadim Gueye Dipayan Gupta Vikas Gupta Muhammad Haseeb Maira Ihsan Eugene Ivanov Suran Jayathilaka Vishnukumar Balavenkataraman Kadhirvelu Manish Kumar Ankur Lathi Rasko Leinonen Milena Mansurova Jasmine McKinnon Colman O'Cathail Joana Paup\u00e9rio St\u00e9phane Pesant Nadim Rahman Gabriele Rinck Sandeep Selvakumar Swati Suman Senthilnathan Vijayaraja Zahra Waheed Peter Woollard David Yuan Ahmad Zyoud Tony Burdett and Guy Cochrane. 2022. The European Nucleotide Archive in 2022. Nucleic Acids Research 51 D1 (11 2022) D121--D125. https:\/\/doi.org\/10.1093\/nar\/gkac1051 arXiv:https:\/\/academic.oup.com\/nar\/articlepdf\/ 51\/D1\/D121\/48441175\/gkac1051.pdf","key":"e_1_3_2_2_6_1","DOI":"10.1093\/nar\/gkac1051"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_7_1","DOI":"10.1016\/0022-0000(79)90044-8"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_8_1","DOI":"10.1145\/509907.509965"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_9_1","DOI":"10.1093\/bioinformatics\/"},{"key":"e_1_3_2_2_10_1","volume-title":"The Sanger FASTQ file format for sequences with quality scores, and the Solexa\/Illumina FASTQ variants. Nucleic acids research 38, 6","author":"Cock Peter JA","year":"2010","unstructured":"Peter JA Cock, Christopher J Fields, Naohisa Goto, Michael L Heuer, and Peter M Rice. 2010. The Sanger FASTQ file format for sequences with quality scores, and the Solexa\/Illumina FASTQ variants. Nucleic acids research 38, 6 (2010), 1767--1771."},{"unstructured":"Yann Collet. 2021. xxHash. https:\/\/github.com\/Cyan4973\/xxHash.","key":"e_1_3_2_2_11_1"},{"key":"e_1_3_2_2_12_1","volume-title":"D1","author":"Cook Charles E","year":"2019","unstructured":"Charles E Cook, Rodrigo Lopez, Oana Stroe, Guy Cochrane, Cath Brooksbank, Ewan Birney, and Rolf Apweiler. 2019. The European Bioinformatics Institute in 2018: tools, infrastructure and training. Nucleic acids research 47, D1 (2019), D15--D22."},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_13_1","DOI":"10.1145\/2501928.2501931"},{"doi-asserted-by":"crossref","unstructured":"Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5 1 (1998) 46--55.","key":"e_1_3_2_2_14_1","DOI":"10.1109\/99.660313"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_15_1","DOI":"10.1145\/997817.997857"},{"key":"e_1_3_2_2_16_1","volume-title":"To petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic acids research 48, 10","author":"Leo Elworth RA","year":"2020","unstructured":"RA Leo Elworth, QiWang, Pavan K Kota, CJ Barberan, Benjamin Coleman, Advait Balaji, Gaurav Gupta, Richard G Baraniuk, Anshumali Shrivastava, and Todd J Treangen. 2020. To petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic acids research 48, 10 (2020), 5217--5234."},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_17_1","DOI":"10.1145\/3448016.3457333"},{"doi-asserted-by":"crossref","unstructured":"Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC.","key":"e_1_3_2_2_18_1","DOI":"10.1145\/276698.276876"},{"key":"e_1_3_2_2_19_1","volume-title":"SODA","volume":"8","author":"Mitzenmacher Michael","year":"2008","unstructured":"Michael Mitzenmacher and Salil P Vadhan. 2008. Why simple hash functions work: exploiting the entropy in a data stream.. In SODA, Vol. 8. Citeseer, 746--755."},{"unstructured":"NCBI. . SRAToolkit. https:\/\/github.com\/ncbi\/sra-tools\/wiki\/01.-Downloading- SRA-Toolkit.","key":"e_1_3_2_2_20_1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_21_1","DOI":"10.1145\/1250734.1250746"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_22_1","DOI":"10.1089\/cmb.2016.0155"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_23_1","DOI":"10.1007\/978-3-540-72845-0_9"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_24_1","DOI":"10.1109\/MSPEC.2013.6545119"},{"key":"e_1_3_2_2_25_1","volume-title":"International Conference on Machine Learning. PMLR, 557--565","author":"Shrivastava Anshumali","year":"2014","unstructured":"Anshumali Shrivastava and Ping Li. 2014. Densifying one permutation hashing via rotation for fast near neighbor search. In International Conference on Machine Learning. PMLR, 557--565."},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_26_1","DOI":"10.1007\/978-3-319-56970-3_16"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_27_1","DOI":"10.1007\/978-3-319-56970-3_16"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_28_1","DOI":"10.1145\/3589334.3645672"}],"event":{"sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"],"acronym":"KDD '25","name":"KDD '25: The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Toronto ON Canada"},"container-title":["Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3690624.3709233","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3690624.3709233","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,16]],"date-time":"2025-08-16T15:40:37Z","timestamp":1755358837000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3690624.3709233"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,20]]},"references-count":28,"alternative-id":["10.1145\/3690624.3709233","10.1145\/3690624"],"URL":"https:\/\/doi.org\/10.1145\/3690624.3709233","relation":{},"subject":[],"published":{"date-parts":[[2025,7,20]]},"assertion":[{"value":"2025-07-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}