{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T06:29:03Z","timestamp":1778048943449,"version":"3.51.4"},"reference-count":64,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,7,31]],"date-time":"2022-07-31T00:00:00Z","timestamp":1659225600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Italian MIUR PRIN","award":["2017WR7SHH"],"award-info":[{"award-number":["2017WR7SHH"]}]},{"DOI":"10.13039\/501100009888","name":"Regione Toscana","doi-asserted-by":"crossref","award":["POR FSE 2014\/2020"],"award-info":[{"award-number":["POR FSE 2014\/2020"]}],"id":[{"id":"10.13039\/501100009888","id-type":"DOI","asserted-by":"crossref"}]},{"name":"SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics","award":["871042, and 952026"],"award-info":[{"award-number":["871042, and 952026"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Algorithms"],"published-print":{"date-parts":[[2022,7,31]]},"abstract":"<jats:p>\n            We address the problem of designing, implementing, and experimenting with compressed data structures that support\n            <jats:sans-serif>rank<\/jats:sans-serif>\n            and\n            <jats:sans-serif>select<\/jats:sans-serif>\n            queries over a dictionary of integers. We shine a new light on this classical problem by showing a connection between the input integers and the geometry of a set of points in a Cartesian plane suitably derived from them. We then build upon some results in computational geometry to introduce the first compressed rank\/select dictionary based on the idea of \u201clearning\u201d the distribution of such points via proper\n            <jats:italic>linear approximations<\/jats:italic>\n            (LA). We therefore call this novel data structure the\n            <jats:monospace>la_vector<\/jats:monospace>\n            .\n          <\/jats:p>\n          <jats:p>\n            We prove time and space complexities of the\n            <jats:monospace>la_vector<\/jats:monospace>\n            in several scenarios: in the worst case, in the case of input distributions with finite mean and variance, and taking into account the\n            <jats:italic>k<\/jats:italic>\n            th order entropy of some of its building blocks. We also discuss improved\n            <jats:italic>hybrid<\/jats:italic>\n            data structures, namely, ones that suitably orchestrate known compressed rank\/select dictionaries with the\n            <jats:monospace>la_vector<\/jats:monospace>\n            .\n          <\/jats:p>\n          <jats:p>\n            We corroborate our theoretical results with a large set of experiments over datasets originating from a variety of applications (Web search, DNA sequencing, information retrieval, and natural language processing) and show that our approach provides new interesting space-time tradeoffs with respect to many well-established compressed rank\/select dictionary implementations. In particular, we show that our\n            <jats:sans-serif>select<\/jats:sans-serif>\n            is the fastest, and our\n            <jats:sans-serif>rank<\/jats:sans-serif>\n            is on the space-time Pareto frontier.\n          <\/jats:p>","DOI":"10.1145\/3524060","type":"journal-article","created":{"date-parts":[[2022,3,17]],"date-time":"2022-03-17T13:36:20Z","timestamp":1647524180000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["A Learned Approach to Design Compressed Rank\/Select Data Structures"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8178-135X","authenticated-orcid":false,"given":"Antonio","family":"Boffa","sequence":"first","affiliation":[{"name":"University of Pisa, Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1353-360X","authenticated-orcid":false,"given":"Paolo","family":"Ferragina","sequence":"additional","affiliation":[{"name":"University of Pisa, Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0328-7791","authenticated-orcid":false,"given":"Giorgio","family":"Vinciguerra","sequence":"additional","affiliation":[{"name":"University of Pisa, Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,10,11]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"337","volume-title":"Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI)","author":"Agarwal Rachit","year":"2015","unstructured":"Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. 2015. Succinct: Enabling queries on compressed data. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 337\u2013350."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.14778\/2002974.2002975"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2018.05.007"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-32686-9_33"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/SCCC51225.2020.9281244"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.4230\/LIPIcs.STACS.2009.1814"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/1963190.2025378"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611976472.4"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/950620.950622"},{"key":"e_1_3_2_11_2","volume-title":"Compact Pat Trees","author":"Clark David Richard","year":"1996","unstructured":"David Richard Clark. 1996. Compact Pat Trees. Ph.D. Dissertation. University of Waterloo, Canada."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89097-3_18"},{"key":"e_1_3_2_13_2","volume-title":"Introduction to Algorithms (3rd ed.)","author":"Cormen Thomas H.","year":"2009","unstructured":"Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.). The MIT Press."},{"key":"e_1_3_2_14_2","volume-title":"Elements of Information Theory (2nd ed.)","author":"Cover Thomas M.","year":"2006","unstructured":"Thomas M. Cover and Joy A. Thomas. 2006. Elements of Information Theory (2nd ed.). Wiley."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/321812.321820"},{"key":"e_1_3_2_16_2","volume-title":"On the Number of Bits Required to Implement an Associative Memory. Memo 61","author":"Fano Robert Mario","year":"1971","unstructured":"Robert Mario Fano. 1971. On the Number of Bits Required to Implement an Associative Memory. Memo 61. Massachusetts Institute of Technology, Project MAC."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-011-9535-0"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1201\/9781315119335-59"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.5555\/3524938.3525231"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2021.04.015"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1082036.1082039"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30213-1_23"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/1240233.1240243"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-010-9437-6"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-43883-8_2"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.14778\/3389133.3389135"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/1198513.1198521"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-07959-2_28"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1002\/spe.2198"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2007.07.041"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/1109557.1109599"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-013-9767-2"},{"key":"e_1_3_2_33_2","first-page":"27","volume-title":"Poster Proceedings Volume of the 4th Workshop on Efficient and Experimental Algorithms (WEA)","author":"Gonz\u00e1lez Rodrigo","year":"2005","unstructured":"Rodrigo Gonz\u00e1lez, Szymon Grabowski, Veli M\u00e4kinen, and Gonzalo Navarro. 2005. Practical implementation of rank and select queries. In Poster Proceedings Volume of the 4th Workshop on Efficient and Experimental Algorithms (WEA). 27\u201338."},{"key":"e_1_3_2_34_2","first-page":"841","volume-title":"Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)","author":"Grossi Roberto","year":"2003","unstructured":"Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. 2003. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 841\u2013850."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2007.07.042"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.1989.63533"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539797331105"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1002\/spe.2203"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139940023"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2007.07.013"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1089\/cmb.2009.0169"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-62034-6_35"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539799364092"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2535933"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781316588284"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/1216370.1216372"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-30850-5_26"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3409371"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972870.6"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/358746.358758"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609615"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3052773"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2008.83"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/1132516.1132551"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611973075.11"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4939-2864-4_332"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/1290672.1290680"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/1109557.1109693"},{"key":"e_1_3_2_59_2","unstructured":"Craig Silverstein. 2005. Google SparseHash. Retrieved fromhttps:\/\/github.com\/sparsehash\/sparsehash."},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/1871437.1871592"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-68552-4_12"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/2433396.2433409"},{"key":"e_1_3_2_63_2","volume-title":"Managing Gigabytes: Compressing and Indexing Documents and Images (2nd ed.)","author":"Witten Ian H.","year":"1999","unstructured":"Ian H. Witten, Alistair Moffat, and Timothy C. Bell. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images (2nd ed.). Morgan Kaufmann."},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-014-0355-0"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3313276.3316352"}],"container-title":["ACM Transactions on Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524060","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524060","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:04Z","timestamp":1750188664000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524060"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,31]]},"references-count":64,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,7,31]]}},"alternative-id":["10.1145\/3524060"],"URL":"https:\/\/doi.org\/10.1145\/3524060","relation":{},"ISSN":["1549-6325","1549-6333"],"issn-type":[{"value":"1549-6325","type":"print"},{"value":"1549-6333","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,31]]},"assertion":[{"value":"2021-03-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-03","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-10-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}