{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T12:20:16Z","timestamp":1767183616451,"version":"build-2065373602"},"reference-count":83,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"crossref","award":["807\/20"],"award-info":[{"award-number":["807\/20"]}],"id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>In the realm of information retrieval, the need to maintain reliable term-indexing has grown more acute in recent years, with vast amounts of ever-growing online data used for data mining and natural language processing, and searched by a large number of search-engine users. At the same time, an increasing portion of primary storage systems employ data deduplication, where duplicate logical data chunks are replaced with references to a unique physical copy.<\/jats:p>\n                  <jats:p>We show that indexing deduplicated data with deduplication-oblivious mechanisms might result in extreme inefficiencies: the index size would increase in proportion to the logical data size, regardless of its duplication ratio, consuming excessive storage and memory and slowing down lookups. In addition, the logically sequential accesses during index creation would be transformed into random and redundant accesses to the physical chunks. Indeed, to the best of our knowledge, term indexing is not supported by any deduplicating storage system.<\/jats:p>\n                  <jats:p>\n                    In this article, we propose the design of a deduplication-aware term-index that addresses these challenges.\n                    <jats:italic toggle=\"yes\">IDEA<\/jats:italic>\n                    maps terms to the unique chunks that contain them, and maps each chunk to the files in which it is contained. This basic design concept improves the index performance and can support advanced functionalities such as inline indexing, result ranking, and proximity search. Our prototype implementation based on Lucene (the search engine at the core of Elasticsearch) shows that IDEA can reduce the index size and indexing time by up to 73% and 94%, respectively, and reduce term-lookup latency by up to 82% and 59% for single and multi-term queries, respectively.\n                  <\/jats:p>","DOI":"10.1145\/3729426","type":"journal-article","created":{"date-parts":[[2025,5,21]],"date-time":"2025-05-21T06:55:32Z","timestamp":1747810532000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Physical vs. Logical Indexing with IDEA: Inverted Deduplication-Aware Index"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-5854-1957","authenticated-orcid":false,"given":"Asaf","family":"Levi","sequence":"first","affiliation":[{"name":"Computer Science, Technion Israel Institute of Technology","place":["Haifa, Israel"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1235-0502","authenticated-orcid":false,"given":"Philip","family":"Shilane","sequence":"additional","affiliation":[{"name":"Dell Technologies","place":["Newtown, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0524-7390","authenticated-orcid":false,"given":"Sarai","family":"Sheinvald","sequence":"additional","affiliation":[{"name":"Software Engineering, Ort Braude College of Engineering","place":["Karmiel, Israel"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2701-0260","authenticated-orcid":false,"given":"Gala","family":"Yadgar","sequence":"additional","affiliation":[{"name":"Computer Science, Technion Israel Institute of Technology","place":["Haifa, Israel"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,11,3]]},"reference":[{"key":"e_1_3_3_2_2","unstructured":"2019. Microsoft Search Server 2010 Express. Retrieved 1 October 2024 from https:\/\/www.microsoft.com\/en-us\/download\/details.aspx?id=18914"},{"key":"e_1_3_3_3_2","unstructured":"2022. Commvault Documentation PDF: PROTECT. ACCESS. COMPLY. SHARE. Retrieved 1 October 2024 from https:\/\/documentation.commvault.com\/commvault\/index.html"},{"key":"e_1_3_3_4_2","unstructured":"2022. Data Domain Boost: What is DDBoost plug-in and what it does? Retrieved 1 October 2024 from https:\/\/www.dell.com\/support\/kbdoc\/en-il\/000061966\/what-is-ddboost-plug-in"},{"key":"e_1_3_3_5_2","unstructured":"2022. Deduplication: btrfs Wiki. Retrieved 1 October 2024 from https:\/\/btrfs.wiki.kernel.org\/index.php\/Deduplication"},{"key":"e_1_3_3_6_2","unstructured":"2022. Dell EMC Data Protection Search 19.6.1 Deployment and Administration Guide. Retrieved 1 October 2024 from https:\/\/www.dell.com\/support\/home\/en-il\/product-support\/product\/data-protection-search\/docs"},{"key":"e_1_3_3_7_2","unstructured":"2022. Linux Kernel Archives. Retrieved 1 October 2024 from https:\/\/mirrors.edge.kernel.org\/pub\/linux\/kernel\/"},{"key":"e_1_3_3_8_2","unstructured":"2022. Solaris ZFS Administration Guide: The dedup Property. Retrieved 1 October 2024 from https:\/\/docs.oracle.com\/cd\/E19120-01\/open.solaris\/817-2271\/gjhav\/index.html"},{"key":"e_1_3_3_9_2","unstructured":"2022. Veeam Backup and Replication 11: User Guide for VMware vSphere. Retrieved 1 October 2024 from https:\/\/helpcenter.veeam.com\/docs\/backup\/vsphere\/overview.html?ver=110"},{"key":"e_1_3_3_10_2","unstructured":"2022. Wikimedia Data dump torrents. Retrieved 1 October 2024 from https:\/\/meta.wikimedia.org\/wiki\/Data_dump_torrents"},{"key":"e_1_3_3_11_2","unstructured":"2022. Wikimedia Downloads. Retrieved 1 October 2024 from https:\/\/dumps.wikimedia.org\/enwiki\/"},{"key":"e_1_3_3_12_2","unstructured":"2024. Windows Search Overview. Retrieved 1 October 2024 from https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/search\/-search-3x-wds-overview"},{"key":"e_1_3_3_13_2","unstructured":"2025. Amazon OpenSearch\u2122. Retrieved 1 October 2024 from https:\/\/aws.amazon.com\/what-is\/opensearch\/"},{"key":"e_1_3_3_14_2","unstructured":"2025. Apache Lucene. Retrieved 1 October 2024 from https:\/\/lucene.apache.org\/"},{"key":"e_1_3_3_15_2","unstructured":"2025. Elasticsearch: The heart of the free and open Elastic Stack. Retrieved 1 October 2024 from https:\/\/www.elastic.co\/elasticsearch\/"},{"key":"e_1_3_3_16_2","unstructured":"2025. IBM Watson. Retrieved 1 October 2024 from https:\/\/www.ibm.com\/watson"},{"key":"e_1_3_3_17_2","unstructured":"2025. Indri. Retrieved from http:\/\/www.lemurproject.org\/indri\/"},{"key":"e_1_3_3_18_2","unstructured":"2025. LMDB. Retrieved 1 October 2024 from http:\/\/www.lmdb.tech\/doc\/"},{"key":"e_1_3_3_19_2","unstructured":"2025. LucenePlusPlus. Retrieved 1 October 2024 from https:\/\/github.com\/luceneplusplus\/LucenePlusPlus"},{"key":"e_1_3_3_20_2","unstructured":"2025. Meilisearch. Retrieved 1 October 2024 from https:\/\/www.meilisearch.com\/"},{"key":"e_1_3_3_21_2","unstructured":"2025. Oracle Berkeley DB. Retrieved 1 October 2024 from https:\/\/www.oracle.com\/database\/technologies\/related\/berkeleydb.html"},{"key":"e_1_3_3_22_2","unstructured":"2025. RocksDB. Retrieved 1 October 2024 from http:\/\/rocksdb.org\/"},{"key":"e_1_3_3_23_2","unstructured":"2025. Solr. Retrieved 1 October 2024 from https:\/\/solr.apache.org\/"},{"key":"e_1_3_3_24_2","unstructured":"2025. TypeSense. Retrieved 1 October 2024 from https:\/\/typesense.org\/"},{"key":"e_1_3_3_25_2","volume-title":"Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC 18)","author":"Allu Yamini","year":"2018","unstructured":"Yamini Allu, Fred Douglis, Mahesh Kamat, Ramya Prabhakar, Philip Shilane, and Rahul Ugale. 2018. Can\u2019t we all get along? Redesigning protection storage for modern workloads. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC 18)."},{"key":"e_1_3_3_26_2","volume-title":"Nearest Neighbor Search: the Old, the New, and the Impossible","author":"Andoni Alexandr","year":"2009","unstructured":"Alexandr Andoni. 2009. Nearest Neighbor Search: the Old, the New, and the Impossible. Ph.D. Thesis. Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science."},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOT.2009.5366623"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994521"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7552(98)00110-X"},{"key":"e_1_3_3_30_2","article-title":"Efficient Search in NetApp Data Storage Solutions","author":"Brunner Manuel","year":"2023","unstructured":"Manuel Brunner. 2023. Efficient Search in NetApp Data Storage Solutions. Retrieved 1 October, 2024 from https:\/\/intrafind.com\/en\/blog\/efficient-search-in-netapp-data-storage-solutions","journal-title":"Retrieved 1 October, 2024 from"},{"key":"e_1_3_3_31_2","volume-title":"Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST 18)","author":"Cao Zhichao","year":"2018","unstructured":"Zhichao Cao, Hao Wen, Fenggang Wu, and David H. C. Du. 2018. ALACC: Accelerating restore performance of data deduplication systems using adaptive look-ahead window assisted chunk caching. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST 18)."},{"key":"e_1_3_3_32_2","volume-title":"Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST 11)","author":"Chen Feng","year":"2011","unstructured":"Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST 11)."},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742798"},{"key":"e_1_3_3_34_2","article-title":"How many shards should I have in my Elasticsearch cluster?","author":"Dahlqvist Christian","year":"2022","unstructured":"Christian Dahlqvist. 2022. How many shards should I have in my Elasticsearch cluster? Retrieved 1 October, 2024 from https:\/\/www.elastic.co\/blog\/how-many-shards-should-i-have-in-my-elasticsearch-cluster","journal-title":"Retrieved 1 October, 2024 from"},{"key":"e_1_3_3_35_2","volume-title":"Proceedings of the Symposium on Operating System Design and Implementation (OSDI 04)","author":"Dean Jeffrey","year":"2004","unstructured":"Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the Symposium on Operating System Design and Implementation (OSDI 04)."},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.5555\/1960475.1960477"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/2208488.2208501"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.5555\/3129633.3129637"},{"key":"e_1_3_3_39_2","volume-title":"Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Duggal Abhinav","year":"2019","unstructured":"Abhinav Duggal, Fani Jenkins, Philip Shilane, Ramprasad Chinthekindi, Ritesh Shah, and Mahesh Kamat. 2019. Data domain cloud tier: Backup here, backup there, deduplicated everywhere!. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19)."},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.5555\/2342821.2342847"},{"key":"e_1_3_3_41_2","article-title":"DedupSearch implementation","author":"Elias Nadav","year":"2025","unstructured":"Nadav Elias. 2025. DedupSearch implementation. Retrieved 1 October, 2024 from https:\/\/github.com\/NadavElias\/DedupSearch","journal-title":"R"},{"key":"e_1_3_3_42_2","volume-title":"Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST 22)","author":"Elias Nadav","year":"2022","unstructured":"Nadav Elias, Philip Shilane, Sarai Sheinvald, and Gala Yadgar. 2022. DedupSearch: Two-phase deduplication aware keyword search. In Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST 22). Retrieved from https:\/\/www.usenix.org\/conference\/fast22\/presentation\/elias"},{"volume-title":"Introduction to the EMC XtremIO Storage Array (Ver. 4.0) (rev. 08 ed.)","year":"2015","key":"e_1_3_3_43_2","unstructured":"EMC Corporation 2015. Introduction to the EMC XtremIO Storage Array (Ver. 4.0) (rev. 08 ed.). EMC Corporation."},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2013.6558437"},{"key":"e_1_3_3_45_2","volume-title":"Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 14)","author":"Fu Min","year":"2014","unstructured":"Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Fangting Huang, and Qing Liu. 2014. Accelerating restore and garbage collection in deduplication-based backup systems via exploiting historical information. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 14)."},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.5555\/2750482.2750507"},{"key":"e_1_3_3_47_2","volume-title":"Elasticsearch: The Definitive Guide: A Distributed Real-time Search and Analytics Engine","author":"Gormley Clinton","year":"2015","unstructured":"Clinton Gormley and Zachary Tong. 2015. Elasticsearch: The Definitive Guide: A Distributed Real-time Search and Analytics Engine. O\u2019Reilly Media, Inc., USA."},{"key":"e_1_3_3_48_2","first-page":"3929","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Guu Kelvin","year":"2020","unstructured":"Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In Proceedings of the International Conference on Machine Learning. PMLR, 3929\u20133938."},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.5555\/3323298.3323309"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1147\/rd.312.0249"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3565025"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2017.8258016"},{"key":"e_1_3_3_53_2","article-title":"How does a computer virus scan work?","author":"Kuenning Geoff","year":"2002","unstructured":"Geoff Kuenning. 2002. How does a computer virus scan work? Scientific American 286, 2 (2002).","journal-title":"Scientific American"},{"key":"e_1_3_3_54_2","volume-title":"Proceedings of the 22nd USENIX Conference on File and Storage Technologies (FAST)","author":"Levi Asaf","year":"2024","unstructured":"Asaf Levi, Philip Shilane, Sarai Sheinvald, and Gala Yadgar. 2024. Physical vs. logical indexing with IDEA: Inverted deduplication-aware index. In Proceedings of the 22nd USENIX Conference on File and Storage Technologies (FAST). Retrieved from https:\/\/www.usenix.org\/conference\/fast24\/presentation\/levi"},{"key":"e_1_3_3_55_2","first-page":"18470","article-title":"Pre-training via paraphrasing","volume":"33","author":"Lewis Mike","year":"2020","unstructured":"Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, and Luke Zettlemoyer. 2020. Pre-training via paraphrasing. Advances in Neural Information Processing Systems 33 (2020), 18470\u201318481.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_56_2","volume-title":"Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 14)","author":"Li Cheng","year":"2014","unstructured":"Cheng Li, Philip Shilane, Fred Douglis, Hyong Shim, Stephen Smaldone, and Grant Wallace. 2014. Nitro: A capacity-optimized SSD cache for primary storage. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 14)."},{"key":"e_1_3_3_57_2","volume-title":"Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST 16)","author":"Li Wenji","year":"2016","unstructured":"Wenji Li, Gregory Jean-Baptise, Juan Riveros, Giri Narasimhan, Tony Zhang, and Ming Zhao. 2016. CacheDedup: In-line deduplication for flash caching. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST 16)."},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.5555\/2591272.2591292"},{"key":"e_1_3_3_59_2","article-title":"Reducing Logging Cost by Two Orders of Magnitude using CLP","author":"Luo Jack (Yu)","year":"2022","unstructured":"Jack (Yu) Luo and Devesh Agrawal. 2022. Reducing Logging Cost by Two Orders of Magnitude using CLP. Retrieved from https:\/\/www.uber.com\/en-US\/blog\/reducing-logging-cost-by-two-orders-of-magnitude-using-clp","journal-title":"Retrieved from"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.5555\/1960475.1960476"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/237496.237497"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139940023"},{"key":"e_1_3_3_64_2","volume-title":"Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST 20)","author":"Nachman Aviv","year":"2020","unstructured":"Aviv Nachman, Gala Yadgar, and Sarai Sheinvald. 2020. GoSeed: Generating an optimal seeding plan for deduplicated storage. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST 20)."},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC.2011.82"},{"key":"e_1_3_3_66_2","article-title":"Searching 1.5TB\/sec: Systems Engineering Before Algorithms","author":"Newman Steve","year":"2022","unstructured":"Steve Newman. 2022. Searching 1.5TB\/sec: Systems Engineering Before Algorithms. Retrieved 1 October, 2024 from https:\/\/www.dataset.com\/blog\/systems-engineering-before-algorithms\/","journal-title":"Retrieved 1 October, 2024 from"},{"key":"e_1_3_3_67_2","unstructured":"Enno Ohlebusch. 2013. Bioinformatics Algorithms: Sequence Analysis Genome Rearrangements and Phylogenetic Reconstruction. Oldenbusch Verlag. Retrieved from http:\/\/www.oldenbusch-verlag.de\/"},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824078"},{"key":"e_1_3_3_69_2","first-page":"29","volume-title":"Proceedings of the 1st Instructional Conference on Machine Learning","author":"Ramos Juan","year":"2003","unstructured":"Juan Ramos. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning. 29\u201348."},{"key":"e_1_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.2307\/1373202"},{"key":"e_1_3_3_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564416"},{"key":"e_1_3_3_72_2","doi-asserted-by":"publisher","DOI":"10.5555\/3026852.3026870"},{"key":"e_1_3_3_73_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000013"},{"key":"e_1_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2007.914237"},{"key":"e_1_3_3_75_2","volume-title":"Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST 12)","author":"Srinivasan Kiran","year":"2012","unstructured":"Kiran Srinivasan, Tim Bisson, Garth Goodson, and Kaladhar Voruganti. 2012. iDedup: Latency-aware, inline data deduplication for primary storage. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST 12)."},{"key":"e_1_3_3_76_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2009.12.003"},{"key":"e_1_3_3_77_2","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556574"},{"key":"e_1_3_3_78_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31753-8_11"},{"key":"e_1_3_3_79_2","article-title":"Fast and Reliable Schema-Agnostic Log Analytics Platform","author":"Wang Chao","year":"2021","unstructured":"Chao Wang and Xiaobing Li. 2021. Fast and Reliable Schema-Agnostic Log Analytics Platform. Retrieved 1 October, 2024 from https:\/\/www.uber.com\/en-CA\/blog\/logging\/","journal-title":"Retrieved 1 October, 2024 from"},{"key":"e_1_3_3_80_2","volume-title":"Proceedings of the IEEE 37th Annual 2003 International Carnahan Conference on Security Technology.","author":"Wang Jau-Hwang","year":"2003","unstructured":"Jau-Hwang Wang, Peter S Deng, Yi-Shen Fan, Li-Jing Jaw, and Yu-Ching Liu. 2003. Virus detection using data mining techinques. In Proceedings of the IEEE 37th Annual 2003 International Carnahan Conference on Security Technology. IEEE."},{"key":"e_1_3_3_81_2","first-page":"65","article-title":"Computer-based discovery in federal civil litigation","volume":"1","author":"Withers Kenneth J.","year":"2006","unstructured":"Kenneth J. Withers. 2006. Computer-based discovery in federal civil litigation. Federal Courts Law Review 1, 65 (2006), 65.","journal-title":"Federal Courts Law Review"},{"key":"e_1_3_3_82_2","volume-title":"Managing Gigabytes: Compressing and Indexing Documents and Images","author":"Witten Ian H.","year":"1999","unstructured":"Ian H. Witten, Ian H. Witten, Alistair Moffat, Timothy C. Bell, Timothy C. Bell, Ed Fox, and Timothy C. Bell. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann."},{"key":"e_1_3_3_83_2","volume-title":"Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC 16)","author":"Xia Wen","year":"2016","unstructured":"Wen Xia, Yukun Zhou, Hong Jiang, Dan Feng, Yu Hua, Yuchong Hu, Qing Liu, and Yucheng Zhang. 2016. FastCDC: A fast and efficient content-defined chunking approach for data deduplication. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC 16)."},{"key":"e_1_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.5555\/1364813.1364831"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729426","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T13:31:32Z","timestamp":1762176692000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729426"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,3]]},"references-count":83,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3729426"],"URL":"https:\/\/doi.org\/10.1145\/3729426","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2025,11,3]]},"assertion":[{"value":"2025-02-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}