{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:21:21Z","timestamp":1750306881256,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2012,12,18]],"date-time":"2012-12-18T00:00:00Z","timestamp":1355788800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGOPS Oper. Syst. Rev."],"published-print":{"date-parts":[[2012,12,18]]},"abstract":"<jats:p>Explosion of data growth and duplication of data in enterprises has led to the deployment of a variety of deduplication technologies. However not all deduplication technologies serve the needs of every workload. Most prior research in deduplication concentrates on fixed block size (or variable block size at a fixed block boundary) deduplication which provides sub-optimal space efficiency in workloads where the duplicate data is not block aligned. Workloads also differ in the nature of operations and their priorities thereby affecting the choice of the right flavor of deduplication. Object workloads for instance, hold multiple versions of archived documents that have a high degree of duplicate data. They are also write-once read-many in nature and follow a whole object GET, PUT and DELETE model and would be better served by a deduplication strategy that takes care of nonblock aligned changes to data.<\/jats:p>\n          <jats:p>In this paper, we describe and evaluate a hybrid of a variable length and block based deduplication that is hierarchical in nature. We are motivated by the following insights from real world data: (a) object workload applications do not do in-place modification of data and hence new versions of objects are written again as a whole (b) significant amount of data among different versions of the same object is shareable but the changes are usually not block aligned. While the second point is the basis for variable length technique, both the above insights motivate our hierarchical deduplication strategy.<\/jats:p>\n          <jats:p>We show through experiments with production data-sets from enterprise environments that this provides up to twice the space savings compared to a fixed block deduplication.<\/jats:p>","DOI":"10.1145\/2421648.2421657","type":"journal-article","created":{"date-parts":[[2013,1,2]],"date-time":"2013-01-02T13:23:15Z","timestamp":1357132995000},"page":"57-64","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Space savings and design considerations in variable length deduplication"],"prefix":"10.1145","volume":"46","author":[{"given":"Giridhar","family":"Appaji Nag Yasa","sequence":"first","affiliation":[{"name":"NetApp Inc."}]},{"given":"P. C.","family":"Nagesh","sequence":"additional","affiliation":[{"name":"NetApp Inc."}]}],"member":"320","published-online":{"date-parts":[[2012,12,18]]},"reference":[{"volume-title":"IDC","year":"2010","author":"DuBois Laura","key":"e_1_2_1_1_1"},{"volume-title":"IDC","year":"2011","author":"Gantz John","key":"e_1_2_1_2_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_3_1","DOI":"10.5555\/1267102.1267104"},{"volume-title":"Proceedings of the 1st USENIX conference on file and storage technologies","year":"2002","author":"Quinlan Sean","key":"e_1_2_1_4_1"},{"unstructured":"Jeff Bonwick. ZFS deduplication. https:\/\/blogs.oracle.com\/bonwick\/entry\/zfs%20dedup 2009.  Jeff Bonwick. ZFS deduplication. https:\/\/blogs.oracle.com\/bonwick\/entry\/zfs%20dedup 2009.","key":"e_1_2_1_5_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.1145\/223784.223855"},{"volume-title":"Proceedings of the 7th USENIX conference on file and storage technologies","year":"2008","author":"Zhu Benjamin","key":"e_1_2_1_7_1"},{"unstructured":"Quantum. Data deduplication background: A technical white paper. http:\/\/www.quantum.com\/iqdoc\/doc.aspx?id=5959.  Quantum. Data deduplication background: A technical white paper. http:\/\/www.quantum.com\/iqdoc\/doc.aspx?id=5959.","key":"e_1_2_1_8_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1145\/502034.502052"},{"volume-title":"Proceedings of the 20th IEEE\/11th NASA Goddard Conference on Mass Storage Systems and Technologies","year":"2003","key":"e_1_2_1_10_1"},{"volume-title":"White Paper","year":"2009","author":"McClure Terri","key":"e_1_2_1_11_1"},{"volume-title":"Proceedings of the USENIX","year":"1994","author":"Manber Udi","key":"e_1_2_1_12_1"},{"unstructured":"Hewlett Packard. Understanding the HP deduplication strategy. http:\/\/www.usdatavault.com\/library\/understandingdeduplication.pdf.  Hewlett Packard. Understanding the HP deduplication strategy. http:\/\/www.usdatavault.com\/library\/understandingdeduplication.pdf.","key":"e_1_2_1_13_1"},{"volume-title":"Proccedings of the 7th USENIX conference on file and storage technologies","year":"2009","author":"Dubnicki Cezary","key":"e_1_2_1_14_1"},{"unstructured":"George Crump. Lab report: Deduplication of primary storage. http:\/\/www.ocarinanetworks.com\/products\/productsoverview 2009.  George Crump. Lab report: Deduplication of primary storage. http:\/\/www.ocarinanetworks.com\/products\/productsoverview 2009.","key":"e_1_2_1_15_1"},{"volume-title":"Proccedings of the 7th USENIX conference on File and storage technologies","year":"2009","author":"Lillibridge Mark","key":"e_1_2_1_16_1"},{"volume-title":"Proceedings of the 2011 USENIX conference on USENIX annual technical conference","year":"2011","author":"Xia Wen","key":"e_1_2_1_17_1"},{"volume-title":"Proceedings of the 10th USENIX conference on File and Storage Technologies","year":"2012","author":"Srinivasan Kiran","key":"e_1_2_1_18_1"},{"unstructured":"Carlos Alvarez. NetApp deduplication for FAS and V-Series. deployment and implementation guide. NetApp TR-3505 2009.  Carlos Alvarez. NetApp deduplication for FAS and V-Series. deployment and implementation guide. NetApp TR-3505 2009.","key":"e_1_2_1_19_1"},{"volume-title":"NetApp Inc.","year":"2009","author":"Brown Keith","key":"e_1_2_1_20_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_21_1","DOI":"10.5555\/646752.704751"},{"unstructured":"Arvid Norberg. Merkle hash torrent extension. www.bittorrent.org\/beps\/bep 0030.html 2009.  Arvid Norberg. Merkle hash torrent extension. www.bittorrent.org\/beps\/bep 0030.html 2009.","key":"e_1_2_1_22_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_23_1","DOI":"10.1145\/1534530.1534541"},{"volume-title":"SNIA","year":"2012","author":"Technical Position SNIA","key":"e_1_2_1_24_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_25_1","DOI":"10.1145\/256163.256168"},{"unstructured":"Michael O. Rabin. Fingerprinting by random polynomials. TR-15-81 1981.  Michael O. Rabin. Fingerprinting by random polynomials. TR-15-81 1981.","key":"e_1_2_1_26_1"},{"key":"e_1_2_1_27_1","first-page":"1321","author":"Rivest Ronald L.","year":"1992","journal-title":"IETF RFC"},{"unstructured":"The Debian Project. Weekly builds. http:\/\/cdimage.debian.org\/cdimage\/weekly-builds\/.  The Debian Project. Weekly builds. http:\/\/cdimage.debian.org\/cdimage\/weekly-builds\/.","key":"e_1_2_1_28_1"},{"unstructured":"Symantec. Netbackup. http:\/\/www.symantec.com\/netbackup.  Symantec. Netbackup. http:\/\/www.symantec.com\/netbackup.","key":"e_1_2_1_29_1"},{"unstructured":"IBM. Tivoli storage manager. http:\/\/www.ibm.com\/software\/tivoli\/products\/storagemgr\/.  IBM. Tivoli storage manager. http:\/\/www.ibm.com\/software\/tivoli\/products\/storagemgr\/.","key":"e_1_2_1_30_1"},{"unstructured":"Oracle Corporation. Btrfs checksum tree. http:\/\/en.wikipedia.org\/wiki\/Btrfs.  Oracle Corporation. Btrfs checksum tree. http:\/\/en.wikipedia.org\/wiki\/Btrfs.","key":"e_1_2_1_31_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_32_1","DOI":"10.1109\/MSST.2010.5496992"}],"container-title":["ACM SIGOPS Operating Systems Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2421648.2421657","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2421648.2421657","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:18:34Z","timestamp":1750234714000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2421648.2421657"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,12,18]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2012,12,18]]}},"alternative-id":["10.1145\/2421648.2421657"],"URL":"https:\/\/doi.org\/10.1145\/2421648.2421657","relation":{},"ISSN":["0163-5980"],"issn-type":[{"type":"print","value":"0163-5980"}],"subject":[],"published":{"date-parts":[[2012,12,18]]},"assertion":[{"value":"2012-12-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}