{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T14:39:34Z","timestamp":1774449574436,"version":"3.50.1"},"reference-count":84,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2014,6,1]],"date-time":"2014-06-01T00:00:00Z","timestamp":1401580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Regional Development Fund (EDRF) through the COMPETE Programme"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2014,7]]},"abstract":"<jats:p>The automatic elimination of duplicate data in a storage system, commonly known as deduplication, is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid-state drives, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development.<\/jats:p>\n          <jats:p>The first contribution of this article is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.<\/jats:p>","DOI":"10.1145\/2611778","type":"journal-article","created":{"date-parts":[[2014,7,1]],"date-time":"2014-07-01T14:23:02Z","timestamp":1404224582000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":110,"title":["A Survey and Classification of Storage Deduplication Systems"],"prefix":"10.1145","volume":"47","author":[{"given":"Jo\u00e3o","family":"Paulo","sequence":"first","affiliation":[{"name":"High-Assurance Software Lab (HASLab), INESC TEC &amp; University of Minho, Braga, Portugal"}]},{"given":"Jos\u00e9","family":"Pereira","sequence":"additional","affiliation":[{"name":"High-Assurance Software Lab (HASLab), INESC TEC &amp; University of Minho, Braga, Portugal"}]}],"member":"320","published-online":{"date-parts":[[2014,6]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI). USENIX","author":"Anand Ashok","year":"2010","unstructured":"Ashok Anand , Chitra Muthukrishnan , Steven Kappes , Aditya Akella , and Suman Nath . 2010 . Cheap and large CAMs for high performance data-intensive networked systems . In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI). USENIX , Berkeley, CA, 433--449. Ashok Anand, Chitra Muthukrishnan, Steven Kappes, Aditya Akella, and Suman Nath. 2010. Cheap and large CAMs for high performance data-intensive networked systems. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI). USENIX, Berkeley, CA, 433--449."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the Linux Symposium. 19--28","author":"Arcangeli Andrea","year":"2009","unstructured":"Andrea Arcangeli , Izik Eidus , and Chris Wright . 2009 . Increasing memory density by using KSM . In Proceedings of the Linux Symposium. 19--28 . Andrea Arcangeli, Izik Eidus, and Chris Wright. 2009. Increasing memory density by using KSM. In Proceedings of the Linux Symposium. 19--28."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1534530.1534539"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of USENIX Winter Technical Conference. USENIX","author":"Berliner Brian","year":"1990","unstructured":"Brian Berliner . 1990 . CVS II: Parallelizing software development . In Proceedings of USENIX Winter Technical Conference. USENIX , Berkeley, CA, 341--352. Brian Berliner. 1990. CVS II: Parallelizing software development. In Proceedings of USENIX Winter Technical Conference. USENIX, Berkeley, CA, 341--352."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOT.2009.5366623"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2006.42"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1210596.1210599"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the USENIX Windows System Symposium (WSS). USENIX","author":"Bolosky William J.","unstructured":"William J. Bolosky , Scott Corbin , David Goebel , and John R. Douceur . 2000. Single instance storage in Windows 2000 . In Proceedings of the USENIX Windows System Symposium (WSS). USENIX , Berkeley, CA, 1--12. William J. Bolosky, Scott Corbin, David Goebel, and John R. Douceur. 2000. Single instance storage in Windows 2000. In Proceedings of the USENIX Windows System Symposium (WSS). USENIX, Berkeley, CA, 1--12."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the Compression and Complexity of Sequences. IEEE Computer Society","author":"Broder Andrei","year":"1997","unstructured":"Andrei Broder . 1997 . On the resemblance and containment of documents . In Proceedings of the Compression and Complexity of Sequences. IEEE Computer Society , Washington, DC, 21--30. Andrei Broder. 1997. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of Sequences. IEEE Computer Society, Washington, DC, 21--30."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Andrei Z. Broder. 1993. Some applications of Rabin\u2019s fingerprinting method. In Sequences II: Methods in Communications Security and Computer Science. 143--152.  Andrei Z. Broder. 1993. Some applications of Rabin\u2019s fingerprinting method. In Sequences II: Methods in Communications Security and Computer Science. 143--152.","DOI":"10.1007\/978-1-4613-9323-8_11"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/265924.265930"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/266220.266223"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Chen Feng","year":"2011","unstructured":"Feng Chen , Tian Luo , and Xiaodong Zhang . 2011 . CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 77--90. Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 77--90."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2151007"},{"key":"e_1_2_1_15_1","volume-title":"Retrieved","author":"Chute Christopher","year":"2008","unstructured":"Christopher Chute , Alex Manfrediz , Stephen Minton , David Reinsel , Wolfgang Schlichting , and Anna Toncheva . 2008 . The diverse and exploding digital universe: An updated forecast of worldwide information growth through 2011. IDC white paper, sponsored by EMC . Retrieved September 12, 2013, from http:\/\/www.emc.com\/collateral\/analyst-reports\/diverse-exploding-digital-universe.pdf. Christopher Chute, Alex Manfrediz, Stephen Minton, David Reinsel, Wolfgang Schlichting, and Anna Toncheva. 2008. The diverse and exploding digital universe: An updated forecast of worldwide information growth through 2011. IDC white paper, sponsored by EMC. Retrieved September 12, 2013, from http:\/\/www.emc.com\/collateral\/analyst-reports\/diverse-exploding-digital-universe.pdf."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Clements Austin T.","year":"2009","unstructured":"Austin T. Clements , Irfan Ahmad , Murali Vilayannur , and Jinyuan Li . 2009 . Decentralized deduplication in SAN cluster file systems . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 1--14. Austin T. Clements, Irfan Ahmad, Murali Vilayannur, and Jinyuan Li. 2009. Decentralized deduplication in SAN cluster file systems. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--14."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Collberg Christian","unstructured":"Christian Collberg , John H. Hartman , Sridivya Babu , and Sharath K. Udupa . 2005. Slinky: Static linking reloaded . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 309--322. Christian Collberg, John H. Hartman, Sridivya Babu, and Sharath K. Udupa. 2005. Slinky: Static linking reloaded. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 309--322."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/DCC.2011.46"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1060289.1060316"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Debnath Biplob","year":"2010","unstructured":"Biplob Debnath , Sudipta Sengupta , and Jin Li . 2010 . ChunkStash: Speeding up inline storage deduplication using flash memory . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 1--16. Biplob Debnath, Sudipta Sengupta, and Jin Li. 2010. ChunkStash: Speeding up inline storage deduplication using flash memory. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--16."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989327"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Dong Wei","year":"2011","unstructured":"Wei Dong , Fred Douglis , Kai Li , Hugo Patterson , Sazzala Reddy , and Philip Shilane . 2011 . Tradeoffs in scalable data routing for deduplication clusters . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 15--29. Wei Dong, Fred Douglis, Kai Li, Hugo Patterson, Sazzala Reddy, and Philip Shilane. 2011. Tradeoffs in scalable data routing for deduplication clusters. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 15--29."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Douglis Fred","year":"2003","unstructured":"Fred Douglis and Arun Iyengar . 2003 . Application-specific delta-encoding via resemblance detection . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 113--126. Fred Douglis and Arun Iyengar. 2003. Application-specific delta-encoding via resemblance detection. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 113--126."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Douglis Fred","year":"2004","unstructured":"Fred Douglis , Jason Lavoie , John M. Tracey , Purushottam Kulkarni , and Purushottam Kulkarni . 2004 . Redundancy elimination within large collections of files . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 1--5. Fred Douglis, Jason Lavoie, John M. Tracey, Purushottam Kulkarni, and Purushottam Kulkarni. 2004. Redundancy elimination within large collections of files. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--5."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Dubnicki Cezary","year":"2009","unstructured":"Cezary Dubnicki , Leszek Gryz , Lukasz Heldt , Michal Kaczmarczyk , Wojciech Kilian , Przemyslaw Strzelczak , Jerzy Szczepkowski , Cristian Ungureanu , and Michal Welnicki . 2009 . HYDRAstor: A scalable secondary storage . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 197--210. Cezary Dubnicki, Leszek Gryz, Lukasz Heldt, Michal Kaczmarczyk, Wojciech Kilian, Przemyslaw Strzelczak, Jerzy Szczepkowski, Cristian Ungureanu, and Michal Welnicki. 2009. HYDRAstor: A scalable secondary storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 197--210."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"El-Shimi Ahmed","year":"2012","unstructured":"Ahmed El-Shimi , Ran Kalach , Ankit Kumar , Adi Oltean , Jin Li , and Sudipta Sengupta . 2012 . Primary data deduplication large scale study and system design . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 1--14. Ahmed El-Shimi, Ran Kalach, Ankit Kumar, Adi Oltean, Jin Li, and Sudipta Sengupta. 2012. Primary data deduplication large scale study and system design. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--14."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Eshghi Kave","year":"2007","unstructured":"Kave Eshghi , Mark Lillibridge , Lawrence Wilcock , Guillaume Belrose , and Rycharde Hawkes . 2007 . Jumbo Store: Providing efficient incremental upload and versioning for a utility rendering service . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 123--138. Kave Eshghi, Mark Lillibridge, Lawrence Wilcock, Guillaume Belrose, and Rycharde Hawkes. 2007. Jumbo Store: Providing efficient incremental upload and versioning for a utility rendering service. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 123--138."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391246"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/2442626.2442649"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Guo Fanglu","year":"2011","unstructured":"Fanglu Guo and Petros Efstathopoulos . 2011 . Building a high-performance deduplication system . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 1--14. Fanglu Guo and Petros Efstathopoulos. 2011. Building a high-performance deduplication system. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--14."},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Gupta Aayush","year":"2011","unstructured":"Aayush Gupta , Raghav Pisolkar , Bhuvan Urgaonkar , and Anand Sivasubramaniam . 2011 . Leveraging value locality in optimizing NAND flash-based SSDs . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 91--103. Aayush Gupta, Raghav Pisolkar, Bhuvan Urgaonkar, and Anand Sivasubramaniam. 2011. Leveraging value locality in optimizing NAND flash-based SSDs. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 91--103."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1831407.1831429"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2010.187"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society","author":"Hong Bo","unstructured":"Bo Hong and Darrell D. E. Long . 2004. Duplicate data elimination in a SAN file system . In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society , Washington, DC, 301--314. Bo Hong and Darrell D. E. Long. 2004. Duplicate data elimination in a SAN file system. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 301--314."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/279310.279321"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1534530.1534540"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2367589.2367600"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2012.6232380"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2012.6232379"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/1855511.1855527"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Kruus Erik","year":"2010","unstructured":"Erik Kruus , Cristian Ungureanu , and Cezary Dubnicki . 2010 . Bimodal content defined chunking for backup streams . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 239--252. Erik Kruus, Cristian Ungureanu, and Cezary Dubnicki. 2010. Bimodal content defined chunking for backup streams. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 239--252."},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the USENIX Workshop on I\/O Virtualization (WIOV). USENIX","author":"Liguori Anthony","year":"2008","unstructured":"Anthony Liguori and Eric Van Hensbergen . 2008 . Experiences with content addressable storage and virtual disks . In Proceedings of the USENIX Workshop on I\/O Virtualization (WIOV). USENIX , Berkeley, CA, 1--5. Anthony Liguori and Eric Van Hensbergen. 2008. Experiences with content addressable storage and virtual disks. In Proceedings of the USENIX Workshop on I\/O Virtualization (WIOV). USENIX, Berkeley, CA, 1--5."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Lillibridge Mark","year":"2009","unstructured":"Mark Lillibridge , Kave Eshghi , Deepavali Bhagwat , Vinay Deolalikar , Greg Trezise , and Peter Camble . 2009 . Sparse indexing: Large scale, inline deduplication using sampling and locality . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 111--123. Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble. 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 111--123."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043556.2043558"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2010.37"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society","author":"Lu Guanlin","unstructured":"Guanlin Lu , Youngjin Nam , and David H. C. Du . 2012. BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash . In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society , Washington, DC, 1--11. Guanlin Lu, Youngjin Nam, and David H. C. Du. 2012. BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 1--11."},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the USENIX Winter Technical Conference. USENIX","author":"Manber Udi","year":"1994","unstructured":"Udi Manber . 1994 . Finding similar files in a large file system . In Proceedings of the USENIX Winter Technical Conference. USENIX , Berkeley, CA, 1--10. Udi Manber. 1994. Finding similar files in a large file system. In Proceedings of the USENIX Winter Technical Conference. USENIX, Berkeley, CA, 1--10."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1462735.1462739"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1534530.1534541"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496992"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1352592.1352598"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Dutch","unstructured":"Dutch T. Meyer and William J. Bolosky. 2011. A study of practical deduplication . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 1--13. Dutch T. Meyer and William J. Bolosky. 2011. A study of practical deduplication. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 1--13."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Milos Grzegorz","unstructured":"Grzegorz Milos , Derek G. Murray , Steven Hand , and Michael A. Fetterman . 2009. Satori: Enlightened page sharing . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 1--14. Grzegorz Milos, Derek G. Murray, Steven Hand, and Michael A. Fetterman. 2009. Satori: Enlightened page sharing. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--14."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/502034.502052"},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Nath Partho","year":"2006","unstructured":"Partho Nath , Michael A. Kozuch , David R. O\u2019Hallaron , Jan Harkes , M. Satyanarayanan , Niraj Tolia , and Matt Toups . 2006 . Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 71--84. Partho Nath, Michael A. Kozuch, David R. O\u2019Hallaron, Jan Harkes, M. Satyanarayanan, Niraj Tolia, and Matt Toups. 2006. Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 71--84."},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the ACM\/IFIP\/USENIX International Middleware Conference. ACM","author":"Ng Chun-Ho","unstructured":"Chun-Ho Ng , Mingcao Ma , Tsz-Yeung Wong , Patrick P. C. Lee , and John C. S. Lui . 2011. Live deduplication storage of virtual machine images in an open-source cloud . In Proceedings of the ACM\/IFIP\/USENIX International Middleware Conference. ACM , New York, 1--20. Chun-Ho Ng, Mingcao Ma, Tsz-Yeung Wong, Patrick P. C. Lee, and John C. S. Lui. 2011. Live deduplication storage of virtual machine images in an open-source cloud. In Proceedings of the ACM\/IFIP\/USENIX International Middleware Conference. ACM, New York, 1--20."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.5555\/645962.674230"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2011.71"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33615-7_9"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.5555\/1247415.1247421"},{"key":"e_1_2_1_63_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Quinlan Sean","year":"2002","unstructured":"Sean Quinlan and Sean Dorward . 2002 . Venti: A new approach to archival storage . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 1--13. Sean Quinlan and Sean Dorward. 2002. Venti: A new approach to archival storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 1--13."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/PST.2012.6297923"},{"key":"e_1_2_1_66_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Rhea Sean","year":"2008","unstructured":"Sean Rhea , Russ Cox , and Alex Pesterev . 2008 . Fast, inexpensive content-addressed storage in foundation . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 143--156. Sean Rhea, Russ Cox, and Alex Pesterev. 2008. Fast, inexpensive content-addressed storage in foundation. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 143--156."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/SRDS.2011.18"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/6314.6315"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/2287076.2287081"},{"key":"e_1_2_1_70_1","volume-title":"Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). USENIX","author":"Shilane Philip","year":"2012","unstructured":"Philip Shilane , Grant Wallace , Mark Huang , and Windsor Hsu . 2012 . Delta compressed and deduplicated storage using stream-informed locality . In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). USENIX , Berkeley, CA, 1--10. Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu. 2012. Delta compressed and deduplicated storage using stream-informed locality. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). USENIX, Berkeley, CA, 1--10."},{"key":"e_1_2_1_71_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Srinivasan Kiran","year":"2012","unstructured":"Kiran Srinivasan , Tim Bisson , Garth Goodson , and Kaladhar Voruganti . 2012 . iDedup: Latency-aware, inline data deduplication for primary storage . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 1--14. Kiran Srinivasan, Tim Bisson, Garth Goodson, and Kaladhar Voruganti. 2012. iDedup: Latency-aware, inline data deduplication for primary storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 1--14."},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/1456469.1456471"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/1972551.1972552"},{"key":"e_1_2_1_74_1","volume-title":"Proceedings of the Workshop on Hot Topics in Security (HotSec). USENIX","author":"Suzaki Kuniyasu","year":"2010","unstructured":"Kuniyasu Suzaki , Toshiki Yagi , Kengo Iijima , Nguyen Anh Quynh , Cyrille Artho , and Yoshihito Watanebe . 2010 . Moving from logical sharing of guest OS to physical sharing of deduplication on virtual machine . In Proceedings of the Workshop on Hot Topics in Security (HotSec). USENIX , Berkeley, CA, 1--7. Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Nguyen Anh Quynh, Cyrille Artho, and Yoshihito Watanebe. 2010. Moving from logical sharing of guest OS to physical sharing of deduplication on virtual machine. In Proceedings of the Workshop on Hot Topics in Security (HotSec). USENIX, Berkeley, CA, 1--7."},{"key":"e_1_2_1_75_1","volume-title":"Poster Session of the USENIX Annual Technical Conference (ATC). USENIX","author":"Tarasov Vasily","year":"2012","unstructured":"Vasily Tarasov , Amar Mudrankit , Will Buik , Philip Shilane , Geoff Kuenning , and Erez Zadok . 2012 . Generating realistic datasets for deduplication analysis . In Poster Session of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 1--2. Vasily Tarasov, Amar Mudrankit, Will Buik, Philip Shilane, Geoff Kuenning, and Erez Zadok. 2012. Generating realistic datasets for deduplication analysis. In Poster Session of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--2."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2011.5937237"},{"key":"e_1_2_1_77_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Ungureanu Cristian","year":"2010","unstructured":"Cristian Ungureanu , Benjamin Atkin , Akshat Aranya , Salil Gokhale , Stephen Rago , Grzegorz Calkowski , Cezary Dubnicki , and Aniruddha Bohra . 2010 . HydraFS: A high-throughput file system for the HYDRAstor content-addressable storage system . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 225--238. Cristian Ungureanu, Benjamin Atkin, Akshat Aranya, Salil Gokhale, Stephen Rago, Grzegorz Calkowski, Cezary Dubnicki, and Aniruddha Bohra. 2010. HydraFS: A high-throughput file system for the HYDRAstor content-addressable storage system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 225--238."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/844128.844146"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496987"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/1508293.1508299"},{"key":"e_1_2_1_81_1","volume-title":"Retrieved","author":"Wright Jeff","year":"2011","unstructured":"Jeff Wright . 2011 . Sun ZFS Storage Appliance Deduplication Design and Implementation Guidelines . Retrieved September 12, 2013, from http:\/\/www.oracle.com\/technetwork\/articles\/servers-storage-admin\/zfs-storage-deduplication-335298.html. Jeff Wright. 2011. Sun ZFS Storage Appliance Deduplication Design and Implementation Guidelines. Retrieved September 12, 2013, from http:\/\/www.oracle.com\/technetwork\/articles\/servers-storage-admin\/zfs-storage-deduplication-335298.html."},{"key":"e_1_2_1_82_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). USENIX","author":"Xia Wen","year":"2011","unstructured":"Wen Xia , Hong Jiang , Dan Feng , and Yu Hua . 2011 . SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput . In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX , Berkeley, CA, 26--30. Wen Xia, Hong Jiang, Dan Feng, and Yu Hua. 2011. SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 26--30."},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1631\/jzus.C0910445"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470468"},{"key":"e_1_2_1_85_1","volume-title":"Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society","author":"You Lawrence","year":"2004","unstructured":"Lawrence You and Christos Karamanolis . 2004 . Evaluation of efficient archival storage techniques . In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society , Washington, DC, 227--232. Lawrence You and Christos Karamanolis. 2004. Evaluation of efficient archival storage techniques. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 227--232."},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2005.47"},{"key":"e_1_2_1_87_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX","author":"Zhu Benjamin","year":"2008","unstructured":"Benjamin Zhu , Kai Li , and Hugo Patterson . 2008 . Avoiding the disk bottleneck in the data domain deduplication file system . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX , Berkeley, CA, 1--14. Benjamin Zhu, Kai Li, and Hugo Patterson. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 1--14."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2611778","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2611778","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:01:34Z","timestamp":1750230094000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2611778"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6]]},"references-count":84,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,7]]}},"alternative-id":["10.1145\/2611778"],"URL":"https:\/\/doi.org\/10.1145\/2611778","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,6]]},"assertion":[{"value":"2012-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}