{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T14:47:23Z","timestamp":1773154043858,"version":"3.50.1"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,11,30]],"date-time":"2019-11-30T00:00:00Z","timestamp":1575072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2019,11,30]]},"abstract":"<jats:p>\n            The adoption of deduplication in storage systems has introduced significant new challenges for storage management. Specifically, the physical capacities associated with volumes are no longer readily available. In this work, we introduce a new approach to analyzing capacities in deduplicated storage environments. We provide sketch-based estimations of fundamental capacity measures required for managing a storage system: How much physical space would be reclaimed if a volume or group of volumes were to be removed from a system (the\n            <jats:italic>reclaimable<\/jats:italic>\n            capacity) and how much of the physical space should be attributed to each of the volumes in the system (the\n            <jats:italic>attributed<\/jats:italic>\n            capacity). Our methods also support capacity queries for volume groups across multiple storage systems, e.g., how much capacity would a volume group consume after being migrated to another storage system? We provide analytical accuracy guarantees for our estimations as well as empirical evaluations. Our technology is integrated into a prominent all-flash storage array and exhibits high performance even for very large systems. We also demonstrate how this method opens the door for performing placement decisions at the data-center level and obtaining insights on deduplication in the field.\n          <\/jats:p>","DOI":"10.1145\/3369737","type":"journal-article","created":{"date-parts":[[2019,12,18]],"date-time":"2019-12-18T13:21:11Z","timestamp":1576675271000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Sketching Volume Capacities in Deduplicated Storage"],"prefix":"10.1145","volume":"15","author":[{"given":"Danny","family":"Harnik","sequence":"first","affiliation":[{"name":"IBM Research, Givatayim, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Moshik","family":"Hershcovitch","sequence":"additional","affiliation":[{"name":"IBM Research, Givatayim, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yosef","family":"Shatsky","sequence":"additional","affiliation":[{"name":"IBM Systems, Givatayim, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Amir","family":"Epstein","sequence":"additional","affiliation":[{"name":"Citi Innovation Lab TLV, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ronen","family":"Kat","sequence":"additional","affiliation":[{"name":"IBM Research, Givatayim, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,12,18]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"VDBench Users Guide. 2012. Retrieved from https:\/\/www.oracle.com\/technetwork\/server-storage\/vdbench-1901683.pdf.  VDBench Users Guide. 2012. Retrieved from https:\/\/www.oracle.com\/technetwork\/server-storage\/vdbench-1901683.pdf."},{"key":"e_1_2_1_2_1","unstructured":"HPE StoreOnce Data Protection Backup Appliances. 2018. Retrieved from https:\/\/www.hpe.com\/us\/en\/storage\/storeonce.html.  HPE StoreOnce Data Protection Backup Appliances. 2018. Retrieved from https:\/\/www.hpe.com\/us\/en\/storage\/storeonce.html."},{"key":"e_1_2_1_3_1","unstructured":"IBM FlashSystem 9100. 2018. Retrieved from https:\/\/www.ibm.com\/us-en\/marketplace\/flashsystem-9100.  IBM FlashSystem 9100. 2018. Retrieved from https:\/\/www.ibm.com\/us-en\/marketplace\/flashsystem-9100."},{"key":"e_1_2_1_4_1","unstructured":"IBM FlashSystem A9000. 2018. Retrieved from https:\/\/www.ibm.com\/il-en\/marketplace\/small-cloud-storage\/specifications.  IBM FlashSystem A9000. 2018. Retrieved from https:\/\/www.ibm.com\/il-en\/marketplace\/small-cloud-storage\/specifications."},{"key":"e_1_2_1_5_1","volume-title":"purity-reduce","author":"Storage Pure","year":"2018","unstructured":"Pure Storage : purity-reduce . 2018 . Retrieved September 2018 from https:\/\/www.purestorage.com\/products\/purity\/purity-reduce.html. Pure Storage: purity-reduce. 2018. Retrieved September 2018 from https:\/\/www.purestorage.com\/products\/purity\/purity-reduce.html."},{"key":"e_1_2_1_6_1","volume-title":"IOTTA Repository Home","author":"SNIA","year":"2018","unstructured":"SNIA : IOTTA Repository Home . 2018 . Retrieved from http:\/\/iotta.snia.org\/. SNIA: IOTTA Repository Home. 2018. Retrieved from http:\/\/iotta.snia.org\/."},{"key":"e_1_2_1_7_1","volume-title":"Using Deduplication and Compression","author":"Mware","year":"2018","unstructured":"V Mware vSAN : Using Deduplication and Compression . 2018 . Retrieved from https:\/\/docs.vmware.com\/en\/VMware-vSphere\/. VMware vSAN: Using Deduplication and Compression. 2018. Retrieved from https:\/\/docs.vmware.com\/en\/VMware-vSphere\/."},{"key":"e_1_2_1_8_1","unstructured":"XIOS 6.1 Data Reduction (DRR) Reporting per a Volume. 2018. Retrieved from https:\/\/xtremio.me\/.  XIOS 6.1 Data Reduction (DRR) Reporting per a Volume. 2018. Retrieved from https:\/\/xtremio.me\/."},{"key":"e_1_2_1_9_1","volume-title":"Retrieved","author":"Integrated Data Reduction IO","year":"2018","unstructured":"Xtrem IO Integrated Data Reduction . 2018. Retrieved September 2018 from https:\/\/www.emc.com\/collateral\/solution-overview\/h12453-xtremio-integrated-data-reduction-so.pdf. XtremIO Integrated Data Reduction. 2018. Retrieved September 2018 from https:\/\/www.emc.com\/collateral\/solution-overview\/h12453-xtremio-integrated-data-reduction-so.pdf."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the ACM International Systems and Storage Conference (SYSTOR\u201909)","author":"Aronovich Lior","unstructured":"Lior Aronovich , Ron Asher , Eitan Bachmat , Haim Bitner , Michael Hirsch , and Shmuel T. Klein . 2009. The design of a similarity based deduplication system . In Proceedings of the ACM International Systems and Storage Conference (SYSTOR\u201909) . ACM. Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, and Shmuel T. Klein. 2009. The design of a similarity based deduplication system. In Proceedings of the ACM International Systems and Storage Conference (SYSTOR\u201909). ACM."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45726-7_1"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOT.2009.5366623"},{"key":"e_1_2_1_13_1","volume-title":"Technical Report RFC","author":"Deutsch P.","year":"1950","unstructured":"P. Deutsch and J. L. Gailly . 1996. Zlib Compressed Data Format Specification version 3.3 . Technical Report RFC 1950 . Network Working Group. P. Deutsch and J. L. Gailly. 1996. Zlib Compressed Data Format Specification version 3.3. Technical Report RFC 1950. Network Working Group."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201911)","author":"Dong Wei","year":"2011","unstructured":"Wei Dong , Fred Douglis , Kai Li , R. Hugo Patterson , Sazzala Reddy , and Philip Shilane . 2011 . Tradeoffs in scalable data routing for deduplication clusters . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201911) . 15--29. Wei Dong, Fred Douglis, Kai Li, R. Hugo Patterson, Sazzala Reddy, and Philip Shilane. 2011. Tradeoffs in scalable data routing for deduplication clusters. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201911). 15--29."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/2208488.2208501"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-0000(85)90041-8"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1496909.1496926"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391246"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35170-9_18"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA\u201901)","author":"Phillip","unstructured":"Phillip B. Gibbons and Srikanta Tirthapura. 2001. Estimating simple functions on the union of data streams . In Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA\u201901) . 281--291. Phillip B. Gibbons and Srikanta Tirthapura. 2001. Estimating simple functions on the union of data streams. In Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA\u201901). 281--291."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 31st Annual ACM Southeast Conference. 127--135","author":"Greene William","year":"1993","unstructured":"William Greene . 1993 . k-way merging and k-ary sorts . In Proceedings of the 31st Annual ACM Southeast Conference. 127--135 . William Greene. 1993. k-way merging and k-ary sorts. In Proceedings of the 31st Annual ACM Southeast Conference. 127--135."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201913)","author":"Harnik Danny","year":"2013","unstructured":"Danny Harnik , Ronen Kat , Dmitry Sotnikov , Avishay Traeger , and Oded Margalit . 2013 . To zip or not to zip: Effective resource usage for real-time compression . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201913) . Danny Harnik, Ronen Kat, Dmitry Sotnikov, Avishay Traeger, and Oded Margalit. 2013. To zip or not to zip: Effective resource usage for real-time compression. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201913)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/2930583.2930604"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2012.6232381"},{"key":"e_1_2_1_25_1","first-page":"9","article-title":"A method for the construction of minimum-redundancy codes","volume":"40","author":"Huffman D. A.","year":"1952","unstructured":"D. A. Huffman . 1952 . A method for the construction of minimum-redundancy codes . Proc. Inst. Radio Eng. 40 , 9 (Sep. 1952), 1098--1101. D. A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40, 9 (Sep. 1952), 1098--1101.","journal-title":"Proc. Inst. Radio Eng."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909)","author":"Lillibridge Mark","year":"2009","unstructured":"Mark Lillibridge , Kave Eshghi , Deepavali Bhagwat , Vinay Deolalikar , Greg Trezise , and Peter Camble . 2009 . Sparse indexing: Large scale, inline deduplication using sampling and locality . In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909) . Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble. 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909)."},{"key":"e_1_2_1_27_1","volume-title":"Content sharing graphs for deduplication-enabled storage systems. Algorithms 5, 2","author":"Lu Maohua","year":"2012","unstructured":"Maohua Lu , Cornel Constantinescu , and Prasenjit Sarkar . 2012. Content sharing graphs for deduplication-enabled storage systems. Algorithms 5, 2 ( 2012 ). Maohua Lu, Cornel Constantinescu, and Prasenjit Sarkar. 2012. Content sharing graphs for deduplication-enabled storage systems. Algorithms 5, 2 (2012)."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST\u201911)","author":"Dutch","unstructured":"Dutch T. Meyer and William J. Bolosky. 2011. A study of practical deduplication . In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST\u201911) . 1--13. Dutch T. Meyer and William J. Bolosky. 2011. A study of practical deduplication. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST\u201911). 1--13."},{"key":"e_1_2_1_29_1","volume-title":"Randomized Algorithms","author":"Motwani Rajeev","unstructured":"Rajeev Motwani and Prabhakar Raghavan . 1995. Randomized Algorithms . Cambridge University Press , New York, NY . Rajeev Motwani and Prabhakar Raghavan. 1995. Randomized Algorithms. Cambridge University Press, New York, NY."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485732.2485744"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/3026852.3026870"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/2750482.2750490"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3127479.3132021"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/2535461.2535483"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1977.1055714"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3369737","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3369737","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:08Z","timestamp":1750200068000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3369737"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,30]]},"references-count":35,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,11,30]]}},"alternative-id":["10.1145\/3369737"],"URL":"https:\/\/doi.org\/10.1145\/3369737","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"value":"1553-3077","type":"print"},{"value":"1553-3093","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,30]]},"assertion":[{"value":"2019-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}