{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T14:51:27Z","timestamp":1776783087803,"version":"3.51.2"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2012,1,1]],"date-time":"2012-01-01T00:00:00Z","timestamp":1325376000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2012,1]]},"abstract":"<jats:p>We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication, particularly considering whole-file versus block-level elimination of redundancy. We found that whole-file deduplication achieves about three quarters of the space savings of the most aggressive block-level deduplication for storage of live file systems, and 87% of the savings for backup images. We also studied file fragmentation, finding that it is not prevalent, and updated prior file system metadata studies, finding that the distribution of file sizes continues to skew toward very large unstructured files.<\/jats:p>","DOI":"10.1145\/2078861.2078864","type":"journal-article","created":{"date-parts":[[2012,1,31]],"date-time":"2012-01-31T14:49:20Z","timestamp":1328021360000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":303,"title":["A study of practical deduplication"],"prefix":"10.1145","volume":"7","author":[{"given":"Dutch T.","family":"Meyer","sequence":"first","affiliation":[{"name":"The University of British Columbia, Microsoft Research"}]},{"given":"William J.","family":"Bolosky","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]}],"member":"320","published-online":{"date-parts":[[2012,2,2]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 5th USENIX Conference on File and Storage Technologies.","author":"Agrawal N.","unstructured":"Agrawal , N. , Bolosky , W. , Douceur , J. , and Lorch , J . 2007. A five-year study of file-system metadata . In Proceedings of the 5th USENIX Conference on File and Storage Technologies. Agrawal, N., Bolosky, W., Douceur, J., and Lorch, J. 2007. A five-year study of file-system metadata. In Proceedings of the 5th USENIX Conference on File and Storage Technologies."},{"key":"e_1_2_1_2_1","unstructured":"BackupRead. 2010. Microsoft Corp. BackupRead function. MSDN. http:\/\/msdn.microsoft.com\/en-us\/library\/aa362509(VS.85).aspx  BackupRead. 2010. Microsoft Corp. BackupRead function. MSDN. http:\/\/msdn.microsoft.com\/en-us\/library\/aa362509(VS.85).aspx"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 7th USENIX Conference on File and Storage Technologies.","author":"Bhadkamkar M.","unstructured":"Bhadkamkar , M. , Guerra , J. , Useche , L. , Burnett , S. , Liptak , J. , Rangaswami , R. , and Hristidis , V . 2009. Borg: Block-reorganization for self-optimizing storage systems . In Proceedings of the 7th USENIX Conference on File and Storage Technologies. Bhadkamkar, M., Guerra, J., Useche, L., Burnett, S., Liptak, J., Rangaswami, R., and Hristidis, V. 2009. Borg: Block-reorganization for self-optimizing storage systems. In Proceedings of the 7th USENIX Conference on File and Storage Technologies."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. IEEE","author":"Bhagwat D.","unstructured":"Bhagwat , D. , Eshghi , K. , Long , D. , and Lillibridge , M . 2009. Extreme binning: Scalable, parallel deduplication for chunk-based file backup , In Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. IEEE , Los Alamitos, CA. Bhagwat, D., Eshghi, K., Long, D., and Lillibridge, M. 2009. Extreme binning: Scalable, parallel deduplication for chunk-based file backup, In Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. IEEE, Los Alamitos, CA."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 4th USENIX Windows Systems Symposium.","author":"Bolosky W.","unstructured":"Bolosky , W. , Corbin , S. , Goebel , D. , and Douceur , J . 2000. Single instance storage in Windows 2000 . In Proceedings of the 4th USENIX Windows Systems Symposium. Bolosky, W., Corbin, S., Goebel, D., and Douceur, J. 2000. Single instance storage in Windows 2000. In Proceedings of the 4th USENIX Windows Systems Symposium."},{"key":"e_1_2_1_7_1","volume-title":"InProceedings of the USENIX Annual Technical Conference.","author":"Clements A.","unstructured":"Clements , A. , Ahmad , I. , Vilayannur , M. , and Li , J . 2009. Decentralized deduplication in SAN cluster file systems . InProceedings of the USENIX Annual Technical Conference. Clements, A., Ahmad, I., Vilayannur, M., and Li, J. 2009. Decentralized deduplication in SAN cluster file systems. InProceedings of the USENIX Annual Technical Conference."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 9th USENIX Conference on File and Storage Technology.","author":"Dong W.","unstructured":"Dong , W. , Douglis , F. , Li , K. , Patterson , H. , Reddy , S. , and Shilane , P . 2011. Tradeoffs in scalable data routing for deduplication clusters . In Proceedings of the 9th USENIX Conference on File and Storage Technology. Dong, W., Douglis, F., Li, K., Patterson, H., Reddy, S., and Shilane, P. 2011. Tradeoffs in scalable data routing for deduplication clusters. In Proceedings of the 9th USENIX Conference on File and Storage Technology."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 1st USENIX Conference on File and Storage Technologies.","author":"Dorward S.","unstructured":"Dorward , S. and Quinlan , S . 2002. Venti: A new approach to archival data storage . In Proceedings of the 1st USENIX Conference on File and Storage Technologies. Dorward, S. and Quinlan, S. 2002. Venti: A new approach to archival data storage. In Proceedings of the 1st USENIX Conference on File and Storage Technologies."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/301453.301480"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 7th USENIX Conference on File and Storage Technologies.","author":"Dubnicki C.","unstructured":"Dubnicki , C. , Gryz , L. , Heldt , L. , Kaczmarczyk , M. , Kilian , W. , Strzelczak , P. , Szczepkowski , J. , Ungureanu , C. , and Welnicki , M . 2009. Hydrastor: A scalable secondary storage . In Proceedings of the 7th USENIX Conference on File and Storage Technologies. Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., and Welnicki, M. 2009. Hydrastor: A scalable secondary storage. In Proceedings of the 7th USENIX Conference on File and Storage Technologies."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095810.1095836"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the USENIX Annual Technical Conference.","author":"Kulkarni P.","unstructured":"Kulkarni , P. , Douglis , F. , Lavoie , J. , and Tracey , J . 2004. Redundancy elimination within large collections of files . In Proceedings of the USENIX Annual Technical Conference. Kulkarni, P., Douglis, F., Lavoie, J., and Tracey, J. 2004. Redundancy elimination within large collections of files. In Proceedings of the USENIX Annual Technical Conference."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1534530.1534540"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 7th USENIX Conference on File and Storage Technologies.","author":"Lillibridge M.","unstructured":"Lillibridge , M. , Eshghi , K. , Bhagwat , D. , Deola-Likar , V. , Trezise , G. , and Camble , P . 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality . In Proceedings of the 7th USENIX Conference on File and Storage Technologies. Lillibridge, M., Eshghi, K., Bhagwat, D., Deola-Likar, V., Trezise, G., and Camble, P. 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings of the 7th USENIX Conference on File and Storage Technologies."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the Linux Symposium","author":"Mathur A.","unstructured":"Mathur , A. , Cao , M. , Bhattacharya , S. , Dilger , A. , Tomas , A. , and Vivier , L . 2007. The new ext4 filesystem: Current status and future plans . In Proceedings of the Linux Symposium Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., and Vivier, L. 2007. The new ext4 filesystem: Current status and future plans. In Proceedings of the Linux Symposium"},{"key":"e_1_2_1_17_1","unstructured":"MS Atime. 2010. Microsoft Corp. Disabling last access time in Windows Vista to improve NTFS perfomance. The Storage Team Blog. http:\/\/blogs.technet.com\/b\/filecab\/archive\/2006\/11\/07\/disabling-last-access-time-in-windows-vista-to-improve-ntfs-performance.aspx.  MS Atime. 2010. Microsoft Corp. Disabling last access time in Windows Vista to improve NTFS perfomance. The Storage Team Blog. http:\/\/blogs.technet.com\/b\/filecab\/archive\/2006\/11\/07\/disabling-last-access-time-in-windows-vista-to-improve-ntfs-performance.aspx."},{"key":"e_1_2_1_18_1","unstructured":"MS Filesystem. 2010. Microsoft Corp. File systems. Microsoft TechNet. http:\/\/technet.microsoft.com\/en-us\/library\/cc938929.aspx.  MS Filesystem. 2010. Microsoft Corp. File systems. Microsoft TechNet. http:\/\/technet.microsoft.com\/en-us\/library\/cc938929.aspx."},{"key":"e_1_2_1_19_1","unstructured":"VSS. 2010. Microsoft Corp.Volume shadow copy service. MSDN. http:\/\/msdn.microsoft.com\/en-us\/library\/bb968832(VS.85).aspx.  VSS. 2010. Microsoft Corp.Volume shadow copy service. MSDN. http:\/\/msdn.microsoft.com\/en-us\/library\/bb968832(VS.85).aspx."},{"key":"e_1_2_1_20_1","unstructured":"Miller D. R. 2009. Storage economics: Four principles for reducing total cost of ownership. Hitachi Corporate Web Site. http:\/\/www.hds.com\/assets\/pdf\/four-principles-for-reducing-total-cost-of-ownership.pdf.  Miller D. R. 2009. Storage economics: Four principles for reducing total cost of ownership. Hitachi Corporate Web Site. http:\/\/www.hds.com\/assets\/pdf\/four-principles-for-reducing-total-cost-of-ownership.pdf."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 12th Workshop on Hot Topics in Operating Systems.","author":"Murphy N.","unstructured":"Murphy , N. and Seltzer , M . 2009. Hierarchical file systems are dead . In Proceedings of the 12th Workshop on Hot Topics in Operating Systems. Murphy, N. and Seltzer, M. 2009. Hierarchical file systems are dead. In Proceedings of the 12th Workshop on Hot Topics in Operating Systems."},{"key":"e_1_2_1_22_1","unstructured":"Nagar R. 1997. Windows NT File System Internals. O'Reilly.   Nagar R. 1997. Windows NT File System Internals. O'Reilly."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the. USENIX Annual Technical Conference.","author":"Policroniades C.","unstructured":"Policroniades , C. and Pratt , I . 2004. Alternatives for detecting redundancy in storage systems . In Proceedings of the. USENIX Annual Technical Conference. Policroniades, C. and Pratt, I. 2004. Alternatives for detecting redundancy in storage systems. In Proceedings of the. USENIX Annual Technical Conference."},{"key":"e_1_2_1_24_1","volume-title":"Fingerprinting by random polynomials. Tech. rep. TR-CSE-03-01","author":"Rabin M.","unstructured":"Rabin , M. 1981. Fingerprinting by random polynomials. Tech. rep. TR-CSE-03-01 . Harvard University Center for Research in Computing Technology. Rabin, M. 1981. Fingerprinting by random polynomials. Tech. rep. TR-CSE-03-01. Harvard University Center for Research in Computing Technology."},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Rivest R. 1992. The MD5 message-digest algorithm. http:\/\/tools.ietf.org\/rfc\/rfc1321.txt.   Rivest R. 1992. The MD5 message-digest algorithm. http:\/\/tools.ietf.org\/rfc\/rfc1321.txt.","DOI":"10.17487\/rfc1321"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/800216.806597"},{"key":"e_1_2_1_27_1","unstructured":"Scheduled Tasks. 2010. Microsoft Corp. description of the scheduled tasks in Widows Vista. Microsoft support. http:\/\/support.microsoft.com\/kb\/939039.  Scheduled Tasks. 2010. Microsoft Corp. description of the scheduled tasks in Widows Vista. Microsoft support. http:\/\/support.microsoft.com\/kb\/939039."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/258612.258689"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the USENIX Annual Technical Conference.","author":"Sweeney A.","unstructured":"Sweeney , A. , Doucette , D. , Hu , W. , Anderson , C. , Nishimoto , M. , and Peck , G . 1996. Scalability in the XFS file system . In Proceedings of the USENIX Annual Technical Conference. Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/319151.319158"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 8th USENIX Conference on File and Storage Technologies.","author":"Ungureanu C.","unstructured":"Ungureanu , C. , Atkin , B. , Aranya , A. , Gokhale , S. , Rago , S. , Cakowski , G. , Dubnicki , C. , and Bohra , A . 2010. Hydrafs: A high-throughput file system for the Hydrastor content-addressable storage system . In Proceedings of the 8th USENIX Conference on File and Storage Technologies. Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Cakowski, G., Dubnicki, C., and Bohra, A. 2010. Hydrafs: A high-throughput file system for the Hydrastor content-addressable storage system. In Proceedings of the 8th USENIX Conference on File and Storage Technologies."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 8th USENIX Conference on File and Storage Technologies.","author":"Ungureanu E.","unstructured":"Ungureanu , E. and Kruus , C . 2010. Bimodal content defined chunking for backup streams . In Proceedings of the 8th USENIX Conference on File and Storage Technologies. Ungureanu, E. and Kruus, C. 2010. Bimodal content defined chunking for backup streams. In Proceedings of the 8th USENIX Conference on File and Storage Technologies."},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies, 1--14","author":"Zhu B.","unstructured":"Zhu , B. , Li , K. , and Patterson , H . 2008 Avoiding the disk bottleneck in the data domain deduplication file system . In Proceedings of the 6th USENIX Conference on File and Storage Technologies, 1--14 . Zhu, B., Li, K., and Patterson, H. 2008 Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies, 1--14."}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2078861.2078864","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2078861.2078864","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:16Z","timestamp":1750241176000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2078861.2078864"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,1]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2012,1]]}},"alternative-id":["10.1145\/2078861.2078864"],"URL":"https:\/\/doi.org\/10.1145\/2078861.2078864","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"value":"1553-3077","type":"print"},{"value":"1553-3093","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,1]]},"assertion":[{"value":"2011-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-02-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}