{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T13:49:45Z","timestamp":1765806585717,"version":"3.41.0"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2006,11,1]],"date-time":"2006-11-01T00:00:00Z","timestamp":1162339200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2006,11]]},"abstract":"<jats:p>\n            Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has already been stored not only reduces storage overheads, but can also improve bandwidth utilization. For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems.Intelligent object partitioning techniques identify data that is\n            <jats:italic>new<\/jats:italic>\n            when objects are updated, and transfer only these chunks to a storage server. In this article, we propose a new object partitioning technique, called\n            <jats:italic>fingerdiff<\/jats:italic>\n            , that improves upon existing schemes in several important respects. Most notably,\n            <jats:italic>fingerdiff<\/jats:italic>\n            dynamically chooses a partitioning strategy for a data object based on its similarities with previously stored objects in order to improve storage and bandwidth utilization. We present a detailed evaluation of\n            <jats:italic>fingerdiff<\/jats:italic>\n            , and other existing object partitioning schemes, using a set of real-world workloads. We show that for these workloads, the duplicate elimination strategies employed by\n            <jats:italic>fingerdiff<\/jats:italic>\n            improve storage utilization on average by 25%, and bandwidth utilization on average by 40% over comparable techniques.\n          <\/jats:p>","DOI":"10.1145\/1210596.1210599","type":"journal-article","created":{"date-parts":[[2007,4,5]],"date-time":"2007-04-05T19:20:08Z","timestamp":1175800808000},"page":"424-448","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":105,"title":["Improving duplicate elimination in storage systems"],"prefix":"10.1145","volume":"2","author":[{"given":"Deepak R.","family":"Bobbarjung","sequence":"first","affiliation":[{"name":"Purdue University, West Lafayette, IN"}]},{"given":"Suresh","family":"Jagannathan","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, IN"}]},{"given":"Cezary","family":"Dubnicki","sequence":"additional","affiliation":[{"name":"NEC Laboratories America, Princeton, NJ"}]}],"member":"320","published-online":{"date-parts":[[2006,11]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/567112.567116"},{"volume-title":"Algebraic Coding Theory","author":"Berlekamp E. R.","key":"e_1_2_1_2_1","unstructured":"Berlekamp , E. R. 1968. Algebraic Coding Theory . McGraw-Hill , New York .]] Berlekamp, E. R. 1968. Algebraic Coding Theory. McGraw-Hill, New York.]]"},{"key":"e_1_2_1_3_1","unstructured":"Blomer J. Kalfane M. Karp R. Karpinski M. Luby M. and Zuckerman D. 1995. An xor-based erasure-resilient coding scheme. Tech. Rep. International Computer Science Institute Berkeley California.]]  Blomer J. Kalfane M. Karp R. Karpinski M. Luby M. and Zuckerman D. 1995. An xor-based erasure-resilient coding scheme. Tech. Rep. International Computer Science Institute Berkeley California.]]"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the Compression and Complexity of Sequences Conference. IEEE Computer Society, 21","author":"Broder A.","year":"1997","unstructured":"Broder , A. 1997 . On the resemblance and containment of documents . In Proceedings of the Compression and Complexity of Sequences Conference. IEEE Computer Society, 21 .]] Broder, A. 1997. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of Sequences Conference. IEEE Computer Society, 21.]]"},{"volume-title":"Proceedings of the 6th International WWW Conference. 391--404","author":"Broder A.","key":"e_1_2_1_5_1","unstructured":"Broder , A. , Glassman , S. , Manasse , M. , and Zweig , G . 1997. Syntactic clustering of the web . In Proceedings of the 6th International WWW Conference. 391--404 .]] Broder, A., Glassman, S., Manasse, M., and Zweig, G. 1997. Syntactic clustering of the web. In Proceedings of the 6th International WWW Conference. 391--404.]]"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/647819.736184"},{"key":"e_1_2_1_7_1","unstructured":"Cederqvist P. 1992. Version management with cvs. http:\/\/www.cvshome.org\/docs\/manual\/.]]  Cederqvist P. 1992. Version management with cvs. http:\/\/www.cvshome.org\/docs\/manual\/.]]"},{"volume-title":"Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation","author":"Cox L.","key":"e_1_2_1_8_1","unstructured":"Cox , L. , Murray , C. , and Noble , B . 2002. Pastiche: Making backup cheap and easy . In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation . Boston.]] Cox, L., Murray, C., and Noble, B. 2002. Pastiche: Making backup cheap and easy. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation. Boston.]]"},{"volume-title":"Usenix Annual Technical Conference. 59--72","author":"Douglis F.","key":"e_1_2_1_9_1","unstructured":"Douglis , F. and Iyengar , A . 2003. Application-Specific deltaencoding via resemblance detection . In Usenix Annual Technical Conference. 59--72 .]] Douglis, F. and Iyengar, A. 2003. Application-Specific deltaencoding via resemblance detection. In Usenix Annual Technical Conference. 59--72.]]"},{"volume-title":"Usenix Annual Technical Conference. 59--72","author":"Douglis P. K. F.","key":"e_1_2_1_10_1","unstructured":"Douglis , P. K. F. , LaVoie , J. , and Tracey , J. M . 2004. Redundancy elimination within large collections of files . In Usenix Annual Technical Conference. 59--72 .]] Douglis, P. K. F., LaVoie, J., and Tracey, J. M. 2004. Redundancy elimination within large collections of files. In Usenix Annual Technical Conference. 59--72.]]"},{"volume-title":"Proceedings of the Advances in Digital Libraries Conference.]]","author":"Goldberg A. V.","key":"e_1_2_1_11_1","unstructured":"Goldberg , A. V. and Yianilos , P. N . 1998. Towards an archival intermemory . In Proceedings of the Advances in Digital Libraries Conference.]] Goldberg, A. V. and Yianilos, P. N. 1998. Towards an archival intermemory. In Proceedings of the Advances in Digital Libraries Conference.]]"},{"volume-title":"Proceedings of the 21st IEEE\/12th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST). 301--314","author":"Hong B.","key":"e_1_2_1_12_1","unstructured":"Hong , B. , Plantenberg , D. , Long , D. D. E. , and Sivan-Zimet , M . 2004. Duplicate data elimination in a san file system . In Proceedings of the 21st IEEE\/12th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST). 301--314 .]] Hong, B., Plantenberg, D., Long, D. D. E., and Sivan-Zimet, M. 2004. Duplicate data elimination in a san file system. In Proceedings of the 21st IEEE\/12th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST). 301--314.]]"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/279310.279321"},{"volume-title":"Proceedings of the 4th Usenix Conference on File and Storage Technologies (FAST).]]","author":"Jain N.","key":"e_1_2_1_14_1","unstructured":"Jain , N. , Dahlin , M. , and Tewari , R . 2005. Taper: Tiered approach for eliminating redundancy in replica sychronization . In Proceedings of the 4th Usenix Conference on File and Storage Technologies (FAST).]] Jain, N., Dahlin, M., and Tewari, R. 2005. Taper: Tiered approach for eliminating redundancy in replica sychronization. In Proceedings of the 4th Usenix Conference on File and Storage Technologies (FAST).]]"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/378993.379239"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/45072.45074"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/514191.514206"},{"key":"e_1_2_1_18_1","volume-title":"Usenix Winter Conference. 1--10","author":"Manber U.","year":"1994","unstructured":"Manber , U. 1994 . Finding similar files in a large file system . In Usenix Winter Conference. 1--10 .]] Manber, U. 1994. Finding similar files in a large file system. In Usenix Winter Conference. 1--10.]]"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/502034.502052"},{"key":"e_1_2_1_20_1","unstructured":"National Institute of Standards and Technology FIPS PUB 180-1. 1995. Secure hash standard.]]  National Institute of Standards and Technology FIPS PUB 180-1. 1995. Secure hash standard.]]"},{"volume-title":"International Conference on Web Information Systems Engineering (WISE).]]","author":"Ouyang Z.","key":"e_1_2_1_21_1","unstructured":"Ouyang , Z. , Memon , N. , Suel , T. , and Trendafilov , D . 2006. Cluster-Based delta compression of a collection of files . In International Conference on Web Information Systems Engineering (WISE).]] Ouyang, Z., Memon, N., Suel, T., and Trendafilov, D. 2006. Cluster-Based delta compression of a collection of files. In International Conference on Web Information Systems Engineering (WISE).]]"},{"volume-title":"Usenix Annual Technical Conference. 73--86","author":"Policroniades C.","key":"e_1_2_1_22_1","unstructured":"Policroniades , C. and Pratt , I . 2004. Alternatives for detecting redundancy in storage systems data . In Usenix Annual Technical Conference. 73--86 .]] Policroniades, C. and Pratt, I. 2004. Alternatives for detecting redundancy in storage systems data. In Usenix Annual Technical Conference. 73--86.]]"},{"volume-title":"Usenix Conference on File and Storage Technologies.]]","author":"Quinlan S.","key":"e_1_2_1_23_1","unstructured":"Quinlan , S. and Dorwards , S . 2002. Venti: A new approach to archival storage . In Usenix Conference on File and Storage Technologies.]] Quinlan, S. and Dorwards, S. 2002. Venti: A new approach to archival storage. In Usenix Conference on File and Storage Technologies.]]"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1975.6312866"},{"volume-title":"Proceedings of the 2nd Annual Conference on the Theory and Practice of Digital Libraries.]]","author":"Shivakumar N.","key":"e_1_2_1_26_1","unstructured":"Shivakumar , N. and Garc\u00eda-Molina , H . 1995. SCAM: A copy detection mechanism for digital documents . In Proceedings of the 2nd Annual Conference on the Theory and Practice of Digital Libraries.]] Shivakumar, N. and Garc\u00eda-Molina, H. 1995. SCAM: A copy detection mechanism for digital documents. In Proceedings of the 2nd Annual Conference on the Theory and Practice of Digital Libraries.]]"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/357401.357404"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380150703"},{"volume-title":"Usenix Annual Technical Conference.]]","author":"Bolosky S.","key":"e_1_2_1_29_1","unstructured":"W. J. Bolosky , S. Corbin , D. G. and Douceur , J. R . Single instance storage in windows 2000 . In Usenix Annual Technical Conference.]] W. J. Bolosky, S. Corbin, D. G. and Douceur, J. R. Single instance storage in windows 2000. In Usenix Annual Technical Conference.]]"},{"volume-title":"1st International Workshop on Peer-to-Peer Systems","author":"Weatherspoon H.","key":"e_1_2_1_30_1","unstructured":"Weatherspoon , H. and Kubiatowicz , J . 2002. Erasure coding vs. replication: A quantitative comparison . In 1st International Workshop on Peer-to-Peer Systems . Cambridge, MA.]] Weatherspoon, H. and Kubiatowicz, J. 2002. Erasure coding vs. replication: A quantitative comparison. In 1st International Workshop on Peer-to-Peer Systems. Cambridge, MA.]]"},{"volume-title":"Proceedings of the 21st IEEE Symposium on Mass Storage Systems and Technologies (MSST).]]","author":"You L. L.","key":"e_1_2_1_31_1","unstructured":"You , L. L. and Karamanolis , C . 2004. Evaluation of efficient archival storage techniques . In Proceedings of the 21st IEEE Symposium on Mass Storage Systems and Technologies (MSST).]] You, L. L. and Karamanolis, C. 2004. Evaluation of efficient archival storage techniques. In Proceedings of the 21st IEEE Symposium on Mass Storage Systems and Technologies (MSST).]]"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1977.1055714"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1210596.1210599","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1210596.1210599","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:22:22Z","timestamp":1750278142000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1210596.1210599"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,11]]},"references-count":31,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2006,11]]}},"alternative-id":["10.1145\/1210596.1210599"],"URL":"https:\/\/doi.org\/10.1145\/1210596.1210599","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2006,11]]},"assertion":[{"value":"2006-11-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}