{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T21:48:43Z","timestamp":1766267323218,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2017,5,31]],"date-time":"2017-05-31T00:00:00Z","timestamp":1496188800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"NSF of China","doi-asserted-by":"crossref","award":["61373018, 61602266 and 11550110491"],"award-info":[{"award-number":["61373018, 61602266 and 11550110491"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100006606","name":"Natural Science Foundation of Tianjin","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006606","id-type":"DOI","asserted-by":"crossref"}]},{"name":"PhD Candidate Research Innovation Fund of Nankai University"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2017,5,31]]},"abstract":"<jats:p>Deduplication aims to reduce duplicate data in storage systems by removing redundant copies of data blocks, which are compared to one another using fingerprints. However, repeated on-disk fingerprint lookups lead to high disk traffic, which results in a bottleneck.<\/jats:p>\n          <jats:p>In this article, we propose a \u201clazy\u201d data deduplication method, which buffers incoming fingerprints that are used to perform on-disk lookups in batches, with the aim of improving subsequent prefetching. In deduplication in general, prefetching is used to improve the cache hit rate by exploiting locality within the incoming fingerprint stream. For lazy deduplication, we design a buffering strategy that preserves locality in order to facilitate prefetching. Furthermore, as the proportion of deduplication time spent on I\/O decreases, the proportion spent on fingerprint calculation and chunking increases. Thus, we also utilize parallel approaches (utilizing multiple CPU cores and a graphics processing unit) to further improve the overall performance.<\/jats:p>\n          <jats:p>Experimental results indicate that the lazy method improves fingerprint identification performance by over 50% compared with an \u201ceager\u201d method with the same data layout. The GPU improves the hash calculation by a factor of 4.6 and multithreaded chunking by a factor of 4.16. Deduplication performance can be improved by over 45% on SSD and 80% on HDD in the last round on the real datasets.<\/jats:p>","DOI":"10.1145\/3078837","type":"journal-article","created":{"date-parts":[[2017,6,13]],"date-time":"2017-06-13T12:18:36Z","timestamp":1497356316000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Lazy Exact Deduplication"],"prefix":"10.1145","volume":"13","author":[{"given":"Jingwei","family":"Ma","sequence":"first","affiliation":[{"name":"College of Computer and Control Engineering, Nankai University, Jinnan District, Tianjin, CN"}]},{"given":"Rebecca J.","family":"Stones","sequence":"additional","affiliation":[{"name":"College of Computer and Control Engineering, Nankai University, Jinnan District, Tianjin, CN"}]},{"given":"Yuxiang","family":"Ma","sequence":"additional","affiliation":[{"name":"College of Computer and Control Engineering, Nankai University, Jinnan District, Tianjin, CN"}]},{"given":"Jingui","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer and Control Engineering, Nankai University, Jinnan District, Tianjin, CN"}]},{"given":"Junjie","family":"Ren","sequence":"additional","affiliation":[{"name":"College of Computer and Control Engineering, Nankai University, Jinnan District, Tianjin, CN"}]},{"given":"Gang","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer and Control Engineering, Nankai University, Jinnan District, Tianjin, CN"}]},{"given":"Xiaoguang","family":"Liu","sequence":"additional","affiliation":[{"name":"College of Computer and Control Engineering, Nankai University, Jinnan District, Tianjin, CN"}]}],"member":"320","published-online":{"date-parts":[[2017,6,10]]},"reference":[{"volume-title":"Proceedings of IEEE International Symposium on Modeling, Analysis 8 Simulation of Computer and Telecommunication Systems (MASCOTS\u201909)","author":"Bhagwat D.","key":"e_1_2_1_1_1"},{"volume-title":"Proceedings of USENIX Conference on File and Storage Technologies (FAST\u201912)","author":"Bhatotia P.","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2008.05.018"},{"volume-title":"Proceedings of USENIX Conference on File and Storage Technologies (FAST\u201913)","author":"Botelho F. C.","key":"e_1_2_1_5_1"},{"volume-title":"Proceedings of USENIX Annual Technical Conference (ATC\u201909)","author":"Clements A. T.","key":"e_1_2_1_6_1"},{"volume-title":"Proceedings of USENIX Annual Technical Conference (ATC\u201910)","author":"Debnath B.","key":"e_1_2_1_7_1"},{"volume-title":"Proceedings of ACM International Conference on Management of Data (SIGMOD\u201911)","author":"Debnath B.","key":"e_1_2_1_8_1"},{"volume-title":"Proceedings of USENIX Annual Technical Conference (ATC\u201914)","author":"Fu M.","key":"e_1_2_1_9_1"},{"volume-title":"Proceedings of USENIX Annual Technical Conference (ATC\u201911)","author":"Guo F.","key":"e_1_2_1_10_1"},{"volume":"7","volume-title":"Proceedings of ACM Israeli Experimental Systems Conference (SYSTOR\u201909)","author":"Jin K.","key":"e_1_2_1_11_1"},{"volume-title":"Proceedings of IEEE International Conference on Networked Computing and Advanced Information Management (NCM\u201911)","author":"Kim C.","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2012.6232379"},{"volume-title":"Proceedings of IEEE Symposium on Mass Storage Systems and Technologies (MSST\u201912)","author":"Kim J.","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837915.1837921"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-013-0912-0"},{"volume-title":"Proceedings of IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS\u201909)","author":"Li X.","key":"e_1_2_1_17_1"},{"volume-title":"Proceedings of USENIX Conference on File and Storage Technologies (FAST\u201909)","author":"Lillibridge M.","key":"e_1_2_1_18_1"},{"volume-title":"Proceedings of USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201915)","author":"Lin X.","key":"e_1_2_1_19_1"},{"key":"e_1_2_1_20_1","first-page":"e1","article-title":"Using deduplicating storage for efficient disk image deployment","volume":"2","author":"Lin X.","year":"2015","journal-title":"EAI Endorsed Trans. Scalable Information Systems"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2012.6232390"},{"volume-title":"Proceedings of IEEE 32th Symposium on Mass Storage Systems and Technologies (MSST\u201916)","author":"Ma J.","key":"e_1_2_1_22_1"},{"volume-title":"Proceedings of IEEE International Conference on Networking, Architecture and Storage (NAS\u201910)","author":"Ma L.","key":"e_1_2_1_23_1"},{"key":"e_1_2_1_24_1","first-page":"1","article-title":"Finding similar files in a large file system","volume":"94","author":"Manber U.","year":"1994","journal-title":"Usenix Winter"},{"key":"e_1_2_1_25_1","article-title":"Read-performance optimization for deduplication-based storage systems in the cloud","volume":"10","author":"Mao B.","year":"2014","journal-title":"ACM Trans. Storage (TOS)"},{"volume-title":"Proceedings of IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST\u201910)","author":"Meister D.","key":"e_1_2_1_26_1"},{"volume-title":"Proceedings of USENIX Conference on File and Storage Technologies (FAST\u201911)","year":"2011","author":"Meyer D. T.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","article-title":"A study of practical deduplication","volume":"7","author":"Meyer D. T.","year":"2012","journal-title":"ACM Trans. Storage (TOS)"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2010.263"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2500727.2500731"},{"key":"e_1_2_1_31_1","unstructured":"NVIDIA.\n     2013. NVIDIA CUDA.\n   Retrieved from https:\/\/developer.nvidia.com\/cuda-downloads (\n  July\n  2013\n  ).  NVIDIA. 2013. NVIDIA CUDA. Retrieved from https:\/\/developer.nvidia.com\/cuda-downloads (July 2013)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"R. Pagh and F. F. Rodler. 2001. Cuckoo Hashing. Springer.  R. Pagh and F. F. Rodler. 2001. Cuckoo Hashing. Springer.","DOI":"10.1007\/3-540-44676-1_10"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"J. Paulo and J. Pereira. 2014. A survey and classification of storage deduplication systems. ACM Comput. Surv. (CSUR) 47 1 (2014) 11:1--11:30.  J. Paulo and J. Pereira. 2014. A survey and classification of storage deduplication systems. ACM Comput. Surv. (CSUR) 47 1 (2014) 11:1--11:30.","DOI":"10.1145\/2611778"},{"volume-title":"Proceedings of USENIX Annual Technical Conference (ATC\u201904)","author":"Policroniades C.","key":"e_1_2_1_34_1"},{"volume":"4","volume-title":"Proceedings of USENIX Conference on File and Storage Technologies (FAST\u201902)","author":"Quinlan S.","key":"e_1_2_1_35_1"},{"volume-title":"Fingerprinting by Random Polynomials","author":"Rabin M. O.","key":"e_1_2_1_36_1"},{"key":"e_1_2_1_37_1","article-title":"WAN-optimized replication of backup datasets using stream-informed delta compression","volume":"8","author":"Shilane P.","year":"2012","journal-title":"ACM Trans. Storage (TOS)"},{"volume-title":"Proceedings of USENIX Conference on File and Storage Technologies (FAST\u201912)","author":"Srinivasan K.","key":"e_1_2_1_38_1"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10586-014-0397-5"},{"volume-title":"Proceedings of USENIX Annual Technical Conference (ATC\u201912)","author":"Tarasov V.","key":"e_1_2_1_40_1"},{"key":"e_1_2_1_41_1","unstructured":"VIA Technologies. 2008. VIA nano processor. Retrieved from http:\/\/www.viatech.com.cn\/cn\/downloads\/whitepapers\/processors\/WP080529VIA_Nano.pdf.  VIA Technologies. 2008. VIA nano processor. Retrieved from http:\/\/www.viatech.com.cn\/cn\/downloads\/whitepapers\/processors\/WP080529VIA_Nano.pdf."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1038\/530144a"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2014.2308181"},{"volume-title":"Proceedings of USENIX Annual Technical Conference (ATC\u201911)","author":"Xia W.","key":"e_1_2_1_44_1"},{"volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201908)","author":"Zhu B.","key":"e_1_2_1_45_1"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3078837","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3078837","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:03:08Z","timestamp":1750215788000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3078837"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,5,31]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2017,5,31]]}},"alternative-id":["10.1145\/3078837"],"URL":"https:\/\/doi.org\/10.1145\/3078837","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2017,5,31]]},"assertion":[{"value":"2016-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-06-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}