{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,4,27]],"date-time":"2023-04-27T09:04:43Z","timestamp":1682586283427},"reference-count":63,"publisher":"IGI Global","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,7]]},"abstract":"<jats:p>Exponential growth of the amount of data stored worldwide together with high level of data redundancy motivates the active development of data deduplication techniques. The overall increasing popularity of solid-state drives (SSDs) as primary storage devices forces the adaptation of deduplication techniques to technical peculiarities of this type of storage (such as write amplification and wearout), implying active research in SSD-equipped storage data deduplication subdomain. In this survey paper the authors summarize the recent results on deduplication in SSD-enhanced storage, providing a novel taxonomy of the techniques. They classify the techniques on the basis of storage device complexity, starting from a sub-device level up to the storage network. Linux deduplication implementations are discussed, and the results of experimental comparison of several widely used tools are presented. Finally, the authors briefly outline open problems in the field and possible points of future research.<\/jats:p>","DOI":"10.4018\/ijertcs.2019070103","type":"journal-article","created":{"date-parts":[[2019,6,6]],"date-time":"2019-06-06T15:45:04Z","timestamp":1559835904000},"page":"32-48","source":"Crossref","is-referenced-by-count":3,"title":["Flash-Based Storage Deduplication Techniques"],"prefix":"10.4018","volume":"10","author":[{"given":"Ilya A.","family":"Chernov","sequence":"first","affiliation":[{"name":"Institute of Applied Mathematical Research, KRC of RAS, Petrozavodsk State University, Petrozavodsk, Russia"}]},{"given":"Evgeny","family":"Ivashko","sequence":"additional","affiliation":[{"name":"Institute of Applied Mathematical Research, KRC of RAS, Petrozavodsk State University, Petrozavodsk, Russia"}]},{"given":"Dmitry","family":"Kositsyn","sequence":"additional","affiliation":[{"name":"Petrozavodsk State University, Petrozavodsk, Russia"}]},{"given":"Vadim","family":"Ponomarev","sequence":"additional","affiliation":[{"name":"Petrozavodsk State University, Petrozavodsk, Russia"}]},{"given":"Alexander","family":"Rumyantsev","sequence":"additional","affiliation":[{"name":"Institute of Applied Mathematical Research, KRC of RAS, Petrozavodsk State University, Petrozavodsk, Russia"}]},{"given":"Anton","family":"Shabaev","sequence":"additional","affiliation":[{"name":"Petrozavodsk State University, Petrozavodsk, Russia"}]}],"member":"2432","reference":[{"key":"IJERTCS.2019070103-0","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2017.2753258"},{"key":"IJERTCS.2019070103-1","unstructured":"Albireo virtual data optimizer (vdo) on drbd. (n.d.). Linbit. Retrieved from https:\/\/www.linbit.com\/en\/albireo-virtual-data-optimizer-vdo-on-drbd\/"},{"key":"IJERTCS.2019070103-2","unstructured":"Bowling, J. (2013). Opendedup: open-source deduplication put to the test. Linux Journal, (228), 2."},{"key":"IJERTCS.2019070103-3","doi-asserted-by":"publisher","DOI":"10.1109\/JSYST.2015.2494377"},{"key":"IJERTCS.2019070103-4","doi-asserted-by":"publisher","DOI":"10.1109\/ISCC.2015.7405578"},{"key":"IJERTCS.2019070103-5","doi-asserted-by":"publisher","DOI":"10.1109\/ICCCRI.2015.11"},{"key":"IJERTCS.2019070103-6","unstructured":"Data deduplication and compression with vdo. (n.d.). Redhat. Retrieved from https:\/\/access.redhat.com\/documentation\/en-us\/red_hat_enterprise_linux\/7\/html-single\/storage_administration_guide\/vdo"},{"key":"IJERTCS.2019070103-7","doi-asserted-by":"publisher","DOI":"10.1145\/1555815.1555790"},{"key":"IJERTCS.2019070103-8","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2014.20"},{"key":"IJERTCS.2019070103-9","doi-asserted-by":"publisher","DOI":"10.1142\/S0218126618500196"},{"key":"IJERTCS.2019070103-10","doi-asserted-by":"publisher","DOI":"10.1109\/ICCE-Berlin.2015.7391216"},{"key":"IJERTCS.2019070103-11","first-page":"1982","author":"J.-Y.Ha","year":"2013","journal-title":"Deduplication with block-level content-aware chunking for solid state drives (SSDs). In 2013 IEEE 15TH international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing (HPCC EUC)"},{"issue":"5","key":"IJERTCS.2019070103-12","doi-asserted-by":"crossref","first-page":"1384","DOI":"10.1587\/transinf.2016EDL8006","article-title":"Parity Data De-Duplication in All Flash Array-Based OpenStack Cloud Block Storage.","volume":"99","author":"H.Heo","year":"2016","journal-title":"IEICE Transactions on Information and Systems"},{"key":"IJERTCS.2019070103-13","doi-asserted-by":"publisher","DOI":"10.1145\/2534169.2491714"},{"key":"IJERTCS.2019070103-14","doi-asserted-by":"publisher","DOI":"10.1109\/ICITA.2005.5"},{"key":"IJERTCS.2019070103-15","doi-asserted-by":"publisher","DOI":"10.1109\/MUE.2007.206"},{"key":"IJERTCS.2019070103-16","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496985"},{"key":"IJERTCS.2019070103-17","doi-asserted-by":"publisher","DOI":"10.1109\/PATMOS.2016.7833432"},{"key":"IJERTCS.2019070103-18","unstructured":"Kilvansky, M. (2004). A thorough introduction to flexclone volumes. NetApp."},{"key":"IJERTCS.2019070103-19","doi-asserted-by":"crossref","unstructured":"Kim, J., Lee, C., Lee, S., Son, I., Choi, J., Yoon, S., . . . Cha, J. (2012). Deduplication in SSDs: Model and quantitative analysis. In 2012 IEEE 28th symposium on mass storage systems and technologies (MSST). New York: IEEE.","DOI":"10.1109\/MSST.2012.6232379"},{"key":"IJERTCS.2019070103-20","doi-asserted-by":"crossref","unstructured":"Kim, K., Jung, S., & Song, Y. H. (2011). Compression ratio based hot\/cold data identification for flash memory. In IEEE International conference on consumer electronics (ICCE 2011) (pp. 33-34) New York, USA. IEEE.","DOI":"10.1109\/ICCE.2011.5722616"},{"key":"IJERTCS.2019070103-21","doi-asserted-by":"crossref","unstructured":"Kim, T., Lee, S., and Kim, J. (2017). FineDedup: A fine-grained deduplication technique for extending lifetime of flash-based SSDs. Journal of semiconductor technology and science, 17(5):648-659.","DOI":"10.5573\/JSTS.2017.17.5.648"},{"key":"IJERTCS.2019070103-22","doi-asserted-by":"crossref","unstructured":"Kim, T., Lee, S., Park, J., & Kim, J. (2016). Efficient lifetime management of SSD-based RAIDs using dedup-assisted partial stripe writes. In 2016 5TH Non-volatile memory systems and applications symposium (NVMSA). New York: IEEE.","DOI":"10.1109\/NVMSA.2016.7547184"},{"key":"IJERTCS.2019070103-23","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-60368-9_36"},{"key":"IJERTCS.2019070103-24","doi-asserted-by":"publisher","DOI":"10.7873\/DATE.2013.309"},{"key":"IJERTCS.2019070103-25","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2011.6131148"},{"key":"IJERTCS.2019070103-26","doi-asserted-by":"publisher","DOI":"10.1109\/NAS.2014.21"},{"key":"IJERTCS.2019070103-27","unstructured":"Li, W., Jean-Baptise, G., Riveros, J., Narasimhan, G., Zhang, T., & Zhao, M. (2016). Cachededup: In-line deduplication for flash caching. In 14th Usenix conference on file and storage technologies (FAST\u201816) (pp. 301-314). Berkeley, CA: USENIX ASSOC."},{"key":"IJERTCS.2019070103-28","doi-asserted-by":"crossref","unstructured":"Li, Y., Wang, Y., Jiang, A. A., & Bruck, J. (2012). Content-assisted file decoding for nonvolatile memories. In M. Matthews (Ed.), 2012 conference record of the forty sixth asilomar conference on signals, systems and computers (ASILOMAR) (pp. 937-941). New York: IEEE.","DOI":"10.1109\/ACSSC.2012.6489154"},{"key":"IJERTCS.2019070103-29","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2014.04.002"},{"key":"IJERTCS.2019070103-30","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2016.0087"},{"key":"IJERTCS.2019070103-31","doi-asserted-by":"crossref","unstructured":"Liu, J., Chai, Y., Qin, X., & Xiao, Y. (2014). PLC-cache: Endurable SSD cache for deduplication-based primary storage. In 2014 30th symposium on massive storage systems and technologies (MSST). New York. IEEE.","DOI":"10.1109\/MSST.2014.6855536"},{"key":"IJERTCS.2019070103-32","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2509060"},{"key":"IJERTCS.2019070103-33","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-018-1808-5"},{"key":"IJERTCS.2019070103-34","doi-asserted-by":"publisher","DOI":"10.1145\/3078837"},{"key":"IJERTCS.2019070103-35","doi-asserted-by":"publisher","DOI":"10.1109\/TrustCom.2016.0177"},{"key":"IJERTCS.2019070103-36","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1145\/1462735.1462739","article-title":"Demystifying data deduplication.","volume":"8","author":"N.Mandagere","year":"2008","journal-title":"Companion (Gloucester)"},{"key":"IJERTCS.2019070103-37","doi-asserted-by":"publisher","DOI":"10.1109\/ICoAC.2014.7229702"},{"key":"IJERTCS.2019070103-38","doi-asserted-by":"crossref","unstructured":"Mao, B., Jiang, H., Wu, S., Fu, Y., & Tian, L. (2012). SAR: SSD assisted restore optimization for deduplication-based storage systems in the cloud. In IEEE 7th International Conference on Networking, Architecture and Storage (NAS) (pp. 328-337). IEEE.","DOI":"10.1109\/NAS.2012.48"},{"key":"IJERTCS.2019070103-39","doi-asserted-by":"publisher","DOI":"10.1145\/2512348"},{"key":"IJERTCS.2019070103-40","doi-asserted-by":"crossref","unstructured":"Mao, B., Jiang, H., Wu, S., & Tian, L. (2014b). POD: performance oriented I\/O deduplication for primary storage systems in the cloud. In IEEE 28th International Parallel and Distributed Processing Symposium (pp. 767-776). IEEE.","DOI":"10.1109\/IPDPS.2014.84"},{"key":"IJERTCS.2019070103-41","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496992"},{"key":"IJERTCS.2019070103-42","article-title":"A study of practical deduplication.","author":"D. T.Meyer","year":"2011","journal-title":"Proceedings of the 9th USENIX Conference on File and Storage Technologies, FAST\u201911"},{"key":"IJERTCS.2019070103-43","doi-asserted-by":"publisher","DOI":"10.1007\/s10617-014-9142-9"},{"key":"IJERTCS.2019070103-44","doi-asserted-by":"crossref","unstructured":"Park, E. and Shin, D. (2015). Offline deduplication for solid state disk using a lightweight hash algorithm. JSTS: journal of semiconductor technology and science, 15(5), 539-545.","DOI":"10.5573\/JSTS.2015.15.5.539"},{"key":"IJERTCS.2019070103-45","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2017.7927181"},{"key":"IJERTCS.2019070103-46","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2011.6018868"},{"key":"IJERTCS.2019070103-47","doi-asserted-by":"publisher","DOI":"10.1145\/2611778"},{"key":"IJERTCS.2019070103-48","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2015.29"},{"key":"IJERTCS.2019070103-49","unstructured":"Seagate. (2017). Data age 2025: The evolution of data to life-critical."},{"key":"IJERTCS.2019070103-50","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2014.2350984"},{"key":"IJERTCS.2019070103-51","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2010.5606289"},{"key":"IJERTCS.2019070103-52","unstructured":"Shiming, W., Zhiyong, X., Yao, Z., & Chengyu, F. (2015). PCIE interface design for high-speed image storage system based on SSD. In C. Tang, S. Chen, and X. Tang (Eds.), 20th international symposium on high-power laser systems and applications 2014, Bellingham, WA USA. SPIE-INT socoptical engineering."},{"key":"IJERTCS.2019070103-53","doi-asserted-by":"publisher","DOI":"10.1145\/3017428"},{"key":"IJERTCS.2019070103-54","unstructured":"Wei, D., Gong, Y., Qiao, L., & Deng, L. (2014). A Hardware-Software Co-design Experiments Platform for NAND Flash Based on Zynq. In 2014 IEEE 20th international conference on embedded and real-time computing systems and applications (RTCSA). New York: IEEE."},{"key":"IJERTCS.2019070103-55","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2016.2571298"},{"key":"IJERTCS.2019070103-56","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2010.150"},{"key":"IJERTCS.2019070103-57","unstructured":"Yim, K., Koh, K., & Bahn, H. (2003). A compressed page management scheme for NAND-type flash memory. In H. Arabnia & L. Yang (Eds.), VLSI\u201903: Proceedings of the international conference on VLSI, Athens, GA (pp. 266-271). CSREA Press."},{"key":"IJERTCS.2019070103-58","doi-asserted-by":"publisher","DOI":"10.1007\/s10723-018-9429-3"},{"key":"IJERTCS.2019070103-59","unstructured":"Zhang, B., Wang, C., Zhou, B. B., & Zomaya, A. Y. (2015). Inline data deduplication for SSD-based distributed storage. In IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS) (pp. 593-600). IEEE."},{"key":"IJERTCS.2019070103-60","unstructured":"Zhang, X., Li, J., Wang, H., Zhao, K., & Zhang, T. (2016). Reducing solid-state storage device write stress through opportunistic in-place delta compression. In 14TH USENIX Conference on file and storage technologies (FAST \u201816) (pp. 111-124). Berkeley, CA: USENIX ASSOC."},{"key":"IJERTCS.2019070103-61","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.173"},{"key":"IJERTCS.2019070103-62","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2013.6704674"}],"container-title":["International Journal of Embedded and Real-Time Communication Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=231459","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T00:09:50Z","timestamp":1651795790000},"score":1,"resource":{"primary":{"URL":"http:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/IJERTCS.2019070103"}},"subtitle":["A Survey"],"short-title":[],"issued":{"date-parts":[[2019,7]]},"references-count":63,"journal-issue":{"issue":"3"},"URL":"https:\/\/doi.org\/10.4018\/ijertcs.2019070103","relation":{},"ISSN":["1947-3176","1947-3184"],"issn-type":[{"value":"1947-3176","type":"print"},{"value":"1947-3184","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,7]]}}}