{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T18:36:15Z","timestamp":1764700575373,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2016,5,12]],"date-time":"2016-05-12T00:00:00Z","timestamp":1463011200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2016,6,27]]},"abstract":"<jats:p>Storage workload identification is the task of characterizing a workload in a storage system (more specifically, network storage system\u2014NAS or SAN) and matching it with the previously known workloads. We refer to storage workload identification as \u201cworkload identification\u201d in the rest of this article. Workload identification is an important problem for cloud providers to solve because (1) providers can leverage this information to colocate similar workloads to make the system more predictable and (2) providers can identify workloads and subsequently give guidance to the subscribers as to associated best practices (with respect to configuration) for provisioning those workloads.<\/jats:p>\n          <jats:p>\n            Historically, people have identified workloads by looking at their read\/write ratios, random\/sequential ratios, block size, and interarrival frequency. Researchers are well aware that workload characteristics change over time and that one cannot just take a point in time view of a workload, as that will incorrectly characterize workload behavior. Increasingly, manual detection of workload signature is becoming harder because (1) it is difficult for a human to detect a pattern and (2) representing a workload signature by a tuple consisting of\n            <jats:italic>average<\/jats:italic>\n            values for each of the signature components leads to a large error.\n          <\/jats:p>\n          <jats:p>In this article, we present workload signature detection and a matching algorithm that is able to correctly identify workload signatures and match them with other similar workload signatures. We have tested our algorithm on nine different workloads generated using publicly available traces and on real customer workloads running in the field to show the robustness of our approach.<\/jats:p>","DOI":"10.1145\/2818716","type":"journal-article","created":{"date-parts":[[2016,5,13]],"date-time":"2016-05-13T14:30:58Z","timestamp":1463149858000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Storage Workload Identification"],"prefix":"10.1145","volume":"12","author":[{"given":"Jayanta","family":"Basak","sequence":"first","affiliation":[{"name":"NetApp, Inc., Bangalore, India"}]},{"given":"Kushal","family":"Wadhwani","sequence":"additional","affiliation":[{"name":"NetApp, Inc."}]},{"given":"Kaladhar","family":"Voruganti","sequence":"additional","affiliation":[{"name":"NetApp, Inc."}]}],"member":"320","published-online":{"date-parts":[[2016,5,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2012.6402909"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2421648.2421654"},{"key":"e_1_2_1_3_1","unstructured":"L. Breiman J. H. Friedman R. A. Olshen and C. J. Stone. 1983. Classification and Regression Trees. Chapman & Hall New York NY.  L. Breiman J. H. Friedman R. A. Olshen and C. J. Stone. 1983. Classification and Regression Trees. Chapman & Hall New York NY."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009715923555"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043556.2043562"},{"volume-title":"Proceedings of the 2nd Workshop on Exascale Evaluation and Research Techniques (EXERT\u201911)","author":"Delimitrou C.","key":"e_1_2_1_6_1","unstructured":"C. Delimitrou , S. Sankar , K. Vaid , and C. Kozyrakis . 2011. Accurate modeling and generation of storage I\/O for datacenter workloads . In Proceedings of the 2nd Workshop on Exascale Evaluation and Research Techniques (EXERT\u201911) . C. Delimitrou, S. Sankar, K. Vaid, and C. Kozyrakis. 2011. Accurate modeling and generation of storage I\/O for datacenter workloads. In Proceedings of the 2nd Workshop on Exascale Evaluation and Research Techniques (EXERT\u201911)."},{"key":"e_1_2_1_7_1","unstructured":"R. O. Duda P. E. Hart and D. G. Stork. 2001. Pattern Classification (2nd ed.). Wiley New York NY.   R. O. Duda P. E. Hart and D. G. Stork. 2001. Pattern Classification (2nd ed.). Wiley New York NY."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/584792.584898"},{"volume-title":"Proceedings of the International Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT\u201909)","author":"Gulati A.","key":"e_1_2_1_9_1","unstructured":"A. Gulati , C. Kumar , and I. Ahmad . 2009. Storage workload characterization and consolidation in virtualized environments . In Proceedings of the International Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT\u201909) . A. Gulati, C. Kumar, and I. Ahmad. 2009. Storage workload characterization and consolidation in virtualized environments. In Proceedings of the International Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT\u201909)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2038916.2038935"},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"T. Hastie R. Tibshirani and J. Friedman. 2009. The Elements of Statistical Learning (2nd ed.). Springer New York. NY.  T. Hastie R. Tibshirani and J. Friedman. 2009. The Elements of Statistical Learning (2nd ed.). Springer New York. NY.","DOI":"10.1007\/978-0-387-84858-7"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909)","volume":"9","author":"Jiang W.","unstructured":"W. Jiang , C. Hu , S. Pasupathy , A. Kanevsky , Z. Li , and Y. Zhou . 2009. Understanding customer problem troubleshooting from storage system logs . In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909) , Vol. 9 . 43--56. W. Jiang, C. Hu, S. Pasupathy, A. Kanevsky, Z. Li, and Y. Zhou. 2009. Understanding customer problem troubleshooting from storage system logs. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909), Vol. 9. 43--56."},{"volume-title":"Proceedings of the IEEE International Symposium on Workload Characterization. 119--128","author":"Kavalanekar S.","key":"e_1_2_1_13_1","unstructured":"S. Kavalanekar , B. Worthington , Q. Zhang , and V. Sharda . 2008. Characterization of storage workload traces from production windows servers . In Proceedings of the IEEE International Symposium on Workload Characterization. 119--128 . S. Kavalanekar, B. Worthington, Q. Zhang, and V. Sharda. 2008. Characterization of storage workload traces from production windows servers. In Proceedings of the IEEE International Symposium on Workload Characterization. 119--128."},{"key":"e_1_2_1_14_1","volume-title":"PMC Based Performance Measurement in FreeBSD. Retrieved","author":"Koshy J.","year":"2016","unstructured":"J. Koshy . 2007. PMC Based Performance Measurement in FreeBSD. Retrieved April 2, 2016 , from http:\/\/people.freebsd.org\/&sim;jkoshy\/projects\/perf-measurement. J. Koshy. 2007. PMC Based Performance Measurement in FreeBSD. Retrieved April 2, 2016, from http:\/\/people.freebsd.org\/&sim;jkoshy\/projects\/perf-measurement."},{"key":"e_1_2_1_15_1","first-page":"1609","article-title":"A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models","volume":"13","author":"Lawrence N. D.","year":"2012","unstructured":"N. D. Lawrence . 2012 . A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models . Journal of Machine Learning Research 13 , 1609 -- 1638 . N. D. Lawrence. 2012. A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models. Journal of Machine Learning Research 13, 1609--1638.","journal-title":"Journal of Machine Learning Research"},{"volume-title":"Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST\u201914)","author":"Liu Y.","key":"e_1_2_1_16_1","unstructured":"Y. Liu , R. Gunasekaran , X. Ma , and S. S. Vazhkudai . 2014. Automatic identification of application I\/O signatures from noisy server-side traces . In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST\u201914) . 213--228. Y. Liu, R. Gunasekaran, X. Ma, and S. S. Vazhkudai. 2014. Automatic identification of application I\/O signatures from noisy server-side traces. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST\u201914). 213--228."},{"volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST\u201908)","author":"Narayanan D.","key":"e_1_2_1_17_1","unstructured":"D. Narayanan , A. Donnelly , and A. Rowstron . 2008. Write off-loading: Practical power management for enterprise storage . In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST\u201908) . 253--267. D. Narayanan, A. Donnelly, and A. Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST\u201908). 253--267."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICHIT.2008.216"},{"volume-title":"Proceedings of the 5th NASA Goddard Conference on Mass Storage Systems and Technologies.","author":"Pentakalos O. I.","key":"e_1_2_1_19_1","unstructured":"O. I. Pentakalos , D. A. Menasce , and Y. Yesha . 1996. Automated clustering-based workload characterization . In Proceedings of the 5th NASA Goddard Conference on Mass Storage Systems and Technologies. O. I. Pentakalos, D. A. Menasce, and Y. Yesha. 1996. Automated clustering-based workload characterization. In Proceedings of the 5th NASA Goddard Conference on Mass Storage Systems and Technologies."},{"volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906)","author":"Perelman E.","key":"e_1_2_1_20_1","unstructured":"E. Perelman , M. Polito , J. Bouguet , J. Sampson , B. Calder , and C. Dulong . 2006. Detecting phases in parallel applications on shared memory architectures . In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906) . E. Perelman, M. Polito, J. Bouguet, J. Sampson, B. Calder, and C. Dulong. 2006. Detecting phases in parallel applications on shared memory architectures. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906)."},{"volume-title":"Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201912)","author":"Pipada P.","key":"e_1_2_1_21_1","unstructured":"P. Pipada , A. Kundu , K. Gopinath , C. Bhattacharyya , S. Susarla , and P. C. Nagesh . 2012. LoadIQ: Learning to identify workload phases from a live storage trace . In Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201912) . P. Pipada, A. Kundu, K. Gopinath, C. Bhattacharyya, S. Susarla, and P. C. Nagesh. 2012. LoadIQ: Learning to identify workload phases from a live storage trace. In Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201912)."},{"volume-title":"Proceedings of the USENIX Annual Technical Conference. 97--102","author":"Riska A.","key":"e_1_2_1_22_1","unstructured":"A. Riska and E. Riedel . 2006. Disk drive level workload characterization . In Proceedings of the USENIX Annual Technical Conference. 97--102 . A. Riska and E. Riedel. 2006. Disk drive level workload characterization. In Proceedings of the USENIX Annual Technical Conference. 97--102."},{"volume-title":"Proceedings of the 1st USENIX Workshop on the Analysis of System Logs.","author":"Sandeep S. R.","key":"e_1_2_1_23_1","unstructured":"S. R. Sandeep , M. Swapna , T. Niranjan , S. Susarla , and S. Nandi . 2008. CLUEBOX: A performance log analyzer for automated troubleshooting . In Proceedings of the 1st USENIX Workshop on the Analysis of System Logs. S. R. Sandeep, M. Swapna, T. Niranjan, S. Susarla, and S. Nandi. 2008. CLUEBOX: A performance log analyzer for automated troubleshooting. In Proceedings of the 1st USENIX Workshop on the Analysis of System Logs."},{"key":"e_1_2_1_25_1","volume-title":"SNIA IOTTA Repository: I\/O Trace Data Files. Retrieved","author":"IA.","year":"2016","unstructured":"SN IA. 2011. SNIA IOTTA Repository: I\/O Trace Data Files. Retrieved April 2, 2016 , from http:\/\/iotta.snia.org\/traces. SNIA. 2011. SNIA IOTTA Repository: I\/O Trace Data Files. Retrieved April 2, 2016, from http:\/\/iotta.snia.org\/traces."},{"key":"e_1_2_1_26_1","volume-title":"Storage Performance Council: SPC Trace File Format Specification. Retrieved","author":"PC.","year":"2016","unstructured":"S PC. 2002. Storage Performance Council: SPC Trace File Format Specification. Retrieved April 2, 2016 , from http:\/\/skuld.cs.umass.edu\/traces\/storage\/SPC-Traces.pdf. SPC. 2002. Storage Performance Council: SPC Trace File Format Specification. Retrieved April 2, 2016, from http:\/\/skuld.cs.umass.edu\/traces\/storage\/SPC-Traces.pdf."},{"key":"e_1_2_1_27_1","unstructured":"P.-N. Tan M. Steinbach and V. Kumar. 2005. Introduction to Data Mining. Addison Wesley Longman Boston MA.   P.-N. Tan M. Steinbach and V. Kumar. 2005. Introduction to Data Mining. Addison Wesley Longman Boston MA."},{"key":"e_1_2_1_28_1","volume-title":"Technical Report HPL-SSP-2003-13. HP Laboratories, SSP.","author":"Veitch A.","year":"2003","unstructured":"A. Veitch and K. Keeton . 2003 . The Rubicon Workload Characterization Tool . Technical Report HPL-SSP-2003-13. HP Laboratories, SSP. A. Veitch and K. Keeton. 2003. The Rubicon Workload Characterization Tool. Technical Report HPL-SSP-2003-13. HP Laboratories, SSP."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1005686.1005743"},{"key":"e_1_2_1_30_1","volume-title":"Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann","author":"Witten I. H.","year":"2011","unstructured":"I. H. Witten , E. Frank , and M. A. Hall . 2011 . Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann , San Francisco, CA . I. H. Witten, E. Frank, and M. A. Hall. 2011. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, CA."},{"volume-title":"Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST\u201910)","author":"Yadwadkar N. J.","key":"e_1_2_1_31_1","unstructured":"N. J. Yadwadkar , C. Bhattacharya , K. Gopinath , T. Niranjan , and S. Susarla . 2010. Discovery of application workloads from network file traces . In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST\u201910) . 183--196. N. J. Yadwadkar, C. Bhattacharya, K. Gopinath, T. Niranjan, and S. Susarla. 2010. Discovery of application workloads from network file traces. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST\u201910). 183--196."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2013.47"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0019-9958(65)90241-X"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2818716","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2818716","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:42:49Z","timestamp":1750225369000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2818716"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,5,12]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2016,6,27]]}},"alternative-id":["10.1145\/2818716"],"URL":"https:\/\/doi.org\/10.1145\/2818716","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2016,5,12]]},"assertion":[{"value":"2013-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-05-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}