{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:16:25Z","timestamp":1750306585658,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2016,2,1]],"date-time":"2016-02-01T00:00:00Z","timestamp":1454284800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Center for Research in Intelligent Storage"},{"name":"Department of Energy's Petascale Data Storage Institute","award":["DE-FC02-06ER25768"],"award-info":[{"award-number":["DE-FC02-06ER25768"]}]},{"name":"National Science Foundation","award":["CNS-0917396 (part of the American Recovery and Reinvestment Act of 2009 [Public Law 111-5]) and IIP-0934401"],"award-info":[{"award-number":["CNS-0917396 (part of the American Recovery and Reinvestment Act of 2009 [Public Law 111-5]) and IIP-0934401"]}]},{"name":"Sandia National Laboratories and the industrial sponsors of the Storage Systems Research Center"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2016,3,8]]},"abstract":"<jats:p>Storing large amounts of data for different users has become the new normal in a modern distributed cloud storage environment. Storing data successfully requires a balance of availability, reliability, cost, and performance. Typically, systems design for this balance with minimal information about the data that will pass through them. We propose a series of methods to derive groupings from data that have predictive value, informing layout decisions for data on disk.<\/jats:p>\n          <jats:p>Unlike previous grouping work, we focus on dynamically identifying groupings in data that can be gathered from active systems in real time with minimal impact using spatiotemporal locality. We outline several techniques we have developed and discuss how we select particular techniques for particular workloads and application domains. Our statistical and machine-learning-based grouping algorithms answer questions such as \u201cWhat can a grouping be based on?\u201d and \u201cIs a given grouping meaningful for a given application?\u201d We design our models to be flexible and require minimal domain information so that our results are as broadly applicable as possible. We intend for this work to provide a launchpad for future specialized system design using groupings in combination with caching policies and architectural distinctions such as tiered storage to create the next generation of scalable storage systems.<\/jats:p>","DOI":"10.1145\/2738042","type":"journal-article","created":{"date-parts":[[2016,2,1]],"date-time":"2016-02-01T20:37:54Z","timestamp":1454359074000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Can We Group Storage? Statistical Techniques to Identify Predictive Groupings in Storage System Accesses"],"prefix":"10.1145","volume":"12","author":[{"given":"Avani","family":"Wildani","sequence":"first","affiliation":[{"name":"The Salk Institute for Biological Studies, Atlanta, GA"}]},{"given":"Ethan L.","family":"Miller","sequence":"additional","affiliation":[{"name":"University of California, MS, Santa Cruz"}]}],"member":"320","published-online":{"date-parts":[[2016,2]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2180905.2180907"},{"volume-title":"IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 293--301","year":"2002","author":"Amer A.","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPCCC.2002.995144"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings in Informatics","volume":"14","author":"Ari I.","year":"2002"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1138085.1138093"},{"key":"e_1_2_1_6_1","unstructured":"M. Barbaro and T. Zeller  Jr. 2006. A face is exposed for aol searcher no. 4417749. (August 2006).  M. Barbaro and T. Zeller Jr. 2006. A face is exposed for aol searcher no. 4417749. (August 2006)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.52.0078"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/361268.361280"},{"volume-title":"Proceedings of the 2002 ACM\/IEEE Conference on Supercomputing. IEEE Computer Society Press, 11","author":"Colarelli D.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/DCC.2011.46"},{"key":"e_1_2_1_11_1","unstructured":"T. H. Cormen C. E. Leiserson and R. L. Rivest. 1990. Algorithms. MIT Press Cambridge MA.  T. H. Cormen C. E. Leiserson and R. L. Rivest. 1990. Algorithms. MIT Press Cambridge MA."},{"key":"e_1_2_1_12_1","unstructured":"X. Ding S. Jiang F. Chen K. Davis and X. Zhang. 2007. DiskSeen: Exploiting disk layout and access history to enhance I\/O prefetch. In 2007 USENIX ATC. USENIX Association 1--14.   X. Ding S. Jiang F. Chen K. Davis and X. Zhang. 2007. DiskSeen: Exploiting disk layout and access history to enhance I\/O prefetch. In 2007 USENIX ATC. USENIX Association 1--14."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1383422.1383429"},{"key":"e_1_2_1_14_1","unstructured":"R. O. Duda P. E. Hart and D. G. Stork. 2001. Pattern Classification. Vol. 2. Citeseer.   R. O. Duda P. E. Hart and D. G. Stork. 2001. Pattern Classification. Vol. 2. Citeseer."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1353452.1353454"},{"key":"e_1_2_1_16_1","unstructured":"Bert Dufrasne Roger Eriksson Lisa Martinez and Wenzel Kalabza. 2012. IBM XIV Storage System Gen3 Architecture Implementation and Usage. IBM International Technical Support Organization. 426 pages.  Bert Dufrasne Roger Eriksson Lisa Martinez and Wenzel Kalabza. 2012. IBM XIV Storage System Gen3 Architecture Implementation and Usage. IBM International Technical Support Organization. 426 pages."},{"key":"e_1_2_1_17_1","unstructured":"P. Jaccard. 1901. Distribution de la flore alpine dans le bassin des Dranses et dans quelques r\u00e9gions voisines. Bulletin del la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles 37 (1901) 241--272.  P. Jaccard. 1901. Distribution de la flore alpine dans le bassin des Dranses et dans quelques r\u00e9gions voisines. Bulletin del la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles 37 (1901) 241--272."},{"volume-title":"USENIX Conference on File and Storage Technologies (FAST). USENIX Association, 8.","author":"Jiang S.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837915.1837921"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/1268299.1268325"},{"volume-title":"USENIX Annual Technical Conference, General Track. 105--118","year":"2001","author":"Kroeger T. M.","key":"e_1_2_1_21_1"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.80.056117"},{"key":"e_1_2_1_23_1","unstructured":"W. Li. 2008. An Efficient Query System for High-Dimensional Spatio-Temporal Data. Ph.D. Dissertation. University of Massachusetts Lowell.   W. Li. 2008. An Efficient Query System for High-Dimensional Spatio-Temporal Data. Ph.D. Dissertation. University of Massachusetts Lowell."},{"volume-title":"Proceedings of the 3rd USENIX Conference on File and Storage Technologies. USENIX Association, 173--186","author":"Li Z.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2554850.2554871"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/989.990"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.2989\/16085910409503825"},{"key":"e_1_2_1_28_1","unstructured":"J. Metz. 2012. Working document of the new technologies file system (NTFS). 0.0.3 (2012).  J. Metz. 2012. Working document of the new technologies file system (NTFS). 0.0.3 (2012)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1416944.1416949"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/514191.514214"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1006209.1006220"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1971.10482356"},{"volume-title":"Proceedings of the USENIX Annual Technical Conference. 97--103","author":"Riska A.","key":"e_1_2_1_33_1"},{"volume-title":"Conference on File and Storage Technologies.","author":"Schindler J.","key":"e_1_2_1_34_1"},{"volume-title":"Proceedings of the 2002 Conference on File and Storage Technologies (FAST\u201902)","author":"Schmuck F.","key":"e_1_2_1_35_1"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1063786.1063787"},{"volume-title":"Proceedings of the 2nd USENIX Conference on File and Storage Technologies. 73--88","author":"Sivathanu M.","key":"e_1_2_1_37_1"},{"volume-title":"Proceedings of the National Academy of Science 1021 (Dec. 2005)","author":"Slonim N.","key":"e_1_2_1_38_1"},{"key":"e_1_2_1_39_1","unstructured":"T. S\u00f8renson. 1948. A method of establishing groups of equal amplitude in plant sociology based oil similarity of species content. Biologiske Skrifter\/Kongelige Danske Videnskabernes Selskab (1948) 1--34.  T. S\u00f8renson. 1948. A method of establishing groups of equal amplitude in plant sociology based oil similarity of species content. Biologiske Skrifter\/Kongelige Danske Videnskabernes Selskab (1948) 1--34."},{"volume-title":"Tech. Rep. CS--TR--298--90","year":"1990","author":"Staelin C.","key":"e_1_2_1_40_1"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1113361.1113364"},{"volume-title":"IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). Published by the IEEE Computer Society, 0285","author":"Wang J.","key":"e_1_2_1_42_1"},{"volume-title":"2010 5th Petascale Data Storage Workshop (PDSW\u201910)","author":"Wildani A.","key":"e_1_2_1_43_1"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544846"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1987816.1987823"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2014.17"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2168836.2168862"},{"volume-title":"Proceedings of the 8th USENIX Conference on File and Storage Technologies. USENIX Association, 14","author":"Yadwadkar N. J.","key":"e_1_2_1_48_1"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"S. Zaman S. I. Lippman L. Schneper N. Slonim and J. R. Broach. 2009. Glucose regulates transcription in yeast through a network of signaling pathways. Molecular Systems Biology 5 1 (2009).  S. Zaman S. I. Lippman L. Schneper N. Slonim and J. R. Broach. 2009. Glucose regulates transcription in yeast through a network of signaling pathways. Molecular Systems Biology 5 1 (2009).","DOI":"10.1038\/msb.2009.20"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/1191544.1191706"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2738042","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2738042","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:16:23Z","timestamp":1750227383000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2738042"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,2]]},"references-count":50,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,3,8]]}},"alternative-id":["10.1145\/2738042"],"URL":"https:\/\/doi.org\/10.1145\/2738042","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2016,2]]},"assertion":[{"value":"2014-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-02-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}