{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T12:14:21Z","timestamp":1771330461600,"version":"3.50.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2012,5,1]],"date-time":"2012-05-01T00:00:00Z","timestamp":1335830400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000151","name":"Division of Industrial Innovation and Partnerships","doi-asserted-by":"publisher","award":["CNS-0917396IIP-0934401"],"award-info":[{"award-number":["CNS-0917396IIP-0934401"]}],"id":[{"id":"10.13039\/100000151","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000144","name":"Division of Computer and Network Systems","doi-asserted-by":"publisher","award":["CNS-0917396IIP-0934401"],"award-info":[{"award-number":["CNS-0917396IIP-0934401"]}],"id":[{"id":"10.13039\/100000144","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2012,5]]},"abstract":"<jats:p>The scope of archival systems is expanding beyond cheap tertiary storage: scientific and medical data is increasingly digital, and the public has a growing desire to digitally record their personal histories. Driven by the increase in cost efficiency of hard drives, and the rise of the Internet, content archives have become a means of providing the public with fast, cheap access to long-term data. Unfortunately, designers of purpose-built archival systems are either forced to rely on workload behavior obtained from a narrow, anachronistic view of archives as simply cheap tertiary storage, or extrapolate from marginally related enterprise workload data and traditional library access patterns.<\/jats:p><jats:p>To close this knowledge gap and provide relevant input for the design of effective long-term data storage systems, we studied the workload behavior of several systems within this expanded archival storage space. Our study examined several scientific and historical archives, covering a mixture of purposes, media types, and access models---that is, public versus private. Our findings show that, for more traditional private scientific archival storage, files have become larger, but update rates have remained largely unchanged. However, in the public content archives we observed, we saw behavior that diverges from the traditional \u201cwrite-once, read-maybe\u201d behavior of tertiary storage. Our study shows that the majority of such data is modified---sometimes unnecessarily---relatively frequently, and that indexing services such as Google and internal data management processes may routinely access large portions of an archive, accounting for most of the accesses. Based on these observations, we identify areas for improving the efficiency and performance of archival storage systems.<\/jats:p>","DOI":"10.1145\/2180905.2180907","type":"journal-article","created":{"date-parts":[[2012,6,5]],"date-time":"2012-06-05T17:34:28Z","timestamp":1338917668000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":32,"title":["Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories"],"prefix":"10.1145","volume":"8","author":[{"given":"Ian F.","family":"Adams","sequence":"first","affiliation":[{"name":"University of California, Santa Cruz"}]},{"given":"Mark W.","family":"Storer","sequence":"additional","affiliation":[{"name":"NetApp"}]},{"given":"Ethan L.","family":"Miller","sequence":"additional","affiliation":[{"name":"University of California, Santa Cruz"}]}],"member":"320","published-online":{"date-parts":[[2012,5]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST). 31--45","author":"Agrawal N."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST). 125--138","author":"Agrawal N."},{"key":"e_1_2_1_3_1","unstructured":"Alaska State. 2010. Alaska\u2019s digital archives. vilda.alaska.edu. Alaska State . 2010. Alaska\u2019s digital archives. vilda.alaska.edu."},{"key":"e_1_2_1_4_1","unstructured":"Amazon. 2011. Amazon\u2019s simple storage service. http:\/\/aws.amazon.com\/s3\/. Amazon . 2011. Amazon\u2019s simple storage service. http:\/\/aws.amazon.com\/s3\/."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 7th USENIX Conference on File and Storage Technologies.","author":"Anderson E.","year":"2009"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1496909.1496923"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1254882.1254917"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST). 223--238","author":"Bairavasundaram L. N."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of 1st IEEE Workshop on Hot Topics in System Dependendability.","author":"Baker M."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1217935.1217957"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654081"},{"key":"e_1_2_1_12_1","unstructured":"California DWR. 2010. California Department of Water Resources water reports. http:\/\/www.water.ca.gov\/waterdatalibrary\/docs\/Hydstra\/index.cfm. California DWR . 2010. California Department of Water Resources water reports. http:\/\/www.water.ca.gov\/waterdatalibrary\/docs\/Hydstra\/index.cfm."},{"key":"e_1_2_1_13_1","unstructured":"Chronicles. 2011. Chronicles of life: Save your memories forever. http:\/\/www.chronicleoflife.com\/. Chronicles . 2011. Chronicles of life: Save your memories forever. http:\/\/www.chronicleoflife.com\/."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing (SC\u201902)","author":"Colarelli D."},{"key":"e_1_2_1_15_1","unstructured":"Cornell University Library. 2010. Cornell University Library arXiv. http:\/\/arxiv.org\/. Cornell University Library . 2010. Cornell University Library arXiv. http:\/\/arxiv.org\/."},{"key":"e_1_2_1_16_1","volume-title":"Opinion: Tape backup is WORN (write once, read never)","author":"Damoulakis J.","year":"2007"},{"key":"e_1_2_1_17_1","unstructured":"Dayal S. 2008. Characterizing HEC Storage Systems at Rest. Tech. rep. CMU-PDL-08-109 Carnegie Mellon University. Dayal S. 2008. Characterizing HEC Storage Systems at Rest. Tech. rep. CMU-PDL-08-109 Carnegie Mellon University."},{"key":"e_1_2_1_18_1","unstructured":"Dropbox. 2011. Dropbox. http:\/\/www.dropbox.com\/. Dropbox . 2011. Dropbox. http:\/\/www.dropbox.com\/."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 24th International Conference for the Resource Management and Performance and Performance Evaluation of Enterprise Computing Systems (CMG\u201998)","author":"Gibson T."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 6th Goddard Conference on Mass Storage Systems and Technologies\/15th IEEE Symposium on Mass Storage Systems. 355--372","author":"Gibson T. J."},{"key":"e_1_2_1_21_1","unstructured":"HIPAA. 1996. Health Information Portability and Accountability Act. HIPAA . 1996. Health Information Portability and Accountability Act."},{"key":"e_1_2_1_22_1","unstructured":"IBM. 2010. IBM 3380 direct access storage device. http:\/\/www-03.ibm.com\/ibm\/history\/exhibits\/storage\/storage_3380e.html. IBM . 2010. IBM 3380 direct access storage device. http:\/\/www-03.ibm.com\/ibm\/history\/exhibits\/storage\/storage_3380e.html."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1534530.1534545"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/165939.166018"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the USENIX Annual Technical Conference.","author":"Leung A. W."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the USENIX Annual Technical Conference. 29--42","author":"Lillibridge M."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1047915.1047917"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the Winter USENIX Technical Conference. 421--433","author":"Miller E."},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Moore R. L. D\u2019Aoust J. McDonald R. H. and Minor D. 2007. Disk and tape storage cost models. In Archiving 2007. Moore R. L. D\u2019Aoust J. McDonald R. H. and Minor D. 2007. Disk and tape storage cost models. In Archiving 2007 .","DOI":"10.2352\/issn.2168-3204.2007.4.1.art00008"},{"key":"e_1_2_1_30_1","unstructured":"New York State. 2010. New York State digital archives. http:\/\/www.archives.nysed.gov\/aindex.shtml. New York State . 2010. New York State digital archives. http:\/\/www.archives.nysed.gov\/aindex.shtml."},{"key":"e_1_2_1_31_1","unstructured":"NOAA. 2010. National Climatic Data Center. http:\/\/www.ncdc.noaa.gov\/oa\/ncdc.html. NOAA . 2010. National Climatic Data Center. http:\/\/www.ncdc.noaa.gov\/oa\/ncdc.html."},{"key":"e_1_2_1_32_1","unstructured":"ORNL. 2010. Distributed Active Archive Center. http:\/\/daac.ornl.gov\/. ORNL . 2010. Distributed Active Archive Center. http:\/\/daac.ornl.gov\/."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1006209.1006220"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST).","author":"Pinheiro E."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the Conference on File and Storage Technologies (FAST). USENIX, 89--101","author":"Quinlan S."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the USENIX Annual Technical Conference. USENIX Association, 41--54","author":"Roselli D."},{"key":"e_1_2_1_37_1","unstructured":"Sarbanes-Oxley. 2002. Sarbanes-Oxley act 2002. www.soxlaw.com. Sarbanes-Oxley . 2002. Sarbanes-Oxley act 2002. www.soxlaw.com."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST). 1--16","author":"Schroeder B."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1981.230843"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/358722.358737"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the USENIX Annual Technical Conference. 143--156","author":"Storer M. W."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST).","author":"Storer M. W."},{"key":"e_1_2_1_43_1","unstructured":"Strange S. 1992. Analysis of long-term UNIX file access patterns for application to automatic file migration strategies. Tech. rep. UCB\/CSD 92\/700 University of California Berkeley. Strange S. 1992. Analysis of long-term UNIX file access patterns for application to automatic file migration strategies. Tech. rep. UCB\/CSD 92\/700 University of California Berkeley."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1140277.1140280"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367829.1367831"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/319151.319158"},{"key":"e_1_2_1_47_1","unstructured":"Washington State. 2010. Washington State digital archives. http:\/\/www.digitalarchives.wa.gov\/. Washington State . 2010. Washington State digital archives. http:\/\/www.digitalarchives.wa.gov\/."},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the 5th International Workshop on Petascale Data Storage (PDSW10)","author":"Wildani A.","year":"2010"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).","author":"Wildani A."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2005.47"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1243418.1243423"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST).","author":"Zhu B."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095810.1095828"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2180905.2180907","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2180905.2180907","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:14Z","timestamp":1750241174000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2180905.2180907"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,5]]},"references-count":53,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2012,5]]}},"alternative-id":["10.1145\/2180905.2180907"],"URL":"https:\/\/doi.org\/10.1145\/2180905.2180907","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"value":"1553-3077","type":"print"},{"value":"1553-3093","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,5]]},"assertion":[{"value":"2011-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-05-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}