{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,27]],"date-time":"2025-04-27T20:21:26Z","timestamp":1745785286138,"version":"3.37.3"},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,12,22]],"date-time":"2021-12-22T00:00:00Z","timestamp":1640131200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,12,22]],"date-time":"2021-12-22T00:00:00Z","timestamp":1640131200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100012325","name":"Bergische Universit\u00e4t Wuppertal","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100012325","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comput Softw Big Sci"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>A common task in scientific computing is the data reduction. This workflow extracts the most important information from large input data and stores it in smaller derived data objects. The derived data objects can then be used for further analysis. Typically, these workflows use distributed storage and computing resources. A straightforward setup of storage media would be low-cost tape storage and higher-cost disk storage. The large, infrequently accessed input data are stored on tape storage. The smaller, frequently accessed derived data is stored on disk storage. In a best-case scenario, the large input data is only accessed very infrequently and in a well-planned pattern. However, practice shows that often the data has to be processed continuously and unpredictably. This can significantly reduce tape storage performance. A common approach to counter this is storing copies of the large input data on disk storage. This contribution evaluates an approach that uses cloud storage resources to serve as a flexible cache or buffer, depending on the computational workflow. The proposed model is explored for the case of continuously processed data. For the evaluation, a simulation tool was developed, which can be used to analyse models related to storage and network resources. We show that using commercial cloud storage can reduce on-premises disk storage requirements, while maintaining an equal throughput of jobs. Moreover, the key metrics of the model are discussed, and an approach is described, which uses the simulation to assist with the decision process of using commercial cloud storage. The goal is to investigate approaches and propose new evaluation methods to overcome future data challenges.<\/jats:p>","DOI":"10.1007\/s41781-021-00076-w","type":"journal-article","created":{"date-parts":[[2022,2,14]],"date-time":"2022-02-14T08:05:53Z","timestamp":1644825953000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Simulation and Evaluation of Cloud Storage Caching for Data Intensive Science"],"prefix":"10.1007","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3691-021X","authenticated-orcid":false,"given":"Tobias","family":"Wegner","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mario","family":"Lassnig","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peer","family":"Ueberholz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christian","family":"Zeitnitz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,12,22]]},"reference":[{"key":"76_CR1","unstructured":"Collaboration ATLAS (2008) The ATLAS Experiment at the CERN Large Hadron Collider. JINST 3:S08003"},{"key":"76_CR2","first-page":"S08004","volume":"3","author":"CMS Collaboration","year":"2008","unstructured":"Collaboration CMS (2008) The CMS Experiment at the CERN LHC. JINST 3:S08004","journal-title":"JINST"},{"issue":"2","key":"76_CR3","doi-asserted-by":"publisher","first-page":"111","DOI":"10.3847\/1538-4357\/ab042c","volume":"873","author":"Z Ivezi\u0107","year":"2019","unstructured":"Ivezi\u0107 Z, Kahn SM, Tyson JA et al (2019) LSST: From science drivers to reference design and anticipated data products. Astrophys J 873(2):111","journal-title":"Astrophys J"},{"key":"76_CR4","unstructured":"Weltman A, Bull P, Camera S et al (2020) Fundamental Physics with the Square Kilometer Array. Publications of the Astronomical Society of Australia 37"},{"issue":"1","key":"76_CR5","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1063\/1.1807311","volume":"722","author":"P Avery","year":"2004","unstructured":"Avery P (2004) Grid Computing in High Energy Physics. AIP Conf Proc 722(1):131\u2013140","journal-title":"AIP Conf Proc"},{"issue":"3","key":"76_CR6","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/s10723-005-9010-8","volume":"3","author":"J Yu","year":"2005","unstructured":"Yu J, Buyya R (2005) A Taxonomy of Workflow Management Systems for Grid Computing. J Grid Comput 3(3):171\u2013200","journal-title":"J Grid Comput"},{"key":"76_CR7","doi-asserted-by":"crossref","unstructured":"Bonacorsi D and Ferrari T (2007). WLCG Service Challenges and Tiered architecture in the LHC era. In IFAE 2006, pages 365\u2013368. Springer Milan","DOI":"10.1007\/978-88-470-0530-3_68"},{"key":"76_CR8","doi-asserted-by":"crossref","unstructured":"ATLAS Collaboration (2017). ATLAS Distributed Computing experience and performance during the LHC Run-2. J. Phys. Conf. Ser., 898(5):052015","DOI":"10.1088\/1742-6596\/898\/5\/052015"},{"key":"76_CR9","doi-asserted-by":"crossref","unstructured":"ATLAS Collaboration (2017). The ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-Luminosity. J. Phys. Conf. Ser., 898(5):052016","DOI":"10.1088\/1742-6596\/898\/5\/052016"},{"key":"76_CR10","doi-asserted-by":"crossref","unstructured":"ATLAS Collaboration (2017). Experiences with the new ATLAS Distributed Data Management System. J. Phys. Conf. Ser., 898(6):062019","DOI":"10.1088\/1742-6596\/898\/6\/062019"},{"key":"76_CR11","doi-asserted-by":"crossref","unstructured":"Zhang X, He D, Du D Hc, et al (2006). Object Placement in Parallel Tape Storage Systems. In 2006 International Conference on Parallel Processing (ICPP\u201906), pages 101\u2013108","DOI":"10.1109\/ICPP.2006.55"},{"key":"76_CR12","doi-asserted-by":"crossref","unstructured":"Moore RL, D\u2019Aoust J, McDonald RH et al (2007) Disk and Tape Storage Cost Models. Archiving Conference 2007(1):29\u201332","DOI":"10.2352\/issn.2168-3204.2007.4.1.art00008"},{"issue":"4","key":"76_CR13","doi-asserted-by":"publisher","first-page":"042045","DOI":"10.1088\/1742-6596\/331\/4\/042045","volume":"331","author":"D Yu","year":"2011","unstructured":"Yu D, Lauret J (2011) Tape Storage Optimization at BNL. J Phys Conf Ser 331(4):042045","journal-title":"J Phys Conf Ser"},{"key":"76_CR14","unstructured":"Collaboration ATLAS (2020) ATLAS Data Carousel. Technical Report ATL-SOFT-PROC-2020-014. CERN, Geneva"},{"key":"76_CR15","doi-asserted-by":"publisher","first-page":"052025","DOI":"10.1088\/1742-6596\/664\/5\/052025","volume":"664","author":"E Martelli","year":"2015","unstructured":"Martelli E, Stancu S (2015) LHCOPN and LHCONE: Status and Future Evolution. J Phys Conf Ser 664:052025","journal-title":"J Phys Conf Ser"},{"key":"76_CR16","unstructured":"Lim K DMTN-125: Google Cloud Engagement Results. https:\/\/dmtn-125.lsst.io\/. Accessed: 2020-11-13"},{"key":"76_CR17","doi-asserted-by":"publisher","first-page":"04020","DOI":"10.1051\/epjconf\/201921404020","volume":"214","author":"ATLAS Collaboration","year":"2019","unstructured":"Collaboration ATLAS (2019) The Data Ocean project: An ATLAS and Google R&D collaboration. EPJ Web Conf 214:04020","journal-title":"EPJ Web Conf"},{"key":"76_CR18","unstructured":"Google. Google Cloud Platform. https:\/\/cloud.google.com\/. Accessed: 2020-09-14"},{"key":"76_CR19","doi-asserted-by":"publisher","first-page":"012015","DOI":"10.1088\/1742-6596\/608\/1\/012015","volume":"608","author":"ATLAS Collaboration","year":"2015","unstructured":"Collaboration ATLAS (2015) Multilevel Workflow System in the ATLAS Experiment. J Phys Conf Ser 608:012015","journal-title":"J Phys Conf Ser"},{"key":"76_CR20","doi-asserted-by":"publisher","first-page":"072024","DOI":"10.1088\/1742-6596\/331\/7\/072024","volume":"331","author":"ATLAS Collaboration","year":"2011","unstructured":"Collaboration ATLAS (2011) Overview of ATLAS PanDA workload management. J Phys Conf Ser 331:072024","journal-title":"J Phys Conf Ser"},{"issue":"2","key":"76_CR21","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s10723-018-9436-4","volume":"16","author":"M Meoni","year":"2018","unstructured":"Meoni M, Perego R, Tonellotto N (2018) Dataset popularity prediction for caching of cms big data. J Grid Comput 16(2):211\u2013228","journal-title":"J Grid Comput"},{"key":"76_CR22","doi-asserted-by":"publisher","first-page":"032106","DOI":"10.1088\/1742-6596\/396\/3\/032106","volume":"396","author":"M Titov","year":"2012","unstructured":"Titov M, Zaruba G et al (2012) A probabilistic analysis of data popularity in ATLAS data caching. J Phys Conf Ser 396:032106","journal-title":"J Phys Conf Ser"}],"container-title":["Computing and Software for Big Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41781-021-00076-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41781-021-00076-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41781-021-00076-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,17]],"date-time":"2023-11-17T12:36:24Z","timestamp":1700224584000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41781-021-00076-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,22]]},"references-count":22,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["76"],"URL":"https:\/\/doi.org\/10.1007\/s41781-021-00076-w","relation":{},"ISSN":["2510-2036","2510-2044"],"issn-type":[{"type":"print","value":"2510-2036"},{"type":"electronic","value":"2510-2044"}],"subject":[],"published":{"date-parts":[[2021,12,22]]},"assertion":[{"value":"11 December 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 November 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 December 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"5"}}