{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T05:31:08Z","timestamp":1739338268193,"version":"3.37.0"},"reference-count":43,"publisher":"Wiley","issue":"11","license":[{"start":{"date-parts":[[2009,8,21]],"date-time":"2009-08-21T00:00:00Z","timestamp":1250812800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2010,8,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved in the development of the data management middleware. This middleware, called DQ2, has been used for the last several years by the ATLAS experiment for shipping petabytes of data to research centres and universities worldwide. We describe our experience in developing and deploying DQ2 on the Worldwide LHC computing Grid, a production Grid infrastructure formed of hundreds of computing sites. From this operational experience, we have identified an important degree of uncertainty that underlies the behaviour of large Grid infrastructures. This uncertainty is subjected to a detailed analysis, leading us to present novel modelling and simulation techniques for Data Grids. In addition, we discuss what we perceive as practical limits to the development of data distribution algorithms for Data Grids given the underlying infrastructure uncertainty, and propose future research directions. Copyright \u00a9 2009 John Wiley &amp; Sons, Ltd.<\/jats:p>","DOI":"10.1002\/cpe.1489","type":"journal-article","created":{"date-parts":[[2009,8,21]],"date-time":"2009-08-21T10:25:28Z","timestamp":1250850328000},"page":"1338-1364","source":"Crossref","is-referenced-by-count":8,"title":["Managing very large distributed data sets on a data grid"],"prefix":"10.1002","volume":"22","author":[{"given":"Miguel","family":"Branco","sequence":"first","affiliation":[]},{"given":"Ed","family":"Zaluska","sequence":"additional","affiliation":[]},{"given":"David","family":"de Roure","sequence":"additional","affiliation":[]},{"given":"Mario","family":"Lassnig","sequence":"additional","affiliation":[]},{"given":"Vincent","family":"Garonne","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2010,7,8]]},"reference":[{"key":"e_1_2_9_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1107499.1107503"},{"key":"e_1_2_9_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/1374780.1374789"},{"issue":"3","key":"e_1_2_9_4_2","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1006\/jnca.2000.0110","article-title":"The data grid: Towards an architecture for the distributed management and analysis of large scientific data sets","volume":"23","author":"Chervenak A","year":"2000","journal-title":"Journal of Network and Computer Applications"},{"key":"e_1_2_9_5_2","unstructured":"The ATLAS Collaboration. Available at:http:\/\/atlasexperiment.org\/[Retrieved on 4 August2009]."},{"key":"e_1_2_9_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88871-0_54"},{"key":"e_1_2_9_7_2","unstructured":"SandbergR GolgbergD KleimanS WalshD LyonB.Design and implementation of the sun network filesystem. Proceedings of the Summer 1985 USENIX Conference Phoenix AZ U.S.A. 1985;119\u2013130."},{"key":"e_1_2_9_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/35037.35059"},{"key":"e_1_2_9_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.54838"},{"key":"e_1_2_9_10_2","unstructured":"SchmuckF HaskinR.GPFS: A shared\u2010disk file system for large computing clusters. Proceedings of the 2002 Conference on File and Storage Technologies Monterey CA U.S.A. 2002;231\u2013244."},{"key":"e_1_2_9_11_2","unstructured":"AndrewsP KovatchP JordanC.Massive high\u2010performance global file systems for grid computing. Proceedings of the ACM\/IEEE SC 2005 Conference Seattle WA U.S.A. 2005."},{"key":"e_1_2_9_12_2","unstructured":"SchwanP.Lustre: Building a file system for 1000\u2010node clusters. Proceedings of the 2003 Linux Symposium Ottawa Canada 2003."},{"key":"e_1_2_9_13_2","unstructured":"WeilS BrandtS MillerE LongD MaltzahnC.Ceph: A scalable high\u2010performance distributed file system. Proceedings of the 7th Conference on Operating Systems Design and Implementation Seattle WA U.S.A. 2006;307\u2013320."},{"key":"e_1_2_9_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/1165389.945450"},{"key":"e_1_2_9_15_2","unstructured":"Apache Hadoop Project. Available at:http:\/\/hadoop.apache.org\/core\/[Retrieved on 4 August2009]."},{"key":"e_1_2_9_16_2","doi-asserted-by":"crossref","unstructured":"WhiteB WalkerM HumphreyM GrimshawA.LegionFS: A secure and scalable file system supporting cross\u2010domain high\u2010performance applications. Proceedings of the ACM\/IEEE SC 2001 Conference Denver CO U.S.A. 2001.","DOI":"10.1145\/582034.582093"},{"key":"e_1_2_9_17_2","unstructured":"CohenB.Incentives build robustness in BitTorrent. Workshop on Economics of Peer\u2010to\u2010Peer Systems Berkeley CA U.S.A. 2003."},{"key":"e_1_2_9_18_2","doi-asserted-by":"crossref","unstructured":"LvQ CaoP CohenE LiK ShenkerS.Search and replication in unstructured peer\u2010to\u2010peer networks. Proceedings of the 16th International Conference on Supercomputing New York NY U.S.A. 2002;84\u201395.","DOI":"10.1145\/514191.514206"},{"key":"e_1_2_9_19_2","doi-asserted-by":"crossref","unstructured":"CohenE ShenkerS.Replication strategies in unstructured peer\u2010to\u2010peer networks. Proceedings of the 2002 Conference on Applications Technologies Architectures and Protocols for Computer Communications Pittsburgh PA U.S.A. 2002;177\u2013190.","DOI":"10.1145\/633025.633043"},{"key":"e_1_2_9_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/844128.844132"},{"key":"e_1_2_9_21_2","unstructured":"ChunBG DabekF HaeberlenA SitE WeatherspoonH KaashoekM KubiatowiczJ MorrisR.Efficient replica maintenance for distributed storage systems. Proceedings of the 3rd Conference on Networked Systems Design and Implementation San Jose CA U.S.A. vol. 3 2006;45\u201358."},{"key":"e_1_2_9_22_2","unstructured":"BhagwanR TatiK ChengY SavageS VoelkerG.Total recall: System support for automated availability management. Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation San Francisco CA U.S.A. 2004."},{"key":"e_1_2_9_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2006.02.001"},{"key":"e_1_2_9_24_2","doi-asserted-by":"crossref","unstructured":"BesterJ FosterI KesselmanC TedescoJ TueckeS.GASS: A data movement and access service for wide\u2010area computing systems. Proceedings of the 6th Workshop on I\/O in Parallel and Distributed Systems Atlanta GA U.S.A. 1999;78\u201388.","DOI":"10.1145\/301816.301839"},{"key":"e_1_2_9_25_2","unstructured":"SamarA StockingerH.Grid data management pilot (GDMP): A tool for wide area replication. IASTED International Conference on Applied Informatics Innsbruck Austria 2001."},{"key":"e_1_2_9_26_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(02)00094-7"},{"key":"e_1_2_9_27_2","doi-asserted-by":"crossref","unstructured":"ChervenakA DeelmanE FosterI GuyL HoschekW IamnitchiA KesselmanC KunsztP RipeanuM SchwartzkopfR StockingerH StockingerK TierneyB.Giggle: A framework for constructing scalable replica location services. Proceedings of the ACM\/IEEE SC 2002 Conference Baltimore MD U.S.A. 2002;1\u201317.","DOI":"10.1109\/SC.2002.10024"},{"key":"e_1_2_9_28_2","doi-asserted-by":"crossref","unstructured":"LamehamediH SzymanskiB ShentuZ DeelmanE.Data replication strategies in grid environments. Proceedings of the 5th International Conference on Algorithms and Architectures for Parallel Processing Beijing China 2002;378\u2013383.","DOI":"10.1109\/ICAPP.2002.1173605"},{"key":"e_1_2_9_29_2","unstructured":"BaruC MooreR RajasekarA WanM.The SDSC storage resource broker. Proceedings of the 1998 Conference of the Centre for Advanced Studies on Collaborative Research Toronto Ontario Canada 1998."},{"key":"e_1_2_9_30_2","unstructured":"TatebeO SekiguchiS MoritaY. Gfarm v2: A grid file system that supports high\u2010performance distributed and parallel data computing. Available at:http:\/\/datafarm.apgrid.org\/pdf\/CHEP04\u2010gfarmv2.pdf[24 June2009]."},{"key":"e_1_2_9_31_2","doi-asserted-by":"publisher","DOI":"10.1504\/IJHPCN.2008.020857"},{"key":"e_1_2_9_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-36133-2_5"},{"issue":"3","key":"e_1_2_9_33_2","first-page":"126","article-title":"SimGrid: A generic framework for large\u2010scale distributed experiments","volume":"1","author":"Casanova H","year":"2008","journal-title":"Computer Modeling and Simulation"},{"key":"e_1_2_9_34_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1307"},{"key":"e_1_2_9_35_2","unstructured":"ShoshaniA SimA GuJ.Storage resource managers: Middleware components for grid storage. Proceedings of the 19th IEEE Symposium on Mass Storage Systems College Park MD U.S.A. 2002."},{"key":"e_1_2_9_36_2","unstructured":"FieldingR.Architectural styles and the design of network\u2010based software architectures. PhD Thesis University of California 2000."},{"key":"e_1_2_9_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/1435417.1435432"},{"key":"e_1_2_9_38_2","doi-asserted-by":"crossref","unstructured":"KunsztP BadinoP FrohnerA McCanceG NienartowiczK RochaR RodriguesD. Data storage access and catalogs in gLite. Local to Global Data Interoperability\u2014Challenges and Technologies Sardinia Italy 2005; 166\u2013170. Available at:http:\/\/ieeexplore.ieee.org\/xpls\/abs_all.jsp?tp=&arnumber=1612487&isnumber=33856.","DOI":"10.1109\/LGDI.2005.1612487"},{"key":"e_1_2_9_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.28"},{"key":"e_1_2_9_40_2","unstructured":"AdamsDet al. The ATLAS Computing Model. Available at:http:\/\/cdsweb.cern.ch\/record\/811058[2004]."},{"volume-title":"The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling","year":"1991","author":"Jain R","key":"e_1_2_9_41_2"},{"volume-title":"Computational Statistics","year":"2005","author":"Givens G","key":"e_1_2_9_42_2"},{"issue":"1","key":"e_1_2_9_43_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster A","year":"1977","journal-title":"Journal of the Royal Statistical Society"},{"key":"e_1_2_9_44_2","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1176344136"}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.1489","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.1489","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T20:26:43Z","timestamp":1739305603000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.1489"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,7,8]]},"references-count":43,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2010,8,10]]}},"alternative-id":["10.1002\/cpe.1489"],"URL":"https:\/\/doi.org\/10.1002\/cpe.1489","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"type":"print","value":"1532-0626"},{"type":"electronic","value":"1532-0634"}],"subject":[],"published":{"date-parts":[[2010,7,8]]}}}