{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:36:35Z","timestamp":1750307795965,"version":"3.41.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2008,2,1]],"date-time":"2008-02-01T00:00:00Z","timestamp":1201824000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2008,2]]},"abstract":"<jats:p>We consider storage in an extremely large-scale distributed computer system designed for stream processing applications. In such systems, both incoming data and intermediate results may need to be stored to enable analyses at unknown future times. The quantity of data of potential use would dominate even the largest storage system. Thus, a mechanism is needed to keep the data most likely to be used. One recently introduced approach is to employ retention value functions, which effectively assign each data object a value that changes over time in a prespecified way [Douglis et al.2004]. Storage space for data entering the system is reclaimed automatically by deleting data of the lowest current value. In such large systems, there will naturally be multiple file systems available, each with different properties. Choosing the right file system for a given incoming stream of data presents a challenge. In this article we provide a novel and effective scheme for optimizing the placement of data within a distributed storage subsystem employing retention value functions. The goal is to keep the data of highest overall value, while simultaneously balancing the read load to the file system. The key aspects of such a scheme are quite different from those that arise in traditional file assignment problems. We further motivate this optimization problem and describe a solution, comparing its performance to other reasonable schemes via simulation experiments.<\/jats:p>","DOI":"10.1145\/1326542.1326547","type":"journal-article","created":{"date-parts":[[2008,2,28]],"date-time":"2008-02-28T14:02:33Z","timestamp":1204207353000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Storage optimization for large-scale distributed stream-processing systems"],"prefix":"10.1145","volume":"3","author":[{"given":"Kirsten","family":"Hildrum","sequence":"first","affiliation":[{"name":"IBM T. J. Watson Research Center, Yorktown Heights, NY"}]},{"given":"Fred","family":"Douglis","sequence":"additional","affiliation":[{"name":"IBM T. J. Watson Research Center, Yorktown Heights, NY"}]},{"given":"Joel L.","family":"Wolf","sequence":"additional","affiliation":[{"name":"IBM T. J. Watson Research Center, Yorktown Heights, NY"}]},{"given":"Philip S.","family":"Yu","sequence":"additional","affiliation":[{"name":"IBM T. J. Watson Research Center, Yorktown Heights, NY"}]},{"given":"Lisa","family":"Fleischer","sequence":"additional","affiliation":[{"name":"Dartmouth College, Hanover, NH"}]},{"given":"Akshay","family":"Katta","sequence":"additional","affiliation":[{"name":"Amazon Corporation, Seattle, WA"}]}],"member":"320","published-online":{"date-parts":[[2008,2,25]]},"reference":[{"volume-title":"Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR).","author":"Abadi D. J.","key":"e_1_2_1_1_1"},{"volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST).","author":"Abd-El-Malek M., II, W. V. C.","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","unstructured":"Ahuja R. Magnanti T. and Orlin J. 1993. Network Flows. Prentice Hall.  Ahuja R. Magnanti T. and Orlin J. 1993. Network Flows. Prentice Hall."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/502912.502915"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2006.13"},{"volume-title":"Proceedings of the ACMIUSENIX Symposium on Networked System Design and Implementation (NSDI). 365--378","author":"Bent J.","key":"e_1_2_1_6_1"},{"key":"e_1_2_1_7_1","unstructured":"Bertsimas D. and Tsitsiklis J. 1997. Introduction to Linear Optimization. Athena Scientific.   Bertsimas D. and Tsitsiklis J. 1997. Introduction to Linear Optimization. Athena Scientific."},{"volume-title":"1st Workshop on Hot Topics in System Dependability.","author":"Bhagwan R.","key":"e_1_2_1_8_1"},{"volume-title":"Proceedings of the 2nd Workshop on Hot Topics in Autonomic Computing. To appear.","author":"Branson M.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872857"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/502034.502054"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1151374.1151383"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1133572.1133593"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/356876.356883"},{"key":"e_1_2_1_15_1","unstructured":"Forrest J. 2006. CLP- COIN-OR linear program solver. http:\/\/www.coin-or.org\/Clp\/index.html.  Forrest J. 2006. CLP- COIN-OR linear program solver. http:\/\/www.coin-or.org\/Clp\/index.html."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1137\/0117039"},{"volume-title":"Workshop on System Management Tools for Large-Scale Parallel Systems.","author":"Hildrum K.","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","unstructured":"Hunter D. 1980. Modeling real dasd configurations. IBM Res. Rep. RC 8606.  Hunter D. 1980. Modeling real dasd configurations. IBM Res. Rep. RC 8606."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICAC.2007.40"},{"key":"e_1_2_1_20_1","unstructured":"Knuth D. E. 1973. The Art of Computer Programming Volume 3. Addison-Wesley.   Knuth D. E. 1973. The Art of Computer Programming Volume 3. Addison-Wesley."},{"key":"e_1_2_1_21_1","unstructured":"Lavenberg S. Ed. 1983. Computer Performance Modeling Handbook. Academic Press.   Lavenberg S. Ed. 1983. Computer Performance Modeling Handbook. Academic Press."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.833109"},{"key":"e_1_2_1_23_1","first-page":"20","article-title":"The COIN-OR initiative: Open-Source software accelerates operations research progress","volume":"28","author":"Lougee-Heimer R.","year":"2001","journal-title":"ORMS Today"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.382299"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/98457.98522"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/800264.809310"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.105"},{"volume-title":"Proceedings of the ACM\/IFIP\/USENIX 7th International Middleware Conference, 322--341","author":"Repantis","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/502034.502053"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1107499.1107504"},{"key":"e_1_2_1_31_1","unstructured":"Streambase Systems. 2007. Streambase. http:\/\/www.streambase.com\/.  Streambase Systems. 2007. Streambase. http:\/\/www.streambase.com\/."},{"key":"e_1_2_1_32_1","first-page":"1","article-title":"STREAM: The Stanford stream data manager","volume":"26","author":"The STREAM Group","year":"2003","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1984.1658928"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/75108.75373"},{"key":"e_1_2_1_35_1","first-page":"1","article-title":"The Aurora and Medusa projects","volume":"26","author":"Zdonik S.","year":"2003","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_36_1","unstructured":"Zipf G. K. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley.  Zipf G. K. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley."}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1326542.1326547","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1326542.1326547","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:56:25Z","timestamp":1750254985000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1326542.1326547"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,2]]},"references-count":36,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2008,2]]}},"alternative-id":["10.1145\/1326542.1326547"],"URL":"https:\/\/doi.org\/10.1145\/1326542.1326547","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2008,2]]},"assertion":[{"value":"2007-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2007-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-02-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}