{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T10:08:19Z","timestamp":1742378899480},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2008,8]]},"abstract":"<jats:p>\n            Recent advances in flash media have made it an attractive alternative for data storage in a wide spectrum of computing devices, such as embedded sensors, mobile phones, PDA's, laptops, and even servers. However, flash media has many unique characteristics that make existing data management\/analytics algorithms designed for magnetic disks perform poorly with flash storage. For example, while random (page) reads are as fast as sequential reads, random (page) writes and in-place data updates are orders of magnitude slower than sequential writes. In this paper, we consider an important fundamental problem that would seem to be particularly challenging for flash storage: efficiently maintaining a very large (100 MBs or more) random sample of a data stream (e.g., of sensor readings). First, we show that previous algorithms such as reservoir sampling and geometric file are not readily adapted to flash. Second, we propose B-FILE, an energy-efficient abstraction for flash media to store self-expiring items, and show how a B-FILE can be used to efficiently maintain a large sample in flash. Our solution is simple, has a small (RAM) memory footprint, and is designed to cope with flash constraints in order to reduce latency and energy consumption. Third, we provide techniques to maintain biased samples with a B-FILE and to query the large sample stored in a B-FILE for a subsample of an arbitrary size. Finally, we present an evaluation with flash media that shows our techniques are\n            <jats:italic>several orders of magnitude faster and more energy-efficient<\/jats:italic>\n            than (flash-friendly versions of) reservoir sampling and geometric file. A key finding of our study, of potential use to many flash algorithms beyond sampling, is that \"semi-random\" writes (as defined in the paper) on flash cards are over two orders of magnitude faster and more energy-efficient than random writes.\n          <\/jats:p>","DOI":"10.14778\/1453856.1453961","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"970-983","source":"Crossref","is-referenced-by-count":29,"title":["Online maintenance of very large random samples on flash storage"],"prefix":"10.14778","volume":"1","author":[{"given":"Suman","family":"Nath","sequence":"first","affiliation":[{"name":"Microsoft Research"}]},{"given":"Phillip B.","family":"Gibbons","sequence":"additional","affiliation":[{"name":"Intel Research Pittsburgh"}]}],"member":"320","published-online":{"date-parts":[[2008,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Usenix Annual Technical Conference","author":"Agrawal N.","year":"2008"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872822"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1243418.1243429"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/645484.656545"},{"key":"e_1_2_1_5_1","volume-title":"CIDR","author":"Diao Y.","year":"2007"},{"key":"e_1_2_1_6_1","volume-title":"USENIX OSDI","author":"Douglis F.","year":"1994"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/378580.378687"},{"key":"e_1_2_1_8_1","unstructured":"M. Hachman. New Samsung notebook replaces hard drive with flash. http:\/\/www.extremetech.com\/article2\/0 1558 1966644 00.asp May 2006.  M. Hachman. New Samsung notebook replaces hard drive with flash. http:\/\/www.extremetech.com\/article2\/0 1558 1966644 00.asp May 2006."},{"key":"e_1_2_1_9_1","unstructured":"Intel-Corporation. Understanding the Flash Translation Layer (FTL) specification. www.embeddedfreebsd.org\/Documents\/Intel-FTL.pdf 1998.  Intel-Corporation. Understanding the Flash Translation Layer (FTL) specification. www.embeddedfreebsd.org\/Documents\/Intel-FTL.pdf 1998."},{"key":"e_1_2_1_10_1","volume-title":"VLDB","author":"Jermaine C.","year":"1999"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007603"},{"key":"e_1_2_1_12_1","volume-title":"Usenix FAST","author":"Kim H.","year":"2008"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1289927.1289956"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247488"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1182807.1182827"},{"key":"e_1_2_1_16_1","unstructured":"P. Miller. SimpleTech announces 512GB and 256GB 3.5-inch SSD drives. http:\/\/www.engadget.com\/2007\/04\/18\/ April 2007.  P. Miller. SimpleTech announces 512GB and 256GB 3.5-inch SSD drives. http:\/\/www.engadget.com\/2007\/04\/18\/ April 2007."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1236360.1236412"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/93605.98746"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/78973.78977"},{"key":"e_1_2_1_21_1","unstructured":"D. Reinsel and J. Janukowicz. Datacenter SSDs: Solid footing for growth. Samsung white paper. www.samsung.com\/global\/business\/semiconductor\/products\/flash\/ssd\/pdf\/datacenter_ssds.pdf January 2008.  D. Reinsel and J. Janukowicz. Datacenter SSDs: Solid footing for growth. Samsung white paper. www.samsung.com\/global\/business\/semiconductor\/products\/flash\/ssd\/pdf\/datacenter_ssds.pdf January 2008."},{"key":"e_1_2_1_22_1","unstructured":"SyCard. CF extend 180 CompactFlash Flexible Extender Card. http:\/\/www.sycard.com\/cfextl 80.html 2008.  SyCard. CF extend 180 CompactFlash Flexible Extender Card. http:\/\/www.sycard.com\/cfextl 80.html 2008."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3147.3165"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/23002.23003"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/384192.384193"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/956676.956679"},{"key":"e_1_2_1_27_1","unstructured":"Yahoo!-Finance. Zeus-IOPS solid state drives surge to 512GB. http:\/\/biz.yahoo.com\/pz\/070418\/117663.html April 2007.  Yahoo!-Finance. Zeus-IOPS solid state drives surge to 512GB. http:\/\/biz.yahoo.com\/pz\/070418\/117663.html April 2007."},{"key":"e_1_2_1_28_1","volume-title":"USENIX FAST","author":"Zeinalipour-Yazti D.","year":"2005"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1453856.1453961","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:00:52Z","timestamp":1672225252000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1453856.1453961"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,8]]}},"alternative-id":["10.14778\/1453856.1453961"],"URL":"https:\/\/doi.org\/10.14778\/1453856.1453961","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2008,8]]}}}