{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T22:16:04Z","timestamp":1765232164169},"reference-count":21,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2012,8]]},"abstract":"<jats:p>MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write such applications and execute them over a cluster of machines, to achieve low latency and high scalability?<\/jats:p>\n          <jats:p>In this paper we report on our investigation of this question, as carried out at Kosmix and WalmartLabs. We describe MapUpdate, a framework like MapReduce, but specifically developed for fast data. We describe Muppet, our implementation of MapUpdate. Throughout the description we highlight the key challenges, argue why MapReduce is not well suited to address them, and briefly describe our current solutions. Finally, we describe our experience and lessons learned with Muppet, which has been used extensively at Kosmix and WalmartLabs to power a broad range of applications in social media and e-commerce.<\/jats:p>","DOI":"10.14778\/2367502.2367520","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1814-1825","source":"Crossref","is-referenced-by-count":91,"title":["Muppet"],"prefix":"10.14778","volume":"5","author":[{"given":"Wang","family":"Lam","sequence":"first","affiliation":[{"name":"WalmartLabs"}]},{"given":"Lu","family":"Liu","sequence":"additional","affiliation":[{"name":"WalmartLabs"}]},{"given":"Sts","family":"Prasad","sequence":"additional","affiliation":[{"name":"WalmartLabs"}]},{"given":"Anand","family":"Rajaraman","sequence":"additional","affiliation":[{"name":"WalmartLabs"}]},{"given":"Zoheb","family":"Vacheri","sequence":"additional","affiliation":[{"name":"WalmartLabs"}]},{"given":"AnHai","family":"Doan","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]}],"member":"320","published-online":{"date-parts":[[2012,8]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"277","volume-title":"CIDR","author":"Abadi D. J.","year":"2005","unstructured":"D. J. Abadi , Y. Ahmad , M. Balazinska , U. \u00c7etintemel , M. Cherniack , J.-H. Hwang , W. Lindner , A. S. Maskey , A. Rasin , E. Ryvkina , N. Tatbul , Y. Xing , and S. Zdonik . The Design of the Borealis Stream Processing Engine . In CIDR , pages 277 -- 289 , 2005 . D. J. Abadi, Y. Ahmad, M. Balazinska, U. \u00c7etintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In CIDR, pages 277--289, 2005."},{"key":"e_1_2_1_2_1","unstructured":"AWS Case Study: foursquare. http:\/\/aws.amazon.com\/solutions\/case-studies\/foursquare\/.  AWS Case Study: foursquare. http:\/\/aws.amazon.com\/solutions\/case-studies\/foursquare\/."},{"key":"e_1_2_1_3_1","first-page":"1","volume-title":"PODS","author":"Babcock B.","year":"2002","unstructured":"B. Babcock , S. Babu , M. Datar , R. Motwani , and J. Widom . Models and Issues in Data Stream Systems . In PODS , pages 1 -- 16 , 2002 . 10.1145\/543613.543615 B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream Systems. In PODS, pages 1--16, 2002. 10.1145\/543613.543615"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2038916.2038923"},{"key":"e_1_2_1_5_1","first-page":"269","volume-title":"CIDR","author":"Chandrasekaran S.","year":"2003","unstructured":"S. Chandrasekaran , O. Cooper , A. Deshpande , M. J. Franklin , J. M. Hellerstein , W. Hong , S. Krishnamurthy , S. Madden , V. Raman , F. Reiss , and M. Shah . TelegraphCQ: Continuous Dataflow Processing for an Uncertain World . In CIDR , pages 269 -- 280 , 2003 . 10.1145\/872757.872857 S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In CIDR, pages 269--280, 2003. 10.1145\/872757.872857"},{"key":"e_1_2_1_6_1","unstructured":"CloudScale. http:\/\/www.cloudscale.com\/.  CloudScale. http:\/\/www.cloudscale.com\/."},{"key":"e_1_2_1_7_1","first-page":"313","volume-title":"NSDI","author":"Condie T.","year":"2010","unstructured":"T. Condie , N. Conway , P. Alvaro , J. M. Hellerstein , K. Elmeleegy , and R. Sears . MapReduce Online . In NSDI , pages 313 -- 327 , 2010 . T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce Online. In NSDI, pages 313--327, 2010."},{"key":"e_1_2_1_8_1","first-page":"137","volume-title":"OSDI","author":"Dean J.","year":"2004","unstructured":"J. Dean and S. Ghemawat . MapReduce: Simplified Data Processing on Large Clusters . In OSDI , pages 137 -- 150 , 2004 . J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004."},{"key":"e_1_2_1_9_1","unstructured":"Facebook Inc. Amendment No. 8 to Form S-1 Registration Statement Under The Securities Act of 1933. http:\/\/sec.gov\/Archives\/edgar\/data\/1326801\/000119312512235588\/d287954ds1a.htm 2012.  Facebook Inc. Amendment No. 8 to Form S-1 Registration Statement Under The Securities Act of 1933. http:\/\/sec.gov\/Archives\/edgar\/data\/1326801\/000119312512235588\/d287954ds1a.htm 2012."},{"key":"e_1_2_1_10_1","volume-title":"Data Stream Management","author":"Garofalakis M.","year":"2009","unstructured":"M. Garofalakis , J. Gehrke , and R. Rastogi , editors . Data Stream Management . Springer , 2009 . M. Garofalakis, J. Gehrke, and R. Rastogi, editors. Data Stream Management. Springer, 2009."},{"key":"e_1_2_1_11_1","first-page":"1123","volume-title":"SIGMOD","author":"Gedik B.","year":"2008","unstructured":"B. Gedik , H. Andrade , K.-L. Wu , P. S. Yu , and M. Doo . SPADE: The System S Declarative Stream Processing Engine . In SIGMOD , pages 1123 -- 1134 , 2008 . 10.1145\/1376616.1376729 B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. SPADE: The System S Declarative Stream Processing Engine. In SIGMOD, pages 1123--1134, 2008. 10.1145\/1376616.1376729"},{"key":"e_1_2_1_12_1","unstructured":"Going Social. http:\/\/www.economist.com\/events-conferences\/americas\/information-2012?bclid=1682222098001&bctid=1684182003001. An interview with Twitter CEO Dick Costolo at Ideas Economy: Information.  Going Social. http:\/\/www.economist.com\/events-conferences\/americas\/information-2012?bclid=1682222098001&bctid=1684182003001. An interview with Twitter CEO Dick Costolo at Ideas Economy: Information."},{"key":"e_1_2_1_13_1","unstructured":"K. Group. OpenCL. http:\/\/www.khronos.org\/opencl\/.  K. Group. OpenCL. http:\/\/www.khronos.org\/opencl\/."},{"key":"e_1_2_1_14_1","first-page":"985","volume-title":"SIGMOD","author":"Li B.","year":"2011","unstructured":"B. Li , E. Mazur , Y. Diao , A. McGregor , and P. J. Shenoy . A Platform for Scalable One-Pass Analytics using MapReduce . In SIGMOD , pages 985 -- 996 , 2011 . 10.1145\/1989323.1989426 B. Li, E. Mazur, Y. Diao, A. McGregor, and P. J. Shenoy. A Platform for Scalable One-Pass Analytics using MapReduce. In SIGMOD, pages 985--996, 2011. 10.1145\/1989323.1989426"},{"key":"e_1_2_1_15_1","unstructured":"Storm. https:\/\/github.com\/nathanmarz\/storm.  Storm. https:\/\/github.com\/nathanmarz\/storm."},{"key":"e_1_2_1_16_1","first-page":"170","volume-title":"ICDMW","author":"Neumeyer L.","year":"2010","unstructured":"L. Neumeyer , B. Robbins , A. Nair , and A. Kesari . S4: Distributed Stream Computing Platform . In ICDMW , pages 170 -- 177 , 2010 . 10.1109\/ICDMW.2010.172 L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed Stream Computing Platform. In ICDMW, pages 170--177, 2010. 10.1109\/ICDMW.2010.172"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"e_1_2_1_18_1","first-page":"1081","volume-title":"SIGMOD","author":"Olston C.","year":"2011","unstructured":"C. Olston , G. Chiou , L. Chitnis , F. Liu , Y. Han , M. Larsson , A. Neumann , V. B. N. Rao , V. Sankarasubramanian , S. Seth , C. Tian , T. ZiCornell , and X. Wang . Nova: Continuous Pig\/Hadoop Workflows . In SIGMOD , pages 1081 -- 1090 , 2011 . 10.1145\/1989323.1989439 C. Olston, G. Chiou, L. Chitnis, F. Liu, Y. Han, M. Larsson, A. Neumann, V. B. N. Rao, V. Sankarasubramanian, S. Seth, C. Tian, T. ZiCornell, and X. Wang. Nova: Continuous Pig\/Hadoop Workflows. In SIGMOD, pages 1081--1090, 2011. 10.1145\/1989323.1989439"},{"key":"e_1_2_1_19_1","first-page":"25","volume-title":"ICDE","author":"Shah M. A.","year":"2003","unstructured":"M. A. Shah , J. M. Hellerstein , S. Chandrasekaran , and M. J. Franklin . Flux: An Adaptive Partitioning Operator for Continuous Query Systems . In ICDE , pages 25 -- 36 , 2003 . M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. In ICDE, pages 25--36, 2003."},{"key":"e_1_2_1_20_1","volume-title":"HotCloud","author":"Zaharia M.","year":"2012","unstructured":"M. Zaharia , T. Das , H. Li , S. Shenker , and I. Stoica . Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters . In HotCloud , 2012 . M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters. In HotCloud, 2012."},{"issue":"1","key":"e_1_2_1_21_1","first-page":"3","volume":"26","author":"Zdonik S.","year":"2003","unstructured":"S. Zdonik , M. Stonebraker , M. Cherniack , U. \u00c7etintemel , M. Balazinska , and H. Balakrishnan . The Aurora and Medusa Projects. IEEE Data Engineering Bulletin , 26 ( 1 ): 3 -- 10 , 2003 . S. Zdonik, M. Stonebraker, M. Cherniack, U. \u00c7etintemel, M. Balazinska, and H. Balakrishnan. The Aurora and Medusa Projects. IEEE Data Engineering Bulletin, 26(1):3--10, 2003.","journal-title":"The Aurora and Medusa Projects. IEEE Data Engineering Bulletin"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2367502.2367520","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:55:25Z","timestamp":1672224925000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2367502.2367520"}},"subtitle":["MapReduce-style processing of fast data"],"short-title":[],"issued":{"date-parts":[[2012,8]]},"references-count":21,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2012,8]]}},"alternative-id":["10.14778\/2367502.2367520"],"URL":"https:\/\/doi.org\/10.14778\/2367502.2367520","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2012,8]]}}}