{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T23:37:56Z","timestamp":1768520276466,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":84,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,9,24]],"date-time":"2017-09-24T00:00:00Z","timestamp":1506211200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1336580, CNS-1350499, CNS-1526304, CNS-1405959, CNS-1563956"],"award-info":[{"award-number":["CCF-1336580, CNS-1350499, CNS-1526304, CNS-1405959, CNS-1563956"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,9,24]]},"DOI":"10.1145\/3127479.3131622","type":"proceedings-article","created":{"date-parts":[[2017,9,27]],"date-time":"2017-09-27T12:34:00Z","timestamp":1506515640000},"page":"295-308","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["PBSE"],"prefix":"10.1145","author":[{"given":"Riza O.","family":"Suminto","sequence":"first","affiliation":[{"name":"University of Chicago"}]},{"given":"Cesar A.","family":"Stuardo","sequence":"additional","affiliation":[{"name":"University of Chicago"}]},{"given":"Alexandra","family":"Clark","sequence":"additional","affiliation":[{"name":"University of Chicago"}]},{"given":"Huan","family":"Ke","sequence":"additional","affiliation":[{"name":"University of Chicago"}]},{"given":"Tanakorn","family":"Leesatapornwongsa","sequence":"additional","affiliation":[{"name":"University of Chicago"}]},{"given":"Bo","family":"Fu","sequence":"additional","affiliation":[{"name":"University of Chicago"}]},{"given":"Daniar H.","family":"Kurniawan","sequence":"additional","affiliation":[{"name":"Bandung Institute of Technology"}]},{"given":"Vincentius","family":"Martin","sequence":"additional","affiliation":[{"name":"Surya University"}]},{"given":"Maheswara Rao G.","family":"Uma","sequence":"additional","affiliation":[{"name":"Intel Corp."}]},{"given":"Haryadi S.","family":"Gunawi","sequence":"additional","affiliation":[{"name":"University of Chicago"}]}],"member":"320","published-online":{"date-parts":[[2017,9,24]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Personal Communication from datacenter operators of University of Chicago IT Services.  Personal Communication from datacenter operators of University of Chicago IT Services."},{"key":"e_1_3_2_1_2_1","unstructured":"Personal Communication from Kevin Harms of Argonne National Laboratory.  Personal Communication from Kevin Harms of Argonne National Laboratory."},{"key":"e_1_3_2_1_3_1","unstructured":"Personal Communication from Robert Ricci of University of Utah.  Personal Communication from Robert Ricci of University of Utah."},{"key":"e_1_3_2_1_4_1","unstructured":"Personal Communication from Gary Grider and Parks Fields of Los Alamos National Laboratory.  Personal Communication from Gary Grider and Parks Fields of Los Alamos National Laboratory."},{"key":"e_1_3_2_1_5_1","unstructured":"Personal Communication from Xing Lin of NetApp.  Personal Communication from Xing Lin of NetApp."},{"key":"e_1_3_2_1_6_1","unstructured":"Personal Communication from H. Birali Runesha (Director of Research Computing Center University of Chicago).  Personal Communication from H. Birali Runesha (Director of Research Computing Center University of Chicago)."},{"key":"e_1_3_2_1_7_1","unstructured":"Personal Communication from Andree Jacobson (Chief Information Officer at New Mexico Consortium).  Personal Communication from Andree Jacobson (Chief Information Officer at New Mexico Consortium)."},{"key":"e_1_3_2_1_8_1","unstructured":"Personal Communication from Dhruba Borthakur of Facebook.  Personal Communication from Dhruba Borthakur of Facebook."},{"key":"e_1_3_2_1_9_1","unstructured":"Apache Flume. http:\/\/flume.apache.org\/.  Apache Flume. http:\/\/flume.apache.org\/."},{"key":"e_1_3_2_1_10_1","unstructured":"Apache Giraph. http:\/\/giraph.apache.org\/.  Apache Giraph. http:\/\/giraph.apache.org\/."},{"key":"e_1_3_2_1_11_1","unstructured":"Apache Hadoop. http:\/\/hadoop.apache.org.  Apache Hadoop. http:\/\/hadoop.apache.org."},{"key":"e_1_3_2_1_12_1","unstructured":"Apache S4. http:\/\/incubator.apache.org\/s4\/.  Apache S4. http:\/\/incubator.apache.org\/s4\/."},{"key":"e_1_3_2_1_13_1","unstructured":"Apache Spark. http:\/\/spark.apache.org\/.  Apache Spark. http:\/\/spark.apache.org\/."},{"key":"e_1_3_2_1_14_1","unstructured":"Chameleon. https:\/\/www.chameleoncloud.org.  Chameleon. https:\/\/www.chameleoncloud.org."},{"key":"e_1_3_2_1_15_1","unstructured":"Emulab Network Emulation Testbed. http:\/\/www.emulab.net.  Emulab Network Emulation Testbed. http:\/\/www.emulab.net."},{"key":"e_1_3_2_1_16_1","unstructured":"HDFS-8009: Signal congestion on the DataNode. https:\/\/issues.apache.org\/jira\/browse\/HDFS-8009.  HDFS-8009: Signal congestion on the DataNode. https:\/\/issues.apache.org\/jira\/browse\/HDFS-8009."},{"key":"e_1_3_2_1_17_1","unstructured":"Introduction to HDFS Erasure Coding in Apache Hadoop. http:\/\/blog.cloudera.com\/blog\/2015\/09\/introduction-to-hdfs-erasure-coding-in-apache-hadoop\/.  Introduction to HDFS Erasure Coding in Apache Hadoop. http:\/\/blog.cloudera.com\/blog\/2015\/09\/introduction-to-hdfs-erasure-coding-in-apache-hadoop\/."},{"key":"e_1_3_2_1_18_1","unstructured":"Parallel Reconfigurable Observational Environment (PRObE). http:\/\/www.nmc-probe.org.  Parallel Reconfigurable Observational Environment (PRObE). http:\/\/www.nmc-probe.org."},{"key":"e_1_3_2_1_19_1","unstructured":"QFS. https:\/\/quantcast.github.io\/qfs\/.  QFS. https:\/\/quantcast.github.io\/qfs\/."},{"key":"e_1_3_2_1_20_1","unstructured":"Resource Localization in Yarn: Deep dive. http:\/\/hortonworks.com\/blog\/resource-localization-in-yarn-deep-dive\/.  Resource Localization in Yarn: Deep dive. http:\/\/hortonworks.com\/blog\/resource-localization-in-yarn-deep-dive\/."},{"key":"e_1_3_2_1_21_1","unstructured":"RIVER\n  : A Research Infrastructure to Explore Volatility Energy-Efficiency and Resilience. http:\/\/river.cs.uchicago.edu.  RIVER: A Research Infrastructure to Explore Volatility Energy-Efficiency and Resilience. http:\/\/river.cs.uchicago.edu."},{"key":"e_1_3_2_1_22_1","unstructured":"Saving capacity with HDFS RAID. https:\/\/code.facebook.com\/posts\/536638663113101\/saving-capacity-with-hdfs-raid\/.  Saving capacity with HDFS RAID. https:\/\/code.facebook.com\/posts\/536638663113101\/saving-capacity-with-hdfs-raid\/."},{"key":"e_1_3_2_1_23_1","unstructured":"Speculative tasks in Hadoop. http:\/\/stackoverflow.com\/questions\/34342546\/speculative-tasks-in-hadoop.  Speculative tasks in Hadoop. http:\/\/stackoverflow.com\/questions\/34342546\/speculative-tasks-in-hadoop."},{"key":"e_1_3_2_1_24_1","unstructured":"Statistical Workload Injector for MapReduce (SWIM). https:\/\/github.com\/SWIMProjectUCB\/SWIM\/wiki.  Statistical Workload Injector for MapReduce (SWIM). https:\/\/github.com\/SWIMProjectUCB\/SWIM\/wiki."},{"key":"e_1_3_2_1_25_1","unstructured":"Support 'hedged' reads in DFSClient. https:\/\/issues.apache.org\/jira\/browse\/HDFS%2D5776.  Support 'hedged' reads in DFSClient. https:\/\/issues.apache.org\/jira\/browse\/HDFS%2D5776."},{"key":"e_1_3_2_1_26_1","unstructured":"Worlds First 1 000-Processor Chip. https:\/\/www.ucdavis.edu\/news\/worlds-first-1000-processor-chip\/.  Worlds First 1 000-Processor Chip. https:\/\/www.ucdavis.edu\/news\/worlds-first-1000-processor-chip\/."},{"key":"e_1_3_2_1_27_1","volume-title":"Xiaoqiang Zheng. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI)","author":"Abadi Martin","year":"2016","unstructured":"Martin Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , , and Xiaoqiang Zheng. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI) , 2016 . Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, , and Xiaoqiang Zheng. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), 2016."},{"key":"e_1_3_2_1_28_1","volume-title":"Athicha Muthitacharoen. Performance Debugging for Distributed Systems of Black Boxes. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP)","author":"Aguilera Marcos K.","year":"2003","unstructured":"Marcos K. Aguilera , Jeffrey C. Mogul , Janet L. Wiener , Patrick Reynolds , and Athicha Muthitacharoen. Performance Debugging for Distributed Systems of Black Boxes. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) , 2003 . Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, and Athicha Muthitacharoen. Performance Debugging for Distributed Systems of Black Boxes. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP), 2003."},{"key":"e_1_3_2_1_29_1","volume-title":"Ed Harris. Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters. In Proceedings of the 2011 EuroSys Conference (EuroSys)","author":"Ananthanarayanan Ganesh","year":"2011","unstructured":"Ganesh Ananthanarayanan , Sameer Agarwal , Srikanth Kandula , Albert Greenberg , Ion Stoica , Duke Harlan , and Ed Harris. Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters. In Proceedings of the 2011 EuroSys Conference (EuroSys) , 2011 . Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert Greenberg, Ion Stoica, Duke Harlan, and Ed Harris. Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters. In Proceedings of the 2011 EuroSys Conference (EuroSys), 2011."},{"key":"e_1_3_2_1_30_1","volume-title":"Ion Stoica. Effective Straggler Mitigation: Attack of the Clones. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI)","author":"Ananthanarayanan Ganesh","year":"2013","unstructured":"Ganesh Ananthanarayanan , Ali Ghodsi , Scott Shenker , and Ion Stoica. Effective Straggler Mitigation: Attack of the Clones. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI) , 2013 . Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. Effective Straggler Mitigation: Attack of the Clones. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI), 2013."},{"key":"e_1_3_2_1_31_1","volume-title":"Ion Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI)","author":"Ananthanarayanan Ganesh","year":"2012","unstructured":"Ganesh Ananthanarayanan , Ali Ghodsi , Andrew Wang , Dhruba Borthakur , Srikanth Kandula , Scott Shenker , and Ion Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI) , 2012 . Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, and Ion Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI), 2012."},{"key":"e_1_3_2_1_32_1","volume-title":"Minlan Yu. GRASS: Trimming Stragglers in Approximation Analytics. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI)","author":"Ananthanarayanan Ganesh","year":"2014","unstructured":"Ganesh Ananthanarayanan , Michael Chien-Chun Hung , Xiaoqi Ren , Ion Stoica , Adam Wierman , and Minlan Yu. GRASS: Trimming Stragglers in Approximation Analytics. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI) , 2014 . Ganesh Ananthanarayanan, Michael Chien-Chun Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, and Minlan Yu. GRASS: Trimming Stragglers in Approximation Analytics. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI), 2014."},{"key":"e_1_3_2_1_33_1","volume-title":"Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI)","author":"Ananthanarayanan Ganesh","year":"2010","unstructured":"Ganesh Ananthanarayanan , Srikanth Kandula , Albert Greenberg , Ion Stoica , Yi Lu , Bikas Saha , and Edward Harris . Reining in the Outliers in Map-Reduce Clusters using Mantri . In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI) , 2010 . Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. Reining in the Outliers in Map-Reduce Clusters using Mantri. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI), 2010."},{"key":"e_1_3_2_1_34_1","unstructured":"Michael Armbrust Armando Fox Rean Griffith Anthony D. Joseph Randy H. Katz Andrew Konwinski Gunho Lee David A. Patterson Ariel Rabkin Ion Stoica and Matei Zaharia. Above the Clouds: A Berkeley View of Cloud Computing. http:\/\/www.eecs.berkeley.edu\/Pubs\/TechRpts\/2009\/EECS-2009-28.pdf.  Michael Armbrust Armando Fox Rean Griffith Anthony D. Joseph Randy H. Katz Andrew Konwinski Gunho Lee David A. Patterson Ariel Rabkin Ion Stoica and Matei Zaharia. Above the Clouds: A Berkeley View of Cloud Computing. http:\/\/www.eecs.berkeley.edu\/Pubs\/TechRpts\/2009\/EECS-2009-28.pdf."},{"key":"e_1_3_2_1_35_1","volume-title":"River: Making the Fast Case Common. In The 1999 Workshop on Input\/Output in Parallel and Distributed Systems (IOPADS)","author":"Arpaci-Dusseau Remzi H.","year":"1999","unstructured":"Remzi H. Arpaci-Dusseau , Eric Anderson , Noah Treuhaft , David E. Culler , Joseph M. Hellerstein , Dave Patterson , and Kathy Yelick . Cluster I\/O with River: Making the Fast Case Common. In The 1999 Workshop on Input\/Output in Parallel and Distributed Systems (IOPADS) , 1999 . Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft, David E. Culler, Joseph M. Hellerstein, Dave Patterson, and Kathy Yelick. Cluster I\/O with River: Making the Fast Case Common. In The 1999 Workshop on Input\/Output in Parallel and Distributed Systems (IOPADS), 1999."},{"key":"e_1_3_2_1_36_1","volume-title":"July","author":"Bailis Peter","year":"2014","unstructured":"Peter Bailis and Kyle Kingsbury . The Network is Reliable. An informal survey of real-world communications failures. ACM Queue, 12(7) , July 2014 . Peter Bailis and Kyle Kingsbury. The Network is Reliable. An informal survey of real-world communications failures. ACM Queue, 12(7), July 2014."},{"key":"e_1_3_2_1_37_1","volume-title":"Jiri Schindler. An Analysis of Latent Sector Errors in Disk Drives. In Proceedings of the 2007 ACM Conference on Measurement and Modeling of Computer Systems (SIGMETRICS)","author":"Bairavasundaram Lakshmi N.","year":"2007","unstructured":"Lakshmi N. Bairavasundaram , Garth R. Goodson , Shankar Pasupathy , and Jiri Schindler. An Analysis of Latent Sector Errors in Disk Drives. In Proceedings of the 2007 ACM Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) , 2007 . Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, and Jiri Schindler. An Analysis of Latent Sector Errors in Disk Drives. In Proceedings of the 2007 ACM Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2007."},{"key":"e_1_3_2_1_38_1","volume-title":"Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI)","author":"Barham Paul","year":"2004","unstructured":"Paul Barham , Austin Donnelly , Rebecca Isaacs , and Richar Mortier . Using Magpie for request extraction and workload modelling . In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI) , 2004 . Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richar Mortier. Using Magpie for request extraction and workload modelling. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004."},{"key":"e_1_3_2_1_39_1","volume-title":"Jingren Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB)","author":"Chaiken Ronnie","year":"2008","unstructured":"Ronnie Chaiken , Bob Jenkins , Paul Larson , Bill Ramsey , Darren Shakib , Simon Weaver , and Jingren Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB) , 2008 . Ronnie Chaiken, Bob Jenkins, Paul Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB), 2008."},{"key":"e_1_3_2_1_40_1","volume-title":"Eric Brewer. Path-Based Failure and Evolution Management. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI)","author":"Chen Mike Y.","year":"2004","unstructured":"Mike Y. Chen , Anthony Accardi , Emre Kiciman , Dave Patterson , Armando Fox , and Eric Brewer. Path-Based Failure and Evolution Management. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI) , 2004 . Mike Y. Chen, Anthony Accardi, Emre Kiciman, Dave Patterson, Armando Fox, and Eric Brewer. Path-Based Failure and Evolution Management. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI), 2004."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367519"},{"key":"e_1_3_2_1_42_1","volume-title":"February","author":"Dean Jeffrey","year":"2013","unstructured":"Jeffrey Dean and Luiz Andr Barroso . The Tail at Scale. Communications of the ACM, 56(2) , February 2013 . Jeffrey Dean and Luiz Andr Barroso. The Tail at Scale. Communications of the ACM, 56(2), February 2013."},{"key":"e_1_3_2_1_43_1","volume-title":"Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI)","author":"Jeffrey","year":"2004","unstructured":"Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI) , 2004 . Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523627"},{"key":"e_1_3_2_1_45_1","volume-title":"Ion Stoica. X-Trace: A Pervasive Network Tracing Framework. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI)","author":"Fonseca Rodrigo","year":"2007","unstructured":"Rodrigo Fonseca , George Porter , Randy H. Katz , Scott Shenker , and Ion Stoica. X-Trace: A Pervasive Network Tracing Framework. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI) , 2007 . Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica. X-Trace: A Pervasive Network Tracing Framework. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), 2007."},{"key":"e_1_3_2_1_46_1","volume-title":"Proceedings of the 2013 USENIX Annual Technical Conference (ATC)","author":"Gandhi Rohan","year":"2013","unstructured":"Rohan Gandhi , Di Xie , and Y. Charlie Hu . PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters . In Proceedings of the 2013 USENIX Annual Technical Conference (ATC) , 2013 . Rohan Gandhi, Di Xie, and Y. Charlie Hu. PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters. In Proceedings of the 2013 USENIX Annual Technical Conference (ATC), 2013."},{"key":"e_1_3_2_1_47_1","volume-title":"June","author":"Gibson Garth","year":"2013","unstructured":"Garth Gibson , Gary Grider , Andree Jacobson , and Wyatt Lloyd . Probe: A thousand-node experimental cluster for computer systems research. USENIX;login:, 38(3) , June 2013 . Garth Gibson, Gary Grider, Andree Jacobson, and Wyatt Lloyd. Probe: A thousand-node experimental cluster for computer systems research. USENIX;login:, 38(3), June 2013."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2670979.2670986"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987583"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043556.2043582"},{"key":"e_1_3_2_1_51_1","volume-title":"Proceedings of the 14th USENIX Symposium on File and Storage Technologies (FAST)","author":"Hao Mingzhe","year":"2016","unstructured":"Mingzhe Hao , Gokul Soundararajan , Deepak Kenchammana-Hosekote , Andrew A. Chien , and Haryadi S. Gunawi . The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments . In Proceedings of the 14th USENIX Symposium on File and Storage Technologies (FAST) , 2016 . Mingzhe Hao, Gokul Soundararajan, Deepak Kenchammana-Hosekote, Andrew A. Chien, and Haryadi S. Gunawi. The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments. In Proceedings of the 14th USENIX Symposium on File and Storage Technologies (FAST), 2016."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987554"},{"key":"e_1_3_2_1_53_1","volume-title":"Sergey Yekhanin. Erasure Coding in Windows Azure Storage. In Proceedings of the 2012 USENIX Annual Technical Conference (ATC)","author":"Huang Cheng","year":"2012","unstructured":"Cheng Huang , Huseyin Simitci , Yikang Xu , Aaron Ogus , Brad Calder , Parikshit Gopalan , Jin Li , and Sergey Yekhanin. Erasure Coding in Windows Azure Storage. In Proceedings of the 2012 USENIX Annual Technical Conference (ATC) , 2012 . Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. Erasure Coding in Windows Azure Storage. In Proceedings of the 2012 USENIX Annual Technical Conference (ATC), 2012."},{"key":"e_1_3_2_1_54_1","volume-title":"Proceedings of the 2007 EuroSys Conference (EuroSys)","author":"Isard Michael","year":"2007","unstructured":"Michael Isard , Mihai Budiu , Yuan Yu , Andrew Birrell , and Dennis Fetterly . Dryad : distributed data-parallel programs from sequential building blocks . In Proceedings of the 2007 EuroSys Conference (EuroSys) , 2007 . Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2007 EuroSys Conference (EuroSys), 2007."},{"key":"e_1_3_2_1_55_1","volume-title":"Andrew Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP)","author":"Isard Michael","year":"2009","unstructured":"Michael Isard , Vijayan Prabhakaran , Jon Currey , Udi Wieder , Kunal Talwar , and Andrew Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP) , 2009 . Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), 2009."},{"key":"e_1_3_2_1_56_1","volume-title":"Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI)","author":"Leesatapornwongsa Tanakorn","year":"2014","unstructured":"Tanakorn Leesatapornwongsa , Mingzhe Hao , Pallavi Joshi , Jeffrey F. Lukman , and Haryadi S. Gunawi . SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems . In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI) , 2014 . Tanakorn Leesatapornwongsa, Mingzhe Hao, Pallavi Joshi, Jeffrey F. Lukman, and Haryadi S. Gunawi. SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872362.2872374"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3102980.3102985"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2670979.2670988"},{"key":"e_1_3_2_1_60_1","volume-title":"Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI)","author":"Lion David","year":"2016","unstructured":"David Lion , Adrian Chiu , Hailong Sun , Xin Zhuang , Nikola Grcevski , and Ding Yuan . Dont Get Caught in the Cold , Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI) , 2016 . David Lion, Adrian Chiu, Hailong Sun, Xin Zhuang, Nikola Grcevski, and Ding Yuan. Dont Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), 2016."},{"key":"e_1_3_2_1_61_1","volume-title":"Madanlal Musuvathi. Retro: Targeted Resource Management in Multi-tenant Distributed Systems. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI)","author":"Mace Jonathan","year":"2015","unstructured":"Jonathan Mace , Peter Bodik , Rodrigo Fonseca , and Madanlal Musuvathi. Retro: Targeted Resource Management in Multi-tenant Distributed Systems. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI) , 2015 . Jonathan Mace, Peter Bodik, Rodrigo Fonseca, and Madanlal Musuvathi. Retro: Targeted Resource Management in Multi-tenant Distributed Systems. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI), 2015."},{"key":"e_1_3_2_1_62_1","volume-title":"Rodrigo Fonseca. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP)","author":"Mace Jonathan","year":"2015","unstructured":"Jonathan Mace , Ryan Roelke , and Rodrigo Fonseca. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP) , 2015 . Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), 2015."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807223"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447919"},{"key":"e_1_3_2_1_65_1","volume-title":"Martin Abadi. Naiad: A Timely Dataflow System. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP)","author":"Murray Derek G.","year":"2013","unstructured":"Derek G. Murray , Frank McSherry , Rebecca Isaacs , Michael Isard , Paul Barham , and Martin Abadi. Naiad: A Timely Dataflow System. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP) , 2013 . Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martin Abadi. Naiad: A Timely Dataflow System. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), 2013."},{"key":"e_1_3_2_1_66_1","volume-title":"Using Burstable Instances in the Public Cloud: When and How?","author":"Nasiriani Neda","year":"2016","unstructured":"Neda Nasiriani , Cheng Wang , George Kesidis , and Bhuvan Urgaonkar . Using Burstable Instances in the Public Cloud: When and How? 2016 . Neda Nasiriani, Cheng Wang, George Kesidis, and Bhuvan Urgaonkar. Using Burstable Instances in the Public Cloud: When and How? 2016."},{"key":"e_1_3_2_1_67_1","volume-title":"Jim Kelly. The Quantcast File System. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB)","author":"Ovsiannikov Michael","year":"2013","unstructured":"Michael Ovsiannikov , Silvius Rus , Damian Reeves , Paul Sutter , Sriram Rao , and Jim Kelly. The Quantcast File System. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB) , 2013 . Michael Ovsiannikov, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. The Quantcast File System. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB), 2013."},{"key":"e_1_3_2_1_68_1","volume-title":"Dhruba Borthakur. XORing Elephants: Novel Erasure Codes for Big Data. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB)","author":"Sathiamoorthy Maheswaran","year":"2013","unstructured":"Maheswaran Sathiamoorthy , Megasthenis Asteris , Dimitris Papailiopoulos , Alexandros G. Dimakis , Ramkumar Vadali , Scott Chen , and Dhruba Borthakur. XORing Elephants: Novel Erasure Codes for Big Data. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB) , 2013 . Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. XORing Elephants: Novel Erasure Codes for Big Data. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB), 2013."},{"key":"e_1_3_2_1_69_1","volume-title":"Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI)","author":"Suresh Lalith","year":"2015","unstructured":"Lalith Suresh , Marco Canini , Stefan Schmid , and Anja Feldmann . C3 : Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection . In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI) , 2015 . Lalith Suresh, Marco Canini, Stefan Schmid, and Anja Feldmann. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI), 2015."},{"key":"e_1_3_2_1_70_1","volume-title":"Ion Stoica. The Power of Choice in Data-Aware Cluster Scheduling. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI)","author":"Venkataraman Shivaram","year":"2014","unstructured":"Shivaram Venkataraman , Aurojit Panda , Ganesh Ananthanarayanan , Michael J. Franklin , and Ion Stoica. The Power of Choice in Data-Aware Cluster Scheduling. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI) , 2014 . Shivaram Venkataraman, Aurojit Panda, Ganesh Ananthanarayanan, Michael J. Franklin, and Ion Stoica. The Power of Choice in Data-Aware Cluster Scheduling. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014."},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.2010.5461931"},{"key":"e_1_3_2_1_72_1","volume-title":"Abhijeet Joglekar. An Integrated Experimental Environment for Distributed Systems and Networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI)","author":"White Brian","year":"2002","unstructured":"Brian White , Jay Lepreau , Leigh Stoller , Robert Ricci , Shashi Guruprasad , Mac Newbold , Mike Hibler , Chad Barb , and Abhijeet Joglekar. An Integrated Experimental Environment for Distributed Systems and Networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI) , 2002 . Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. An Integrated Experimental Environment for Distributed Systems and Networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), 2002."},{"key":"e_1_3_2_1_73_1","volume-title":"Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI)","author":"Wu Zhe","year":"2015","unstructured":"Zhe Wu , Curtis Yu , and Harsha V. Madhyastha . CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services . In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI) , 2015 . Zhe Wu, Curtis Yu, and Harsha V. Madhyastha. CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI), 2015."},{"key":"e_1_3_2_1_74_1","volume-title":"Xia and Andrew A. Chien. RobuSTore: Robust Performance for Distributed Storage Systems. In Proceedings of the 2007 Conference on High Performance Networking and Computing (SC)","author":"Huaxia","year":"2007","unstructured":"Huaxia Xia and Andrew A. Chien. RobuSTore: Robust Performance for Distributed Storage Systems. In Proceedings of the 2007 Conference on High Performance Networking and Computing (SC) , 2007 . Huaxia Xia and Andrew A. Chien. RobuSTore: Robust Performance for Distributed Storage Systems. In Proceedings of the 2007 Conference on High Performance Networking and Computing (SC), 2007."},{"key":"e_1_3_2_1_75_1","volume-title":"Shankar Pasupathy. Do Not Blame Users for Misconfigurations. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP)","author":"Xu Tianyin","year":"2013","unstructured":"Tianyin Xu , Jiaqi Zhang , Peng Huang , Jing Zheng , Tianwei Sheng , Ding Yuan , Yuanyuan Zhou , and Shankar Pasupathy. Do Not Blame Users for Misconfigurations. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP) , 2013 . Tianyin Xu, Jiaqi Zhang, Peng Huang, Jing Zheng, Tianwei Sheng, Ding Yuan, Yuanyuan Zhou, and Shankar Pasupathy. Do Not Blame Users for Misconfigurations. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), 2013."},{"key":"e_1_3_2_1_76_1","volume-title":"Michael Bailey. Bobtail: Avoiding Long Tails in the Cloud. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI)","author":"Xu Yunjing","year":"2013","unstructured":"Yunjing Xu , Zachary Musgrave , Brian Noble , and Michael Bailey. Bobtail: Avoiding Long Tails in the Cloud. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI) , 2013 . Yunjing Xu, Zachary Musgrave, Brian Noble, and Michael Bailey. Bobtail: Avoiding Long Tails in the Cloud. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI), 2013."},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/2670979.2671005"},{"key":"e_1_3_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.5555\/3129633.3129636"},{"key":"e_1_3_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735508.2735511"},{"key":"e_1_3_2_1_80_1","unstructured":"Ding Yuan Yu Luo Xin Zhuang Guilherme Renna Rodrigues Xu Zhao Yongle Zhang Pranay U. Jain and Michael Stumm. Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI) 2014.  Ding Yuan Yu Luo Xin Zhuang Guilherme Renna Rodrigues Xu Zhao Yongle Zhang Pranay U. Jain and Michael Stumm. Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI) 2014."},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/1755913.1755940"},{"key":"e_1_3_2_1_82_1","volume-title":"Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI)","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia , Mosharaf Chowdhury , Tathagata Das , Ankur Dave , Justin Ma , Murphy McCauley , Michael J. Franklin , Scott Shenker , and Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI) , 2012 . Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI), 2012."},{"key":"e_1_3_2_1_83_1","volume-title":"Ion Stoica. Improving MapReduce Performance in Heterogeneous Environments. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI)","author":"Zaharia Matei","year":"2008","unstructured":"Matei Zaharia , Andy Konwinski , Anthony D. Joseph , Randy Katz , and Ion Stoica. Improving MapReduce Performance in Heterogeneous Environments. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI) , 2008 . Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. Improving MapReduce Performance in Heterogeneous Environments. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), 2008."},{"key":"e_1_3_2_1_84_1","volume-title":"Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI)","author":"Zhai Ennan","year":"2014","unstructured":"Ennan Zhai , Ruichuan Chen , David Isaac Wolinsky , and Bryan Ford . Heading Off Correlated Failures through Independence-as-a-Service . In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI) , 2014 . Ennan Zhai, Ruichuan Chen, David Isaac Wolinsky, and Bryan Ford. Heading Off Correlated Failures through Independence-as-a-Service. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014."}],"event":{"name":"SoCC '17: ACM Symposium on Cloud Computing","location":"Santa Clara California","acronym":"SoCC '17","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the 2017 Symposium on Cloud Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3127479.3131622","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3127479.3131622","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3127479.3131622","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:30:29Z","timestamp":1750217429000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3127479.3131622"}},"subtitle":["a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks"],"short-title":[],"issued":{"date-parts":[[2017,9,24]]},"references-count":84,"alternative-id":["10.1145\/3127479.3131622","10.1145\/3127479"],"URL":"https:\/\/doi.org\/10.1145\/3127479.3131622","relation":{},"subject":[],"published":{"date-parts":[[2017,9,24]]},"assertion":[{"value":"2017-09-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}