{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:10:23Z","timestamp":1759133423145,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":77,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,11,17]],"date-time":"2019-11-17T00:00:00Z","timestamp":1573948800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Amazon Web Services Cloud Credits for Research Award"},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1756013, IIS-1838024, EEC-1801727"],"award-info":[{"award-number":["CCF-1756013, IIS-1838024, EEC-1801727"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,11,17]]},"DOI":"10.1145\/3295500.3356164","type":"proceedings-article","created":{"date-parts":[[2019,11,7]],"date-time":"2019-11-07T19:43:22Z","timestamp":1573155802000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Swift machine learning model serving scheduling"],"prefix":"10.1145","author":[{"given":"Heyang","family":"Qin","sequence":"first","affiliation":[{"name":"University of Nevada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Syed","family":"Zawad","sequence":"additional","affiliation":[{"name":"University of Nevada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanqi","family":"Zhou","sequence":"additional","affiliation":[{"name":"Google Brain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Yang","sequence":"additional","affiliation":[{"name":"University of Nevada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dongfang","family":"Zhao","sequence":"additional","affiliation":[{"name":"University of Nevada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Feng","family":"Yan","sequence":"additional","affiliation":[{"name":"University of Nevada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,11,17]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2019. Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN). https:\/\/github.com\/intel\/mkl-dnn  2019. Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN). https:\/\/github.com\/intel\/mkl-dnn"},{"key":"e_1_3_2_1_2_1","unstructured":"2019. TensorBoard: TensorFlow's Visualization Toolkit. https:\/\/github.com\/tensorflow\/tensorboard  2019. TensorBoard: TensorFlow's Visualization Toolkit. https:\/\/github.com\/tensorflow\/tensorboard"},{"key":"e_1_3_2_1_3_1","unstructured":"2019. TensorFlow XLA. https:\/\/www.tensorflow.org\/performance\/xla\/  2019. TensorFlow XLA. https:\/\/www.tensorflow.org\/performance\/xla\/"},{"key":"e_1_3_2_1_4_1","unstructured":"Emile Aarts and Jan Korst. 1988. Simulated annealing and Boltzmann machines. (1988).  Emile Aarts and Jan Korst. 1988. Simulated annealing and Boltzmann machines. (1988)."},{"volume-title":"Chris Olah","author":"Abadi Mart\u00edn","key":"e_1_3_2_1_5_1","unstructured":"Mart\u00edn Abadi , Ashish Agarwal , Paul Barham , Eugene Brevdo , Zhifeng Chen , Craig Citro , Gregory S. Corrado , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Ian J. Goodfellow , Andrew Harp , Geoffrey Irving , Michael Isard , Yangqing Jia , Rafal J\u00f3zefowicz , Lukasz Kaiser , Manjunath Kudlur , Josh Levenberg , Dan Man\u00e9 , Rajat Monga , Sherry Moore , Derek Gordon Murray , Chris Olah , Mike Schuster , Jonathon Shlens , Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda B. Vi\u00e9gas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR abs\/1603.04467 (2016). arXiv:1603.04467 http:\/\/arxiv.org\/abs\/1603.04467 Mart\u00edn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian J. Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal J\u00f3zefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Man\u00e9, Rajat Monga, Sherry Moore, Derek Gordon Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda B. Vi\u00e9gas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR abs\/1603.04467 (2016). arXiv:1603.04467 http:\/\/arxiv.org\/abs\/1603.04467"},{"key":"e_1_3_2_1_6_1","unstructured":"Jacob D Abernethy Elad Hazan and Alexander Rakhlin. 2009. Competing in the dark: An efficient algorithm for bandit linear optimization. (2009).  Jacob D Abernethy Elad Hazan and Alexander Rakhlin. 2009. Competing in the dark: An efficient algorithm for bandit linear optimization. (2009)."},{"key":"e_1_3_2_1_7_1","first-page":"4","article-title":"CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics","volume":"2","author":"Alipourfard Omid","year":"2017","unstructured":"Omid Alipourfard , Hongqiang Harry Liu , Jianshu Chen , Shivaram Venkataraman , Minlan Yu , and Ming Zhang . 2017 . CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics .. In NSDI , Vol. 2. 4 -- 2 . Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics.. In NSDI, Vol. 2. 4--2.","journal-title":"NSDI"},{"key":"e_1_3_2_1_8_1","volume-title":"International Conference on Machine Learning. 173--182","author":"Amodei Dario","year":"2016","unstructured":"Dario Amodei , Sundaram Ananthanarayanan , Rishita Anubhai , Jingliang Bai , Eric Battenberg , Carl Case , Jared Casper , Bryan Catanzaro , Qiang Cheng , Guoliang Chen , 2016 . Deep speech 2: End-to-end speech recognition in english and mandarin . In International Conference on Machine Learning. 173--182 . Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International Conference on Machine Learning. 173--182."},{"key":"e_1_3_2_1_9_1","volume-title":"Learning to Compose Neural Networks for Question Answering. CoRR abs\/1601.01705","author":"Andreas Jacob","year":"2016","unstructured":"Jacob Andreas , Marcus Rohrbach , Trevor Darrell , and Dan Klein . 2016. Learning to Compose Neural Networks for Question Answering. CoRR abs\/1601.01705 ( 2016 ). arXiv:1601.01705 http:\/\/arxiv.org\/abs\/1601.01705 Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to Compose Neural Networks for Question Answering. CoRR abs\/1601.01705 (2016). arXiv:1601.01705 http:\/\/arxiv.org\/abs\/1601.01705"},{"key":"e_1_3_2_1_10_1","unstructured":"AWSLABS. 2019. Mxnet model server. https:\/\/github.com\/awslabs\/mxnet-model-server.  AWSLABS. 2019. Mxnet model server. https:\/\/github.com\/awslabs\/mxnet-model-server."},{"key":"e_1_3_2_1_11_1","unstructured":"Mohammad Gheshlaghi Azar Remi Munos Mohammad Ghavamzadeh and Hilbert Kappen. 2011. Speedy Q-learning. In Advances in neural information processing systems.  Mohammad Gheshlaghi Azar Remi Munos Mohammad Ghavamzadeh and Hilbert Kappen. 2011. Speedy Q-learning. In Advances in neural information processing systems."},{"volume-title":"Proceedings of the 22Nd Annual International Conference on Supercomputing (ICS '08)","author":"Baskaran Muthu Manikandan","key":"e_1_3_2_1_12_1","unstructured":"Muthu Manikandan Baskaran , Uday Bondhugula , Sriram Krishnamoorthy , J. Ramanujam , Atanas Rountev , and P. Sadayappan . 2008. A Compiler Framework for Optimization of Affine Loop Nests for Gpgpus . In Proceedings of the 22Nd Annual International Conference on Supercomputing (ICS '08) . ACM, New York, NY, USA, 225--234. Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2008. A Compiler Framework for Optimization of Affine Loop Nests for Gpgpus. In Proceedings of the 22Nd Annual International Conference on Supercomputing (ICS '08). ACM, New York, NY, USA, 225--234."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"L\u00e9on Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In COMPSTAT.  L\u00e9on Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In COMPSTAT.","DOI":"10.1007\/978-3-7908-2604-3_16"},{"key":"e_1_3_2_1_14_1","unstructured":"Tianshi Chen Zidong Du Ninghui Sun Jia Wang Chengyong Wu Yunji Chen and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In ASPLOS.  Tianshi Chen Zidong Du Ninghui Sun Jia Wang Chengyong Wu Yunji Chen and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In ASPLOS."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2017.8057206"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988450.2988454"},{"key":"e_1_3_2_1_17_1","volume-title":"Project Adam: Building an Efficient and Scalable Deep Learning Training System. In OSDI.","author":"Chilimbi Trishul","year":"2014","unstructured":"Trishul Chilimbi , Johnson Apacible , Karthik Kalyanaraman , and Yutaka Suzue . 2014 . Project Adam: Building an Efficient and Scalable Deep Learning Training System. In OSDI. Trishul Chilimbi, Johnson Apacible, Karthik Kalyanaraman, and Yutaka Suzue. 2014. Project Adam: Building an Efficient and Scalable Deep Learning Training System. In OSDI."},{"key":"e_1_3_2_1_18_1","volume-title":"Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw , Xin Wang , Giulio Zhou , Michael J. Franklin , Joseph E. Gonzalez , and Ion Stoica . 2017 . Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017 , Boston, MA, USA, March 27--29 , 2017. 613--627. Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, March 27--29, 2017. 613--627."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"G. E. Dahl Dong Yu Li Deng and A. Acero. 2012. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. TASLP (2012).  G. E. Dahl Dong Yu Li Deng and A. Acero. 2012. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. TASLP (2012).","DOI":"10.1109\/TASL.2011.2134090"},{"key":"e_1_3_2_1_20_1","volume-title":"Ng","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean , Greg Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Quoc V. Le , Mark Z. Mao , Marc'Aurelio Ranzato , Andrew W. Senior , Paul A. Tucker , Ke Yang , and Andrew Y . Ng . 2012 . Large Scale Distributed Deep Networks.. In NIPS. Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew W. Senior, Paul A. Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks.. In NIPS."},{"key":"e_1_3_2_1_21_1","volume-title":"Fastest Convergence for Q-learning. arXiv preprint arXiv:1707.03770","author":"Devraj Adithya M","year":"2017","unstructured":"Adithya M Devraj and Sean P Meyn . 2017. Fastest Convergence for Q-learning. arXiv preprint arXiv:1707.03770 ( 2017 ). Adithya M Devraj and Sean P Meyn. 2017. Fastest Convergence for Q-learning. arXiv preprint arXiv:1707.03770 (2017)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007355226281"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3135974.3135993"},{"key":"e_1_3_2_1_24_1","volume-title":"Ng","author":"Hannun Awni Y.","year":"2014","unstructured":"Awni Y. Hannun , Carl Case , Jared Casper , Bryan Catanzaro , Greg Diamos , Erich Elsen , Ryan Prenger , Sanjeev Satheesh , Shubho Sengupta , Adam Coates , and Andrew Y . Ng . 2014 . Deep Speech : Scaling up end-to-end speech recognition. CoRR abs\/1412.5567 (2014). arXiv:1412.5567 http:\/\/arxiv.org\/abs\/1412.5567 Awni Y. Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep Speech: Scaling up end-to-end speech recognition. CoRR abs\/1412.5567 (2014). arXiv:1412.5567 http:\/\/arxiv.org\/abs\/1412.5567"},{"key":"e_1_3_2_1_25_1","volume-title":"Yuxiong He, Sameh Elnikety, Ricardo Bianchini, and Kathryn S. McKinley.","author":"Haque Md. E.","year":"2015","unstructured":"Md. E. Haque , Yong Hun Eom , Yuxiong He, Sameh Elnikety, Ricardo Bianchini, and Kathryn S. McKinley. 2015 . Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services. In ASPLOS. Md. E. Haque, Yong Hun Eom, Yuxiong He, Sameh Elnikety, Ricardo Bianchini, and Kathryn S. McKinley. 2015. Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services. In ASPLOS."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_27_1","volume-title":"Squeezenet: Alexnet-level accuracy with 50x fewer parameters and &lt","author":"Iandola Forrest N","year":"2016","unstructured":"Forrest N Iandola , Song Han , Matthew W Moskewicz , Khalid Ashraf , William J Dally , and Kurt Keutzer . 2016 . Squeezenet: Alexnet-level accuracy with 50x fewer parameters and &lt ; 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016). Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and &lt; 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)."},{"key":"e_1_3_2_1_28_1","unstructured":"Google Inc. 2019. Google Cloud Vision API. https:\/\/cloud.google.com\/vision\/.  Google Inc. 2019. Google Cloud Vision API. https:\/\/cloud.google.com\/vision\/."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1006209.1006219"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Myeongjae Jeon Yuxiong He Sameh Elnikety Alan L. Cox and Scott Rixner. 2013. Adaptive Parallelism for Web Search. In EuroSys.  Myeongjae Jeon Yuxiong He Sameh Elnikety Alan L. Cox and Scott Rixner. 2013. Adaptive Parallelism for Web Search. In EuroSys.","DOI":"10.1145\/2465351.2465367"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465367"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2011.199"},{"key":"e_1_3_2_1_33_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba . 2015 . Adam : A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds .). http:\/\/arxiv.org\/abs\/1412.6980 Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of the ACM\/IEEE SC2006 Conference on High Performance Networking and Computing, November 11--17","author":"Krishnamoorthy Sriram","year":"2006","unstructured":"Sriram Krishnamoorthy , \u00dcmit V. \u00c7ataly\u00fcrek , Jarek Nieplocha , Atanas Rountev , and P. Sadayappan . 2006. Data management and query - Hypergraph partitioning for automatic memory hierarchy management . In Proceedings of the ACM\/IEEE SC2006 Conference on High Performance Networking and Computing, November 11--17 , 2006 , Tampa, FL, USA. ACM Press, 98. Sriram Krishnamoorthy, \u00dcmit V. \u00c7ataly\u00fcrek, Jarek Nieplocha, Atanas Rountev, and P. Sadayappan. 2006. Data management and query - Hypergraph partitioning for automatic memory hierarchy management. In Proceedings of the ACM\/IEEE SC2006 Conference on High Performance Networking and Computing, November 11--17, 2006, Tampa, FL, USA. ACM Press, 98."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600212.2600229"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126951"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3005745.3005750"},{"key":"e_1_3_2_1_39_1","unstructured":"Robert Mcmillan. 2019. How Skype Used AI to Build Its Amazing New Language Translator. http:\/\/www.wired.com\/2014\/12\/skype-used-ai-build-amazing-new-language-translator\/.  Robert Mcmillan. 2019. How Skype Used AI to Build Its Amazing New Language Translator. http:\/\/www.wired.com\/2014\/12\/skype-used-ai-build-amazing-new-language-translator\/."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2014.2304771"},{"key":"e_1_3_2_1_41_1","volume-title":"Device placement optimization with reinforcement learning. arXiv preprint arXiv:1706.04972","author":"Mirhoseini Azalia","year":"2017","unstructured":"Azalia Mirhoseini , Hieu Pham , Quoc V Le , Benoit Steiner , Rasmus Larsen , Yuefeng Zhou , Naveen Kumar , Mohammad Norouzi , Samy Bengio , and Jeff Dean . 2017. Device placement optimization with reinforcement learning. arXiv preprint arXiv:1706.04972 ( 2017 ). Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device placement optimization with reinforcement learning. arXiv preprint arXiv:1706.04972 (2017)."},{"key":"e_1_3_2_1_42_1","volume-title":"9th Americas Conference on Information Systems, AMCIS 2003","author":"Fui-Hoon Nah Fiona","year":"2003","unstructured":"Fiona Fui-Hoon Nah . 2003 . A Study on Tolerable Waiting Time: How Long Are Web Users Willing to Wait? . In 9th Americas Conference on Information Systems, AMCIS 2003 , Tampa, FL, USA, August 4--6 , 2003. Association for Information Systems, 285. http:\/\/aisel.aisnet.org\/amcis2003\/285 Fiona Fui-Hoon Nah. 2003. A Study on Tolerable Waiting Time: How Long Are Web Users Willing to Wait?. In 9th Americas Conference on Information Systems, AMCIS 2003, Tampa, FL, USA, August 4--6, 2003. Association for Information Systems, 285. http:\/\/aisel.aisnet.org\/amcis2003\/285"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/3104322.3104425"},{"key":"e_1_3_2_1_44_1","volume-title":"High-Performance ML Serving. arXiv preprint arXiv:1712.06139","author":"Olston Christopher","year":"2017","unstructured":"Christopher Olston , Noah Fiedel , Kiril Gorovoy , Jeremiah Harmsen , Li Lao , Fangwei Li , Vinu Rajashekhar , Sukriti Ramesh , and Jordan Soyke . 2017. TensorFlow-Serving : Flexible , High-Performance ML Serving. arXiv preprint arXiv:1712.06139 ( 2017 ). Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. TensorFlow-Serving: Flexible, High-Performance ML Serving. arXiv preprint arXiv:1712.06139 (2017)."},{"key":"e_1_3_2_1_45_1","volume-title":"Proceedings of the 16th international conference on Supercomputing, ICS 2002","author":"Oly James","year":"2002","unstructured":"James Oly and Daniel A. Reed . 2002. Markov model prediction of I\/O requests for scientific applications . In Proceedings of the 16th international conference on Supercomputing, ICS 2002 , New York City, NY, USA, June 22--26 , 2002 , Kemal Ebcioglu, Keshav Pingali, and Alex Nicolau (Eds.). ACM, 147--155. James Oly and Daniel A. Reed. 2002. Markov model prediction of I\/O requests for scientific applications. In Proceedings of the 16th international conference on Supercomputing, ICS 2002, New York City, NY, USA, June 22--26, 2002, Kemal Ebcioglu, Keshav Pingali, and Alex Nicolau (Eds.). ACM, 147--155."},{"key":"e_1_3_2_1_46_1","volume-title":"Chung","author":"Ovtcharov Kalin","year":"2015","unstructured":"Kalin Ovtcharov , Olatunji Ruwase , Joo-Young Kim , Jeremy Fowers , Karin Strauss , and Eric S . Chung . 2015 . Accelerating Deep Convolutional Neural Networks Using Specialized Hardware . http:\/\/research.microsoft.com\/apps\/pubs\/default.aspx?id=240715 Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric S. Chung. 2015. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware. http:\/\/research.microsoft.com\/apps\/pubs\/default.aspx?id=240715"},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011","author":"Raman Arun","year":"2011","unstructured":"Arun Raman , Hanjun Kim , Taewook Oh , Jae W. Lee , and David I. August . 2011. Parallelism orchestration using DoPE: the degree of parallelism executive . In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011 , San Jose, CA, USA, June 4--8 , 2011 . 26--37. Arun Raman, Hanjun Kim, Taewook Oh, Jae W. Lee, and David I. August. 2011. Parallelism orchestration using DoPE: the degree of parallelism executive. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4--8, 2011. 26--37."},{"key":"e_1_3_2_1_48_1","volume-title":"Massively Multitask Networks for Drug Discovery. arXiv preprint arXiv:1502.02072","author":"Ramsundar Bharath","year":"2015","unstructured":"Bharath Ramsundar , Steven Kearnes , Patrick Riley , Dale Webster , David Konerding , and Vijay Pande . 2015. Massively Multitask Networks for Drug Discovery. arXiv preprint arXiv:1502.02072 ( 2015 ). Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding, and Vijay Pande. 2015. Massively Multitask Networks for Drug Discovery. arXiv preprint arXiv:1502.02072 (2015)."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA.2015.152"},{"key":"e_1_3_2_1_50_1","unstructured":"Chuck Rosenberg. 2019. Improving Photo Search: A Step Across the Semantic Gap. http:\/\/googleresearch.blogspot.com\/2013\/06\/improving-photo-search-step-across.html.  Chuck Rosenberg. 2019. Improving Photo Search: A Step Across the Semantic Gap. http:\/\/googleresearch.blogspot.com\/2013\/06\/improving-photo-search-step-across.html."},{"key":"e_1_3_2_1_51_1","unstructured":"Anthony Rousseau Paul Del\u00e9glise and Yannick Esteve. 2014. Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks.. In LREC. 3935--3939.  Anthony Rousseau Paul Del\u00e9glise and Yannick Esteve. 2014. Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks.. In LREC. 3935--3939."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.5555\/762761.762776"},{"key":"e_1_3_2_1_54_1","volume-title":"High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438","author":"Schulman John","year":"2015","unstructured":"John Schulman , Philipp Moritz , Sergey Levine , Michael Jordan , and Pieter Abbeel . 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 ( 2015 ). John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)."},{"key":"e_1_3_2_1_55_1","volume-title":"Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al.","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris J Maddison , Arthur Guez , Laurent Sifre , George Van Den Driessche , Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016 . Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484."},{"key":"e_1_3_2_1_56_1","unstructured":"Satinder P Singh Tommi Jaakkola and Michael I Jordan. 1995. Reinforcement learning with soft state aggregation. In Advances in neural information processing systems. 361--368.  Satinder P Singh Tommi Jaakkola and Michael I Jordan. 1995. Reinforcement learning with soft state aggregation. In Advances in neural information processing systems. 361--368."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007678930559"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1162\/106365602320169811"},{"key":"e_1_3_2_1_59_1","unstructured":"Richard S Sutton David A McAllester Satinder P Singh and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.  Richard S Sutton David A McAllester Satinder P Singh and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063."},{"key":"e_1_3_2_1_60_1","first-page":"12","article-title":"Inception-v4, inception-resnet and the impact of residual connections on learning","volume":"4","author":"Szegedy Christian","year":"2017","unstructured":"Christian Szegedy , Sergey Ioffe , Vincent Vanhoucke , and Alexander A Alemi . 2017 . Inception-v4, inception-resnet and the impact of residual connections on learning .. In AAAI , Vol. 4. 12 . Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning.. In AAAI, Vol. 4. 12.","journal-title":"AAAI"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_2_1_62_1","unstructured":"Csaba Szepesv\u00e1ri. 1998. The asymptotic convergence-rate of Q-learning. In Advances in Neural Information Processing Systems. 1064--1070.  Csaba Szepesv\u00e1ri. 1998. The asymptotic convergence-rate of Q-learning. In Advances in Neural Information Processing Systems. 1064--1070."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/1687399.1687486"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2014.6853208"},{"key":"e_1_3_2_1_65_1","volume-title":"Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2","author":"Tieleman Tijmen","year":"2012","unstructured":"Tijmen Tieleman and Geoffrey Hinton . 2012. Lecture 6.5-rmsprop : Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 ( 2012 ), 26--31. Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26--31."},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2009.02.019"},{"key":"e_1_3_2_1_67_1","unstructured":"Oriol Vinyals Alexander Toshev Samy Bengio and Dumitru Erhan. 2019. A picture is worth a thousand (coherent) words: building a natural description of images. http:\/\/googleresearch.blogspot.com\/2014\/11\/a-picture-is-worth-thousand-coherent.html.  Oriol Vinyals Alexander Toshev Samy Bengio and Dumitru Erhan. 2019. A picture is worth a thousand (coherent) words: building a natural description of images. http:\/\/googleresearch.blogspot.com\/2014\/11\/a-picture-is-worth-thousand-coherent.html."},{"key":"e_1_3_2_1_68_1","volume-title":"Machine learning 8, 3--4","author":"Watkins Christopher JCH","year":"1992","unstructured":"Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning 8, 3--4 ( 1992 ), 279--292. Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8, 3--4 (1992), 279--292."},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.25"},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2018.2808352"},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2006.302739"},{"key":"e_1_3_2_1_72_1","volume-title":"ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701","author":"Zeiler Matthew D","year":"2012","unstructured":"Matthew D Zeiler . 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 ( 2012 ). Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)."},{"key":"e_1_3_2_1_73_1","volume-title":"Stay Fresh: Speculative Synchronization for Fast Distributed Machine Learning. In The 38th IEEE International Conference on Distributed Computing Systems (ICDCS)","author":"Zhang Chengliang","year":"2018","unstructured":"Chengliang Zhang , Huangshi Tian , Wei Wang , and Feng Yan . 2018 . Stay Fresh: Speculative Synchronization for Fast Distributed Machine Learning. In The 38th IEEE International Conference on Distributed Computing Systems (ICDCS) , Vienna, Austria , July, 2018. Chengliang Zhang, Huangshi Tian, Wei Wang, and Feng Yan. 2018. Stay Fresh: Speculative Synchronization for Fast Distributed Machine Learning. In The 38th IEEE International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria, July, 2018."},{"key":"e_1_3_2_1_74_1","unstructured":"Chengliang Zhang Minchen Yu Wei Wang and Feng Yan. 2019. MArk: Exploiting Cloud Services for Cost-Effective SLO-Aware Machine Learning Inference Serving. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19).  Chengliang Zhang Minchen Yu Wei Wang and Feng Yan. 2019. MArk: Exploiting Cloud Services for Cost-Effective SLO-Aware Machine Learning Inference Serving. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19)."},{"key":"e_1_3_2_1_75_1","volume-title":"2018 USENIX Annual Technical Conference, USENIX ATC 2018","author":"Zhang Minjia","year":"2018","unstructured":"Minjia Zhang , Samyam Rajbhandari , Wenhan Wang , and Yuxiong He . 2018 . DeepCPU: Serving RNN-based Deep Learning Models 10x Faster . In 2018 USENIX Annual Technical Conference, USENIX ATC 2018 , Boston, MA, USA, July 11--13 , 2018. 951--965. Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, and Yuxiong He. 2018. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11--13, 2018. 951--965."},{"key":"e_1_3_2_1_76_1","volume-title":"2019 USENIX Conference on Operational Machine Learning, OpML 2019","author":"Zhang Minjia","year":"2019","unstructured":"Minjia Zhang , Samyam Rajbhandari , Wenhan Wang , Elton Zheng , Olatunji Ruwase , Jeff Rasley , Jason Li , Junhua Wang , and Yuxiong He . 2019 . Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft . In 2019 USENIX Conference on Operational Machine Learning, OpML 2019 , Santa Clara, CA, USA , May 20, 2019., Bharath Ramsundar and Nisha Talagala (Eds.). USENIX Association, 5--7. https:\/\/www.usenix.org\/conference\/opml19\/presentation\/zhang-minjia Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, Elton Zheng, Olatunji Ruwase, Jeff Rasley, Jason Li, Junhua Wang, and Yuxiong He. 2019. Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft. In 2019 USENIX Conference on Operational Machine Learning, OpML 2019, Santa Clara, CA, USA, May 20, 2019., Bharath Ramsundar and Nisha Talagala (Eds.). USENIX Association, 5--7. https:\/\/www.usenix.org\/conference\/opml19\/presentation\/zhang-minjia"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/NOMS.2014.6838231"}],"event":{"name":"SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis","sponsor":["SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing","IEEE CS"],"location":"Denver Colorado","acronym":"SC '19"},"container-title":["Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3295500.3356164","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3295500.3356164","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3295500.3356164","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:02:13Z","timestamp":1750208533000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3295500.3356164"}},"subtitle":["a region based reinforcement learning approach"],"short-title":[],"issued":{"date-parts":[[2019,11,17]]},"references-count":77,"alternative-id":["10.1145\/3295500.3356164","10.1145\/3295500"],"URL":"https:\/\/doi.org\/10.1145\/3295500.3356164","relation":{},"subject":[],"published":{"date-parts":[[2019,11,17]]},"assertion":[{"value":"2019-11-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}