{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T09:40:26Z","timestamp":1775122826607,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":88,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,3,25]],"date-time":"2019-03-25T00:00:00Z","timestamp":1553472000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-1816717"],"award-info":[{"award-number":["CNS-1816717"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,3,25]]},"DOI":"10.1145\/3302424.3303953","type":"proceedings-article","created":{"date-parts":[[2019,3,22]],"date-time":"2019-03-22T13:10:03Z","timestamp":1553260203000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":105,"title":["Supporting Very Large Models using Automatic Dataflow Graph Partitioning"],"prefix":"10.1145","author":[{"given":"Minjie","family":"Wang","sequence":"first","affiliation":[{"name":"New York University"}]},{"given":"Chien-chin","family":"Huang","sequence":"additional","affiliation":[{"name":"New York University"}]},{"given":"Jinyang","family":"Li","sequence":"additional","affiliation":[{"name":"New York University"}]}],"member":"320","published-online":{"date-parts":[[2019,3,25]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Wide residual networks. In arXiv:1605.07146","author":"Zagoruyko Sergey","year":"2016","unstructured":"Sergey Zagoruyko and Nikos Komodakis . Wide residual networks. In arXiv:1605.07146 , 2016 . Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In arXiv:1605.07146, 2016."},{"key":"e_1_3_2_1_2_1","volume-title":"Google's neural machine translation system: Bridging the gap between human and machine translation. In arxiv.org:1609.08144","author":"Wu Yonghui","year":"2016","unstructured":"Yonghui Wu , Mike Schuster , Zhifeng Chen , Quoc V. Le , and Mohammad Norouzi . Google's neural machine translation system: Bridging the gap between human and machine translation. In arxiv.org:1609.08144 , 2016 . Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, and Mohammad Norouzi. Google's neural machine translation system: Bridging the gap between human and machine translation. In arxiv.org:1609.08144, 2016."},{"key":"e_1_3_2_1_3_1","volume-title":"Deep Learning","author":"Goodfellow Ian","year":"2016","unstructured":"Ian Goodfellow , Yoshua Bengio , and Aaron Courville . Deep Learning . MIT Press , 2016 . http:\/\/www.deeplearningbook.org. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http:\/\/www.deeplearningbook.org."},{"key":"e_1_3_2_1_4_1","volume-title":"Proc. of ML Systems Workshop in NIPS","author":"Meng Chen","year":"2017","unstructured":"Chen Meng , Minmin Sun , Jun Yang , Minghui Qiu , and Yang Gu . Training deeper models by gpu memory optimization on tensorflow . In Proc. of ML Systems Workshop in NIPS , 2017 . Chen Meng, Minmin Sun, Jun Yang, Minghui Qiu, and Yang Gu. Training deeper models by gpu memory optimization on tensorflow. In Proc. of ML Systems Workshop in NIPS, 2017."},{"key":"e_1_3_2_1_5_1","first-page":"4125","volume-title":"Advances in Neural Information Processing Systems","author":"Gruslys Audrunas","year":"2016","unstructured":"Audrunas Gruslys , R\u00e9mi Munos , Ivo Danihelka , Marc Lanctot , and Alex Graves . Memory-efficient backpropagation through time . In Advances in Neural Information Processing Systems , pages 4125 -- 4133 , 2016 . Audrunas Gruslys, R\u00e9mi Munos, Ivo Danihelka, Marc Lanctot, and Alex Graves. Memory-efficient backpropagation through time. In Advances in Neural Information Processing Systems, pages 4125--4133, 2016."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1007\/978-3-642-35289-8_27","volume-title":"Neural networks: Tricks of the trade","author":"Martens James","year":"2012","unstructured":"James Martens and Ilya Sutskever . Training deep and recurrent networks with hessian-free optimization . In Neural networks: Tricks of the trade , pages 479 -- 535 . Springer , 2012 . James Martens and Ilya Sutskever. Training deep and recurrent networks with hessian-free optimization. In Neural networks: Tricks of the trade, pages 479--535. Springer, 2012."},{"key":"e_1_3_2_1_7_1","volume-title":"Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen , Bing Xu , Chiyuan Zhang , and Carlos Guestrin . Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 , 2016 . Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016."},{"key":"e_1_3_2_1_8_1","volume-title":"Neural Information Processing Systems (NIPS)","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean , Greg S. Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Quoc V. Le , Mark Z. Mao , Marc'Aurelio Ranzato , Andrew Senior , Paul Tucker , Ke Yang , and Andrew Y. Ng . Large scale distributed deep networks . In Neural Information Processing Systems (NIPS) , 2012 . Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. Large scale distributed deep networks. In Neural Information Processing Systems (NIPS), 2012."},{"key":"e_1_3_2_1_9_1","first-page":"1337","volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML-13)","author":"Coates Adam","year":"2013","unstructured":"Adam Coates , Brody Huval , Tao Wang , David Wu , Bryan Catanzaro , and Ng Andrew . Deep learning with COTS HPC systems . In Proceedings of the 30th International Conference on Machine Learning (ICML-13) , pages 1337 -- 1345 , 2013 . Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Ng Andrew. Deep learning with COTS HPC systems. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 1337--1345, 2013."},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14","author":"Chilimbi Trishul","year":"2014","unstructured":"Trishul Chilimbi , Yutaka Suzue , Johnson Apacible , and Karthik Kalyanaraman . Project adam : Building an efficient and scalable deep learning training system . In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14 , 2014 . Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, 2014."},{"key":"e_1_3_2_1_11_1","unstructured":"Xuan Yang Jing Pu Blaine Burton Rister Nikhil Bhagdikar Stephen Richardson Shahar Kvatinsky Jonathan Ragan-Kelley Ardavan Pedram and Mark Horowitz. A systematic approach to blocking convolutional neural networks. arXiv preprint arXiv:1606.04209 2016.  Xuan Yang Jing Pu Blaine Burton Rister Nikhil Bhagdikar Stephen Richardson Shahar Kvatinsky Jonathan Ragan-Kelley Ardavan Pedram and Mark Horowitz. A systematic approach to blocking convolutional neural networks. arXiv preprint arXiv:1606.04209 2016."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.41"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037702"},{"key":"e_1_3_2_1_14_1","first-page":"2279","volume-title":"Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\u00e4ssan","author":"Jia Zhihao","year":"2018","unstructured":"Zhihao Jia , Sina Lin , Charles R. Qi , and Alex Aiken . Exploring hidden dimensions in parallelizing convolutional neural networks . In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\u00e4ssan , Stockholm, Sweden , July 10-15, 2018 , pages 2279 -- 2288 , 2018. Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. Exploring hidden dimensions in parallelizing convolutional neural networks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10-15, 2018, pages 2279--2288, 2018."},{"key":"e_1_3_2_1_15_1","volume-title":"Beyond data and model parallelism for deep neural networks. arXiv preprint arXiv:1807.05358","author":"Jia Zhihao","year":"2018","unstructured":"Zhihao Jia , Matei Zaharia , and Alex Aiken . Beyond data and model parallelism for deep neural networks. arXiv preprint arXiv:1807.05358 , 2018 . Zhihao Jia, Matei Zaharia, and Alex Aiken. Beyond data and model parallelism for deep neural networks. arXiv preprint arXiv:1807.05358, 2018."},{"key":"e_1_3_2_1_16_1","volume-title":"12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . Tensorflow : A system for large-scale machine learning . In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) , 2016 . Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016."},{"key":"e_1_3_2_1_17_1","volume-title":"Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 , 2015 . Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274, 2015."},{"key":"e_1_3_2_1_18_1","unstructured":"PyTorch. http:\/\/pytorch.org.  PyTorch. http:\/\/pytorch.org."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_3_2_1_20_1","volume-title":"Exploring the limits of language modeling. CoRR, abs\/1602.02410","author":"J\u00f3zefowicz Rafal","year":"2016","unstructured":"Rafal J\u00f3zefowicz , Oriol Vinyals , Mike Schuster , Noam Shazeer , and Yonghui Wu . Exploring the limits of language modeling. CoRR, abs\/1602.02410 , 2016 . Rafal J\u00f3zefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring the limits of language modeling. CoRR, abs\/1602.02410, 2016."},{"key":"e_1_3_2_1_21_1","unstructured":"Google Cloud. Tpu: System architecture.  Google Cloud. Tpu: System architecture."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2168836.2168857"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/291891.291901"},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of the Fourth Workshop on Compilers for Parallel Computers","author":"Kremer Ulrich","year":"1993","unstructured":"Ulrich Kremer . Np-completeness of dynamic remapping . In Proceedings of the Fourth Workshop on Compilers for Parallel Computers , Delft, The Netherlands , 1993 . Ulrich Kremer. Np-completeness of dynamic remapping. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, 1993."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/FMPC.1990.89493"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(91)90090-V"},{"key":"e_1_3_2_1_28_1","volume-title":"Device placement optimization with reinforcement learning. arXiv preprint arXiv:1706.04972","author":"Mirhoseini Azalia","year":"2017","unstructured":"Azalia Mirhoseini , Hieu Pham , Quoc V Le , Benoit Steiner , Rasmus Larsen , Yuefeng Zhou , Naveen Kumar , Mohammad Norouzi , Samy Bengio , and Jeff Dean . Device placement optimization with reinforcement learning. arXiv preprint arXiv:1706.04972 , 2017 . Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. Device placement optimization with reinforcement learning. arXiv preprint arXiv:1706.04972, 2017."},{"key":"e_1_3_2_1_29_1","volume-title":"ICLR","author":"Mirhoseini Azalia","year":"2018","unstructured":"Azalia Mirhoseini , Anna Goldie , Hieu Pham , Benoit Steiner , Quoc V. Le , and Jeff Dean . A hierarchical model for device placement . In ICLR , 2018 . Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, and Jeff Dean. A hierarchical model for device placement. In ICLR, 2018."},{"key":"e_1_3_2_1_30_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . TVM : An automated end-to-end optimizing compiler for deep learning . In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) , Carlsbad, CA , 2018 . USENIX Association. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, 2018. USENIX Association."},{"key":"e_1_3_2_1_31_1","volume-title":"Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. In arXiv:1802.04730v2","author":"Vasilache Nicolas","year":"2018","unstructured":"Nicolas Vasilache , Oleksandr Zinenko , Theodoros Theodoridis , Priya Goyal , Zachary DeVito , William S. Moses , Sven Verdoolaege , Andrew Adams , and Albert Cohen . Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. In arXiv:1802.04730v2 , 2018 . Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. In arXiv:1802.04730v2, 2018."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31424-7_15"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/358438.349325"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.entcs.2014.08.004"},{"key":"e_1_3_2_1_35_1","volume-title":"USENIX Annual Technical Conference","author":"Huang Chien-Chin","year":"2015","unstructured":"Chien-Chin Huang , Qi Chen , Zhaoguo Wang , Russell Power , Jorge Ortiz , Jinyang Li , and Zhen Xiao . Spartan : A distributed array framework with smart tiling . In USENIX Annual Technical Conference , 2015 . Chien-Chin Huang, Qi Chen, Zhaoguo Wang, Russell Power, Jorge Ortiz, Jinyang Li, and Zhen Xiao. Spartan: A distributed array framework with smart tiling. In USENIX Annual Technical Conference, 2015."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-349-03521-2","volume-title":"Graph Theory with Applications","author":"Bondy J.A","year":"1976","unstructured":"J.A Bondy and U.S.R. Murty . Graph Theory with Applications . Elseyier Science Publishing , 1976 . J.A Bondy and U.S.R. Murty. Graph Theory with Applications. Elseyier Science Publishing, 1976."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303953"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_40_1","volume-title":"Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs\/1412.6980","author":"Diederik","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs\/1412.6980 , 2014 . Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs\/1412.6980, 2014."},{"key":"e_1_3_2_1_41_1","volume-title":"Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121--2159","author":"Duchi John","year":"2011","unstructured":"John Duchi , Elad Hazan , and Yoram Singer . Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121--2159 , 2011 . John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121--2159, 2011."},{"key":"e_1_3_2_1_42_1","volume-title":"Profile-guided memory optimization for deep neural networks. arXiv preprint arXiv:1804.10001","author":"Sekiyama Taro","year":"2018","unstructured":"Taro Sekiyama , Takashi Imamichi , Haruki Imai , and Rudy Raymond . Profile-guided memory optimization for deep neural networks. arXiv preprint arXiv:1804.10001 , 2018 . Taro Sekiyama, Takashi Imamichi, Haruki Imai, and Rudy Raymond. Profile-guided memory optimization for deep neural networks. arXiv preprint arXiv:1804.10001, 2018."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195660"},{"key":"e_1_3_2_1_44_1","first-page":"3104","volume-title":"Advances in neural information processing systems","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever , Oriol Vinyals , and Quoc V Le . Sequence to sequence learning with neural networks . In Advances in neural information processing systems , pages 3104 -- 3112 , 2014 . Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112, 2014."},{"key":"e_1_3_2_1_45_1","volume-title":"Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538","author":"Shazeer Noam","year":"2017","unstructured":"Noam Shazeer , Azalia Mirhoseini , Krzysztof Maziarz , Andy Davis , Quoc Le , Geoffrey Hinton , and Jeff Dean . Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 , 2017 . Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017."},{"key":"e_1_3_2_1_46_1","volume-title":"One weird trick for parallelizing convolutional neural networks. In arXiv:1404.5997","author":"Krizhevsky Alex","year":"2014","unstructured":"Alex Krizhevsky . One weird trick for parallelizing convolutional neural networks. In arXiv:1404.5997 , 2014 . Alex Krizhevsky. One weird trick for parallelizing convolutional neural networks. In arXiv:1404.5997, 2014."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685095"},{"key":"e_1_3_2_1_48_1","volume-title":"USENIX Annual Technical Conference","author":"Cui H.","year":"2014","unstructured":"H. Cui , J. Cipar , Q. Ho , J.K. Kim , S. Lee , A. Kumar , J. Wei , W. Dai , G. R. Ganger , P.B. Gibbons , G. A. Gibson , and E. P. Xing . Exploiting bounded staleness to speed up big data analytics . In USENIX Annual Technical Conference , 2014 . H. Cui, J. Cipar, Q. Ho, J.K. Kim, S. Lee, A. Kumar, J.Wei, W. Dai, G. R. Ganger, P.B. Gibbons, G. A. Gibson, and E. P. Xing. Exploiting bounded staleness to speed up big data analytics. In USENIX Annual Technical Conference, 2014."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2806777.2806778"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2901318.2901323"},{"key":"e_1_3_2_1_51_1","volume-title":"NIPS Workshop, Distributed Machine Learning and Matrix Computations","author":"Wang Minjie","year":"2014","unstructured":"Minjie Wang , Tianjun Xiao , Jianpeng Li , Jiaxing Zhang , Chuntao Hong , and Zheng Zhang . Minerva : A scalable and highly efficient training platform for deep learning . In NIPS Workshop, Distributed Machine Learning and Matrix Computations , 2014 . Minjie Wang, Tianjun Xiao, Jianpeng Li, Jiaxing Zhang, Chuntao Hong, and Zheng Zhang. Minerva: A scalable and highly efficient training platform for deep learning. In NIPS Workshop, Distributed Machine Learning and Matrix Computations, 2014."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2901318.2901331"},{"key":"e_1_3_2_1_53_1","first-page":"1135","volume-title":"Advances in neural information processing systems","author":"Han Song","year":"2015","unstructured":"Song Han , Jeff Pool , John Tran , and William Dally . Learning both weights and connections for efficient neural network . In Advances in neural information processing systems , pages 1135 -- 1143 , 2015 . Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135--1143, 2015."},{"key":"e_1_3_2_1_54_1","volume-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149","author":"Han Song","year":"2015","unstructured":"Song Han , Huizi Mao , and William J Dally . Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 , 2015 . Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015."},{"key":"e_1_3_2_1_55_1","volume-title":"Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115","author":"Gong Yunchao","year":"2014","unstructured":"Yunchao Gong , Liu Liu , Ming Yang , and Lubomir Bourdev . Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 , 2014 . Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115, 2014."},{"key":"e_1_3_2_1_56_1","first-page":"4107","volume-title":"Advances in neural information processing systems","author":"Hubara Itay","year":"2016","unstructured":"Itay Hubara , Matthieu Courbariaux , Daniel Soudry , Ran El-Yaniv , and Yoshua Bengio . Binarized neural networks . In Advances in neural information processing systems , pages 4107 -- 4115 , 2016 . Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Advances in neural information processing systems, pages 4107--4115, 2016."},{"key":"e_1_3_2_1_57_1","first-page":"2","volume-title":"Proceedings of the 1990 ACM\/IEEE conference on Supercomputing","author":"Anderson Edward","year":"1990","unstructured":"Edward Anderson , Zhaojun Bai , J Dongarra , A Greenbaum , A McKenney , Jeremy Du Croz , S Hammerling , J Demmel , C Bischof , and Danny Sorensen . LAPACK : A portable linear algebra library for high-performance computers . In Proceedings of the 1990 ACM\/IEEE conference on Supercomputing , pages 2 -- 11 . IEEE Computer Society Press , 1990 . Edward Anderson, Zhaojun Bai, J Dongarra, A Greenbaum, A McKenney, Jeremy Du Croz, S Hammerling, J Demmel, C Bischof, and Danny Sorensen. LAPACK: A portable linear algebra library for high-performance computers. In Proceedings of the 1990 ACM\/IEEE conference on Supercomputing, pages 2--11. IEEE Computer Society Press, 1990."},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/FMPC.1992.234898"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2427023.2427030"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.5555\/243179.243182"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.5555\/266469.266486"},{"key":"e_1_3_2_1_62_1","volume-title":"USA","author":"Robert","year":"1995","unstructured":"Robert A. van de Geijn and Jerrell Watts. Summa: Scalable universal matrix multiplication algorithm. Technical report, Austin, TX , USA , 1995 . Robert A. van de Geijn and Jerrell Watts. Summa: Scalable universal matrix multiplication algorithm. Technical report, Austin, TX, USA, 1995."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.06.002"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.5555\/645671.665387"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342007078442"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.2172\/862127"},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063473"},{"key":"e_1_3_2_1_68_1","volume-title":"JSM Proceedings, Section on Physical and Engineering Sciences","author":"Stokely Murray","year":"2011","unstructured":"Murray Stokely , Farzan Rohani , and Eric Tassone . Large-scale parallel statistical forecasting computations in r . In JSM Proceedings, Section on Physical and Engineering Sciences , Alexandria, VA , 2011 . Murray Stokely, Farzan Rohani, and Eric Tassone. Large-scale parallel statistical forecasting computations in r. In JSM Proceedings, Section on Physical and Engineering Sciences, Alexandria, VA, 2011."},{"key":"e_1_3_2_1_69_1","unstructured":"SparkR: R frontend for Spark. http:\/\/amplab-extras.github.io\/SparkR-pkg.  SparkR: R frontend for Spark. http:\/\/amplab-extras.github.io\/SparkR-pkg."},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994511"},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/2854038.2854042"},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465371"},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851157"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.112"},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1021\/jp034596z"},{"key":"e_1_3_2_1_76_1","first-page":"645","volume-title":"11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)","author":"M\u00fcller Stefan C.","year":"2014","unstructured":"Stefan C. M\u00fcller , Gustavo Alonso , Adam Amara , and Andr\u00e9 Csillaghy . Pydron : Semi-automatic parallelization for multi-core and the cloud . In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) , pages 645 -- 659 , Broomfield, CO , October 2014 . USENIX Association. Stefan C. M\u00fcller, Gustavo Alonso, Adam Amara, and Andr\u00e9 Csillaghy. Pydron: Semi-automatic parallelization for multi-core and the cloud. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 645--659, Broomfield, CO, October 2014. USENIX Association."},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/255129.255156"},{"key":"e_1_3_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(90)90086-5"},{"key":"e_1_3_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/209937.209953"},{"key":"e_1_3_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.5555\/645608.661829"},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.97903"},{"key":"e_1_3_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/76263.76335"},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.5555\/645672.665556"},{"key":"e_1_3_2_1_84_1","volume-title":"1989 International Conference on Parallel Processing","author":"ERIKH","year":"1989","unstructured":"ERIKH D'HOLLANDER. Partitioning and labeling of index sets in do loops with constant dependence vectors . In 1989 International Conference on Parallel Processing , University Park, PA , 1989 . ERIKH D'HOLLANDER. Partitioning and labeling of index sets in do loops with constant dependence vectors. In 1989 International Conference on Parallel Processing, University Park, PA, 1989."},{"key":"e_1_3_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1993.1094"},{"key":"e_1_3_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.5555\/645669.665200"},{"key":"e_1_3_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2009.36"},{"key":"e_1_3_2_1_88_1","volume-title":"OSDI","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez , Yucheng Low , Haijie Gu , Danny Bickson , and Carlos Guestrin . Powergraph : Distributed graph-parallel computation on natural graphs . In OSDI , 2012 . Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012."},{"key":"e_1_3_2_1_89_1","volume-title":"Symposium on Operating System Design and Implementation (OSDI)","author":"Dean Jeff","year":"2004","unstructured":"Jeff Dean and Sanjay Ghemawat . Mapreduce : Simplified data processing on large clusters . In Symposium on Operating System Design and Implementation (OSDI) , 2004 . Jeff Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In Symposium on Operating System Design and Implementation (OSDI), 2004."}],"event":{"name":"EuroSys '19: Fourteenth EuroSys Conference 2019","location":"Dresden Germany","acronym":"EuroSys '19","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the Fourteenth EuroSys Conference 2019"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3302424.3303953","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3302424.3303953","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3302424.3303953","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:01:48Z","timestamp":1750208508000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3302424.3303953"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,25]]},"references-count":88,"alternative-id":["10.1145\/3302424.3303953","10.1145\/3302424"],"URL":"https:\/\/doi.org\/10.1145\/3302424.3303953","relation":{},"subject":[],"published":{"date-parts":[[2019,3,25]]},"assertion":[{"value":"2019-03-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}