{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T16:13:22Z","timestamp":1762272802733,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":16,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,7,6]],"date-time":"2021-07-06T00:00:00Z","timestamp":1625529600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1919122, 1946752"],"award-info":[{"award-number":["1919122, 1946752"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,7,6]]},"DOI":"10.1145\/3409964.3461828","type":"proceedings-article","created":{"date-parts":[[2021,6,30]],"date-time":"2021-06-30T23:07:02Z","timestamp":1625094422000},"page":"439-442","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Efficient Distributed Algorithms for Convolutional Neural Networks"],"prefix":"10.1145","author":[{"given":"Rui","family":"Li","sequence":"first","affiliation":[{"name":"University of Utah, Salt Lake City, UT, USA"}]},{"given":"Yufan","family":"Xu","sequence":"additional","affiliation":[{"name":"University of Utah, Salt Lake City, UT, USA"}]},{"given":"Aravind","family":"Sukumaran-Rajam","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA, USA"}]},{"given":"Atanas","family":"Rountev","sequence":"additional","affiliation":[{"name":"Ohio State University, Columbus, OH, USA"}]},{"given":"P.","family":"Sadayappan","sequence":"additional","affiliation":[{"name":"University of Utah, Salt Lake City, UT, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,7,6]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Mart'in Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S Corrado Andy Davis Jeffrey Dean Matthieu Devin etal 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).  Mart'in Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S Corrado Andy Davis Jeffrey Dean Matthieu Devin et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.395.0575"},{"key":"e_1_3_2_1_3_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation. 578--594","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze , 2018 . TVM: An automated end-to-end optimizing compiler for deep learning . In 13th USENIX Symposium on Operating Systems Design and Implementation. 578--594 . Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation. 578--594."},{"key":"e_1_3_2_1_4_1","volume-title":"cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , and Evan Shelhamer . 2014. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 ( 2014 ). Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1137\/0210049"},{"key":"e_1_3_2_1_6_1","volume-title":"Beyond data and model parallelism for deep neural networks. arXiv preprint arXiv:1807.05358","author":"Jia Zhihao","year":"2018","unstructured":"Zhihao Jia , Matei Zaharia , and Alex Aiken . 2018. Beyond data and model parallelism for deep neural networks. arXiv preprint arXiv:1807.05358 ( 2018 ). Zhihao Jia, Matei Zaharia, and Alex Aiken. 2018. Beyond data and model parallelism for deep neural networks. arXiv preprint arXiv:1807.05358 (2018)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(93)90029-K"},{"volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--13","author":"Li Rui","key":"e_1_3_2_1_8_1","unstructured":"Rui Li , Aravind Sukumaran-Rajam , Richard Veras , Tze Meng Low , Fabrice Rastello , Atanas Rountev , and P. Sadayappan . 2019. Analytical cache modeling and tilesize optimization for tensor contractions . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--13 . Rui Li, Aravind Sukumaran-Rajam, Richard Veras, Tze Meng Low, Fabrice Rastello, Atanas Rountev, and P. Sadayappan. 2019. Analytical cache modeling and tilesize optimization for tensor contractions. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--13."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446759"},{"key":"e_1_3_2_1_10_1","unstructured":"Shen Li Yanli Zhao Rohan Varma Omkar Salpekar Pieter Noordhuis Teng Li Adam Paszke Jeff Smith Brian Vaughan Pritam Damania etal 2020. Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704 (2020).  Shen Li Yanli Zhao Rohan Varma Omkar Salpekar Pieter Noordhuis Teng Li Adam Paszke Jeff Smith Brian Vaughan Pritam Damania et al. 2020. Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704 (2020)."},{"key":"e_1_3_2_1_11_1","volume-title":"2019 USENIX Annual Technical Conference. 1025--1040","author":"Liu Yizhi","year":"2019","unstructured":"Yizhi Liu , Yao Wang , Ruofei Yu , Mu Li , Vin Sharma , and Yida Wang . 2019 . Optimizing CNN model inference on CPUs . In 2019 USENIX Annual Technical Conference. 1025--1040 . Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. 2019. Optimizing CNN model inference on CPUs. In 2019 USENIX Annual Technical Conference. 1025--1040."},{"key":"e_1_3_2_1_12_1","volume-title":"Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike Del Balso . 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 ( 2018 ). Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/2033408.2033420"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1096-9128(199704)9:4<255::AID-CPE250>3.0.CO;2-2"},{"key":"e_1_3_2_1_15_1","volume-title":"Tuna: A static analysis approach to optimizing deep neural networks. arXiv preprint arXiv:2104.14641","author":"Wang Yao","year":"2021","unstructured":"Yao Wang , Xingyu Zhou , Yanming Wang , Rui Li , Yong Wu , and Vin Sharma . 2021 . Tuna: A static analysis approach to optimizing deep neural networks. arXiv preprint arXiv:2104.14641 (2021). Yao Wang, Xingyu Zhou, Yanming Wang, Rui Li, Yong Wu, and Vin Sharma. 2021. Tuna: A static analysis approach to optimizing deep neural networks. arXiv preprint arXiv:2104.14641 (2021)."},{"key":"e_1_3_2_1_16_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation. 863--879","author":"Zheng Lianmin","year":"2020","unstructured":"Lianmin Zheng , Chengfan Jia , Minmin Sun , Zhao Wu , Cody Hao Yu , Ameer Haj-Ali , Yida Wang , Jun Yang , Danyang Zhuo , Koushik Sen , 2020 . Ansor: Generating high-performance tensor programs for deep learning . In 14th USENIX Symposium on Operating Systems Design and Implementation. 863--879 . Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, et al. 2020. Ansor: Generating high-performance tensor programs for deep learning. In 14th USENIX Symposium on Operating Systems Design and Implementation. 863--879."}],"event":{"name":"SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures","sponsor":["SIGACT ACM Special Interest Group on Algorithms and Computation Theory","SIGARCH ACM Special Interest Group on Computer Architecture","EATCS European Association for Theoretical Computer Science"],"location":"Virtual Event USA","acronym":"SPAA '21"},"container-title":["Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3409964.3461828","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3409964.3461828","content-type":"text\/html","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3409964.3461828","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3409964.3461828","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:08Z","timestamp":1750191428000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3409964.3461828"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,6]]},"references-count":16,"alternative-id":["10.1145\/3409964.3461828","10.1145\/3409964"],"URL":"https:\/\/doi.org\/10.1145\/3409964.3461828","relation":{},"subject":[],"published":{"date-parts":[[2021,7,6]]},"assertion":[{"value":"2021-07-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}