{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T20:57:12Z","timestamp":1765486632058,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,6,9]],"date-time":"2021-06-09T00:00:00Z","timestamp":1623196800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH (National Institutes of Health)","doi-asserted-by":"publisher","award":["UL1TR003167"],"award-info":[{"award-number":["UL1TR003167"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["1918651; 1910803; 2008240"],"award-info":[{"award-number":["1918651; 1910803; 2008240"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,6,9]]},"DOI":"10.1145\/3448016.3457317","type":"proceedings-article","created":{"date-parts":[[2021,6,18]],"date-time":"2021-06-18T17:22:30Z","timestamp":1624036950000},"page":"1222-1234","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra"],"prefix":"10.1145","author":[{"given":"Shangyu","family":"Luo","sequence":"first","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"Dimitrije","family":"Jankov","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"Binhang","family":"Yuan","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]},{"given":"Chris","family":"Jermaine","sequence":"additional","affiliation":[{"name":"Rice University, Houston, TX, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,6,18]]},"reference":[{"volume-title":"http:\/\/pytorch.org. Accessed","year":"2018","key":"e_1_3_2_2_1_1","unstructured":"2017. PyTorch. http:\/\/pytorch.org. Accessed Sep 1, 2018 . 2017. PyTorch. http:\/\/pytorch.org. Accessed Sep 1, 2018."},{"key":"e_1_3_2_2_2_1","volume-title":"https:\/\/systemds.apache.org\/. Accessed","author":"DS.","year":"2021","unstructured":"2021. System DS. https:\/\/systemds.apache.org\/. Accessed Feb 1, 2021 . 2021. SystemDS. https:\/\/systemds.apache.org\/. Accessed Feb 1, 2021."},{"volume-title":"OSDI 16","author":"Abadi Martin","key":"e_1_3_2_2_3_1","unstructured":"Martin Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . 2016. TensorFlow: A System for Large-Scale Machine Learning . In OSDI 16 . USENIX Association , GA , 265--283. Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI 16. USENIX Association, GA, 265--283."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1543135.1542481"},{"key":"e_1_3_2_2_5_1","volume-title":"Proactive Re-optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data","author":"Babu Shivnath","year":"2005","unstructured":"Shivnath Babu , Pedro Bizarro , and David DeWitt . 2005 . Proactive Re-optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data ( Baltimore, Maryland) (SIGMOD '05). ACM, New York, NY, USA, 107--118. https:\/\/doi.org\/10.1145\/1066157.1066171 10.1145\/1066157.1066171 Shivnath Babu, Pedro Bizarro, and David DeWitt. 2005. Proactive Re-optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (Baltimore, Maryland) (SIGMOD '05). ACM, New York, NY, USA, 107--118. https:\/\/doi.org\/10.1145\/1066157.1066171"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263662"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567807"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007263.3007279"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465283"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/335168.335230"},{"key":"e_1_3_2_2_11_1","unstructured":"T. Chen M. Li Y. Li M. Lin N. Wang M. Wang T. Xiao B. Xu C. Zhang and Z. Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274 (2015).  T. Chen M. Li Y. Li M. Lin N. Wang M. Wang T. Xiao B. Xu C. Zhang and Z. Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274 (2015)."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"crossref","unstructured":"Yufei Ding Jason Ansel Kalyan Veeramachaneni Xipeng Shen Una-May O\u00d5Reilly and Saman Amarasinghe. 2015. Autotuning Algorithmic Choice for Input Sensitivity. 379\u00d0390.  Yufei Ding Jason Ansel Kalyan Veeramachaneni Xipeng Shen Una-May O\u00d5Reilly and Saman Amarasinghe. 2015. Autotuning Algorithmic Choice for Input Sensitivity. 379\u00d0390.","DOI":"10.1145\/2813885.2737969"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994515"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342013494428"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01734359"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1998.681704"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"crossref","unstructured":"Amol Ghoting Rajasekar Krishnamurthy Edwin Pednault Berthold Reinwald Vikas Sindhwani Shirish Tatikonda Yuanyuan Tian and Shivakumar Vaithyanathan. 2011. SystemML: Declarative machine learning on MapReduce. In ICDE. 231--242.  Amol Ghoting Rajasekar Krishnamurthy Edwin Pednault Berthold Reinwald Vikas Sindhwani Shirish Tatikonda Yuanyuan Tian and Shivakumar Vaithyanathan. 2011. SystemML: Declarative machine learning on MapReduce. In ICDE. 231--242.","DOI":"10.1109\/ICDE.2011.5767930"},{"key":"e_1_3_2_2_18_1","volume-title":"Matrices with applications in statistics","author":"Graybill F.A.","year":"2008","unstructured":"F.A. Graybill . 1983. Matrices with applications in statistics . Wadsworth International Group . 8 2008 485 F.A. Graybill. 1983. Matrices with applications in statistics .Wadsworth International Group. 82008485"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/7902.7903"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465273"},{"key":"e_1_3_2_2_21_1","volume-title":"Jorge Ortiz, Jinyang Li, and Zhen Xiao.","author":"Huang Chien-Chin","year":"2015","unstructured":"Chien-Chin Huang , Qi Chen , Zhaoguo Wang , Russell Power , Jorge Ortiz, Jinyang Li, and Zhen Xiao. 2015 . Spartan : A distributed array framework with smart tiling. In 2015 $$USENIX$$ Annual Technical Conference ( $$USENIX$$$$ATC$$ 15). 1--15. Chien-Chin Huang, Qi Chen, Zhaoguo Wang, Russell Power, Jorge Ortiz, Jinyang Li, and Zhen Xiao. 2015. Spartan: A distributed array framework with smart tiling. In 2015 $$USENIX$$ Annual Technical Conference ($$USENIX$$$$ATC$$ 15). 1--15."},{"volume-title":"Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data","author":"Yannis","key":"e_1_3_2_2_22_1","unstructured":"Yannis E. Ioannidis and Stavros Christodoulakis. 1991. On the Propagation of Errors in the Size of Join Results . In Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data ( Denver, Colorado, USA) (SIGMOD '91). ACM, New York, NY, USA, 268--277. https:\/\/doi.org\/10.1145\/115790.115835 10.1145\/115790.115835 Yannis E. Ioannidis and Stavros Christodoulakis. 1991. On the Propagation of Errors in the Size of Join Results. In Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data (Denver, Colorado, USA) (SIGMOD '91). ACM, New York, NY, USA, 268--277. https:\/\/doi.org\/10.1145\/115790.115835"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3317315.3317323"},{"key":"e_1_3_2_2_24_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\"a ssan","author":"Jia Zhihao","year":"2018","unstructured":"Zhihao Jia , Sina Lin , Charles R. Qi , and Alex Aiken . 2018 . Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks . In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\"a ssan , Stockholm, Sweden, July 10--15 , 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 2279--2288. Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. 2018. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 2279--2288."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/276305.276315"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498293"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2017.108"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783381"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767755"},{"key":"e_1_3_2_2_30_1","volume-title":"Python for High Performance and Scientific Computing","volume":"14","author":"Wes","year":"2011","unstructured":"Wes McKinney et al. 2011. pandas: a foundational Python library for data analysis and statistics . Python for High Performance and Scientific Computing , Vol. 14 , 9 ( 2011 ). Wes McKinney et al. 2011. pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing, Vol. 14, 9 (2011)."},{"key":"e_1_3_2_2_31_1","volume-title":"Scikit-learn: Machine learning in Python. the Journal of machine Learning research","author":"Pedregosa Fabian","year":"2011","unstructured":"Fabian Pedregosa , Ga\u00ebl Varoquaux , Alexandre Gramfort , Vincent Michel , Bertrand Thirion , Olivier Grisel , Mathieu Blondel , Peter Prettenhofer , Ron Weiss , Vincent Dubourg , 2011 . Scikit-learn: Machine learning in Python. the Journal of machine Learning research , Vol. 12 (2011), 2825--2830. Fabian Pedregosa, Ga\u00ebl Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, Vol. 12 (2011), 2825--2830."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840306"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319854"},{"key":"e_1_3_2_2_34_1","volume-title":"AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 342--355","author":"Song Linghao","year":"2020","unstructured":"Linghao Song , Fan Chen , Youwei Zhuo , Xuehai Qian , Hai Li , and Yiran Chen . 2020 . AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 342--355 . Linghao Song, Fan Chen, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2020. AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 342--355."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/379539.379583"},{"volume-title":"High-Performance Computing on the Intel\u00ae Xeon Phi\u00aa","author":"Wang Endong","key":"e_1_3_2_2_36_1","unstructured":"Endong Wang , Qing Zhang , Bo Shen , Guangyong Zhang , Xiaowei Lu , Qing Wu , and Yajuan Wang . 2014. Intel math kernel library . In High-Performance Computing on the Intel\u00ae Xeon Phi\u00aa . Springer , 167--188. Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel\u00ae Xeon Phi\u00aa. Springer, 167--188."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303953"},{"key":"e_1_3_2_2_38_1","volume-title":"SC'98: Proceedings of the 1998 ACM\/IEEE conference on Supercomputing. IEEE, 38--38","author":"Clinton Whaley R","year":"1998","unstructured":"R Clinton Whaley and Jack J Dongarra . 1998 . Automatically tuned linear algebra software . In SC'98: Proceedings of the 1998 ACM\/IEEE conference on Supercomputing. IEEE, 38--38 . R Clinton Whaley and Jack J Dongarra. 1998. Automatically tuned linear algebra software. In SC'98: Proceedings of the 1998 ACM\/IEEE conference on Supercomputing. IEEE, 38--38."},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723712"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2017.150"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196933"}],"event":{"name":"SIGMOD\/PODS '21: International Conference on Management of Data","sponsor":["SIGMOD ACM Special Interest Group on Management of Data"],"location":"Virtual Event China","acronym":"SIGMOD\/PODS '21"},"container-title":["Proceedings of the 2021 International Conference on Management of Data"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3448016.3457317","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3448016.3457317","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3448016.3457317","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:25:04Z","timestamp":1750195504000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3448016.3457317"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,9]]},"references-count":41,"alternative-id":["10.1145\/3448016.3457317","10.1145\/3448016"],"URL":"https:\/\/doi.org\/10.1145\/3448016.3457317","relation":{},"subject":[],"published":{"date-parts":[[2021,6,9]]},"assertion":[{"value":"2021-06-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}