{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T15:41:41Z","timestamp":1781797301799,"version":"3.54.5"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,17]],"date-time":"2023-06-17T00:00:00Z","timestamp":1686960000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["CCF-1919044"],"award-info":[{"award-number":["CCF-1919044"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["CNS-2144796"],"award-info":[{"award-number":["CNS-2144796"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,17]]},"DOI":"10.1145\/3579371.3589059","type":"proceedings-article","created":{"date-parts":[[2023,6,16]],"date-time":"2023-06-16T20:25:28Z","timestamp":1686947128000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["V10: Hardware-Assisted NPU Multi-tenancy for Improved Resource Utilization and Fairness"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-0363-9486","authenticated-orcid":false,"given":"Yuqi","family":"Xue","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, Illinois, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8171-4970","authenticated-orcid":false,"given":"Yiqi","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, Illinois, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8801-9384","authenticated-orcid":false,"given":"Lifeng","family":"Nai","sequence":"additional","affiliation":[{"name":"Google, Mountain View, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1125-671X","authenticated-orcid":false,"given":"Jian","family":"Huang","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, Illinois, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,6,17]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2014. FreePDK15. https:\/\/eda.ncsu.edu\/freepdk15\/  2014. FreePDK15. https:\/\/eda.ncsu.edu\/freepdk15\/"},{"key":"e_1_3_2_1_2_1","unstructured":"2020. Hardware Accelerated GPU Scheduling. https:\/\/devblogs.microsoft.com\/directx\/hardware-accelerated-gpu-scheduling\/  2020. Hardware Accelerated GPU Scheduling. https:\/\/devblogs.microsoft.com\/directx\/hardware-accelerated-gpu-scheduling\/"},{"key":"e_1_3_2_1_3_1","unstructured":"2020. Understanding 8 types of Cross-Validation: A Deep dive explanation of cross-validation and its types. https:\/\/towardsdatascience.com\/understanding-8-types-of-cross-validation-80c935a4976d  2020. Understanding 8 types of Cross-Validation: A Deep dive explanation of cross-validation and its types. https:\/\/towardsdatascience.com\/understanding-8-types-of-cross-validation-80c935a4976d"},{"key":"e_1_3_2_1_4_1","unstructured":"2022. Profile your model with Cloud TPU tools. https:\/\/cloud.google.com\/tpu\/docs\/profile-tpu-vm  2022. Profile your model with Cloud TPU tools. https:\/\/cloud.google.com\/tpu\/docs\/profile-tpu-vm"},{"key":"e_1_3_2_1_5_1","unstructured":"2022. Supported reference models. https:\/\/cloud.google.com\/tpu\/docs\/tutorials\/supported-models  2022. Supported reference models. https:\/\/cloud.google.com\/tpu\/docs\/tutorials\/supported-models"},{"key":"e_1_3_2_1_6_1","unstructured":"2022. XLA: Optimizing Compiler for Machine Learning. https:\/\/www.tensorflow.org\/xla  2022. XLA: Optimizing Compiler for Machine Learning. https:\/\/www.tensorflow.org\/xla"},{"key":"e_1_3_2_1_7_1","unstructured":"Altexsoft. 2021. Comparing Machine Learning as a Service: Amazon Microsoft Azure Google Cloud AI IBM Watson. https:\/\/www.altexsoft.com\/blog\/datascience\/comparing-machine-learning-as-a-service-amazon-microsoft-azure-google-cloud-ai-ibm-watson\/  Altexsoft. 2021. Comparing Machine Learning as a Service: Amazon Microsoft Azure Google Cloud AI IBM Watson. https:\/\/www.altexsoft.com\/blog\/datascience\/comparing-machine-learning-as-a-service-amazon-microsoft-azure-google-cloud-ai-ibm-watson\/"},{"key":"e_1_3_2_1_8_1","unstructured":"AWS. 2022. Amazon EC2 F1 Instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/  AWS. 2022. Amazon EC2 F1 Instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/"},{"key":"e_1_3_2_1_9_1","unstructured":"Amazon AWS. 2022. Machine Learning on AWS Innovate faster with the most comprehensive set of AI and ML services. https:\/\/aws.amazon.com\/machine-learning\/  Amazon AWS. 2022. Machine Learning on AWS Innovate faster with the most comprehensive set of AI and ML services. https:\/\/aws.amazon.com\/machine-learning\/"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00081"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037700"},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'16)","author":"Chen Quan","year":"2016","unstructured":"Quan Chen , Hailong Yang , Jason Mars , and Lingjia Tang . 2016 . Bay-max: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers . In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'16) . Atlanta, Georgia, USA. Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Bay-max: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'16). Atlanta, Georgia, USA."},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'14)","author":"Chen Tianshi","year":"2014","unstructured":"Tianshi Chen , Zidong Du , Ninghui Sun , Jia Wang , Chengyong Wu , Yunji Chen , and Olivier Temam . 2014 . DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning . In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'14) . Salt Lake City, UT. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'14). Salt Lake City, UT."},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . 2018 . TVM: An Automated End-to-End Optimizing Compiler for Deep Learning . In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18) . Carlsbad, CA. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18). Carlsbad, CA."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00027"},{"key":"e_1_3_2_1_17_1","volume-title":"Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'14)","author":"Delimitrou Christina","unstructured":"Christina Delimitrou and Christos Kozyrakis . [n. d.]. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'14) . Salt Lake City, Utah, USA. Christina Delimitrou and Christos Kozyrakis. [n. d.]. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'14). Salt Lake City, Utah, USA."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015408"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750389"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.44"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00012"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00062"},{"key":"e_1_3_2_1_23_1","unstructured":"Google. 2022. System Architecture - Cloud TPU. https:\/\/cloud.google.com\/tpu\/docs\/system-architecture-tpu-vm  Google. 2022. System Architecture - Cloud TPU. https:\/\/cloud.google.com\/tpu\/docs\/system-architecture-tpu-vm"},{"key":"e_1_3_2_1_24_1","unstructured":"Graphcore. 2022. V-IPU User Guide. https:\/\/docs.graphcore.ai\/projects\/vipu-user\/en\/latest\/index.html  Graphcore. 2022. V-IPU User Guide. https:\/\/docs.graphcore.ai\/projects\/vipu-user\/en\/latest\/index.html"},{"key":"e_1_3_2_1_25_1","volume-title":"Proccedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22)","author":"Han Mingcong","year":"2022","unstructured":"Mingcong Han , Hanze Zhang , Rong Chen , and Haibo Chen . 2022 . Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences . In Proccedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22) . Carlsbad, CA. Mingcong Han, Hanze Zhang, Rong Chen, and Haibo Chen. 2022. Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences. In Proccedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22). Carlsbad, CA."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.30"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3360307"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1704.04760"},{"key":"e_1_3_2_1_29_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Khawaja Ahmed","unstructured":"Ahmed Khawaja , Joshua Landgraf , Rohith Prakash , Michael Wei , Eric Schkufza , and Christopher J. Rossbach . 2018. Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS . In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . Carlsbad, CA. Ahmed Khawaja, Joshua Landgraf, Rohith Prakash, Michael Wei, Eric Schkufza, and Christopher J. Rossbach. 2018. Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). Carlsbad, CA."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446755"},{"key":"e_1_3_2_1_31_1","unstructured":"Rasmus Munk Larsen and Tatiana Shpeisman. 2019. TensorFlow Graph Optimizations.  Rasmus Munk Larsen and Tatiana Shpeisman. 2019. TensorFlow Graph Optimizations."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2749475"},{"key":"e_1_3_2_1_33_1","volume-title":"Proccedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22)","author":"Lv Chengfei","year":"2022","unstructured":"Chengfei Lv , Chaoyue Niu , Renjie Gu , Xiaotang Jiang , Zhaode Wang , Bin Liu , Ziqi Wu , Qiulin Yao , Congyu Huang , Panos Huang , Tao Huang , Hui Shu , Jinde Song , Bin Zou , Peng Lan , Guohuan Xu , Fei Wu , Shaojie Tang , Fan Wu , and Guihai Chen . 2022 . Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning . In Proccedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22) . Carlsbad, CA. Chengfei Lv, Chaoyue Niu, Renjie Gu, Xiaotang Jiang, Zhaode Wang, Bin Liu, Ziqi Wu, Qiulin Yao, Congyu Huang, Panos Huang, Tao Huang, Hui Shu, Jinde Song, Bin Zou, Peng Lan, Guohuan Xu, Fei Wu, Shaojie Tang, Fan Wu, and Guihai Chen. 2022. Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning. In Proccedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22). Carlsbad, CA."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378482"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155650"},{"key":"e_1_3_2_1_36_1","volume-title":"Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22)","author":"Mohan Jayashree","year":"2022","unstructured":"Jayashree Mohan , Amar Phanishayee , Janardhan Kulkarni , and Vijay Chidambaram . 2022 . Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters . In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22) . Carlsbad, CA. Jayashree Mohan, Amar Phanishayee, Janardhan Kulkarni, and Vijay Chidambaram. 2022. Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22). Carlsbad, CA."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3058217"},{"key":"e_1_3_2_1_38_1","unstructured":"NVIDIA. 2022. NVIDIA T4: Flexible Design Breakthrough Performance. https:\/\/www.nvidia.com\/en-us\/data-center\/tesla-t4\/  NVIDIA. 2022. NVIDIA T4: Flexible Design Breakthrough Performance. https:\/\/www.nvidia.com\/en-us\/data-center\/tesla-t4\/"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00056"},{"key":"e_1_3_2_1_40_1","unstructured":"Ejiro Onose. 2022. Machine Learning as a Service: What It Is When to Use It and What Are the Best Tools Out There. https:\/\/neptune.ai\/blog\/machine-learning-as-a-service-what-it-is-when-to-use-it-and-what-are-the-best-tools-out-there  Ejiro Onose. 2022. Machine Learning as a Service: What It Is When to Use It and What Are the Best Tools Out There. https:\/\/neptune.ai\/blog\/machine-learning-as-a-service-what-it-is-when-to-use-it-and-what-are-the-best-tools-out-there"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123979"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359642"},{"key":"e_1_3_2_1_43_1","volume-title":"Proceedings of the 43rd International Symposium on Computer Architecture (ISCA'16)","author":"Reagen Brandon","year":"2016","unstructured":"Brandon Reagen , Paul Whatmough , Robert Adolf , Saketh Rama , Hyunkwang Lee , Sae Kyu Lee , Jos\u00e9 Miguel Hern\u00e1ndez-Lobato , GuYeon Wei , and David Brooks . 2016 . Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators . In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA'16) . Seoul, Republic of Korea. Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, Jos\u00e9 Miguel Hern\u00e1ndez-Lobato, GuYeon Wei, and David Brooks. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA'16). Seoul, Republic of Korea."},{"key":"e_1_3_2_1_44_1","volume-title":"Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou.","author":"Reddi Vijay Janapa","year":"2019","unstructured":"Vijay Janapa Reddi , Christine Cheng , David Kanter , Peter Mattson , Guenther Schmuelling , Carole-Jean Wu , Brian Anderson , Maximilien Breughe , Mark Charlebois , William Chou , Ramesh Chukka , Cody Coleman , Sam Davis , Pan Deng , Greg Diamos , Jared Duke , Dave Fick , J. Scott Gardner , Itay Hubara , Sachin Idgunji , Thomas B. Jablin , Jeff Jiao , Tom St. John , Pankaj Kanwar , David Lee , Jeffery Liao , Anton Lokhmotov , Francisco Massa , Peng Meng , Paulius Micikevicius , Colin Osborne , Gennady Pekhimenko , Arun Tejusve Raghunath Rajan , Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2019 . MLPerf Inference Benchmarks . arXiv 1911.02549 (2019). Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2019. MLPerf Inference Benchmarks. arXiv 1911.02549 (2019)."},{"key":"e_1_3_2_1_45_1","unstructured":"RUN:AI. 2022. Google TPU Architecture and Performance Best Practices. https:\/\/www.run.ai\/guides\/cloud-deep-learning\/google-tpu  RUN:AI. 2022. Google TPU Architecture and Performance Best Practices. https:\/\/www.run.ai\/guides\/cloud-deep-learning\/google-tpu"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00032"},{"key":"e_1_3_2_1_47_1","unstructured":"Alexander Spiridonov. 2021. New Cloud TPU VMs make training your ML models on TPUs easier than ever. https:\/\/cloud.google.com\/blog\/products\/compute\/introducing-cloud-tpu-vms  Alexander Spiridonov. 2021. New Cloud TPU VMs make training your ML models on TPUs easier than ever. https:\/\/cloud.google.com\/blog\/products\/compute\/introducing-cloud-tpu-vms"},{"key":"e_1_3_2_1_48_1","unstructured":"Google TensorFlow. 2021. TensorFlow Model Optimization Toolkit --- Collaborative Optimization API. https:\/\/blog.tensorflow.org\/2021\/10\/Collaborative-Optimizations.html  Google TensorFlow. 2021. TensorFlow Model Optimization Toolkit --- Collaborative Optimization API. https:\/\/blog.tensorflow.org\/2021\/10\/Collaborative-Optimizations.html"},{"key":"e_1_3_2_1_49_1","unstructured":"Google TensorFlow. 2022. Create production-grade machine learning models with TensorFlow. https:\/\/www.tensorflow.org\/  Google TensorFlow. 2022. Create production-grade machine learning models with TensorFlow. https:\/\/www.tensorflow.org\/"},{"key":"e_1_3_2_1_50_1","volume-title":"Proccedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)","author":"Unger Colin","year":"2022","unstructured":"Colin Unger , Zhihao Jia , Wei Wu , Sina Lin , Mandeep Baines , Carlos Efrain Quintero Narvaez , Vinay Ramakrishnaiah , Nirmal Prajapati , Pat McCormick , Jamaludin Mohd-Yusof , Xi Luo , Dheevatsa Mudigere , Jongsoo Park , Misha Smelyanskiy , and Alex Aiken . 2022 . Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization . In Proccedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22) . Carlsbad, CA. Colin Unger, Zhihao Jia, Wei Wu, Sina Lin, Mandeep Baines, Carlos Efrain Quintero Narvaez, Vinay Ramakrishnaiah, Nirmal Prajapati, Pat McCormick, Jamaludin Mohd-Yusof, Xi Luo, Dheevatsa Mudigere, Jongsoo Park, Misha Smelyanskiy, and Alex Aiken. 2022. Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. In Proccedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). Carlsbad, CA."},{"key":"e_1_3_2_1_51_1","volume-title":"Proceedings of the 44th International Symposium on Computer Architecture (ISCA'17)","author":"Venkataramani Swagath","year":"2017","unstructured":"Swagath Venkataramani , Ashish Ranjan , Subarno Banerjee , Dipankar Das , Sasikanth Avancha , Ashok Jagannathan , Ajaya Durg , Dheemanth Nagaraj , Bharat Kaul , Pradeep Dubey , and Anand Raghunathan . 2017 . ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks . In Proceedings of the 44th International Symposium on Computer Architecture (ISCA'17) . Toronto, Canada. Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proceedings of the 44th International Symposium on Computer Architecture (ISCA'17). Toronto, Canada."},{"key":"e_1_3_2_1_52_1","unstructured":"Kyle Wiggers. 2022. Microsoft and NVIDIA team up to build new Azure-hosted AI supercomputer. https:\/\/techcrunch.com\/2022\/11\/16\/microsoft-and-nvidia-team-up-to-build-new-azure-hosted-ai-supercomputer\/  Kyle Wiggers. 2022. Microsoft and NVIDIA team up to build new Azure-hosted AI supercomputer. https:\/\/techcrunch.com\/2022\/11\/16\/microsoft-and-nvidia-team-up-to-build-new-azure-hosted-ai-supercomputer\/"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378491"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446699"},{"key":"e_1_3_2_1_55_1","volume-title":"Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22)","author":"Zheng Lianmin","year":"2022","unstructured":"Lianmin Zheng , Zhuohan Li , Hao Zhang , Yonghao Zhuang , Zhifeng Chen , Yanping Huang , Yida Wang , Yuanzhong Xu , Danyang Zhuo , Eric P. Xing , Joseph E. Gonzalez , and Ion Stoica . 2022 . Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning . In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22) . Carlsbad, CA. Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, and Ion Stoica. 2022. Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22). Carlsbad, CA."}],"event":{"name":"ISCA '23: 50th Annual International Symposium on Computer Architecture","location":"Orlando FL USA","acronym":"ISCA '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","IEEE"]},"container-title":["Proceedings of the 50th Annual International Symposium on Computer Architecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579371.3589059","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:38Z","timestamp":1750178798000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579371.3589059"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,17]]},"references-count":55,"alternative-id":["10.1145\/3579371.3589059","10.1145\/3579371"],"URL":"https:\/\/doi.org\/10.1145\/3579371.3589059","relation":{},"subject":[],"published":{"date-parts":[[2023,6,17]]},"assertion":[{"value":"2023-06-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}