{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T16:58:19Z","timestamp":1768064299611,"version":"3.49.0"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2021,1,7]],"date-time":"2021-01-07T00:00:00Z","timestamp":1609977600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,3,31]]},"abstract":"<jats:p>Deep Neural Networks (DNNs) have revolutionized many aspects of our lives. The use of DNNs is becoming ubiquitous, including in software for image recognition, speech recognition, speech synthesis, language translation, to name a few. The training of DNN architectures, however, is computationally expensive. Once the model is created, its use in the intended application\u2014the inference task, is computationally heavy too and the inference needs to be fast for real time use. For obtaining high performance today, the code of Deep Learning (DL) primitives optimized for specific architectures by expert programmers exposed via libraries is the norm. However, given the constant emergence of new DNN architectures, creating hand optimized code is expensive, slow and is not scalable.<\/jats:p>\n          <jats:p>To address this performance-productivity challenge, in this article we present compiler algorithms to automatically generate high-performance implementations of DL primitives that closely match the performance of hand optimized libraries. We develop novel data reuse analysis algorithms using the polyhedral model to derive efficient execution schedules automatically. In addition, because most DL primitives use some variant of matrix multiplication at their core, we develop a flexible framework where it is possible to plug in library implementations of the same in lieu of a subset of the loops. We show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance. We develop compiler algorithms to also perform operator fusions that reduce data movement through the memory hierarchy of the computer system. Using Convolution Neural Network (CNN) models and matrix multiplication operations, we demonstrate that our approach automatically creates high performing DNN building blocks whose performance matches the performance of hand-crafted kernels of Intel\u2019s oneDNN library on high end CPUs. At the same time, our techniques take only a fraction of time (1\/20 or less) compared to AutoTVM, a deep learning auto-tuner to create optimized implementations.<\/jats:p>","DOI":"10.1145\/3433103","type":"journal-article","created":{"date-parts":[[2021,1,8]],"date-time":"2021-01-08T05:12:19Z","timestamp":1610082739000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["PolyDL"],"prefix":"10.1145","volume":"18","author":[{"given":"Sanket","family":"Tavarageri","sequence":"first","affiliation":[{"name":"Intel Labs, Bengaluru, Karnataka, India"}]},{"given":"Alexander","family":"Heinecke","sequence":"additional","affiliation":[{"name":"Intel Labs, Bengaluru, Karnataka, India"}]},{"given":"Sasikanth","family":"Avancha","sequence":"additional","affiliation":[{"name":"Intel Labs, Bengaluru, Karnataka, India"}]},{"given":"Bharat","family":"Kaul","sequence":"additional","affiliation":[{"name":"Intel Labs, Bengaluru, Karnataka, India"}]},{"given":"Gagandeep","family":"Goyal","sequence":"additional","affiliation":[{"name":"IIT Hyderabad"}]},{"given":"Ramakrishna","family":"Upadrasta","sequence":"additional","affiliation":[{"name":"IIT Hyderabad"}]}],"member":"320","published-online":{"date-parts":[[2021,1,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2020. Google is AI first: 12 AI projects powering Google products. Retrieved from https:\/\/blog.aimultiple.com\/ai-is-already-at-the-heart-of-google.  2020. Google is AI first: 12 AI projects powering Google products. Retrieved from https:\/\/blog.aimultiple.com\/ai-is-already-at-the-heart-of-google."},{"key":"e_1_2_1_2_1","unstructured":"2020. How to optimize GEMM on CPU. Retrieved from https:\/\/tvm.apache.org\/docs\/tutorials\/optimize\/opt_gemm.html.  2020. How to optimize GEMM on CPU. Retrieved from https:\/\/tvm.apache.org\/docs\/tutorials\/optimize\/opt_gemm.html."},{"key":"e_1_2_1_3_1","unstructured":"2020. Library targeting Intel Architecture for specialized dense and sparse matrix operations and deep learning primitives.Retrieved from https:\/\/github.com\/hfp\/libxsmm.  2020. Library targeting Intel Architecture for specialized dense and sparse matrix operations and deep learning primitives.Retrieved from https:\/\/github.com\/hfp\/libxsmm."},{"key":"e_1_2_1_4_1","unstructured":"2020. oneAPI Deep Neural Network Library (oneDNN). Retrieved from https:\/\/github.com\/oneapi-src\/oneDNN.  2020. oneAPI Deep Neural Network Library (oneDNN). Retrieved from https:\/\/github.com\/oneapi-src\/oneDNN."},{"key":"e_1_2_1_5_1","unstructured":"2015. Why GEMM is at the heart of deep learning. Retrieved from https:\/\/petewarden.com\/2015\/04\/20\/why-gemm-is-at-the-heart-of-deep-learning\/.  2015. Why GEMM is at the heart of deep learning. Retrieved from https:\/\/petewarden.com\/2015\/04\/20\/why-gemm-is-at-the-heart-of-deep-learning\/."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322967"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661197"},{"key":"e_1_2_1_8_1","first-page":"32","article-title":"Analytical modeling of cache behavior for affine programs","volume":"2","author":"Bao Wenlei","year":"2017","unstructured":"Wenlei Bao , Sriram Krishnamoorthy , Louis-Noel Pouchet , and P. Sadayappan . 2017 . Analytical modeling of cache behavior for affine programs . Proc. ACM Program. Lang. 2 (2017), 32 . Wenlei Bao, Sriram Krishnamoorthy, Louis-Noel Pouchet, and P. Sadayappan. 2017. Analytical modeling of cache behavior for affine programs. Proc. ACM Program. Lang. 2 (2017), 32.","journal-title":"Proc. ACM Program. Lang."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2007.22"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772983"},{"key":"e_1_2_1_11_1","unstructured":"Gaurav Batra Zach Jacobson Siddarth Madhav Andrea Queirolo and Nick Santhanam. 2018. Artificial-intelligence hardware: New opportunities for semiconductor companies. Retrieved from https:\/\/www.mckinsey.com\/industries\/semiconductors\/our-insights.  Gaurav Batra Zach Jacobson Siddarth Madhav Andrea Queirolo and Nick Santhanam. 2018. Artificial-intelligence hardware: New opportunities for semiconductor companies. Retrieved from https:\/\/www.mckinsey.com\/industries\/semiconductors\/our-insights."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854317"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201908)","author":"Bondhugula Uday","unstructured":"Uday Bondhugula , Albert Hartono , J. Ramanujam , and P. Sadayappan . 2008. A practical automatic polyhedral program optimization system . In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201908) . Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral program optimization system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201908)."},{"key":"e_1_2_1_14_1","volume-title":"Model-guided Empirical Optimization for Memory Hierarchy","author":"Chen Chun","unstructured":"Chun Chen . 2007. Model-guided Empirical Optimization for Memory Hierarchy . University of Southern California . Chun Chen. 2007. Model-guided Empirical Optimization for Memory Hierarchy. University of Southern California."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze et \u00a0al . 2018 . TVM: An automated end-to-end optimizing compiler for deep learning . In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918) . 578--594. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze et\u00a0al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918). 578--594."},{"key":"e_1_2_1_16_1","volume-title":"Advances in Neural Information Processing Systems","author":"Chen Tianqi","unstructured":"Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . 2018. Learning to optimize tensor programs . In Advances in Neural Information Processing Systems . Mit Press , 3389--3400. Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to optimize tensor programs. In Advances in Neural Information Processing Systems. Mit Press, 3389--3400."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the ACM\/IEEE conference on Supercomputing. IEEE Computer Society, 30","author":"Chung Hsin","year":"2004","unstructured":"I.- Hsin Chung , Jeffrey K. Hollingsworth , et\u00a0al. 2004 . Using information from prior runs to improve automated tuning systems . In Proceedings of the ACM\/IEEE conference on Supercomputing. IEEE Computer Society, 30 . I.-Hsin Chung, Jeffrey K. Hollingsworth, et\u00a0al. 2004. Using information from prior runs to improve automated tuning systems. In Proceedings of the ACM\/IEEE conference on Supercomputing. IEEE Computer Society, 30."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, 172--186","author":"Cornwall Jay L. T.","year":"2007","unstructured":"Jay L. T. Cornwall , Paul H. J. Kelly , Phil Parsonage , and Bruno Nicoletti . 2007 . Explicit dependence metadata in an active visual effects library . In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, 172--186 . Jay L. T. Cornwall, Paul H. J. Kelly, Phil Parsonage, and Bruno Nicoletti. 2007. Explicit dependence metadata in an active visual effects library. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, 172--186."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT\u201914)","author":"Darte Alain","year":"2014","unstructured":"Alain Darte , Alexandre Isoard et \u00a0al . 2014 . Parametric tiling with inter-tile data reuse . In Proceedings of the International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT\u201914) . Alain Darte, Alexandre Isoard et\u00a0al. 2014. Parametric tiling with inter-tile data reuse. In Proceedings of the International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT\u201914)."},{"key":"e_1_2_1_20_1","unstructured":"AutoTVM developers. 2020. AutoTVM\u2019s X86 specific code. Retrieved from https:\/\/github.com\/apache\/incubator-tvm\/blob\/master\/topi\/python\/topi\/x86.  AutoTVM developers. 2020. AutoTVM\u2019s X86 specific code. Retrieved from https:\/\/github.com\/apache\/incubator-tvm\/blob\/master\/topi\/python\/topi\/x86."},{"key":"e_1_2_1_21_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding.","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https:\/\/Arxiv:1810.04805. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https:\/\/Arxiv:1810.04805."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211346.3211354"},{"key":"e_1_2_1_23_1","volume-title":"The Data Parallel Programming Model","author":"Feautrier Paul","unstructured":"Paul Feautrier . 1996. Automatic parallelization in the polytope model . In The Data Parallel Programming Model . Springer , 79--103. Paul Feautrier. 1996. Automatic parallelization in the polytope model. In The Data Parallel Programming Model. Springer, 79--103."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS47924.2020.00032"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263657"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Ross Girshick. 2015. Fast R-CNN. Retrieved from https:\/\/arxiv:cs.CV\/1504.08083.  Ross Girshick. 2015. Fast R-CNN. Retrieved from https:\/\/arxiv:cs.CV\/1504.08083.","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1356052.1356053"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314606"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542301"},{"key":"e_1_2_1_30_1","volume-title":"Girshick","author":"He Kaiming","year":"2017","unstructured":"Kaiming He , Georgia Gkioxari , Piotr Doll\u00e1r , and Ross B . Girshick . 2017 . Mask R-CNN. Retrieved from http:\/\/arxiv.org\/abs\/1703.06870. Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross B. Girshick. 2017. Mask R-CNN. Retrieved from http:\/\/arxiv.org\/abs\/1703.06870."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.83"},{"key":"e_1_2_1_33_1","volume-title":"et\u00a0al","author":"Hinton Geoffrey","year":"2012","unstructured":"Geoffrey Hinton , Li Deng , Dong Yu , George Dahl , Abdel-rahman Mohamed, Navdeep Jaitly , Andrew Senior , Vincent Vanhoucke , Patrick Nguyen , Brian Kingsbury , et\u00a0al . 2012 . Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag . 29 (2012). Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et\u00a0al. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29 (2012)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314653"},{"key":"e_1_2_1_36_1","unstructured":"Alex Krizhevsky and Geoff Hinton. 2010. Convolutional deep belief networks on CIFAR-10. Unpublished manuscript. 1--9.  Alex Krizhevsky and Geoff Hinton. 2010. Convolutional deep belief networks on CIFAR-10. Unpublished manuscript. 1--9."},{"key":"e_1_2_1_37_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press , 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 1097--1105."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1379022.1375594"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.14"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00015"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185528"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250734.1250780"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3104988"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the 15th Workshop on Compilers for Parallel Computers.","author":"Tavarageri Sanket","unstructured":"Sanket Tavarageri , Albert Hartono , Muthu Baskaran , Louis-No\u00ebl Pouchet , J. Ramanujam , and P. Sadayappan . 2010. Parametric tiling of affine loop nests . In Proceedings of the 15th Workshop on Compilers for Parallel Computers. Sanket Tavarageri, Albert Hartono, Muthu Baskaran, Louis-No\u00ebl Pouchet, J. Ramanujam, and P. Sadayappan. 2010. Parametric tiling of affine loop nests. In Proceedings of the 15th Workshop on Compilers for Parallel Computers."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342013493939"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1--12","author":"Tiwari Ananta","unstructured":"Ananta Tiwari , Chun Chen , Jacqueline Chame , Mary Hall , and Jeffrey K. Hollingsworth . 2009. A scalable auto-tuning framework for compiler optimization . In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1--12 . Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, and Jeffrey K. Hollingsworth. 2009. A scalable auto-tuning framework for compiler optimization. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1--12."},{"key":"e_1_2_1_47_1","unstructured":"Nicolas Vasilache Oleksandr Zinenko Theodoros Theodoridis Priya Goyal Zachary DeVito William S. Moses Sven Verdoolaege Andrew Adams and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. Retrieved from https:\/\/Arxiv:1802.04730.  Nicolas Vasilache Oleksandr Zinenko Theodoros Theodoridis Priya Goyal Zachary DeVito William S. Moses Sven Verdoolaege Andrew Adams and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. Retrieved from https:\/\/Arxiv:1802.04730."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342019866247"},{"key":"e_1_2_1_49_1","volume-title":"Tyler Michael Smith, Robert van de Geijn, and Franz Franchetti.","author":"Veras Richard Michael","year":"2016","unstructured":"Richard Michael Veras , Tze Meng Low , Tyler Michael Smith, Robert van de Geijn, and Franz Franchetti. 2016 . Automating the last-mile for high performance dense linear algebra. Retrieved from https:\/\/Arxiv:1611.08035. Richard Michael Veras, Tze Meng Low, Tyler Michael Smith, Robert van de Geijn, and Franz Franchetti. 2016. Automating the last-mile for high performance dense linear algebra. Retrieved from https:\/\/Arxiv:1611.08035."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15582-6_49"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 2nd International Workshop on Polyhedral Compilation Techniques (IMPACT\u201912)","author":"Verdoolaege Sven","year":"2012","unstructured":"Sven Verdoolaege and Tobias Grosser . 2012 . Polyhedral extraction tool . In Proceedings of the 2nd International Workshop on Polyhedral Compilation Techniques (IMPACT\u201912) . Sven Verdoolaege and Tobias Grosser. 2012. Polyhedral extraction tool. In Proceedings of the 2nd International Workshop on Polyhedral Compilation Techniques (IMPACT\u201912)."},{"key":"e_1_2_1_52_1","unstructured":"Yao Wang and Eddie Yan. 2020. Auto-tuning a convolutional network for x86 CPU. Retrieved from https:\/\/docs.tvm.ai\/tutorials\/autotvm\/tune_relay_x86.html.  Yao Wang and Eddie Yan. 2020. Auto-tuning a convolutional network for x86 CPU. Retrieved from https:\/\/docs.tvm.ai\/tutorials\/autotvm\/tune_relay_x86.html."},{"key":"e_1_2_1_53_1","volume-title":"Klaus Macherey et\u00a0al","author":"Wu Yonghui","year":"2016","unstructured":"Yonghui Wu , Mike Schuster , Zhifeng Chen , Quoc V. Le , Mohammad Norouzi , Wolfgang Macherey , Maxim Krikun , Yuan Cao , Qin Gao , Klaus Macherey et\u00a0al . 2016 . Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. Retrieved from https:\/\/Arxiv:1609.08144. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey et\u00a0al. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. Retrieved from https:\/\/Arxiv:1609.08144."},{"key":"e_1_2_1_54_1","first-page":"905","article-title":"Mapping the intel last-level cache","volume":"2015","author":"Yarom Yuval","year":"2015","unstructured":"Yuval Yarom , Qian Ge , Fangfei Liu , Ruby B. Lee , and Gernot Heiser . 2015 . Mapping the intel last-level cache . IACR Cryptol. ePrint Arch. 2015 (2015), 905 . Yuval Yarom, Qian Ge, Fangfei Liu, Ruby B. Lee, and Gernot Heiser. 2015. Mapping the intel last-level cache. IACR Cryptol. ePrint Arch. 2015 (2015), 905.","journal-title":"IACR Cryptol. ePrint Arch."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2755561"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3433103","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3433103","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:10Z","timestamp":1750193290000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3433103"}},"subtitle":["Polyhedral Optimizations for Creation of High-performance DL Primitives"],"short-title":[],"issued":{"date-parts":[[2021,1,7]]},"references-count":55,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,3,31]]}},"alternative-id":["10.1145\/3433103"],"URL":"https:\/\/doi.org\/10.1145\/3433103","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,7]]},"assertion":[{"value":"2020-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}