{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T08:19:38Z","timestamp":1780561178056,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,4,17]],"date-time":"2021-04-17T00:00:00Z","timestamp":1618617600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["1946752,2018016"],"award-info":[{"award-number":["1946752,2018016"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,4,19]]},"DOI":"10.1145\/3445814.3446759","type":"proceedings-article","created":{"date-parts":[[2021,4,11]],"date-time":"2021-04-11T17:06:26Z","timestamp":1618160786000},"page":"928-942","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":51,"title":["Analytical characterization and design space exploration for optimization of CNNs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9847-5642","authenticated-orcid":false,"given":"Rui","family":"Li","sequence":"first","affiliation":[{"name":"University of Utah, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7787-6460","authenticated-orcid":false,"given":"Yufan","family":"Xu","sequence":"additional","affiliation":[{"name":"University of Utah, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4062-0293","authenticated-orcid":false,"given":"Aravind","family":"Sukumaran-Rajam","sequence":"additional","affiliation":[{"name":"Washington State University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Atanas","family":"Rountev","sequence":"additional","affiliation":[{"name":"Ohio State University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"P.","family":"Sadayappan","sequence":"additional","affiliation":[{"name":"University of Utah, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,4,17]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 265-283","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jefrey Dean , Matthieu Devin , Sanjay Ghemawat , Geofrey Irving , Michael Isard , 2016 . Tensorflow: A system for large-scale machine learning . In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 265-283 . Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 265-283."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/3314872.3314896"},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the ACM on Programming Languages 2, POPL ( 2017 ), 1-26","author":"Bao Wenlei","year":"2017","unstructured":"Wenlei Bao , Sriram Krishnamoorthy , Louis-Noel Pouchet , and Ponnuswamy Sadayappan . 2017 . Analytical modeling of cache behavior for afine programs . Proceedings of the ACM on Programming Languages 2, POPL ( 2017 ), 1-26 . Wenlei Bao, Sriram Krishnamoorthy, Louis-Noel Pouchet, and Ponnuswamy Sadayappan. 2017. Analytical modeling of cache behavior for afine programs. Proceedings of the ACM on Programming Languages 2, POPL ( 2017 ), 1-26."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2004.1342537"},{"key":"e_1_3_2_1_5_1","volume-title":"Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 101-113","author":"Bondhugula U.","unstructured":"U. Bondhugula , A. Hartono , J. Ramanujam , and P. Sadayappan . 2008. A practical automatic polyhedral parallelizer and locality optimizer . In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 101-113 . U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 101-113."},{"key":"e_1_3_2_1_6_1","volume-title":"Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Meghan Cowan , Haichen Shen , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . 2018 . TVM: An Automated End-to-End Optimizing Compiler for Deep Learning . In Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI). Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2003.1213121"},{"key":"e_1_3_2_1_8_1","article-title":"Some eficient solutions to the afine scheduling problem. I. One-dimensional time","volume":"21","author":"Feautrier Paul","year":"1992","unstructured":"Paul Feautrier . 1992 . Some eficient solutions to the afine scheduling problem. I. One-dimensional time . International Journal of Parallel Programming 21 , 5 ( 1992 ), 313-347. Paul Feautrier. 1992. Some eficient solutions to the afine scheduling problem. I. One-dimensional time. International Journal of Parallel Programming 21, 5 ( 1992 ), 313-347.","journal-title":"International Journal of Parallel Programming"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Robert Fourer David M Gay and Brian W Kernighan. 1990. A modeling language for mathematical programming. Management Science 36 5 ( 1990 ) 519-554.  Robert Fourer David M Gay and Brian W Kernighan. 1990. A modeling language for mathematical programming. Management Science 36 5 ( 1990 ) 519-554.","DOI":"10.1287\/mnsc.36.5.519"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1297027.1297033"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626412500107"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314606"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"e_1_3_2_1_14_1","volume-title":"Mobilenets: Eficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 ( 2017 ).","author":"Howard Andrew G","year":"2017","unstructured":"Andrew G Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , and Hartwig Adam . 2017 . Mobilenets: Eficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 ( 2017 ). Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Eficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 ( 2017 )."},{"key":"e_1_3_2_1_15_1","volume-title":"Deep learning with Python","author":"Ketkar Nikhil","unstructured":"Nikhil Ketkar . 2017. Introduction to keras . In Deep learning with Python . Springer , 97-111. Nikhil Ketkar. 2017. Introduction to keras. In Deep learning with Python. Springer, 97-111."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.773"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2005.06.017"},{"key":"e_1_3_2_1_18_1","volume-title":"MLIR: A Compiler Infrastructure for the End of Moore's Law. arXiv","author":"Lattner Chris","year":"2020","unstructured":"Chris Lattner , Mehdi Amini , Uday Bondhugula , Albert Cohen , Andy Davis , Jacques Pienaar , River Riddle , Tatiana Shpeisman , Nicolas Vasilache , and Oleksandr Zinenko . 2020 . MLIR: A Compiler Infrastructure for the End of Moore's Law. arXiv : 2002. 11054 [cs.PL] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore's Law. arXiv: 2002. 11054 [cs.PL]"},{"key":"e_1_3_2_1_19_1","volume-title":"XLA: TensorFlow, compiled. TensorFlow Dev Summit ( 2017 ).","author":"Leary Chris","year":"2017","unstructured":"Chris Leary and Todd Wang . 2017 . XLA: TensorFlow, compiled. TensorFlow Dev Summit ( 2017 ). Chris Leary and Todd Wang. 2017. XLA: TensorFlow, compiled. TensorFlow Dev Summit ( 2017 )."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/3014904.3014977"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356218"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.53"},{"key":"e_1_3_2_1_23_1","first-page":"1025","volume-title":"Optimizing CNN Model Inference on CPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Liu Yizhi","year":"2019","unstructured":"Yizhi Liu , Yao Wang , Ruofei Yu , Mu Li , Vin Sharma , and Yida Wang . 2019 . Optimizing CNN Model Inference on CPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . 1025 - 1040 . Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. 2019. Optimizing CNN Model Inference on CPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 1025-1040."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925987"},{"key":"e_1_3_2_1_25_1","unstructured":"oneDNN 2020. Intel oneAPI Deep Neural Network Library (oneDNN). https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/documentation\/ oneapi-programming-guide\/top\/api-based-programming\/intel-oneapi-deepneural-network-library-onednn.html.  oneDNN 2020. Intel oneAPI Deep Neural Network Library (oneDNN). https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/documentation\/ oneapi-programming-guide\/top\/api-based-programming\/intel-oneapi-deepneural-network-library-onednn.html."},{"key":"e_1_3_2_1_26_1","first-page":"8024","article-title":"PyTorch: An imperative style, high-performance deep learning library","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , 2019 . PyTorch: An imperative style, high-performance deep learning library . In Advances in Neural Information Processing Systems. 8024 - 8035 . Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024-8035.","journal-title":"Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_1_27_1","unstructured":"PlaidML 2017. PlaidML. https:\/\/www.intel.ai\/plaidML.  PlaidML 2017. PlaidML. https:\/\/www.intel.ai\/plaidML."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Jonathan Ragan-Kelley Connelly Barnes Andrew Adams Sylvain Paris Fr\u00e9do Durand and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism locality and recomputation in image processing pipelines. Acm Sigplan Notices 48 6 ( 2013 ) 519-530.  Jonathan Ragan-Kelley Connelly Barnes Andrew Adams Sylvain Paris Fr\u00e9do Durand and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism locality and recomputation in image processing pipelines. Acm Sigplan Notices 48 6 ( 2013 ) 519-530.","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.690"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1413370.1413426"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2000.842294"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28652-0_6"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2010.38"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/MLHPC.2016.005"},{"key":"e_1_3_2_1_35_1","volume-title":"Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv","author":"Vasilache Nicolas","year":"1802","unstructured":"Nicolas Vasilache , Oleksandr Zinenko , Theodoros Theodoridis , Priya Goyal , Zachary DeVito , William S Moses , Sven Verdoolaege , Andrew Adams , and Albert Cohen . 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv : 1802 . 04730 ( 2018 ). Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv: 1802. 04730 ( 2018 )."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400713"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Andreas W\u00e4chter and Lorenz T Biegler. 2006. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming 106 1 ( 2006 ) 25-57.  Andreas W\u00e4chter and Lorenz T Biegler. 2006. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming 106 1 ( 2006 ) 25-57.","DOI":"10.1007\/s10107-004-0559-y"},{"key":"e_1_3_2_1_38_1","unstructured":"Yao Wang and Animesh Jain. 2019. TVM CNN Tuning Script. https:\/\/github.com\/ apache\/incubator-tvm\/blob\/v0.6\/topi\/python\/topi\/x86\/conv2d.py.  Yao Wang and Animesh Jain. 2019. TVM CNN Tuning Script. https:\/\/github.com\/ apache\/incubator-tvm\/blob\/v0.6\/topi\/python\/topi\/x86\/conv2d.py."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772982"},{"key":"e_1_3_2_1_40_1","volume-title":"Developer Documentation: Automatic Kernel Optimization for Deep Learning on All Hardware Platforms. https:\/\/tvm.apache.org\/ 2018 \/10\/03\/auto-opt-all.","author":"Zheng Lianmin","year":"2018","unstructured":"Lianmin Zheng , Eddie Yan , and Tianqi Chen . 2018 . Developer Documentation: Automatic Kernel Optimization for Deep Learning on All Hardware Platforms. https:\/\/tvm.apache.org\/ 2018 \/10\/03\/auto-opt-all. Lianmin Zheng, Eddie Yan, and Tianqi Chen. 2018. Developer Documentation: Automatic Kernel Optimization for Deep Learning on All Hardware Platforms. https:\/\/tvm.apache.org\/ 2018 \/10\/03\/auto-opt-all."}],"event":{"name":"ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems","location":"Virtual USA","acronym":"ASPLOS '21","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3445814.3446759","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3445814.3446759","content-type":"text\/html","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3445814.3446759","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3445814.3446759","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:24:33Z","timestamp":1750195473000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3445814.3446759"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,17]]},"references-count":40,"alternative-id":["10.1145\/3445814.3446759","10.1145\/3445814"],"URL":"https:\/\/doi.org\/10.1145\/3445814.3446759","relation":{},"subject":[],"published":{"date-parts":[[2021,4,17]]},"assertion":[{"value":"2021-04-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}