{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:31:20Z","timestamp":1750221080264,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":62,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,11,1]],"date-time":"2018-11-01T00:00:00Z","timestamp":1541030400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,11]]},"DOI":"10.1145\/3243176.3243177","type":"proceedings-article","created":{"date-parts":[[2018,10,10]],"date-time":"2018-10-10T13:32:32Z","timestamp":1539178352000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Architectural support for convolutional neural networks on modern CPUs"],"prefix":"10.1145","author":[{"given":"Animesh","family":"Jain","sequence":"first","affiliation":[{"name":"University of Michigan"}]},{"given":"Michael A.","family":"Laurenzano","sequence":"additional","affiliation":[{"name":"University of Michigan"}]},{"given":"Gilles A.","family":"Pokam","sequence":"additional","affiliation":[{"name":"Intel Labs"}]},{"given":"Jason","family":"Mars","sequence":"additional","affiliation":[{"name":"University of Michigan"}]},{"given":"Lingjia","family":"Tang","sequence":"additional","affiliation":[{"name":"University of Michigan"}]}],"member":"320","published-online":{"date-parts":[[2018,11]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Intel Math Kernel Library. In http:\/\/software.intel.com\/en-us\/articles\/intel-mkl\/.  Intel Math Kernel Library. In http:\/\/software.intel.com\/en-us\/articles\/intel-mkl\/ ."},{"key":"e_1_3_2_1_2_1","unstructured":"NervanaGPU library. In https:\/\/github.com\/NervanaSystems\/nervanagpu.  NervanaGPU library. In https:\/\/github.com\/NervanaSystems\/nervanagpu ."},{"key":"e_1_3_2_1_3_1","volume-title":"http:\/\/www.intel.com\/content\/www\/us\/en\/io\/quickpath-technology\/quickpath-technology-general.html","author":"An","year":"2009","unstructured":"An introduction to the intel quickpath interconnect. In http:\/\/www.intel.com\/content\/www\/us\/en\/io\/quickpath-technology\/quickpath-technology-general.html , 2009 . An introduction to the intel quickpath interconnect. In http:\/\/www.intel.com\/content\/www\/us\/en\/io\/quickpath-technology\/quickpath-technology-general.html, 2009."},{"key":"e_1_3_2_1_4_1","volume-title":"https:\/\/www.arm.com\/files\/pdf\/System-MMU-Whitepaper-v8.0.pdf","author":"Virtualization","year":"2011","unstructured":"Virtualization is coming to a platform near you. In https:\/\/www.arm.com\/files\/pdf\/System-MMU-Whitepaper-v8.0.pdf , 2011 . Virtualization is coming to a platform near you. In https:\/\/www.arm.com\/files\/pdf\/System-MMU-Whitepaper-v8.0.pdf, 2011."},{"key":"e_1_3_2_1_5_1","unstructured":"Intel advanced encryption standard (aes) new instructions set. 2012.  Intel advanced encryption standard (aes) new instructions set. 2012."},{"key":"e_1_3_2_1_6_1","unstructured":"AMD64 architecture programmer's manual. 2013.  AMD64 architecture programmer's manual. 2013."},{"key":"e_1_3_2_1_7_1","volume-title":"http:\/\/www.nvidia.com\/object\/nvlink.html","author":"Nvidia","year":"2016","unstructured":"Nvidia nvlink high-speed interconnect. In http:\/\/www.nvidia.com\/object\/nvlink.html , 2016 . Nvidia nvlink high-speed interconnect. In http:\/\/www.nvidia.com\/object\/nvlink.html, 2016."},{"key":"e_1_3_2_1_8_1","volume-title":"http:\/\/www.nvidia.com\/object\/nvlink.html","author":"Nvidia","year":"2016","unstructured":"Nvidia nvlink high-speed interconnect. In http:\/\/www.nvidia.com\/object\/nvlink.html , 2016 . Nvidia nvlink high-speed interconnect. In http:\/\/www.nvidia.com\/object\/nvlink.html, 2016."},{"key":"e_1_3_2_1_9_1","unstructured":"ARM architecture reference manual. 2017.  ARM architecture reference manual. 2017."},{"key":"e_1_3_2_1_10_1","volume-title":"Volume 3","author":"Intel","year":"2017","unstructured":"Intel 64 and ia-32 architectures software developer's manual. In Volume 3 , 2017 . Intel 64 and ia-32 architectures software developer's manual. In Volume 3, 2017."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.11"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2908080.2908111"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3015146"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941487.1941507"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063454"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541967"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.40"},{"key":"e_1_3_2_1_19_1","volume-title":"cuDNN: Efficient primitives for deep learning. In arXiV:1410.0759","author":"Chetlur S.","year":"2014","unstructured":"S. Chetlur , C. Woolley , P. Vandermersch , J. Cohen , J. Tran , B. Catanzaro , and E. Shelhamer . cuDNN: Efficient primitives for deep learning. In arXiV:1410.0759 , 2014 . S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cuDNN: Efficient primitives for deep learning. In arXiV:1410.0759, 2014."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","DOI":"10.1561\/9781601988157","volume-title":"Deep learning: Methods and applications. Technical report","author":"Deng L.","year":"2014","unstructured":"L. Deng and D. Yu . Deep learning: Methods and applications. Technical report , 2014 . L. Deng and D. Yu. Deep learning: Methods and applications. Technical report, 2014."},{"key":"e_1_3_2_1_21_1","volume-title":"NNPACK: Acceleration package for neural networks on multi-core cpus. In https:\/\/github.com\/Maratyszcza\/NNPACK","author":"Dukhan M.","year":"2016","unstructured":"M. Dukhan . NNPACK: Acceleration package for neural networks on multi-core cpus. In https:\/\/github.com\/Maratyszcza\/NNPACK , 2016 . M. Dukhan. NNPACK: Acceleration package for neural networks on multi-core cpus. In https:\/\/github.com\/Maratyszcza\/NNPACK, 2016."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840491"},{"key":"e_1_3_2_1_23_1","volume-title":"A deep architecture for semantic parsing. In arXiV:1404.7296","author":"Grefenstette E.","year":"2014","unstructured":"E. Grefenstette , P. Blunsom , N. de Freitas , and K. M. Hermann . A deep architecture for semantic parsing. In arXiV:1404.7296 , 2014 . E. Grefenstette, P. Blunsom, N. de Freitas, and K. M. Hermann. A deep architecture for semantic parsing. In arXiV:1404.7296, 2014."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.30"},{"key":"e_1_3_2_1_25_1","volume-title":"Neural Information Processing Systems (NIPS)","author":"Han S.","year":"2015","unstructured":"S. Han , J. Pool , J. Tran , and W. J. Dally . Learning both weights and connections for efficient neural networks . In Neural Information Processing Systems (NIPS) , 2015 . S. Han, J. Pool, J. Tran, and W. J. Dally. Learning both weights and connections for efficient neural networks. In Neural Information Processing Systems (NIPS), 2015."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2749472"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00059"},{"key":"e_1_3_2_1_28_1","volume-title":"Deep residual learning for image recognition","author":"He K.","year":"2015","unstructured":"K. He , X. Zhang , S. Ren , and J. Sun . Deep residual learning for image recognition . 2015 . K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2015."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123970"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2254064.2254108"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195688"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00070"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_2_1_34_1","volume-title":"A convolutional neural network for modelling sentences. In arXiV:1404.2188","author":"Kalchbrenner N.","year":"2014","unstructured":"N. Kalchbrenner , E. Grefenstette , and P. Blunsom . A convolutional neural network for modelling sentences. In arXiV:1404.2188 , 2014 . N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A convolutional neural network for modelling sentences. In arXiV:1404.2188, 2014."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_1_36_1","volume-title":"Convolutional neural networks for sentence classification. In arXiV:1408.5882","author":"Kim Y.","year":"2014","unstructured":"Y. Kim . Convolutional neural networks for sentence classification. In arXiV:1408.5882 , 2014 . Y. Kim. Convolutional neural networks for sentence classification. In arXiV:1408.5882, 2014."},{"key":"e_1_3_2_1_37_1","volume-title":"Neural Information Processing Systems (NIPS)","author":"Krizhevsky A.","year":"2012","unstructured":"A. Krizhevsky , I. Sutskever , and G. E. Hinton . Imagenet classification with deep convolutional neural networks . In Neural Information Processing Systems (NIPS) , 2012 . A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems (NIPS), 2012."},{"key":"e_1_3_2_1_38_1","volume-title":"Fast algorithms for convolutional neural networks","author":"Lavin A.","year":"2015","unstructured":"A. Lavin . Fast algorithms for convolutional neural networks . 2015 . A. Lavin. Fast algorithms for convolutional neural networks. 2015."},{"key":"e_1_3_2_1_39_1","volume-title":"When are tree structures necessary for deep learning of representations? In arXiV:1503.00185","author":"Li J.","year":"2015","unstructured":"J. Li , D. Jurafsky , and E. H. Hovy . When are tree structures necessary for deep learning of representations? In arXiV:1503.00185 , 2015 . J. Li, D. Jurafsky, and E. H. Hovy. When are tree structures necessary for deep learning of representations? In arXiV:1503.00185, 2015."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669172"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.31"},{"key":"e_1_3_2_1_42_1","volume-title":"Network in network. In arXiV:1312.4400","author":"Lin M.","year":"2013","unstructured":"M. Lin , Q. Chen , and S. Yan . Network in network. In arXiV:1312.4400 , 2013 . M. Lin, Q. Chen, and S. Yan. Network in network. In arXiV:1312.4400, 2013."},{"key":"e_1_3_2_1_43_1","volume-title":"Intel White Paper","author":"Lomont C.","year":"2011","unstructured":"C. Lomont . Introduction to intel advanced vector extensions . In Intel White Paper , 2011 . C. Lomont. Introduction to intel advanced vector extensions. In Intel White Paper, 2011."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925987"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1133981.1133997"},{"key":"e_1_3_2_1_46_1","volume-title":"HotChips","author":"Ovtcharov K.","year":"2015","unstructured":"K. Ovtcharov , O. Ruwase , J.-Y. Kim , J. Fowers , K. Strauss , and E. Chung . Accelerating deep convolutional neural networks using specialized hardware . In HotChips , 2015 . K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and E. Chung. Accelerating deep convolutional neural networks using specialized hardware. In HotChips, 2015."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.32"},{"key":"e_1_3_2_1_48_1","volume-title":"Virtualizing deep neural networks for memory-efficient neural network design","author":"Rhu M.","year":"2016","unstructured":"M. Rhu , N. Gimelshein , J. Clemons , A. Zulfiqar , and S. W. Keckler . Virtualizing deep neural networks for memory-efficient neural network design . 2016 . M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, and S. W. Keckler. Virtualizing deep neural networks for memory-efficient neural network design. 2016."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366231.2337210"},{"key":"e_1_3_2_1_50_1","volume-title":"LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In arXiv:1312.6229","author":"Sermanet P.","year":"2014","unstructured":"P. Sermanet , D. Eigen , X. Zhang , M. Mathieu , R. Fergus , and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In arXiv:1312.6229 , 2014 . P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In arXiv:1312.6229, 2014."},{"key":"e_1_3_2_1_51_1","volume-title":"Two-stream convolutional networks for action recognition in videos. In arXiV:1406.2199","author":"Simonyan K.","year":"2014","unstructured":"K. Simonyan and A. Zisserman . Two-stream convolutional networks for action recognition in videos. In arXiV:1406.2199 , 2014 . K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In arXiV:1406.2199, 2014."},{"key":"e_1_3_2_1_52_1","volume-title":"Very deep convolutional networks for large-scale image recognition. In arXiV:1409.1556","author":"Simonyan K.","year":"2014","unstructured":"K. Simonyan and A. Zisserman . Very deep convolutional networks for large-scale image recognition. In arXiV:1409.1556 , 2014 . K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In arXiV:1409.1556, 2014."},{"key":"e_1_3_2_1_53_1","volume-title":"Sequence to sequence learning with neural networks. In arXiV:1409.3215","author":"Sutskever I.","year":"2014","unstructured":"I. Sutskever , O. Vinyals , and Q. V. Le . Sequence to sequence learning with neural networks. In arXiV:1409.3215 , 2014 . I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In arXiV:1409.3215, 2014."},{"key":"e_1_3_2_1_54_1","volume-title":"Efficient processing of deep neural networks: A tutorial and survey. In arXiV:1703.09039","author":"Sze V.","year":"2017","unstructured":"V. Sze , Y.-H. Chen , T.-J. Yang , and J. Emer . Efficient processing of deep neural networks: A tutorial and survey. In arXiV:1703.09039 , 2017 . V. Sze, Y.-H. Chen, T.-J. Yang, and J. Emer. Efficient processing of deep neural networks: A tutorial and survey. In arXiV:1703.09039, 2017."},{"key":"e_1_3_2_1_55_1","volume-title":"Going deeper with convolutions","author":"Szegedy C.","year":"2014","unstructured":"C. Szegedy , W. Liu , Y. Jia , P. Sermanet , S. E. Reed , D. Anguelov , D. Erhan , V. Vanhoucke , and A. Rabinovich . Going deeper with convolutions . 2014 . C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. 2014."},{"key":"e_1_3_2_1_56_1","volume-title":"Fast convolutional nets with fbfft: A GPU performance evaluation. In arXiV:1412.7580","author":"Vasilache N.","year":"2014","unstructured":"N. Vasilache , J. Johnson , M. Mathieu , S. Chintala , S. Piantino , and Y. LeCun . Fast convolutional nets with fbfft: A GPU performance evaluation. In arXiV:1412.7580 , 2014 . N. Vasilache, J. Johnson, M. Mathieu, S. Chintala, S. Piantino, and Y. LeCun. Fast convolutional nets with fbfft: A GPU performance evaluation. In arXiV:1412.7580, 2014."},{"key":"e_1_3_2_1_57_1","volume-title":"Automating the last-mile for high performance dense linear algebra. CoRR, abs\/1611.08035","author":"Veras R. M.","year":"2016","unstructured":"R. M. Veras , T. M. Low , T. M. Smith , R. A. van de Geijn , and F. Franchetti . Automating the last-mile for high performance dense linear algebra. CoRR, abs\/1611.08035 , 2016 . R. M. Veras, T. M. Low, T. M. Smith, R. A. van de Geijn, and F. Franchetti. Automating the last-mile for high performance dense linear algebra. CoRR, abs\/1611.08035, 2016."},{"key":"e_1_3_2_1_58_1","volume-title":"Show and tell: A neural image caption generator. In arXiV:1411.4555","author":"Vinyals O.","year":"2014","unstructured":"O. Vinyals , A. Toshev , S. Bengio , and D. Erhan . Show and tell: A neural image caption generator. In arXiV:1411.4555 , 2014 . O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In arXiV:1411.4555, 2014."},{"key":"e_1_3_2_1_59_1","volume-title":"Atomnet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery","author":"Wallach I.","year":"2015","unstructured":"I. Wallach , M. Dzamba , and A. Heifets . Atomnet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery . 2015 . I. Wallach, M. Dzamba, and A. Heifets. Atomnet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery. 2015."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056064"},{"key":"e_1_3_2_1_61_1","volume-title":"Understanding neural networks through deep visualization. In arXiV:1506.06579","author":"Yosinski J.","year":"2015","unstructured":"J. Yosinski , J. Clune , A. M. Nguyen , T. J. Fuchs , and H. Lipson . Understanding neural networks through deep visualization. In arXiV:1506.06579 , 2015 . J. Yosinski, J. Clune, A. M. Nguyen, T. J. Fuchs, and H. Lipson. Understanding neural networks through deep visualization. In arXiV:1506.06579, 2015."},{"key":"e_1_3_2_1_62_1","volume-title":"Recurrent neural network regularization. In arXiV:1409.2329","author":"Zaremba W.","year":"2014","unstructured":"W. Zaremba , I. Sutskever , and O. Vinyals . Recurrent neural network regularization. In arXiV:1409.2329 , 2014 . W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. In arXiV:1409.2329, 2014."}],"event":{"name":"PACT '18: International conference on Parallel Architectures and Compilation Techniques","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","IFIP WG 10.3 IFIP WG 10.3","IEEE CS"],"location":"Limassol Cyprus","acronym":"PACT '18"},"container-title":["Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3243176.3243177","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3243176.3243177","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:57:39Z","timestamp":1750208259000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3243176.3243177"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,11]]},"references-count":62,"alternative-id":["10.1145\/3243176.3243177","10.1145\/3243176"],"URL":"https:\/\/doi.org\/10.1145\/3243176.3243177","relation":{},"subject":[],"published":{"date-parts":[[2018,11]]},"assertion":[{"value":"2018-11-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}