{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T03:48:12Z","timestamp":1772164092119,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,4,4]],"date-time":"2017-04-04T00:00:00Z","timestamp":1491264000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,4,4]]},"DOI":"10.1145\/3037697.3037745","type":"proceedings-article","created":{"date-parts":[[2017,4,5]],"date-time":"2017-04-05T08:47:40Z","timestamp":1491382060000},"page":"267-280","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Optimizing CNNs on Multicores for Scalability, Performance and Goodput"],"prefix":"10.1145","author":[{"given":"Samyam","family":"Rajbhandari","sequence":"first","affiliation":[{"name":"The Ohio State University, Columbus, OH, USA"}]},{"given":"Yuxiong","family":"He","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, WA, USA"}]},{"given":"Olatunji","family":"Ruwase","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, WA, USA"}]},{"given":"Michael","family":"Carbin","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, WA, USA"}]},{"given":"Trishul","family":"Chilimbi","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, WA, USA"}]}],"member":"320","published-online":{"date-parts":[[2017,4,4]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"http:\/\/chainer.org.  http:\/\/chainer.org."},{"key":"e_1_3_2_1_2_1","unstructured":"http:\/\/www.cntk.ai.  http:\/\/www.cntk.ai."},{"key":"e_1_3_2_1_3_1","unstructured":"https:\/\/developers.google.com\/protocol-buffers\/.  https:\/\/developers.google.com\/protocol-buffers\/."},{"key":"e_1_3_2_1_4_1","unstructured":"https:\/\/01.org\/intel-deep-learning-framework.  https:\/\/01.org\/intel-deep-learning-framework."},{"key":"e_1_3_2_1_5_1","volume-title":"et al. Tensorflow: Large-scale machine learning on heterogeneous systems","author":"Abadi M.","year":"2015","unstructured":"M. Abadi , A. Agarwal , P. Barham , E. Brevdo , Z. Chen , C. Citro , G. S. Corrado , A. Davis , J. Dean , M. Devin , et al. Tensorflow: Large-scale machine learning on heterogeneous systems , 2015 . Software available from tensorflow. org. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow. org."},{"key":"e_1_3_2_1_6_1","volume-title":"Caffe con troll: Shallow ideas to speed up deep learning. pharXiv preprint arXiv:1504.04343","author":"Abuzaid F.","year":"2015","unstructured":"F. Abuzaid , S. Hadjis , C. Zhang , and C. R\u00e9 . Caffe con troll: Shallow ideas to speed up deep learning. pharXiv preprint arXiv:1504.04343 , 2015 . F. Abuzaid, S. Hadjis, C. Zhang, and C. R\u00e9. Caffe con troll: Shallow ideas to speed up deep learning. pharXiv preprint arXiv:1504.04343, 2015."},{"key":"e_1_3_2_1_7_1","volume-title":"Theano: new features and speed improvements","author":"Bastien F.","year":"2012","unstructured":"F. Bastien , P. Lamblin , R. Pascanu , J. Bergstra , I. J. Goodfellow , A. Bergeron , N. Bouchard , and Y. Bengio . Theano: new features and speed improvements , 2012 . F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements, 2012."},{"key":"e_1_3_2_1_8_1","volume-title":"Suvisoft Tenth International Workshop on Frontiers in Handwriting Recognition","author":"Chellapilla K.","year":"2006","unstructured":"K. Chellapilla , S. Puri , and P. Simard . High performance convolutional neural networks for document processing . In Suvisoft Tenth International Workshop on Frontiers in Handwriting Recognition , 2006 . K. Chellapilla, S. Puri, and P. Simard. High performance convolutional neural networks for document processing. In Suvisoft Tenth International Workshop on Frontiers in Handwriting Recognition, 2006."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541967"},{"key":"e_1_3_2_1_10_1","volume-title":"IEEE\/ACM International Symposium on Microarchitecture","author":"Chen Y.","unstructured":"Chen, Luo, Liu, Zhang, He, Wang, Li, Chen, Xu, Sun, Y. Chen , T. Luo , S. Liu , S. Zhang , L. He , J. Wang , L. Li , T. Chen , Z. Xu , N. Sun , : A machine-learning supercomputer . In IEEE\/ACM International Symposium on Microarchitecture , 2014\\natexlabb. Chen, Luo, Liu, Zhang, He, Wang, Li, Chen, Xu, Sun, et al.]Chen2014dadiannaoY. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al. Dadiannao: A machine-learning supercomputer. In IEEE\/ACM International Symposium on Microarchitecture, 2014\\natexlabb."},{"key":"e_1_3_2_1_11_1","volume-title":"cudnn: Efficient primitives for deep learning. pharXiv preprint arXiv:1410.0759","author":"Chetlur S.","year":"2014","unstructured":"S. Chetlur , C. Woolley , P. Vandermersch , J. Cohen , J. Tran , B. Catanzaro , and E. Shelhamer . cudnn: Efficient primitives for deep learning. pharXiv preprint arXiv:1410.0759 , 2014 . S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cudnn: Efficient primitives for deep learning. pharXiv preprint arXiv:1410.0759, 2014."},{"key":"e_1_3_2_1_12_1","volume-title":"11th USENIX Symposium on Operating Systems Design and Implementation","author":"Chilimbi T.","year":"2014","unstructured":"T. Chilimbi , Y. Suzue , J. Apacible , and K. Kalyanaraman . Project adam: Building an efficient and scalable deep learning training system . In 11th USENIX Symposium on Operating Systems Design and Implementation , 2014 . T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation, 2014."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248110"},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning","author":"Coates A.","year":"2013","unstructured":"A. Coates , B. Huval , T. Wang , D. Wu , B. Catanzaro , and N. Andrew . Deep learning with cots hpc systems . In Proceedings of the 30th International Conference on Machine Learning , 2013 . A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew. Deep learning with cots hpc systems. In Proceedings of the 30th International Conference on Machine Learning, 2013."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390177"},{"key":"e_1_3_2_1_16_1","volume-title":"Neural Information Processing Systems Workshop, number EPFL-CONF-192376","author":"Collobert R.","year":"2011","unstructured":"R. Collobert , K. Kavukcuoglu , and C. Farabet . Torch7: A matlab-like environment for machine learning. In BigLearn , Neural Information Processing Systems Workshop, number EPFL-CONF-192376 , 2011 R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, Neural Information Processing Systems Workshop, number EPFL-CONF-192376, 2011"},{"key":"e_1_3_2_1_17_1","author":"Collobert R.","year":"2011","unstructured":"R. Collobert , J. Weston , L. Bottou , M. Karlen , K. Kavukcuoglu , and P. Kuksa . Natural language processing (almost) from scratch. The Journal of Machine Learning Research , 2011 \\natexlabb. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 2011\\natexlabb.","journal-title":"Natural language processing (almost) from scratch. The Journal of Machine Learning Research"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-11179-7_36"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5222004"},{"key":"e_1_3_2_1_20_1","volume-title":"Advances in Neural Information Processing Systems","author":"Dean J.","year":"2012","unstructured":"J. Dean , G. Corrado , R. Monga , K. Chen , M. Devin , M. Mao , A. Senior , P. Tucker , K. Yang , Q. V. Le , Large scale distributed deep networks . In Advances in Neural Information Processing Systems , 2012 . J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, 2012."},{"key":"e_1_3_2_1_21_1","volume-title":"Advances in Neural Information Processing Systems","author":"Denil M.","year":"2013","unstructured":"M. Denil , B. Shakibi , L. Dinh , N. de Freitas, et al. Predicting parameters in deep learning . In Advances in Neural Information Processing Systems , 2013 . M. Denil, B. Shakibi, L. Dinh, N. de Freitas, et al. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems, 2013."},{"key":"e_1_3_2_1_22_1","volume-title":"Advances in Neural Information Processing Systems","author":"Denton E. L.","year":"2014","unstructured":"E. L. Denton , W. Zaremba , J. Bruna , Y. LeCun , and R. Fergus . Exploiting linear structure within convolutional networks for efficient evaluation . In Advances in Neural Information Processing Systems , 2014 . E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems, 2014."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2009.5272559"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2011.5981829"},{"key":"e_1_3_2_1_25_1","volume-title":"An efficient sparse matrix multiplication for deep neural network-based applications","author":"Gao Y.","year":"2014","unstructured":"Y. Gao , Y. Liu , R. Zhao , and S. Chiu . An efficient sparse matrix multiplication for deep neural network-based applications . 2014 . Y. Gao, Y. Liu, R. Zhao, and S. Chiu. An efficient sparse matrix multiplication for deep neural network-based applications. 2014."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1356052.1356053"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2749472"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19861-8_13"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304619"},{"key":"e_1_3_2_1_30_1","volume-title":"Intel math kernel library","author":"Intel M.","year":"2007","unstructured":"M. Intel . Intel math kernel library , 2007 . M. Intel. Intel math kernel library, 2007."},{"key":"e_1_3_2_1_31_1","volume-title":"Speeding up convolutional neural networks with low rank expansions. pharXiv preprint arXiv:1405.3866","author":"Jaderberg M.","year":"2014","unstructured":"M. Jaderberg , A. Vedaldi , and A. Zisserman . Speeding up convolutional neural networks with low rank expansions. pharXiv preprint arXiv:1405.3866 , 2014 . M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. pharXiv preprint arXiv:1405.3866, 2014."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.59"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250734.1250761"},{"key":"e_1_3_2_1_36_1","unstructured":"A. Krizhevskey. Cuda-convnet 2014.  A. Krizhevskey. Cuda-convnet 2014."},{"key":"e_1_3_2_1_37_1","volume-title":"pharXiv preprint arXiv:1404.5997","author":"Krizhevsky A.","year":"2014","unstructured":"A. Krizhevsky . One weird trick for parallelizing convolutional neural networks. pharXiv preprint arXiv:1404.5997 , 2014 . A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. pharXiv preprint arXiv:1404.5997, 2014."},{"key":"e_1_3_2_1_38_1","volume-title":"Advances in neural information processing systems","author":"Krizhevsky A.","year":"2012","unstructured":"A. Krizhevsky , I. Sutskever , and G. E. Hinton . Imagenet classification with deep convolutional neural networks . In Advances in neural information processing systems , 2012 . A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 2012."},{"key":"e_1_3_2_1_39_1","volume-title":"Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis","author":"Le Q. V.","year":"2011","unstructured":"Q. V. Le , W. Y. Zou , S. Y. Yeung , and A. Y. Ng . Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis . 2011 . Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. 2011."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553453"},{"key":"e_1_3_2_1_42_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Liu B.","year":"2015","unstructured":"B. Liu , M. Wang , H. Foroosh , M. Tappen , and M. Pensky . Sparse convolutional neural networks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2015 . B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Pensky. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015."},{"key":"e_1_3_2_1_43_1","volume-title":"A framework for general sparse matrix-matrix multiplication on gpus and heterogeneous processors. pharXiv preprint arXiv:1504.05022","author":"Liu W.","year":"2015","unstructured":"W. Liu and B. Vinter . A framework for general sparse matrix-matrix multiplication on gpus and heterogeneous processors. pharXiv preprint arXiv:1504.05022 , 2015 . W. Liu and B. Vinter. A framework for general sparse matrix-matrix multiplication on gpus and heterogeneous processors. pharXiv preprint arXiv:1504.05022, 2015."},{"key":"e_1_3_2_1_44_1","volume-title":"Fast training of convolutional networks through ffts. pharXiv preprint arXiv:1312.5851","author":"Mathieu M.","year":"2013","unstructured":"M. Mathieu , M. Henaff , and Y. LeCun . Fast training of convolutional networks through ffts. pharXiv preprint arXiv:1312.5851 , 2013 . M. Mathieu, M. Henaff, and Y. LeCun. Fast training of convolutional networks through ffts. pharXiv preprint arXiv:1312.5851, 2013."},{"key":"e_1_3_2_1_45_1","volume-title":"Accelerating deep convolutional neural networks using specialized hardware","author":"Ovtcharov K.","year":"2015","unstructured":"K. Ovtcharov , O. Ruwase , J.-Y. Kim , J. Fowers , K. Strauss , and E. S. Chung . Accelerating deep convolutional neural networks using specialized hardware , 2015 . K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and E. S. Chung. Accelerating deep convolutional neural networks using specialized hardware, 2015."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161011"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2003.1227801"},{"key":"e_1_3_2_1_49_1","volume-title":"Very deep convolutional networks for large-scale image recognition. pharXiv preprint arXiv:1409.1556","author":"Simonyan K.","year":"2014","unstructured":"K. Simonyan and A. Zisserman . Very deep convolutional networks for large-scale image recognition. pharXiv preprint arXiv:1409.1556 , 2014 . K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. pharXiv preprint arXiv:1409.1556, 2014."},{"key":"e_1_3_2_1_50_1","author":"Srivastava N.","year":"2014","unstructured":"N. Srivastava , G. Hinton , A. Krizhevsky , I. Sutskever , and R. Salakhutdinov . Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research , 2014 . N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_51_1","volume-title":"An adaptive and fully sparse training approach for multilayer perceptrons","author":"Wang F.","year":"1996","unstructured":"F. Wang and Zhang. An adaptive and fully sparse training approach for multilayer perceptrons , 1996 . F. Wang and Zhang. An adaptive and fully sparse training approach for multilayer perceptrons, 1996."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICNN.1995.488164"},{"key":"e_1_3_2_1_53_1","volume-title":"Springer Encyclopedia of Parallel Computing.","author":"Whaley R. C.","year":"2011","unstructured":"R. C. Whaley . Atlas (automatically tuned linear algebra software). In Springer Encyclopedia of Parallel Computing. 2011 . R. C. Whaley. Atlas (automatically tuned linear algebra software). In Springer Encyclopedia of Parallel Computing. 2011."},{"key":"e_1_3_2_1_54_1","volume-title":"URL: http:\/\/xianyi. github. io\/OpenBLAS","author":"Xianyi Z.","year":"2012","unstructured":"Z. Xianyi , W. Qian , and Z. Chothia . Openblas . URL: http:\/\/xianyi. github. io\/OpenBLAS , 2012 . Z. Xianyi, W. Qian, and Z. Chothia. Openblas. URL: http:\/\/xianyi. github. io\/OpenBLAS, 2012."},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"}],"event":{"name":"ASPLOS '17: Architectural Support for Programming Languages and Operating Systems","location":"Xi'an China","acronym":"ASPLOS '17","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages","SIGOPS ACM Special Interest Group on Operating Systems","SIGARCH ACM Special Interest Group on Computer Architecture","SIGBED ACM Special Interest Group on Embedded Systems"]},"container-title":["Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3037697.3037745","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3037697.3037745","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:03:11Z","timestamp":1750201391000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3037697.3037745"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,4,4]]},"references-count":55,"alternative-id":["10.1145\/3037697.3037745","10.1145\/3037697"],"URL":"https:\/\/doi.org\/10.1145\/3037697.3037745","relation":{"is-identical-to":[{"id-type":"doi","id":"10.1145\/3093337.3037745","asserted-by":"object"},{"id-type":"doi","id":"10.1145\/3093336.3037745","asserted-by":"object"}]},"subject":[],"published":{"date-parts":[[2017,4,4]]},"assertion":[{"value":"2017-04-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}