{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T06:24:48Z","timestamp":1774333488704,"version":"3.50.1"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T00:00:00Z","timestamp":1570752000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012659","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61672250, 61672251"],"award-info":[{"award-number":["61672250, 61672251"]}],"id":[{"id":"10.13039\/501100012659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2019,12,31]]},"abstract":"<jats:p>\n            Although GPUs have emerged as the mainstream for the acceleration of\n            <jats:italic>convolutional neural network<\/jats:italic>\n            (CNN) training processes, they usually have limited physical memory, meaning that it is hard to train large-scale CNN models. Many methods for memory optimization have been proposed to decrease the memory consumption of CNNs and to mitigate the increasing scale of these networks; however, this optimization comes at the cost of an obvious drop in time performance. We propose a new memory optimization strategy named Layup that realizes both better memory efficiency and better time performance. First, a fast layer-type-specific method for memory optimization is presented, based on the new finding that a single memory optimization often shows dramatic differences in time performance for different types of layers. Second, a new memory reuse method is presented in which greater attention is paid to multi-type intermediate data such as convolutional workspaces and cuDNN handle data. Experiments show that Layup can significantly increase the scale of extra-deep network models on a single GPU with lower performance loss. It even can train ResNet with 2,504 layers using 12GB memory, outperforming the state-of-the-art work of SuperNeurons with 1,920 layers (batch size = 16).\n          <\/jats:p>","DOI":"10.1145\/3357238","type":"journal-article","created":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T14:53:33Z","timestamp":1570805613000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Layup"],"prefix":"10.1145","volume":"16","author":[{"given":"Wenbin","family":"Jiang","sequence":"first","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3719-3195","authenticated-orcid":false,"given":"Yang","family":"Ma","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bo","family":"Liu","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haikun","family":"Liu","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bing Bing","family":"Zhou","sequence":"additional","affiliation":[{"name":"The University of Sydney, NSW, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jian","family":"Zhu","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Song","family":"Wu","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hai","family":"Jin","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,10,11]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201916)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . 2016 . TensorFlow: A system for large-scale machine learning . In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201916) . USENIX Association, 265--283. Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201916). USENIX Association, 265--283."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-92bf1922-003"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the Workshop on Machine Learning Systems at the 28th Conference on Neural Information Processing Systems (LearningSys\u201915)","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . 2015 . MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems . In Proceedings of the Workshop on Machine Learning Systems at the 28th Conference on Neural Information Processing Systems (LearningSys\u201915) . Curran Associates, Inc., 1--6. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. In Proceedings of the Workshop on Machine Learning Systems at the 28th Conference on Neural Information Processing Systems (LearningSys\u201915). Curran Associates, Inc., 1--6."},{"key":"e_1_2_1_4_1","volume-title":"Training deep nets with sublinear memory cost. Retrieved from: arXiv preprint arXiv:1604.06174","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen , Bing Xu , Chiyuan Zhang , and Carlos Guestrin . 2016. Training deep nets with sublinear memory cost. Retrieved from: arXiv preprint arXiv:1604.06174 ( 2016 ). Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. Retrieved from: arXiv preprint arXiv:1604.06174 (2016)."},{"key":"e_1_2_1_5_1","volume-title":"cuDNN: Efficient primitives for deep learning. Retrieved from: arXiv preprint arXiv:1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , and Evan Shelhamer . 2014. cuDNN: Efficient primitives for deep learning. Retrieved from: arXiv preprint arXiv:1410.0759 ( 2014 ). Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. Retrieved from: arXiv preprint arXiv:1410.0759 (2014)."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201914)","author":"Chilimbi Trishul","year":"2014","unstructured":"Trishul Chilimbi , Yutaka Suzue , Johnson Apacible , and Karthik Kalyanaraman . 2014 . Project Adam: Building an efficient and scalable deep learning training system . In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201914) . USENIX Association, 571--582. Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an efficient and scalable deep learning training system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201914). USENIX Association, 571--582."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML\u201913)","volume":"28","author":"Coates Adam","year":"2013","unstructured":"Adam Coates , Brody Huval , Tao Wang , David J. Wu , Andrew Y. Ng , and Bryan Catanzaro . 2013 . Deep learning with COTS HPC systems . In Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML\u201913) , Volume 28 . IMLS, 1337--1345. Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Y. Ng, and Bryan Catanzaro. 2013. Deep learning with COTS HPC systems. In Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML\u201913), Volume 28. IMLS, 1337--1345."},{"key":"e_1_2_1_8_1","volume-title":"Torch: A Modular Machine Learning Software Library. Technical Report EPFL-REPORT-82802. Idiap","author":"Collobert Ronan","year":"2002","unstructured":"Ronan Collobert , Samy Bengio , and Johnny Mari\u00e9thoz . 2002 . Torch: A Modular Machine Learning Software Library. Technical Report EPFL-REPORT-82802. Idiap , Martigny , Valais, Switzerland . Ronan Collobert, Samy Bengio, and Johnny Mari\u00e9thoz. 2002. Torch: A Modular Machine Learning Software Library. Technical Report EPFL-REPORT-82802. Idiap, Martigny, Valais, Switzerland."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915)","author":"Courbariaux Matthieu","year":"2015","unstructured":"Matthieu Courbariaux , Yoshua Bengio , and Jean-Pierre David . 2015 . BinaryConnect: Training deep neural networks with binary weights during propagations . In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915) . Curran Associates Inc., 3123--3131. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915). Curran Associates Inc., 3123--3131."},{"key":"e_1_2_1_10_1","volume-title":"Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. Retrieved from: arXiv preprint arXiv:1602.02830","author":"Courbariaux Matthieu","year":"2016","unstructured":"Matthieu Courbariaux , Itay Hubara , Daniel Soudry , Ran El-Yaniv , and Yoshua Bengio . 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. Retrieved from: arXiv preprint arXiv:1602.02830 ( 2016 ). Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. Retrieved from: arXiv preprint arXiv:1602.02830 (2016)."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 11th European Conference on Computer Systems (EuroSys\u201916)","author":"Cui Henggang","unstructured":"Henggang Cui , Hao Zhang , Gregory R. Ganger , Phillip B. Gibbons , and Eric P. Xing . 2016. GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server . In Proceedings of the 11th European Conference on Computer Systems (EuroSys\u201916) . ACM, 1--16. Henggang Cui, Hao Zhang, Gregory R. Ganger, Phillip B. Gibbons, and Eric P. Xing. 2016. GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server. In Proceedings of the 11th European Conference on Computer Systems (EuroSys\u201916). ACM, 1--16."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 25th Conference on Neural Information Processing Systems (NIPS\u201912)","author":"Dean Jeffrey","unstructured":"Jeffrey Dean , Greg Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Mark Mao , Marc'aurelio Ranzato , Andrew Senior , Paul Tucker , Ke Yang , Quoc V. Le , and Andrew Y. Ng . 2012. Large scale distributed deep networks . In Proceedings of the 25th Conference on Neural Information Processing Systems (NIPS\u201912) . Curran Associates, Inc., 1223--1231. Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc'aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the 25th Conference on Neural Information Processing Systems (NIPS\u201912). Curran Associates, Inc., 1223--1231."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_14_1","unstructured":"Facebook. 2017. Caffe2: A new lightweight modular and scalable deep learning framework. Retrieved from: https:\/\/caffe2.ai\/.  Facebook. 2017. Caffe2: A new lightweight modular and scalable deep learning framework. Retrieved from: https:\/\/caffe2.ai\/."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400684"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-017-9189-6"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS\u201916)","author":"Gruslys Audrunas","year":"2016","unstructured":"Audrunas Gruslys , R\u00e9mi Munos , Ivo Danihelka , Marc Lanctot , and Alex Graves . 2016 . Memory-efficient backpropagation through time . In Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS\u201916) . Curran Associates Inc., 4125--4133. Audrunas Gruslys, R\u00e9mi Munos, Ivo Danihelka, Marc Lanctot, and Alex Graves. 2016. Memory-efficient backpropagation through time. In Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS\u201916). Curran Associates Inc., 4125--4133."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916)","author":"Han Song","unstructured":"Song Han , Huizi Mao , and William J. Dally . 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization, and Huffman coding . In Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916) . IEEE, 1--14. Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization, and Huffman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916). IEEE, 1--14."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915)","author":"Han Song","unstructured":"Song Han , Jeff Pool , John Tran , and William J. Dally . 2015. Learning both weights and connections for efficient neural networks . In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915) . Curran Associates Inc., 1135--1143. Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915). Curran Associates Inc., 1135--1143."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Huang Gao","unstructured":"Gao Huang , Zhuang Liu , Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks . In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917) . IEEE, 2261--2269. Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). IEEE, 2261--2269."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 14th European Conference on Computer Vision (ECCV\u201916)","author":"Huang Gao","unstructured":"Gao Huang , Yu Sun , Zhuang Liu , Daniel Sedra , and Kilian Q. Weinberger . 2016. Deep networks with stochastic depth . In Proceedings of the 14th European Conference on Computer Vision (ECCV\u201916) . Springer, 646--661. Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. 2016. Deep networks with stochastic depth. In Proceedings of the 14th European Conference on Computer Vision (ECCV\u201916). Springer, 646--661."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915)","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015 . Batch normalization: Accelerating deep network training by reducing internal covariate shift . In Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915) . IMLS, 448--456. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915). IMLS, 448--456."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243904"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 25th Conference on Neural Information Processing Systems (NIPS\u201912)","author":"Krizhevsky Alex","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E. Hinton . 2012. ImageNet classification with deep convolutional neural networks . In Proceedings of the 25th Conference on Neural Information Processing Systems (NIPS\u201912) . Curran Associates, Inc., 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th Conference on Neural Information Processing Systems (NIPS\u201912). Curran Associates, Inc., 1097--1105."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854622"},{"key":"e_1_2_1_30_1","volume-title":"Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, and Trevor Cohn.","author":"Neubig Graham","year":"2017","unstructured":"Graham Neubig , Chris Dyer , Yoav Goldberg , Austin Matthews , Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, and Trevor Cohn. 2017 . DyNet: The dynamic neural network toolkit. Retrieved from: arXiv preprint arXiv:1701.03980 (2017). Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, and Trevor Cohn. 2017. DyNet: The dynamic neural network toolkit. Retrieved from: arXiv preprint arXiv:1701.03980 (2017)."},{"key":"e_1_2_1_31_1","unstructured":"NVIDIA. 2017. CUDA toolkit 8.0 documentation: Profiler. Retrieved from: http:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/index.html.  NVIDIA. 2017. CUDA toolkit 8.0 documentation: Profiler. Retrieved from: http:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/index.html."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 49th IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Rhu Minsoo","unstructured":"Minsoo Rhu , Natalia Gimelshein , Jason Clemons , Arslan Zulfiqar , and Stephen W. Keckler . 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design . In Proceedings of the 49th IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916) . IEEE, 1--13. Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In Proceedings of the 49th IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916). IEEE, 1--13."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3199605"},{"key":"e_1_2_1_34_1","volume-title":"CUDA by Example: An Introduction to General-purpose GPU Programming","author":"Sanders Jason","unstructured":"Jason Sanders and Edward Kandrot . 2010. CUDA by Example: An Introduction to General-purpose GPU Programming . Addison-Wesley Professional . Jason Sanders and Edward Kandrot. 2010. CUDA by Example: An Introduction to General-purpose GPU Programming. Addison-Wesley Professional."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2945397"},{"key":"e_1_2_1_36_1","volume-title":"Very deep convolutional networks for large-scale image recognition. Retrieved from: arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015. Very deep convolutional networks for large-scale image recognition. Retrieved from: arXiv preprint arXiv:1409.1556 ( 2015 ). Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. Retrieved from: arXiv preprint arXiv:1409.1556 (2015)."},{"key":"e_1_2_1_37_1","volume-title":"Amir Roshan Zamir, and Mubarak Shah","author":"Soomro Khurram","year":"2012","unstructured":"Khurram Soomro , Amir Roshan Zamir, and Mubarak Shah . 2012 . UCF101: A dataset of 101 human actions classes from videos in the wild. Retrieved from: arXiv preprint arXiv:1212.0402 (2012). Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. Retrieved from: arXiv preprint arXiv:1212.0402 (2012)."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917)","author":"Szegedy Christian","unstructured":"Christian Szegedy , Sergey Ioffe , Vincent Vanhoucke , and Alexander A. Alemi . 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning . In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917) . AAAI Press, 4278--4284. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917). AAAI Press, 4278--4284."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178491"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_2_1_44_1","volume-title":"Achieving human parity in conversational speech recognition. Retrieved from: arXiv preprint arXiv:1610.05256","author":"Xiong Wayne","year":"2016","unstructured":"Wayne Xiong , Jasha Droppo , Xuedong Huang , Frank Seide , Mike Seltzer , Andreas Stolcke , Dong Yu , and Geoffrey Zweig . 2016. Achieving human parity in conversational speech recognition. Retrieved from: arXiv preprint arXiv:1610.05256 ( 2016 ). Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. 2016. Achieving human parity in conversational speech recognition. Retrieved from: arXiv preprint arXiv:1610.05256 (2016)."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3177885"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3357238","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3357238","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:27Z","timestamp":1750200087000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3357238"}},"subtitle":["Layer-adaptive and Multi-type Intermediate-oriented Memory Optimization for GPU-based CNNs"],"short-title":[],"issued":{"date-parts":[[2019,10,11]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,12,31]]}},"alternative-id":["10.1145\/3357238"],"URL":"https:\/\/doi.org\/10.1145\/3357238","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,11]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}