{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,30]],"date-time":"2026-06-30T15:48:39Z","timestamp":1782834519534,"version":"3.54.5"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,10,26]],"date-time":"2023-10-26T00:00:00Z","timestamp":1698278400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"name":"Major Key Project of PCL","award":["PCL2022A03"],"award-info":[{"award-number":["PCL2022A03"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U22A2036 and 61972441"],"award-info":[{"award-number":["U22A2036 and 61972441"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100017610","name":"Shenzhen Science and Technology Innovation Program","doi-asserted-by":"crossref","award":["RCYX20210609104510007 and JCYJ20200109113427092"],"award-info":[{"award-number":["RCYX20210609104510007 and JCYJ20200109113427092"]}],"id":[{"id":"10.13039\/501100017610","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies","award":["2022B1212010005"],"award-info":[{"award-number":["2022B1212010005"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>\n            Deep Neural Networks (DNNs) have achieved remarkable success in various real-world applications. However, running a Deep Neural Network (DNN) typically requires hundreds of megabytes of memory footprints, making it challenging to deploy on resource-constrained platforms such as mobile devices and IoT. Although mainstream DNNs compression techniques such as pruning, distillation, and quantization can reduce the memory overhead of model parameters during DNN inference, they suffer from three limitations: (i)\u00a0low model compression ratio for the lightweight DNN structures with little redundancy, (ii)\u00a0potential degradation in model inference accuracy, and (iii)\u00a0inadequate memory compression ratio is attributable to ignoring the layering property of DNN inference. To address these issues, we propose a lightweight memory-efficient DNN inference framework called Smart-DNN+, which significantly reduces the memory costs of DNN inference without degrading the model quality. Specifically, \u2460 Smart-DNN+ applies a layerwise\n            <jats:italic>binary-quantizer<\/jats:italic>\n            with a remapping mechanism to greatly reduce the model size by quantizing the typical floating-point DNN weights of 32-bit to the 1-bit signs layer by layer. To maintain model quality, \u2461 Smart-DNN+ employs a\n            <jats:italic>bucket-encoder<\/jats:italic>\n            to keep the compressed quantization error by encoding the multiple similar floating-point residuals into the same integer bucket IDs. When running the compressed DNN in the user\u2019s device, \u2462 Smart-DNN+ utilizes a\n            <jats:italic>partially decompressing strategy<\/jats:italic>\n            to greatly reduce the required memory overhead by first loading the compressed DNNs in memory and then dynamically decompressing the required materials for model inference layer by layer.\n          <\/jats:p>\n          <jats:p>Experimental results on popular DNNs and datasets demonstrate that Smart-DNN+ achieves lower 0.17%\u20130.92% memory costs at lower runtime overheads compared with the states of the art without degrading the inference accuracy. Moreover, Smart-DNN+ potentially reduces the inference runtime up to 2.04\u00d7 that of conventional DNN inference workflow.<\/jats:p>","DOI":"10.1145\/3617688","type":"journal-article","created":{"date-parts":[[2023,8,30]],"date-time":"2023-08-30T09:52:01Z","timestamp":1693389121000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model Inference"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0358-0533","authenticated-orcid":false,"given":"Donglei","family":"Wu","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6337-1768","authenticated-orcid":false,"given":"Weihao","family":"Yang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5104-8301","authenticated-orcid":false,"given":"Xiangyu","family":"Zou","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4093-6391","authenticated-orcid":false,"given":"Wen","family":"Xia","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen; Department of New Networks, Peng ChengLaboratory, Shenzhen; Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8206-6916","authenticated-orcid":false,"given":"Shiyi","family":"Li","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9453-7516","authenticated-orcid":false,"given":"Zhenbo","family":"Hu","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4783-876X","authenticated-orcid":false,"given":"Weizhe","family":"Zhang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen; Department of New Networks, Peng Cheng Laboratory, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0305-2132","authenticated-orcid":false,"given":"Binxing","family":"Fang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen; Department of New Networks, Peng Cheng Laboratory, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,10,26]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"265","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, et al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916). USENIX Association, 265\u2013283."},{"key":"e_1_3_1_3_2","first-page":"3438","volume-title":"Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems","author":"Balan Anoop Korattikara","year":"2015","unstructured":"Anoop Korattikara Balan, Vivek Rathod, Kevin P. Murphy, and Max Welling. 2015. Bayesian dark knowledge. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems. 3438\u20133446."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3444943"},{"key":"e_1_3_1_5_2","first-page":"279","volume-title":"Proceedings of the IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS\u201919)","author":"Cavigelli Lukas","year":"2019","unstructured":"Lukas Cavigelli and Luca Benini. 2019. Extended bit-plane compression for convolutional neural network accelerators. In Proceedings of the IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS\u201919). IEEE, 279\u2013283."},{"key":"e_1_3_1_6_2","series-title":"Proceedings of the 37th International Conference on Machine Learning (ICML\u201920),","first-page":"1627","volume":"119","author":"Chen Yu","year":"2020","unstructured":"Yu Chen, Zhenming Liu, Bin Ren, and Xin Jin. 2020. On efficient constructions of checkpoints. In Proceedings of the 37th International Conference on Machine Learning (ICML\u201920),Proceedings of Machine Learning Research, Vol. 119. PMLR, 1627\u20131636."},{"key":"e_1_3_1_7_2","unstructured":"Jungwook Choi Zhuo Wang Swagath Venkataramani Pierce I-Jen Chuang Vijayalakshmi Srinivasan and Kailash Gopalakrishnan. PACT: Parameterized clipping activation for quantized neural networks. CoRR abs\/1805.06085 (2018). http:\/\/arxiv.org\/abs\/1805.06085"},{"key":"e_1_3_1_8_2","first-page":"1800","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Chollet Fran\u00e7ois","year":"2017","unstructured":"Fran\u00e7ois Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). IEEE Computer Society, 1800\u20131807."},{"key":"e_1_3_1_9_2","first-page":"577","volume-title":"Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems","author":"Chorowski Jan","year":"2015","unstructured":"Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems. 577\u2013585."},{"issue":"3","key":"e_1_3_1_10_2","first-page":"34:1\u201334:26","article-title":"An FPGA overlay for CNN inference with fine-grained flexible parallelism","volume":"19","author":"Choudhury Ziaul","year":"2022","unstructured":"Ziaul Choudhury, Shashwat Shrivastava, Lavanya Ramapantulu, and Suresh Purini. 2022. An FPGA overlay for CNN inference with fine-grained flexible parallelism. ACM Trans. Archit. Code Optim. 19, 3 (2022), 34:1\u201334:26.","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_3_1_11_2","article-title":"BinaryNet: Training deep neural networks with weights and activations constrained to +1 or -1","volume":"1602","author":"Courbariaux Matthieu","year":"2016","unstructured":"Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or -1. CoRR abs\/1602.02830.","journal-title":"CoRR"},{"key":"e_1_3_1_12_2","first-page":"3123","volume-title":"Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems","author":"Courbariaux Matthieu","year":"2015","unstructured":"Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems. 3123\u20133131."},{"issue":"3","key":"e_1_3_1_13_2","first-page":"43:1\u201343:22","article-title":"LiteCON: An All-photonic Neuromorphic Accelerator for Energy-efficient Deep Learning","volume":"19","author":"Dang Dharanidhar","year":"2022","unstructured":"Dharanidhar Dang, Bill Lin, and Debashis Sahoo. 2022. LiteCON: An All-photonic Neuromorphic Accelerator for Energy-efficient Deep Learning. ACM Trans. Archit. Code Optim. 19, 3 (2022), 43:1\u201343:22.","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_3_1_14_2","volume-title":"GZIP File Format Specification Version 4.3","author":"Deutsch Peter","year":"1996","unstructured":"Peter Deutsch et\u00a0al. 1996. GZIP File Format Specification Version 4.3. Technical Report. RFC 1952, May."},{"key":"e_1_3_1_15_2","article-title":"RTMobile: Beyond real-time mobile acceleration of RNNs for speech recognition","author":"Dong Peiyan","year":"2020","unstructured":"Peiyan Dong, Siyue Wang, Wei Niu, et\u00a0al. 2020. RTMobile: Beyond real-time mobile acceleration of RNNs for speech recognition. arXiv:2002.11474. Retrieved from https:\/\/arxiv.org\/abs\/2002.11474","journal-title":"arXiv:2002.11474"},{"issue":"4","key":"e_1_3_1_16_2","first-page":"40:1\u201340:25","article-title":"MemSZ: Squeezing memory traffic with lossy compression","volume":"17","author":"Eldst\u00e5l-Ahrens Albin","year":"2020","unstructured":"Albin Eldst\u00e5l-Ahrens and Ioannis Sourdis. 2020. MemSZ: Squeezing memory traffic with lossy compression. ACM Trans. Archit. Code Optim. 17, 4 (2020), 40:1\u201340:25.","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_3_1_17_2","series-title":"Proceedings of the 16th European Conference (ECCV\u201920) Part II,","first-page":"69","volume":"12347","author":"Fang Jun","year":"2020","unstructured":"Jun Fang, Ali Shafiee, Hamzah Abdel-Aziz, et al.2020. Post-training piecewise linear quantization for deep neural networks. In Proceedings of the 16th European Conference (ECCV\u201920) Part II,Lecture Notes in Computer Science, Vol. 12347. Springer, 69\u201386."},{"key":"e_1_3_1_18_2","article-title":"A survey of quantization methods for efficient neural network inference","volume":"2103","author":"Gholami Amir","year":"2021","unstructured":"Amir Gholami, Sehoon Kim, Zhen Dong, et al.2021. A survey of quantization methods for efficient neural network inference. CoRR abs\/2103.13630.","journal-title":"CoRR"},{"issue":"6","key":"e_1_3_1_19_2","doi-asserted-by":"crossref","first-page":"1789","DOI":"10.1007\/s11263-021-01453-z","article-title":"Knowledge distillation: A survey","volume":"129","author":"Gou Jianping","year":"2021","unstructured":"Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. Int. J. Comput. Vis. 129, 6 (2021), 1789\u20131819.","journal-title":"Int. J. Comput. Vis."},{"key":"e_1_3_1_20_2","volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916)","author":"Han Song","year":"2016","unstructured":"Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916), Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_1_21_2","first-page":"770","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). IEEE Computer Society, 770\u2013778."},{"key":"e_1_3_1_22_2","article-title":"Distilling the knowledge in a neural network","author":"Hinton Geoffrey","year":"2015","unstructured":"Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531. Retrieved from https:\/\/arxiv.org\/abs\/1503.02531","journal-title":"arXiv:1503.02531"},{"key":"e_1_3_1_23_2","article-title":"Mobilenets: Efficient convolutional neural networks for mobile vision applications","author":"Howard Andrew G","year":"2017","unstructured":"Andrew G Howard, Menglong Zhu, Bo Chen, et\u00a0al. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from https:\/\/arxiv.org\/abs\/1704.04861","journal-title":"arXiv:1704.04861"},{"key":"e_1_3_1_24_2","first-page":"40:1\u201340:12","volume-title":"Proceedings of the 49th International Conference on Parallel Processing (ICPP\u201920)","author":"Hu Zhenbo","year":"2020","unstructured":"Zhenbo Hu, Xiangyu Zou, Wen Xia, Sian Jin, Dingwen Tao, Yang Liu, Weizhe Zhang, and Zheng Zhang. 2020. Delta-DNN: Efficiently compressing deep neural networks via exploiting floats similarity. In Proceedings of the 49th International Conference on Parallel Processing (ICPP\u201920), Jos\u00e9 Nelson Amaral, Lizy Kurian John, and Xipeng Shen (Eds.). ACM, 40:1\u201340:12."},{"key":"e_1_3_1_25_2","first-page":"533","volume-title":"Proceedings of the 39th IEEE International Conference on Computer Design (ICCD\u201921)","author":"Hu Zhenbo","year":"2021","unstructured":"Zhenbo Hu, Xiangyu Zou, Wen Xia, Yuhong Zhao, Weizhe Zhang, and Donglei Wu. 2021. Smart-DNN: Efficiently reducing the memory requirements of running deep neural networks on resource-constrained platforms. In Proceedings of the 39th IEEE International Conference on Computer Design (ICCD\u201921). IEEE, 533\u2013541. 10.1109\/ICCD53106.2021.00087"},{"issue":"6","key":"e_1_3_1_26_2","doi-asserted-by":"crossref","first-page":"1902","DOI":"10.1109\/TCAD.2021.3093835","article-title":"Acceleration-aware fine-grained channel pruning for deep neural networks via residual gating","volume":"41","author":"Huang Kai","year":"2022","unstructured":"Kai Huang, Siang Chen, Bowen Li, Luc Claesen, Hao Yao, Junjian Chen, Xiaowen Jiang, Zhili Liu, and Dongliang Xiong. 2022. Acceleration-aware fine-grained channel pruning for deep neural networks via residual gating. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41, 6 (2022), 1902\u20131915.","journal-title":"IEEE Trans. Comput. Aided Des. Integr. Circuits Syst."},{"key":"e_1_3_1_27_2","first-page":"4107","volume-title":"Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems","author":"Hubara Itay","year":"2016","unstructured":"Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems. 4107\u20134115."},{"key":"e_1_3_1_28_2","article-title":"SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and  \\(\\lt\\) 0.5 MB model size","author":"Iandola Forrest N.","year":"2016","unstructured":"Forrest N. Iandola, Song Han, Matthew W. Moskewicz, et\u00a0al. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \\(\\lt\\) 0.5 MB model size. arXiv:1602.07360. Retrieved from https:\/\/arxiv.org\/abs\/1602.07360","journal-title":"arXiv:1602.07360"},{"key":"e_1_3_1_29_2","first-page":"2704","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Jacob Benoit","year":"2018","unstructured":"Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew G. Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). Computer Vision Foundation \/ IEEE Computer Society, 2704\u20132713."},{"key":"e_1_3_1_30_2","first-page":"159","volume-title":"Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201919)","author":"Jin Sian","year":"2019","unstructured":"Sian Jin, Sheng Di, Xin Liang, Jiannan Tian, Dingwen Tao, and Franck Cappello. 2019. DeepSZ: A novel framework to compress deep neural networks by using error-bounded lossy compression. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201919). ACM, 159\u2013170."},{"key":"e_1_3_1_31_2","unstructured":"Alex Krizhevsky Vinod Nair and Geoffrey Hinton. 2014. The cifar-10 Dataset. 55 (2014). Retrieved from http:\/\/www.cs.toronto.edu\/kriz\/cifar.html"},{"key":"e_1_3_1_32_2","volume-title":"Proceedings of the Conference and Workshop on Neural Information Processing Systems (NIPS\u201912)","author":"Krizhevsky A.","unstructured":"A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NIPS\u201912)."},{"key":"e_1_3_1_33_2","first-page":"1106","volume-title":"Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems. 1106\u20131114."},{"key":"e_1_3_1_34_2","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR\u201920)","author":"Lan Zhenzhong","year":"2020","unstructured":"Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the 8th International Conference on Learning Representations (ICLR\u201920). OpenReview.net."},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1145\/3352460.3358295","volume-title":"Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201919)","author":"Lascorz Alberto Delmas","year":"2019","unstructured":"Alberto Delmas Lascorz, Sayeh Sharify, Isak Edo Vivancos, et al.2019. ShapeShifter: Enabling fine-grain data width adaptation in deep learning. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201919). ACM, 28\u201341."},{"key":"e_1_3_1_36_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)","author":"Lepikhin Dmitry","year":"2021","unstructured":"Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2021. GShard: Scaling giant models with conditional computation and automatic sharding. In Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921). OpenReview.net."},{"key":"e_1_3_1_37_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","author":"Li Hao","year":"2017","unstructured":"Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient ConvNets. In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917). OpenReview.net."},{"issue":"4","key":"e_1_3_1_38_2","first-page":"47:1\u201347:26","article-title":"An application-oblivious memory scheduling system for DNN accelerators","volume":"19","author":"Li Jiansong","year":"2022","unstructured":"Jiansong Li, Xueying Wang, Xiaobing Chen, Guangli Li, Xiao Dong, Peng Zhao, Xianzhi Yu, Yongxin Yang, Wei Cao, Lei Liu, and Xiaobing Feng. 2022. An application-oblivious memory scheduling system for DNN accelerators. ACM Trans. Archit. Code Optim. 19, 4 (2022), 47:1\u201347:26.","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_3_1_39_2","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1016\/j.neucom.2021.07.045","article-title":"Pruning and quantization for deep neural network acceleration: A survey","volume":"461","author":"Liang Tailin","year":"2021","unstructured":"Tailin Liang, John Glossner, Lei Wang, et al.2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370\u2013403.","journal-title":"Neurocomputing"},{"key":"e_1_3_1_40_2","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS\u201920)","author":"Lin Ji","year":"2020","unstructured":"Ji Lin, Wei-Ming Chen, Yujun Lin, et al.2020. MCUNet: Tiny deep learning on IoT devices. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS\u201920)."},{"key":"e_1_3_1_41_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201918)","author":"Lin Y.","year":"2018","unstructured":"Y. Lin, S. Han, H. Mao, et al.2018. Deep gradient compression: Reducing the communication bandwidth for distributed training. In Proceedings of the International Conference on Learning Representations (ICLR\u201918)."},{"key":"e_1_3_1_42_2","volume-title":"Error Distributions of Lossy Floating-point Compressors","author":"Lindstrom Peter","year":"2017","unstructured":"Peter Lindstrom. 2017. Error Distributions of Lossy Floating-point Compressors. Technical Report. Lawrence Livermore National Lab, Livermore, CA."},{"key":"e_1_3_1_43_2","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","volume":"1907","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, et al.2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs\/1907.11692.","journal-title":"CoRR"},{"issue":"5","key":"e_1_3_1_44_2","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1080\/01431160600746456","article-title":"A survey of image classification methods and techniques for improving classification performance","volume":"28","author":"Lu Dengsheng","year":"2007","unstructured":"Dengsheng Lu and Qihao Weng. 2007. A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28, 5 (2007), 823\u2013870.","journal-title":"Int. J. Remote Sens."},{"key":"e_1_3_1_45_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","author":"Molchanov Pavlo","year":"2017","unstructured":"Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient inference. In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917). OpenReview.net."},{"key":"e_1_3_1_46_2","article-title":"BitPruning: Learning bitlengths for aggressive and accurate quantization","volume":"2002","author":"Nikolic Milos","year":"2020","unstructured":"Milos Nikolic, Ghouthi Boukli Hacene, Ciaran Bannon, et al.2020. BitPruning: Learning bitlengths for aggressive and accurate quantization. CoRR abs\/2002.03090.","journal-title":"CoRR"},{"key":"e_1_3_1_47_2","first-page":"8024","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS\u201919)","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, et al.2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS\u201919). 8024\u20138035."},{"key":"e_1_3_1_48_2","unstructured":"Igor Pavlov. 1998. The Algorithm: Lempel-Ziv-Markov Chain."},{"key":"e_1_3_1_49_2","first-page":"140:1\u2013140:67","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21 (2020), 140:1\u2013140:67.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_1_50_2","first-page":"525","volume-title":"Proceedings of the 14th European Conference (ECCV\u201916)","volume":"9908","author":"Rastegari Mohammad","year":"2016","unstructured":"Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-net: ImageNet classification using binary convolutional neural networks. In Proceedings of the 14th European Conference (ECCV\u201916), Vol. 9908. Springer, 525\u2013542."},{"key":"e_1_3_1_51_2","first-page":"779","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"Redmon Joseph","year":"2016","unstructured":"Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). IEEE Computer Society, 779\u2013788."},{"key":"e_1_3_1_52_2","first-page":"4510","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Sandler Mark","year":"2018","unstructured":"Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). Computer Vision Foundation\/IEEE Computer Society, 4510\u20134520."},{"issue":"9","key":"e_1_3_1_53_2","doi-asserted-by":"crossref","first-page":"3400","DOI":"10.1109\/TNNLS.2019.2944481","article-title":"Robust and communication-efficient federated learning from non-i.i.d. data","volume":"31","author":"Sattler F.","year":"2020","unstructured":"F. Sattler, S. Wiedemann, K.-R. M\u00fcller, et al.2020. Robust and communication-efficient federated learning from non-i.i.d. data. IEEE Trans. Neural Netw. Learn. Syst. 31, 9 (2020), 3400\u20133413.","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"e_1_3_1_54_2","first-page":"1","volume-title":"Proceedings of the IEEE Workshop on Signal Processing Systems (SiPS\u201915)","author":"Shah Mohit","year":"2015","unstructured":"Mohit Shah, Jingcheng Wang, David Blaauw, et\u00a0al. 2015. A fixed-point neural network for keyword detection on resource constrained hardware. In Proceedings of the IEEE Workshop on Signal Processing Systems (SiPS\u201915). IEEE, 1\u20136."},{"key":"e_1_3_1_55_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)."},{"issue":"2","key":"e_1_3_1_56_2","first-page":"76","article-title":"Data compression for the exascale computing era-survey","volume":"1","author":"Son Seung Woo","year":"2014","unstructured":"Seung Woo Son, Zhengzhang Chen, William Hendrix, et\u00a0al. 2014. Data compression for the exascale computing era-survey. Supercomput. Front. Innov. 1, 2 (2014), 76\u201388.","journal-title":"Supercomput. Front. Innov."},{"key":"e_1_3_1_57_2","unstructured":"V. Vanhoucke and M. Z. Mao. 2011. Improving the speed of neural networks on CPUs."},{"key":"e_1_3_1_58_2","volume-title":"Proceedings of Machine Learning and Systems (MLSys\u201921)","author":"Vivancos Isak Edo","year":"2021","unstructured":"Isak Edo Vivancos, Sayeh Sharify, Daniel Ly-Ma, et al.2021. Boveda: Building an on-chip deep learning memory hierarchy brick by brick. In Proceedings of Machine Learning and Systems (MLSys\u201921). mlsys.org."},{"key":"e_1_3_1_59_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201919)","author":"Wang Alex","year":"2019","unstructured":"Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the International Conference on Learning Representations (ICLR\u201919)."},{"issue":"1","key":"e_1_3_1_60_2","first-page":"9:1\u20139:23","article-title":"Exploiting parallelism opportunities with deep learning frameworks","volume":"18","author":"Wang Yu Emma","year":"2021","unstructured":"Yu Emma Wang, Carole-Jean Wu, Xiaodong Wang, Kim M. Hazelwood, and David Brooks. 2021. Exploiting parallelism opportunities with deep learning frameworks. ACM Trans. Archit. Code Optim. 18, 1 (2021), 9:1\u20139:23.","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_3_1_61_2","first-page":"2074","volume-title":"Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems","author":"Wen Wei","year":"2016","unstructured":"Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems. 2074\u20132082."},{"key":"e_1_3_1_62_2","first-page":"4254","volume-title":"Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI\u201922), the 34th Conference on Innovative Applications of Artificial Intelligence (IAAI\u201922), the 12th Symposium on Educational Advances in Artificial Intelligence (EAAI\u201922)","author":"Wu Donglei","year":"2022","unstructured":"Donglei Wu, Xiangyu Zou, Shuyu Zhang, Haoyu Jin, Wen Xia, and Binxing Fang. 2022. SmartIdx: Reducing communication cost in federated learning by exploiting the CNNs structures. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI\u201922), the 34th Conference on Innovative Applications of Artificial Intelligence (IAAI\u201922), the 12th Symposium on Educational Advances in Artificial Intelligence (EAAI\u201922). AAAI Press, 4254\u20134262."},{"key":"e_1_3_1_63_2","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Wu Shuang","year":"2018","unstructured":"Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. 2018. Training and inference with integers in deep neural networks. In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918). OpenReview.net."},{"key":"e_1_3_1_64_2","first-page":"811","volume-title":"Proceedings of the 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920)","author":"Zadeh Ali Hadi","year":"2020","unstructured":"Ali Hadi Zadeh, Isak Edo, Omar Mohamed Awad, and Andreas Moshovos. 2020. GOBO: Quantizing attention-based NLP models for low latency and energy efficient inference. In Proceedings of the 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920). IEEE, 811\u2013824."},{"key":"e_1_3_1_65_2","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1145\/3470496.3527438","volume-title":"Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA\u201922)","author":"Zadeh Ali Hadi","year":"2022","unstructured":"Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, and Andreas Moshovos. 2022. Mokey: Enabling narrow fixed-point inference for out-of-the-box floating-point transformer models. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA\u201922). ACM, 888\u2013901."},{"key":"e_1_3_1_66_2","volume-title":"Proceedings of the 39th IEEE International Conference on Computer Design (ICCD\u201921)","author":"Zhang Shuyu","year":"2021","unstructured":"Shuyu Zhang, Donglei Wu, Haoyu, et al.2021. QD-compressor: A quantization-based delta compression framework for deep neural networks. In Proceedings of the 39th IEEE International Conference on Computer Design (ICCD\u201921). IEEE."},{"key":"e_1_3_1_67_2","first-page":"6848","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Zhang Xiangyu","year":"2018","unstructured":"Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). Computer Vision Foundation\/IEEE Computer Society, 6848\u20136856."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617688","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3617688","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:32Z","timestamp":1750178192000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617688"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,26]]},"references-count":66,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3617688"],"URL":"https:\/\/doi.org\/10.1145\/3617688","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,26]]},"assertion":[{"value":"2023-03-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-14","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}