{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,20]],"date-time":"2025-09-20T20:49:45Z","timestamp":1758401385328,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2017,5,12]],"date-time":"2017-05-12T00:00:00Z","timestamp":1494547200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-1527151"],"award-info":[{"award-number":["CNS-1527151"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2017,7,31]]},"abstract":"<jats:p>\n            Deep neural networks have been shown to outperform prior state-of-the-art solutions that often relied heavily on hand-engineered feature extraction techniques coupled with simple classification algorithms. In particular, deep convolutional neural networks have been shown to dominate on several popular public benchmarks such as the ImageNet database. Unfortunately, the benefits of deep networks have yet to be fully exploited in embedded, resource-bound settings that have strict power and area budgets. Graphical processing unit (GPU) have been shown to improve throughput and energy-efficiency over central processing unit (CPU) due to their highly parallel architecture yet still impose a significant power burden. In a similar fashion, field programmable gate array (FPGA) can be used to improve performance while further allowing more fine-grained control over implementation to improve efficiency. In order to reduce power and area while still achieving required throughput, classification-efficient network architectures are required in addition to optimal deployment on efficient hardware. In this work, we target both of these enterprises. For the first objective, we analyze simple, biologically inspired reduction strategies that are applied both before and after training. The central theme of the techniques is the introduction of sparsification to help dissolve away the dense connectivity that is often found at different levels in convolutional neural networks. The sparsification techniques include\n            <jats:italic>feature compression partition<\/jats:italic>\n            ,\n            <jats:italic>structured filter pruning<\/jats:italic>\n            , and\n            <jats:italic>dynamic feature pruning<\/jats:italic>\n            . Additionally, we explore\n            <jats:italic>filter factorization<\/jats:italic>\n            and\n            <jats:italic>filter quantization<\/jats:italic>\n            approximation techniques to further reduce the complexity of convolutional layers. In the second contribution, we propose SPARCNet, a hardware accelerator for efficient deployment of\n            <jats:italic>SPAR<\/jats:italic>\n            se\n            <jats:italic>C<\/jats:italic>\n            onvolutional\n            <jats:italic>NET<\/jats:italic>\n            works. The accelerator looks to enable deploying networks in such resource-bound settings by both exploiting efficient forms of parallelism inherent in convolutional layers and by exploiting the sparsification and approximation techniques proposed. To demonstrate both contributions, modern deep convolutional network architectures containing millions of parameters are explored within the context of the computer vision dataset CIFAR. Utilizing the reduction techniques, we demonstrate the ability to reduce computation and memory by 60% and 93% with less than 0.03% impact on accuracy when compared to the best baseline network with 93.47% accuracy. The SPARCNet accelerator with different numbers of processing engines is implemented on a low-power Artix-7 FPGA platform. Additionally, the same networks are optimally implemented on a number of embedded commercial-off-the-shelf platforms including NVIDIAs CPU+GPU SoCs TK1 and TX1 and Intel Edison. Compared to NVIDIAs TK1 and TX1, the FPGA-based accelerator obtains 11.8 \u00d7 and 7.5 \u00d7 improvement in energy efficiency while maintaining a classification throughput of 72 images\/s. When further compared to a number of recent FPGA-based accelerators, SPARCNet is able to achieve up to 15 \u00d7 improvement in energy efficiency while consuming less than 2W of total board power at 100MHz. In addition to improving efficiency, the accelerator has built-in support for sparsification techniques and ability to perform in-place rectified linear unit (ReLU) activation function, max-pooling, and batch normalization.\n          <\/jats:p>","DOI":"10.1145\/3005448","type":"journal-article","created":{"date-parts":[[2017,5,15]],"date-time":"2017-05-15T12:13:58Z","timestamp":1494850438000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":35,"title":["SPARCNet"],"prefix":"10.1145","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8220-2594","authenticated-orcid":false,"given":"Adam","family":"Page","sequence":"first","affiliation":[{"name":"University of Maryland, Baltimore County, MD, USA"}]},{"given":"Ali","family":"Jafari","sequence":"additional","affiliation":[{"name":"University of Maryland, Baltimore County, MD, USA"}]},{"given":"Colin","family":"Shea","sequence":"additional","affiliation":[{"name":"University of Maryland, Baltimore County, MD, USA"}]},{"given":"Tinoosh","family":"Mohsenin","sequence":"additional","affiliation":[{"name":"University of Maryland, Baltimore County, MD, USA"}]}],"member":"320","published-online":{"date-parts":[[2017,5,12]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2015. NVIDIA Jetson TX1 Supercomputer-on-Module Drives Next Wave of Autonomous Machines. Retrieved from https:\/\/devblogs.nvidia.com\/parallelforall\/nvidia-jetson-tx1-supercomputer-on-module-drives-next- wave-of-autonomous-machines\/.  2015. NVIDIA Jetson TX1 Supercomputer-on-Module Drives Next Wave of Autonomous Machines. Retrieved from https:\/\/devblogs.nvidia.com\/parallelforall\/nvidia-jetson-tx1-supercomputer-on-module-drives-next- wave-of-autonomous-machines\/."},{"key":"e_1_2_1_2_1","volume-title":"Provable bounds for learning some deep representations. CoRR abs\/1310.6343","author":"Arora Sanjeev","year":"2013","unstructured":"Sanjeev Arora , Aditya Bhaskara , Rong Ge , and Tengyu Ma. 2013. Provable bounds for learning some deep representations. CoRR abs\/1310.6343 ( 2013 ). Retrieved from http:\/\/arxiv.org\/abs\/1310.6343. Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. 2013. Provable bounds for learning some deep representations. CoRR abs\/1310.6343 (2013). Retrieved from http:\/\/arxiv.org\/abs\/1310.6343."},{"key":"e_1_2_1_3_1","volume-title":"12th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing.","author":"Balaji Pavan","year":"2012","unstructured":"Pavan Balaji , Rajkumar Buyya , Shikharesh Majumdar , and Suraj Pandey . 2012 . 12th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing. Pavan Balaji, Rajkumar Buyya, Shikharesh Majumdar, and Suraj Pandey. 2012. 12th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815993"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2016.7418007"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2016.7418007"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2014.106"},{"key":"e_1_2_1_8_1","volume-title":"Dally","author":"Han Song","year":"2016","unstructured":"Song Han , Xingyu Liu , Huizi Mao , Jing Pu , Ardavan Pedram , Mark A. Horowitz , and William J . Dally . 2016 . EIE : Efficient inference engine on compressed deep neural network. CoRR abs\/1602.01528 (2016). Retrieved from http:\/\/arxiv.org\/abs\/1602.01528. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. CoRR abs\/1602.01528 (2016). Retrieved from http:\/\/arxiv.org\/abs\/1602.01528."},{"key":"e_1_2_1_9_1","volume-title":"Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 ( 2015 ). Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/BioCAS.2015.7348376"},{"key":"e_1_2_1_11_1","unstructured":"Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).  Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2014.2356197"},{"key":"e_1_2_1_13_1","volume-title":"Muller and Giacomo Indiveri","author":"Lorenz","year":"2015","unstructured":"Lorenz K. Muller and Giacomo Indiveri . 2015 . Rounding methods for neural networks with low resolution synaptic weights. arXiv preprint arXiv:1504.05767 (2015). Lorenz K. Muller and Giacomo Indiveri. 2015. Rounding methods for neural networks with low resolution synaptic weights. arXiv preprint arXiv:1504.05767 (2015)."},{"volume-title":"Proceedings of the 28th International Conference on Machine Learning (ICML-11)","author":"Ngiam Jiquan","key":"e_1_2_1_14_1","unstructured":"Jiquan Ngiam , Aditya Khosla , Mingyu Kim , Juhan Nam , Honglak Lee , and Andrew Y. Ng . 2011. Multimodal deep learning . In Proceedings of the 28th International Conference on Machine Learning (ICML-11) . 689--696. Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 689--696."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2013.2294137"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/BioCAS.2015.7348372"},{"volume-title":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 1--8.","author":"Page A.","key":"e_1_2_1_18_1","unstructured":"A. Page and T. Mohsenin . 2016. FPGA-based reduction techniques for efficient deep neural network deployment . In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 1--8. A. Page and T. Mohsenin. 2016. FPGA-based reduction techniques for efficient deep neural network deployment. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 1--8."},{"key":"e_1_2_1_19_1","first-page":"109","article-title":"A flexible multichannel eeg feature extractor and classifier for seizure detection","volume":"62","author":"Page A.","year":"2015","unstructured":"A. Page , Chris Sagedy , and others. 2015 b. A flexible multichannel eeg feature extractor and classifier for seizure detection . IEEE Trans. Circ. Syst. II: Expr. Briefs 62 , 2 (2015), 109 -- 113 . A. Page, Chris Sagedy, and others. 2015b. A flexible multichannel eeg feature extractor and classifier for seizure detection. IEEE Trans. Circ. Syst. II: Expr. Briefs 62, 2 (2015), 109--113.","journal-title":"IEEE Trans. Circ. Syst. II: Expr. Briefs"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2016.7527433"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2016.7418003"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"S. W. Park J. Park K. Bong D. Shin J. Lee S. Choi and H. J. Yoo. 2016b. An energy-efficient and scalable deep learning\/inference processor with tetra-parallel MIMD architecture for big data applications. IEEE Trans. Biomed. Circ. Syst. (2016).  S. W. Park J. Park K. Bong D. Shin J. Lee S. Choi and H. J. Yoo. 2016b. An energy-efficient and scalable deep learning\/inference processor with tetra-parallel MIMD architecture for big data applications. IEEE Trans. Biomed. Circ. Syst. (2016).","DOI":"10.1109\/TBCAS.2015.2504563"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2013.6657019"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847265"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934583.2934599"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2016.7418008"},{"key":"e_1_2_1_27_1","unstructured":"Kihyuk Sohn Wenling Shang and Honglak Lee. 2014. Improved multimodal deep learning with variation of information. In Advances in Neural Information Processing Systems. 2141--2149.  Kihyuk Sohn Wenling Shang and Honglak Lee. 2014. Improved multimodal deep learning with variation of information. In Advances in Neural Information Processing Systems. 2141--2149."},{"key":"e_1_2_1_28_1","volume-title":"Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806","author":"Springenberg Jost Tobias","year":"2014","unstructured":"Jost Tobias Springenberg , Alexey Dosovitskiy , Thomas Brox , and Martin Riedmiller . 2014. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 ( 2014 ). Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_30_1","volume-title":"Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567","author":"Szegedy Christian","year":"2015","unstructured":"Christian Szegedy , Vincent Vanhoucke , Sergey Ioffe , Jonathon Shlens , and Zbigniew Wojna . 2015b. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567 ( 2015 ). Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015b. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567 (2015)."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/BSN.2015.7299406"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3005448","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3005448","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3005448","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:23:51Z","timestamp":1750220631000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3005448"}},"subtitle":["A Hardware Accelerator for Efficient Deployment of Sparse Convolutional Networks"],"short-title":[],"issued":{"date-parts":[[2017,5,12]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,7,31]]}},"alternative-id":["10.1145\/3005448"],"URL":"https:\/\/doi.org\/10.1145\/3005448","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"type":"print","value":"1550-4832"},{"type":"electronic","value":"1550-4840"}],"subject":[],"published":{"date-parts":[[2017,5,12]]},"assertion":[{"value":"2016-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-05-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}