{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:26:07Z","timestamp":1750220767077,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2020,5,31]],"date-time":"2020-05-31T00:00:00Z","timestamp":1590883200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2020,5,31]]},"abstract":"<jats:p>Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety of machine learning tasks and are deployed in increasing numbers of products and services. However, the computational requirements of training and evaluating large-scale DNNs are growing at a much faster pace than the capabilities of the underlying hardware platforms that they are executed upon. To address this challenge, one promising approach is to exploit the error resilient nature of DNNs by skipping or approximating computations that have negligible impact on classification accuracy. Almost all prior efforts in this direction propose static DNN approximations by either pruning network connections, implementing computations at lower precision, or compressing weights.<\/jats:p>\n          <jats:p>\n            In this work, we propose &lt;u&gt;Dy&lt;\/u&gt;namic &lt;u&gt;V&lt;\/u&gt;ariable &lt;u&gt;E&lt;\/u&gt;ffort &lt;u&gt;Deep&lt;\/u&gt; Neural Networks (D\n            <jats:sc>y<\/jats:sc>\n            VED\n            <jats:sc>eep<\/jats:sc>\n            ) to reduce the computational requirements of DNNs during inference. Complementary to the aforementioned static approaches, DyVEDeep is a dynamic approach that exploits heterogeneity in the DNN inputs to improve their compute efficiency with comparable classification accuracy and without requiring any re-training. D\n            <jats:sc>y<\/jats:sc>\n            VED\n            <jats:sc>eep<\/jats:sc>\n            equips DNNs with dynamic effort mechanisms that identify computations critical to classifying a given input and focus computational effort only on the critical computations, while skipping or approximating the rest. We propose three dynamic effort mechanisms that operate at different levels of granularity viz. neuron, feature, and layer levels. We build D\n            <jats:sc>y<\/jats:sc>\n            VED\n            <jats:sc>eep<\/jats:sc>\n            versions of six popular image recognition benchmarks (CIFAR-10, AlexNet, OverFeat, VGG-16, SqueezeNet, and Deep-Compressed-AlexNet) within the Caffe deep-learning framework. We evaluate D\n            <jats:sc>y<\/jats:sc>\n            VED\n            <jats:sc>eep<\/jats:sc>\n            on two platforms\u2014a high-performance server with a 2.7 GHz Intel Xeon E5-2680 processor and 128 GB memory, and a low-power Raspberry Pi board with an ARM Cortex A53 processor and 1 GB memory. Across all benchmarks, D\n            <jats:sc>y<\/jats:sc>\n            VED\n            <jats:sc>eep<\/jats:sc>\n            achieves 2.47\u00d7--5.15\u00d7 reduction in the number of scalar operations, which translates to 1.94\u00d7--2.23\u00d7 and 1.46\u00d7--3.46\u00d7 performance improvement over well-optimized baselines on the Xeon server and the Raspberry Pi, respectively, with comparable classification accuracy.\n          <\/jats:p>","DOI":"10.1145\/3372882","type":"journal-article","created":{"date-parts":[[2020,6,12]],"date-time":"2020-06-12T01:38:43Z","timestamp":1591925923000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["D\n            <scp>y<\/scp>\n            VED\n            <scp>eep<\/scp>"],"prefix":"10.1145","volume":"19","author":[{"given":"Sanjay","family":"Ganapathy","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Madras, Chennai, Tamil Nadu, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0470-6364","authenticated-orcid":false,"given":"Swagath","family":"Venkataramani","sequence":"additional","affiliation":[{"name":"IBM T.J. Watson Research Center, Yorktown Heights, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Giridhur","family":"Sriraman","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Madras, Chennai, Tamil Nadu, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Balaraman","family":"Ravindran","sequence":"additional","affiliation":[{"name":"Robert Bosch Centre for Data Science and AI, IIT Madras, Chennai, Tamil Nadu, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anand","family":"Raghunathan","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, Indiana, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,6,11]]},"reference":[{"volume-title":"Hinton","year":"2012","author":"Krizhevsky Alex","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","unstructured":"Pierre Sermanet David Eigen Xiang Zhang Michael Mathieu Rob Fergus and Yann Lecun. [n.d.] Overfeat: Integrated recognition localization and detection using convolutional networks. Retrieved from http:\/\/arxiv.org\/abs\/1312.6229.  Pierre Sermanet David Eigen Xiang Zhang Michael Mathieu Rob Fergus and Yann Lecun. [n.d.] Overfeat: Integrated recognition localization and detection using convolutional networks. Retrieved from http:\/\/arxiv.org\/abs\/1312.6229."},{"key":"e_1_2_1_3_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from http:\/\/arxiv.org\/abs\/1409.1556.  Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from http:\/\/arxiv.org\/abs\/1409.1556."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_5_1","unstructured":"Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1404.5997.  Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1404.5997."},{"key":"e_1_2_1_6_1","unstructured":"Dipankar Das Sasikanth Avancha Dheevatsa Mudigere Karthikeyan Vaidyanathan Srinivas Sridharan Dhiraj D. Kalamkar Bharat Kaul and Pradeep Dubey. 2016. Distributed deep learning using synchronous stochastic gradient descent. Retrieved from http:\/\/arxiv.org\/abs\/1602.06709.  Dipankar Das Sasikanth Avancha Dheevatsa Mudigere Karthikeyan Vaidyanathan Srinivas Sridharan Dhiraj D. Kalamkar Bharat Kaul and Pradeep Dubey. 2016. Distributed deep learning using synchronous stochastic gradient descent. Retrieved from http:\/\/arxiv.org\/abs\/1602.06709."},{"volume-title":"Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS\u201912)","author":"Dean Jeffrey","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2014-274"},{"volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR\u201911)","year":"2011","author":"Farabet C.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_2_1_11_1","unstructured":"Norman Jouppi. [n.d.] Google supercharges machine learning tasks with custom chip. Retrieved from https:\/\/cloudplatform.googleblog.com\/2016\/05\/Google-supercharges-machine-learning-tasks-with-custom-chip.html.  Norman Jouppi. [n.d.] Google supercharges machine learning tasks with custom chip. Retrieved from https:\/\/cloudplatform.googleblog.com\/2016\/05\/Google-supercharges-machine-learning-tasks-with-custom-chip.html."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080244"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2744769.2744900"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2627369.2627625"},{"volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201989)","author":"LeCun Yann","key":"e_1_2_1_15_1"},{"volume":"201","journal-title":"William J. Dally.","author":"Han Song","key":"e_1_2_1_16_1"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2014-281"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2627369.2627613"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178146"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472822"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00061"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2968456.2968458"},{"volume-title":"Proceedings of the 54th ACM\/EDAC\/IEEE Design Automation Conference (DAC\u201917)","author":"Tann H.","key":"e_1_2_1_23_1"},{"volume-title":"Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan.","year":"2018","author":"Choi Jungwook","key":"e_1_2_1_24_1"},{"volume-title":"Proceedings of the International Conference on Hardware\/Software Codesign and System Synthesis (CODES+ISSS\u201915)","year":"2015","author":"Park E.","key":"e_1_2_1_25_1"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Surat Teerapittayanon Bradley McDanel and H. T. Kung. 2017. BranchyNet: Fast inference via early exiting from deep neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1709.01686.  Surat Teerapittayanon Bradley McDanel and H. T. Kung. 2017. BranchyNet: Fast inference via early exiting from deep neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1709.01686.","DOI":"10.1109\/ICPR.2016.7900006"},{"key":"e_1_2_1_27_1","unstructured":"Yoshua Bengio. 2013. Estimating or propagating gradients through stochastic neurons. Retrieved from http:\/\/arxiv.org\/abs\/1305.2982.  Yoshua Bengio. 2013. Estimating or propagating gradients through stochastic neurons. Retrieved from http:\/\/arxiv.org\/abs\/1305.2982."},{"volume-title":"Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS\u201913)","author":"Ba Lei Jimmy","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2744769.2744904"},{"volume-title":"Janardhan Rao Doppa, and Partha Pande","year":"2018","author":"Jayakodi Nitthilan Kannappan","key":"e_1_2_1_30_1"},{"volume-title":"Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE\u201916)","author":"Panda P.","key":"e_1_2_1_31_1"},{"key":"e_1_2_1_32_1","unstructured":"Wenlin Chen James T. Wilson Stephen Tyree Kilian Q. Weinberger and Yixin Chen. 2015. Compressing neural networks with the hashing trick. Retrieved from http:\/\/arxiv.org\/abs\/1504.04788.  Wenlin Chen James T. Wilson Stephen Tyree Kilian Q. Weinberger and Yixin Chen. 2015. Compressing neural networks with the hashing trick. Retrieved from http:\/\/arxiv.org\/abs\/1504.04788."},{"volume":"201","journal-title":"William J. Dally.","author":"Han Song","key":"e_1_2_1_33_1"},{"volume-title":"Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.","year":"2014","author":"Jia Yangqing","key":"e_1_2_1_34_1"},{"volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919)","year":"2019","author":"Wu C.","key":"e_1_2_1_35_1"},{"key":"e_1_2_1_36_1","unstructured":"BVLC. Caffe model zoo. [n.d.]. Retrieved from https:\/\/github.com\/BVLC\/caffe\/wiki\/Model-Zoo.  BVLC. Caffe model zoo. [n.d.]. Retrieved from https:\/\/github.com\/BVLC\/caffe\/wiki\/Model-Zoo."},{"key":"e_1_2_1_37_1","unstructured":"BVLC. Caffe cifar-10 network. [n.d.]. Retrieved from https:\/\/github.com\/BVLC\/caffe\/blob\/master\/examples\/cifar10\/cifar10_quick_train_test.prototxt.  BVLC. Caffe cifar-10 network. [n.d.]. Retrieved from https:\/\/github.com\/BVLC\/caffe\/blob\/master\/examples\/cifar10\/cifar10_quick_train_test.prototxt."},{"key":"e_1_2_1_39_1","unstructured":"Forrest N. Iandola Matthew W. Moskewicz Khalid Ashraf Song Han William J. Dally and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50\u00d7 fewer parameters and &lt;1 MB model size. Retrieved from http:\/\/arxiv.org\/abs\/1602.07360.  Forrest N. Iandola Matthew W. Moskewicz Khalid Ashraf Song Han William J. Dally and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50\u00d7 fewer parameters and &lt;1 MB model size. Retrieved from http:\/\/arxiv.org\/abs\/1602.07360."},{"volume-title":"Keckler","year":"2016","author":"Rhu Minsoo","key":"e_1_2_1_40_1"},{"key":"e_1_2_1_41_1","unstructured":"Emily Denton Wojciech Zaremba Joan Bruna Yann LeCun and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. Retrieved from http:\/\/arxiv.org\/abs\/1404.0736.  Emily Denton Wojciech Zaremba Joan Bruna Yann LeCun and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. Retrieved from http:\/\/arxiv.org\/abs\/1404.0736."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.28.88"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915)","year":"2015","author":"Liu Baoyuan","key":"e_1_2_1_43_1"},{"key":"e_1_2_1_44_1","unstructured":"Micha\u00ebl Mathieu Mikael Henaff and Yann LeCun. 2013. Fast training of convolutional networks through FFTs. Retrieved from http:\/\/arxiv.org\/abs\/1312.5851.  Micha\u00ebl Mathieu Mikael Henaff and Yann LeCun. 2013. Fast training of convolutional networks through FFTs. Retrieved from http:\/\/arxiv.org\/abs\/1312.5851."},{"key":"e_1_2_1_45_1","unstructured":"Michael Figurnov Dmitry P. Vetrov and Pushmeet Kohli. 2015. PerforatedCNNs: Acceleration through elimination of redundant convolutions. Retrieved from http:\/\/arxiv.org\/abs\/1504.08362.  Michael Figurnov Dmitry P. Vetrov and Pushmeet Kohli. 2015. PerforatedCNNs: Acceleration through elimination of redundant convolutions. Retrieved from http:\/\/arxiv.org\/abs\/1504.08362."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372882","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3372882","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:09Z","timestamp":1750200069000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372882"}},"subtitle":["Dynamic Variable Effort Deep Neural Networks"],"short-title":[],"issued":{"date-parts":[[2020,5,31]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,5,31]]}},"alternative-id":["10.1145\/3372882"],"URL":"https:\/\/doi.org\/10.1145\/3372882","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2020,5,31]]},"assertion":[{"value":"2019-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-06-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}