{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:03:26Z","timestamp":1750309406475,"version":"3.41.0"},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,9,11]],"date-time":"2024-09-11T00:00:00Z","timestamp":1726012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"<jats:p>\n            Certain data compression techniques like pruning leads to\n            <jats:italic>unstructured sparse Convolution Neural Network (CNN)<\/jats:italic>\n            models without directly leveraging sparsity in optimizing both memory consumption and inference latency of a model having low to medium sparsity. State-of-the-art storage techniques either optimize model size at the cost of execution latency or optimize inference latency at the overhead of the memory consumption of the model. This tradeoff is largely due to the absence of storage selection methodology addressing\n            <jats:italic>sparsity sensitivity<\/jats:italic>\n            , arising from varied\n            <jats:italic>sparsity<\/jats:italic>\n            and positions of nonzero values called\n            <jats:italic>sparsity structure<\/jats:italic>\n            across different sparse layers of a model. However, this issue remains unexplored due to the lack of support to handle sparse data in the current deployment standards for edge devices.\n          <\/jats:p>\n          <jats:p>\n            This article introduces a data compaction strategy for\n            <jats:italic>unstructured<\/jats:italic>\n            sparse data that\n            <jats:italic>not only compresses nonzero data but also encodes it, leveraging the memory consumption and latency reduction benefits of both data compression and data encoding techniques<\/jats:italic>\n            . We propose a novel storage representation, named\n            <jats:italic>Encoded Partitioned Hybrid Sparse<\/jats:italic>\n            (EPaHS) format, which addresses sparsity sensitivity by customizing data storage based on the sparsity structure of the data. Our data compaction technique and storage solution optimizes the tradeoff between the memory consumption and inference latency of a sparse model without altering the network architecture and affecting its accuracy. Our solution easily extends to higher-dimensional data and outperforms standard storage solutions. It proves to be beneficial to all the valid mode orientations of multi-dimensional data.\n          <\/jats:p>\n          <jats:p>\n            For an important health and wellness application, a single-lead short-time ECG classification model, EPaHS achieves up to\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt 16.18\\%}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            reduction in size and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt 15.16\\%}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            reduction in latency when compared to its original model of\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt 42}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            MB size and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt 26.35}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            sec latency, having\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt \\approx 59\\%}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            sparsity. For a ResNet50 model handling higher-dimensional data, it achieves\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt 21.33\\%}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            size reduction and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt 53.9\\%}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            latency gain against the original model of\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt 3265}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            KB size and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt 1.7}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            sec latency, having\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\tt \\approx 67\\%}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            sparsity.\n          <\/jats:p>","DOI":"10.1145\/3687239","type":"journal-article","created":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T11:49:36Z","timestamp":1724327376000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Efficient Low-Memory Implementation of Sparse CNNs Using Encoded Partitioned Hybrid Sparse Format"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3470-130X","authenticated-orcid":false,"given":"Barnali","family":"Basak","sequence":"first","affiliation":[{"name":"TCS Research, Tata Consultancy Services Ltd Kolkata, Kolkata, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2178-8154","authenticated-orcid":false,"given":"Pallab","family":"Dasgupta","sequence":"additional","affiliation":[{"name":"Computer Science &amp; Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9101-8051","authenticated-orcid":false,"given":"Arpan","family":"Pal","sequence":"additional","affiliation":[{"name":"Innovation Lab, Tata Consultancy Services Ltd., Kolkata, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,9,11]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3005348"},{"key":"e_1_3_2_3_2","article-title":"ONNX: Open Neural Network Exchange","author":"Bai Junjie","year":"2023","unstructured":"Junjie Bai, Fang Lu, Ke Zhang, et\u00a0al. 2023. ONNX: Open Neural Network Exchange. Leiden University. Retrieved from https:\/\/github.com\/onnx\/onnx","journal-title":"Leiden University"},{"key":"e_1_3_2_4_2","volume-title":"Compiler Support for Sparse Matrix Computations","author":"Bik Aart J. C.","year":"1996","unstructured":"Aart J. C. Bik. 1996. Compiler Support for Sparse Matrix Computations. Ph. D. Dissertation. Leiden University."},{"key":"e_1_3_2_5_2","unstructured":"Robert David Jared Duke Advait Jain Vijay Janapa Reddi Nat Jeffries Jian Li Nick Kreeger Ian Nappier Meghna Natraj Shlomi Regev Rocky Rhodes Tiezhen Wang and Pete Warden. 2021. TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems. Retrieved from https:\/\/arxiv.org\/abs\/2010.08678"},{"key":"e_1_3_2_6_2","unstructured":"William Fedus Barret Zoph and Noam Shazeer. 2022. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Retrieved from https:\/\/arxiv.org\/abs\/2101.03961"},{"key":"e_1_3_2_7_2","unstructured":"Jonathan Frankle. 2022. OpenLTH: A Framework for Lottery Tickets and Beyond. Retrieved from https:\/\/github.com\/facebookresearch\/open_lth"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1966.1053907"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","unstructured":"Song Han Huizi Mao and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Networks with Pruning Trained Quantization and Huffman Coding. DOI:10.48550\/ARXIV.1510.00149","DOI":"10.48550\/ARXIV.1510.00149"},{"key":"e_1_3_2_10_2","first-page":"1135","volume-title":"28th International Conference on Neural Information Processing Systems (NIPS\u201915)","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In 28th International Conference on Neural Information Processing Systems (NIPS\u201915). MIT Press, Cambridge, MA, USA, 1135\u20131143."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2015. Deep Residual Learning for Image Recognition. DOI:10.48550\/ARXIV.1512.03385","DOI":"10.48550\/ARXIV.1512.03385"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3334614"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00022"},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","unstructured":"Tailin Liang John Glossner Lei Wang and Shaobo Shi. 2021. Pruning and quantization for deep neural network acceleration: A survey. Retrieved from https:\/\/arxiv.org\/abs\/2101.09671","DOI":"10.1016\/j.neucom.2021.07.045"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2017.75"},{"key":"e_1_3_2_18_2","unstructured":"G. M. Morton. 1966. A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing. Technical Report Ottawa Canada: IBM Ltd. https:\/\/domino.research.ibm.com\/library\/cyberdig.nsf\/papers\/0DABF9473B9C86D48525779800566A39\/$File\/Morton1966.pdf"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356216"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00023"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080254"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","unstructured":"Pranav Rajpurkar Awni Y. Hannun Masoumeh Haghpanahi Codie Bourn and Andrew Y. Ng. 2017. Cardiologist-level Arrhythmia Detection with Convolutional Neural Networks. DOI:10.48550\/ARXIV.1707.01836","DOI":"10.48550\/ARXIV.1707.01836"},{"key":"e_1_3_2_23_2","unstructured":"Rohit Sharma. 2022. deepC. Retrieved from https:\/\/github.com\/ai-techsystems\/deepC"},{"key":"e_1_3_2_24_2","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).","journal-title":"arXiv preprint arXiv:1409.1556"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/2833179.2833183"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","unstructured":"Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott E. Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke and Andrew Rabinovich. 2014. Going deeper with convolutions. Retrieved from http:\/\/arxiv.org\/abs\/1409.4842","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582047"},{"key":"e_1_3_2_28_2","unstructured":"Manzil Zaheer Guru Guruganesh Avinava Dubey Joshua Ainslie Chris Alberti Santiago Onta\u00f1\u00f3n Philip Pham Anirudh Ravula Qifan Wang Li Yang and Amr Ahmed. 2020. Big Bird: Transformers for longer sequences. Retrieved from https:\/\/arxiv.org\/abs\/2007.14062"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687239","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3687239","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:58:01Z","timestamp":1750294681000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687239"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,11]]},"references-count":27,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3687239"],"URL":"https:\/\/doi.org\/10.1145\/3687239","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2024,9,11]]},"assertion":[{"value":"2023-05-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-02","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}