{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:25:03Z","timestamp":1750220703844,"version":"3.41.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2019,10,8]],"date-time":"2019-10-08T00:00:00Z","timestamp":1570492800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Macronix International Co., Ltd."},{"name":"Ministry of Science and Technology","award":["106-2221-E-002-037-MY3, 107-2923-E-001-001-MY3, 108-2218-E-002- 048, 108-2221-E-001-001-MY3, 108-2221-E-001-004-MY3, 106-2218-E-194-012-MY3"],"award-info":[{"award-number":["106-2221-E-002-037-MY3, 107-2923-E-001-001-MY3, 108-2218-E-002- 048, 108-2221-E-001-001-MY3, 108-2221-E-001-004-MY3, 106-2218-E-194-012-MY3"]}]},{"DOI":"10.13039\/501100001869","name":"Academia Sinica","doi-asserted-by":"crossref","award":["AS-CDA-107-M05"],"award-info":[{"award-number":["AS-CDA-107-M05"]}],"id":[{"id":"10.13039\/501100001869","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2019,10,31]]},"abstract":"<jats:p>Neural networks over conventional computing platforms are heavily restricted by the data volume and performance concerns. While non-volatile memory offers potential solutions to data volume issues, challenges must be faced over performance issues, especially with asymmetric read and write performance. Beside that, critical concerns over endurance must also be resolved before non-volatile memory could be used in reality for neural networks. This work addresses the performance and endurance concerns altogether by proposing a data-aware programming scheme. We propose to consider neural network training jointly with respect to the data-flow and data-content points of view. In particular, methodologies with approximate results over Dual-SET operations were presented. Encouraging results were observed through a series of experiments, where great efficiency and lifetime enhancement is seen without sacrificing the result accuracy.<\/jats:p>","DOI":"10.1145\/3358191","type":"journal-article","created":{"date-parts":[[2019,10,10]],"date-time":"2019-10-10T13:13:05Z","timestamp":1570713185000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Achieving Lossless Accuracy with Lossy Programming for Efficient Neural-Network Training on NVM-Based Systems"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9435-6598","authenticated-orcid":false,"given":"Wei-Chen","family":"Wang","sequence":"first","affiliation":[{"name":"Macronix Emerging System Lab., Macronix International Co., Ltd., Taiwan and Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1282-2111","authenticated-orcid":false,"given":"Yuan-Hao","family":"Chang","sequence":"additional","affiliation":[{"name":"Institute of Information Science, Academia Sinica, Taipei City, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tei-Wei","family":"Kuo","sequence":"additional","affiliation":[{"name":"College of Engineering, City University of Hong Kong, Hong Kong and Department of Computer Science and Information Engineering, National Taiwan University, Taiwan and Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei City, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3460-8674","authenticated-orcid":false,"given":"Chien-Chung","family":"Ho","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu-Ming","family":"Chang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hung-Sheng","family":"Chang","sequence":"additional","affiliation":[{"name":"Macronix Emerging System Lab., Macronix International Co., Ltd., Hsinchu City, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,10,8]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"1","article-title":"Onyx: A prototype phase change memory storage array","volume":"1","author":"Akel A.","year":"2011","unstructured":"A. Akel , A. M. Caulfield , T. I. Mollov , R. K. Gupta , and S. Swanson . 2011 . Onyx: A prototype phase change memory storage array . HotStorage 1 (2011), 1 . A. Akel, A. M. Caulfield, T. I. Mollov, R. K. Gupta, and S. Swanson. 2011. Onyx: A prototype phase change memory storage array. HotStorage 1 (2011), 1.","journal-title":"HotStorage"},{"volume-title":"Proceedings of the 2014 International Conference on Hardware\/Software Codesign and System Synthesis (CODES\u201914)","author":"Chang B.","unstructured":"B. Chang , Y. Chang , H. Chang , T. Kuo , and H. Li . 2014. A PCM translation layer for integrated memory and storage management . In Proceedings of the 2014 International Conference on Hardware\/Software Codesign and System Synthesis (CODES\u201914) . 6:1--6:10. B. Chang, Y. Chang, H. Chang, T. Kuo, and H. Li. 2014. A PCM translation layer for integrated memory and storage management. In Proceedings of the 2014 International Conference on Hardware\/Software Codesign and System Synthesis (CODES\u201914). 6:1--6:10.","key":"e_1_2_1_2_1"},{"volume-title":"2015 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). 22--29","author":"Chang H.","unstructured":"H. Chang , Y. Chang , T. Kuo , and H. Li . 2015. A light-weighted software-controlled cache for PCM-based main memory systems . In 2015 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). 22--29 . H. Chang, Y. Chang, T. Kuo, and H. Li. 2015. A light-weighted software-controlled cache for PCM-based main memory systems. In 2015 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). 22--29.","key":"e_1_2_1_3_1"},{"volume-title":"Proceedings of the 2016 International Conference on Management of Data (SIGMOD\u201916)","author":"Chen S.","unstructured":"S. Chen , S. Jiang , B. He , and X. Tang . 2016. A study of sorting algorithms on approximate memory . In Proceedings of the 2016 International Conference on Management of Data (SIGMOD\u201916) . 647--662. S. Chen, S. Jiang, B. He, and X. Tang. 2016. A study of sorting algorithms on approximate memory. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD\u201916). 647--662.","key":"e_1_2_1_4_1"},{"volume-title":"2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27--39","author":"Chi P.","unstructured":"P. Chi , S. Li , C. Xu , T. Zhang , J. Zhao , Y. Liu , Y. Wang , and Y. Xie . 2016. PRIME: A novel processing-in-memory architecture for neural network computation in reram-based main memory . In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27--39 . P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27--39.","key":"e_1_2_1_5_1"},{"volume-title":"2009 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 347--357","author":"Cho S.","unstructured":"S. Cho and H. Lee . 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance . In 2009 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 347--357 . S. Cho and H. Lee. 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In 2009 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 347--357.","key":"e_1_2_1_6_1"},{"unstructured":"M. Courbariaux Y. Bengio and J. P. David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. CoRR abs\/1511.00363 (2015).  M. Courbariaux Y. Bengio and J. P. David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. CoRR abs\/1511.00363 (2015).","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 25th International Conference on Neural Information Processing Systems -","volume":"1","author":"Dean J.","unstructured":"J. Dean , G. S. Corrado , R. Monga , K. Chen , M. Devin , Q. V. Le , M. Z. Mao , M. Ranzato , A. Senior , P. Tucker , K. Yang , and A. Y. Ng . 2012. Large scale distributed deep networks . In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS\u201912). 1223--1231. J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS\u201912). 1223--1231."},{"volume-title":"2018 IEEE International Memory Workshop (IMW). 1--4.","author":"Deguchi Y.","unstructured":"Y. Deguchi , K. Maeda , S. Suzuki , T. Nakamura , and K. Takeuchi . 2018. Error-reduction controller techniques of TaOx-based ReRAM for deep neural networks to extend data-retention lifetime by over 1700x . In 2018 IEEE International Memory Workshop (IMW). 1--4. Y. Deguchi, K. Maeda, S. Suzuki, T. Nakamura, and K. Takeuchi. 2018. Error-reduction controller techniques of TaOx-based ReRAM for deep neural networks to extend data-retention lifetime by over 1700x. In 2018 IEEE International Memory Workshop (IMW). 1--4.","key":"e_1_2_1_9_1"},{"volume-title":"2018 IEEE International Memory Workshop (IMW). 1--4.","author":"Deguchi Y.","unstructured":"Y. Deguchi and K. Takeuchi . 2018. 3D-NAND flash solid-state drive (SSD) for deep neural network weight storage of IoT edge devices with 700x data-retention lifetime extention . In 2018 IEEE International Memory Workshop (IMW). 1--4. Y. Deguchi and K. Takeuchi. 2018. 3D-NAND flash solid-state drive (SSD) for deep neural network weight storage of IoT edge devices with 700x data-retention lifetime extention. In 2018 IEEE International Memory Workshop (IMW). 1--4.","key":"e_1_2_1_10_1"},{"volume-title":"Proceedings of the 55th Annual Design Automation Conference. 169","author":"Donato M.","unstructured":"M. Donato , B. Reagen , L. Pentecost , U. Gupta , D. Brooks , and G. Wei . 2018. On-chip deep neural network storage with multi-level eNVM . In Proceedings of the 55th Annual Design Automation Conference. 169 . M. Donato, B. Reagen, L. Pentecost, U. Gupta, D. Brooks, and G. Wei. 2018. On-chip deep neural network storage with multi-level eNVM. In Proceedings of the 55th Annual Design Automation Conference. 169.","key":"e_1_2_1_11_1"},{"unstructured":"S. Gupta A. Agrawal K. Gopalakrishnan and P. Narayanan. 2015. Deep learning with limited numerical precision. CoRR abs\/1502.02551 (2015).  S. Gupta A. Agrawal K. Gopalakrishnan and P. Narayanan. 2015. Deep learning with limited numerical precision. CoRR abs\/1502.02551 (2015).","key":"e_1_2_1_12_1"},{"unstructured":"P. Gysel M. Motamedi and S. Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. CoRR abs\/1604.03168 (2016).  P. Gysel M. Motamedi and S. Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. CoRR abs\/1604.03168 (2016).","key":"e_1_2_1_13_1"},{"volume-title":"International Workshop on Artificial Neural Networks. 195--201","author":"Han J.","unstructured":"J. Han and C. Moraga . 1995. The influence of the sigmoid function parameters on the speed of backpropagation learning . In International Workshop on Artificial Neural Networks. 195--201 . J. Han and C. Moraga. 1995. The influence of the sigmoid function parameters on the speed of backpropagation learning. In International Workshop on Artificial Neural Networks. 195--201.","key":"e_1_2_1_14_1"},{"unstructured":"A. G. Howard M. Zhu B. Chen D. Kalenichenko W. Wang T. Weyand M. Andreetto and H. Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs\/1704.04861 (2017).  A. G. Howard M. Zhu B. Chen D. Kalenichenko W. Wang T. Weyand M. Andreetto and H. Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs\/1704.04861 (2017).","key":"e_1_2_1_15_1"},{"unstructured":"F. N. Iandola M. W. Moskewicz K. Ashraf S. Han W. J. Dally and K. Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and &lt;1MB model size. CoRR abs\/1602.07360 (2016).  F. N. Iandola M. W. Moskewicz K. Ashraf S. Han W. J. Dally and K. Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and &lt;1MB model size. CoRR abs\/1602.07360 (2016).","key":"e_1_2_1_16_1"},{"volume-title":"Proceedings of the 22Nd ACM International Conference on Multimedia (MM\u201914)","author":"Jia Y.","unstructured":"Y. Jia , E. Shelhamer , J. Donahue , S. Karayev , J. Long , R. Girshick , S. Guadarrama , and T. Darrell . 2014. Caffe: Convolutional architecture for fast feature embedding . In Proceedings of the 22Nd ACM International Conference on Multimedia (MM\u201914) . 675--678. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia (MM\u201914). 675--678.","key":"e_1_2_1_17_1"},{"volume-title":"2010 Symposium on VLSI Technology. 203--204","author":"Kim I. S.","unstructured":"I. S. Kim , S. L. Cho , D. H. Im , E. H. Cho , D. H. Kim , G. H. Oh , D. H. Ahn , S. O. Park , S. W. Nam , J. T. Moon , and C. H. Chung . 2010. High performance PRAM cell scalable to sub-20nm technology with below 4F2 cell size, extendable to DRAM applications . In 2010 Symposium on VLSI Technology. 203--204 . I. S. Kim, S. L. Cho, D. H. Im, E. H. Cho, D. H. Kim, G. H. Oh, D. H. Ahn, S. O. Park, S. W. Nam, J. T. Moon, and C. H. Chung. 2010. High performance PRAM cell scalable to sub-20nm technology with below 4F2 cell size, extendable to DRAM applications. In 2010 Symposium on VLSI Technology. 203--204.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","volume-title":"One weird trick for parallelizing convolutional neural networks. CoRR abs\/1404.5997","author":"Krizhevsky A.","year":"2014","unstructured":"A. Krizhevsky . 2014. One weird trick for parallelizing convolutional neural networks. CoRR abs\/1404.5997 ( 2014 ). A. Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. CoRR abs\/1404.5997 (2014)."},{"unstructured":"A. Krizhevsky V. Nair and G. Hinton. 2009. CIFAR-10 (canadian institute for advanced research). (2009). http:\/\/www.cs.toronto.edu\/ kriz\/cifar.html.  A. Krizhevsky V. Nair and G. Hinton. 2009. CIFAR-10 (canadian institute for advanced research). (2009). http:\/\/www.cs.toronto.edu\/ kriz\/cifar.html.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 25th International Conference on Neural Information Processing Systems -","volume":"1","author":"Krizhevsky A.","unstructured":"A. Krizhevsky , I. Sutskever , and G. Hinton . 2012. ImageNet classification with deep convolutional neural networks . In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS\u201912). 1097--1105. A. Krizhevsky, I. Sutskever, and G. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS\u201912). 1097--1105."},{"unstructured":"A. Kuznetsova H. Rom N. Alldrin J. Uijlings I. Krasin J. Pont-Tuset S. Kamali S. Popov M. Malloci T. Duerig and V. Ferrari. 2018. The open images dataset V4: Unified image classification object detection and visual relationship detection at scale. arXiv:1811.00982 (2018).  A. Kuznetsova H. Rom N. Alldrin J. Uijlings I. Krasin J. Pont-Tuset S. Kamali S. Popov M. Malloci T. Duerig and V. Ferrari. 2018. The open images dataset V4: Unified image classification object detection and visual relationship detection at scale. arXiv:1811.00982 (2018).","key":"e_1_2_1_22_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_23_1","DOI":"10.1109\/5.726791"},{"doi-asserted-by":"publisher","key":"e_1_2_1_24_1","DOI":"10.1145\/1555815.1555758"},{"volume-title":"2014 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA). 1--6.","author":"Li B.","unstructured":"B. Li , Y. Hu , and X. Li . 2014. Short-SET: An energy-efficient write scheme for MLC PCM . In 2014 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA). 1--6. B. Li, Y. Hu, and X. Li. 2014. Short-SET: An energy-efficient write scheme for MLC PCM. In 2014 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA). 1--6.","key":"e_1_2_1_25_1"},{"volume-title":"Proceedings of the Conference on Design, Automation 8 Test in Europe (DATE\u201914)","author":"Li B.","unstructured":"B. Li , S. Shan , Y. Hu , and X. Li . 2014. Partial-SET: Write speedup of PCM main memory . In Proceedings of the Conference on Design, Automation 8 Test in Europe (DATE\u201914) . 53:1--53:4. B. Li, S. Shan, Y. Hu, and X. Li. 2014. Partial-SET: Write speedup of PCM main memory. In Proceedings of the Conference on Design, Automation 8 Test in Europe (DATE\u201914). 53:1--53:4.","key":"e_1_2_1_26_1"},{"volume-title":"2018 IEEE Symposium on VLSI Technology.","author":"Lue H.","unstructured":"H. Lue , W. Chen , H. Chang , K. Wang , and C. Lu . 2018. A novel 3D AND-type NVM architecture capable of high-density, low-power in-memory sum-of-product computation for artificial intelligence application . In 2018 IEEE Symposium on VLSI Technology. H. Lue, W. Chen, H. Chang, K. Wang, and C. Lu. 2018. A novel 3D AND-type NVM architecture capable of high-density, low-power in-memory sum-of-product computation for artificial intelligence application. In 2018 IEEE Symposium on VLSI Technology.","key":"e_1_2_1_27_1"},{"volume-title":"2016 26th International Conference on Field Programmable Logic and Applications (FPL).","author":"Ma Y.","unstructured":"Y. Ma , N. Suda , Y. Cao , J. Seo , and S. Vrudhula . 2016. Scalable and modularized RTL compilation of convolutional neural networks onto FPGA . In 2016 26th International Conference on Field Programmable Logic and Applications (FPL). Y. Ma, N. Suda, Y. Cao, J. Seo, and S. Vrudhula. 2016. Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL).","key":"e_1_2_1_28_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.1109\/TCAD.2018.2858360"},{"volume-title":"MT29F128G08AE[C\/E]BB, MT29F256G08AK[C\/E]BB.","year":"2013","unstructured":"Micron. 2013. NAND flash memory MT29F64G08AB[C\/E]BB , MT29F128G08AE[C\/E]BB, MT29F256G08AK[C\/E]BB. ( 2013 ). Micron. 2013. NAND flash memory MT29F64G08AB[C\/E]BB, MT29F128G08AE[C\/E]BB, MT29F256G08AK[C\/E]BB. (2013).","key":"e_1_2_1_30_1"},{"volume-title":"Proceedings of the 27th International Conference on Machine Learning (ICML-10)","author":"Nair V.","unstructured":"V. Nair and G. E. Hinton . 2010. Rectified linear units improve restricted boltzmann machines . In Proceedings of the 27th International Conference on Machine Learning (ICML-10) . 807--814. V. Nair and G. E. Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 807--814.","key":"e_1_2_1_31_1"},{"unstructured":"NVIDIA. 2018. NVIDIA GeForce GTX 1080. (2018). https:\/\/www.nvidia.com\/en-us\/geforce\/products\/10series\/geforce-gtx-1080\/.  NVIDIA. 2018. NVIDIA GeForce GTX 1080. (2018). https:\/\/www.nvidia.com\/en-us\/geforce\/products\/10series\/geforce-gtx-1080\/.","key":"e_1_2_1_32_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_33_1","DOI":"10.1145\/2366231.2337203"},{"volume-title":"Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA\u201909)","author":"Qureshi M. K.","unstructured":"M. K. Qureshi , V. Srinivasan , and J. A. Rivers . 2009. Scalable high performance main memory system using phase-change memory technology . In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA\u201909) . 24--33. M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA\u201909). 24--33.","key":"e_1_2_1_34_1"},{"doi-asserted-by":"crossref","unstructured":"M. Rastegari V. Ordonez J. Redmon and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. CoRR abs\/1603.05279 (2016).  M. Rastegari V. Ordonez J. Redmon and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. CoRR abs\/1603.05279 (2016).","key":"e_1_2_1_35_1","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_2_1_36_1","volume-title":"Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems. 693--701.","author":"Recht B.","year":"2011","unstructured":"B. Recht , C. Re , S. Wright , and F. Niu . 2011 . Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems. 693--701. B. Recht, C. Re, S. Wright, and F. Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems. 693--701."},{"volume-title":"The 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-49)","author":"Rhu M.","unstructured":"M. Rhu , N. Gimelshein , J. Clemons , A. Zulfiqar , and S. W. Keckler . 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design . In The 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-49) . 18:1--18:13. M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, and S. W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In The 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-49). 18:1--18:13.","key":"e_1_2_1_37_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_38_1","DOI":"10.1007\/s11263-015-0816-y"},{"volume-title":"2013 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 25--36","author":"Sampson A.","unstructured":"A. Sampson , J. Nelson , K. Strauss , and L. Ceze . 2013. Approximate storage in solid-state memories . In 2013 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 25--36 . A. Sampson, J. Nelson, K. Strauss, and L. Ceze. 2013. Approximate storage in solid-state memories. In 2013 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 25--36.","key":"e_1_2_1_39_1"},{"volume-title":"Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916)","author":"Shafiee A.","unstructured":"A. Shafiee , A. Nag , N. Muralimanohar , R. Balasubramonian , J. P. Strachan , M. Hu , R. S. Williams , and V. Srikumar . 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars . In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916) . 14--26. A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916). 14--26.","key":"e_1_2_1_40_1"},{"unstructured":"K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs\/1409.1556 (2014).  K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs\/1409.1556 (2014).","key":"e_1_2_1_41_1"},{"volume-title":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--9.","author":"Szegedy C.","unstructured":"C. Szegedy , W. Liu , Y. Jia , P. Sermanet , S. Reed , D. Anguelov , D. Erhan , V. Vanhoucke , and A. Rabinovich . 2015. Going deeper with convolutions . In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--9. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--9.","key":"e_1_2_1_42_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_43_1","DOI":"10.1109\/TVLSI.2018.2819210"},{"doi-asserted-by":"publisher","key":"e_1_2_1_44_1","DOI":"10.1109\/TCAD.2018.2858459"},{"unstructured":"O. Yadan K. Adams Y. Taigman and M. Ranzato. 2013. Multi-GPU training of ConvNets. CoRR abs\/1312.5853 (2013).  O. Yadan K. Adams Y. Taigman and M. Ranzato. 2013. Multi-GPU training of ConvNets. CoRR abs\/1312.5853 (2013).","key":"e_1_2_1_45_1"},{"volume-title":"2007 IEEE International Symposium on Circuits and Systems.","author":"Yang B.","unstructured":"B. Yang , J. Lee , J. Kim , J. Cho , S. Lee , and B. Yu . 2007. A low power phase-change random access memory using a data-comparison write scheme . In 2007 IEEE International Symposium on Circuits and Systems. B. Yang, J. Lee, J. Kim, J. Cho, S. Lee, and B. Yu. 2007. A low power phase-change random access memory using a data-comparison write scheme. In 2007 IEEE International Symposium on Circuits and Systems.","key":"e_1_2_1_46_1"},{"volume-title":"2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 282--293","author":"Yue J.","unstructured":"J. Yue and Y. Zhu . 2013. Accelerating write by exploiting PCM asymmetries . In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 282--293 . J. Yue and Y. Zhu. 2013. Accelerating write by exploiting PCM asymmetries. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 282--293.","key":"e_1_2_1_47_1"},{"volume-title":"2017 IEEE International Conference on Computer Design (ICCD). 585--588","author":"Zhang M.","unstructured":"M. Zhang , L. Zhang , L. Jiang , F. T. Chong , and Z. Liu . 2017. Quick-and-Dirty: Improving performance of MLC PCM by using temporary short writes . In 2017 IEEE International Conference on Computer Design (ICCD). 585--588 . M. Zhang, L. Zhang, L. Jiang, F. T. Chong, and Z. Liu. 2017. Quick-and-Dirty: Improving performance of MLC PCM by using temporary short writes. In 2017 IEEE International Conference on Computer Design (ICCD). 585--588.","key":"e_1_2_1_48_1"},{"doi-asserted-by":"crossref","unstructured":"P. Zhou B. Zhao J. Yang and Y. Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. (Jun 2009).  P. Zhou B. Zhao J. Yang and Y. Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. (Jun 2009).","key":"e_1_2_1_49_1","DOI":"10.1145\/1555754.1555759"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3358191","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3358191","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:32:58Z","timestamp":1750199578000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3358191"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,8]]},"references-count":49,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2019,10,31]]}},"alternative-id":["10.1145\/3358191"],"URL":"https:\/\/doi.org\/10.1145\/3358191","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2019,10,8]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}