{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T04:52:54Z","timestamp":1772254374348,"version":"3.50.1"},"reference-count":99,"publisher":"Association for Computing Machinery (ACM)","issue":"5","funder":[{"name":"Intel Strategic Research Sector (SRS) - Emerging Technology"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>Back propagation (BP) is the default solution for gradient computation in neural network training. However, implementing BP-based training on various edge devices such as FPGA, microcontrollers (MCUs), and analog computing platforms faces multiple major challenges, such as the lack of hardware resources, long time-to-market, and dramatic errors in a low-precision setting. This article presents a simple BP-free training scheme on an MCU, which makes edge training hardware design as easy as inference hardware design. We adopt a quantized zeroth-order method to estimate the gradients of quantized model parameters, which can overcome the error of a straight-through estimator in a low-precision BP scheme. We further employ a few dimension reduction methods (e.g., node perturbation, sparse training) to improve the convergence of zeroth-order training. Experiment results show that our BP-free training achieves comparable performance as BP-based training on adapting a pre-trained image classifier to various corrupted data on resource-constrained edge devices (e.g., an MCU with 1024-KB SRAM for dense full-model training, or an MCU with 256-KB SRAM for sparse training). This method is most suitable for application scenarios where memory cost and time-to-market are the major concerns, but longer latency can be tolerated.<\/jats:p>","DOI":"10.1145\/3745772","type":"journal-article","created":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T07:14:31Z","timestamp":1751354071000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Poor Man\u2019s Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach"],"prefix":"10.1145","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-2785-6789","authenticated-orcid":false,"given":"Yequan","family":"Zhao","sequence":"first","affiliation":[{"name":"Electrical and Computer Engineering, University of California Santa Barbara","place":["Santa Barbara, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7668-569X","authenticated-orcid":false,"given":"Hai","family":"Li","sequence":"additional","affiliation":[{"name":"Technology Research - Exploratory Integrated Circuits, Intel Corporation","place":["Hillsboro, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4017-5265","authenticated-orcid":false,"given":"Ian","family":"Young","sequence":"additional","affiliation":[{"name":"Technology Research - Exploratory Integrated Circuits, Intel Corporation","place":["Hillsboro, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2292-0030","authenticated-orcid":false,"given":"Zheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of California Santa Barbara","place":["Santa Barbara, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,8,12]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Armen Aghajanyan Sonal Gupta and Luke Zettlemoyer. 2021. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 7319\u20137328.","DOI":"10.18653\/v1\/2021.acl-long.568"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2016.07.006"},{"issue":"153","key":"e_1_3_2_4_2","first-page":"1","article-title":"Automatic differentiation in machine learning: A survey","volume":"18","author":"Baydin Atilim Gunes","year":"2018","unstructured":"Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2018. Automatic differentiation in machine learning: A survey. Journal of Machine Learning Research 18, 153 (2018), 1\u201343.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_5_2","unstructured":"At\u0131l\u0131m G\u00fcne\u015f Baydin Barak A. Pearlmutter Don Syme Frank Wood and Philip Torr. 2022. Gradients without backpropagation. arXiv preprint arXiv:2202.08587 (2022)."},{"key":"e_1_3_2_6_2","unstructured":"Yoshua Bengio Nicholas L\u00e9onard and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)."},{"key":"e_1_3_2_7_2","first-page":"10809","article-title":"A mathematical model for automatic differentiation in machine learning","volume":"33","author":"Bolte J\u00e9r\u00f4me","year":"2020","unstructured":"J\u00e9r\u00f4me Bolte and Edouard Pauwels. 2020. A mathematical model for automatic differentiation in machine learning. Advances in Neural Information Processing Systems 33 (2020), 10809\u201310819.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10599-4_29"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7908-2604-3_16"},{"key":"e_1_3_2_10_2","first-page":"11285","article-title":"Tinytl: Reduce memory, not parameters for efficient on-device learning","volume":"33","author":"Cai Han","year":"2020","unstructured":"Han Cai, Chuang Gan, Ligeng Zhu, and Song Han. 2020. Tinytl: Reduce memory, not parameters for efficient on-device learning. Advances in Neural Information Processing Systems 33 (2020), 11285\u201311297.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_11_2","first-page":"1193","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Cai HanQin","year":"2021","unstructured":"HanQin Cai, Yuchen Lou, Daniel McKenzie, and Wotao Yin. 2021. A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization. In Proceedings of the International Conference on Machine Learning. PMLR, 1193\u20131203."},{"key":"e_1_3_2_12_2","unstructured":"Aochuan Chen Yimeng Zhang Jinghan Jia James Diffenderfer Jiancheng Liu Konstantinos Parasyris Yihua Zhang Zheng Zhang Bhavya Kailkhura and Sijia Liu. 2023. Deepzero: Scaling up zeroth-order optimization for deep model training. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_13_2","first-page":"883","article-title":"A statistical framework for low-bitwidth training of deep neural networks","volume":"33","author":"Chen Jianfei","year":"2020","unstructured":"Jianfei Chen, Yu Gai, Zhewei Yao, Michael W. Mahoney, and Joseph E. Gonzalez. 2020. A statistical framework for low-bitwidth training of deep neural networks. Advances in Neural Information Processing Systems 33 (2020), 883\u2013894.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3128572.3140448"},{"key":"e_1_3_2_15_2","first-page":"15834","article-title":"The lottery ticket hypothesis for pre-trained bert networks","volume":"33","author":"Chen Tianlong","year":"2020","unstructured":"Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, and Michael Carbin. 2020. The lottery ticket hypothesis for pre-trained bert networks. Advances in Neural Information Processing Systems 33 (2020), 15834\u201315846.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_16_2","article-title":"Improving black-box adversarial attacks with a transfer-based prior","volume":"32","author":"Cheng Shuyu","year":"2019","unstructured":"Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. 2019. Improving black-box adversarial attacks with a transfer-based prior. Advances in Neural Information Processing Systems 32 (2019).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_17_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Chiang Ping-yeh","year":"2022","unstructured":"Ping-yeh Chiang, Renkun Ni, David Yu Miller, Arpit Bansal, Jonas Geiping, Micah Goldblum, and Tom Goldstein. 2022. Loss landscapes are all you need: Neural network generalization can be explained without the implicit bias of gradient descent. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_2_18_2","article-title":"Separable physics-informed neural networks","volume":"36","author":"Cho Junwoo","year":"2024","unstructured":"Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, and Eunbyung Park. 2024. Separable physics-informed neural networks. Advances in Neural Information Processing Systems 36 (2024).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_19_2","unstructured":"Aakanksha Chowdhery Pete Warden Jonathon Shlens Andrew Howard and Rocky Rhodes. 2019. Visual wake words dataset. arXiv preprint arXiv:1906.05721 (2019)."},{"key":"e_1_3_2_20_2","unstructured":"Sander Dalm Marcel van Gerven and Nasir Ahmad. 2023. Effective learning with node perturbation in deep neural networks. arXiv preprint arXiv:2310.00965 (2023)."},{"key":"e_1_3_2_21_2","first-page":"800","article-title":"Tensorflow lite micro: Embedded machine learning for tinyml systems","volume":"3","author":"David Robert","year":"2021","unstructured":"Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, et\u00a0al. 2021. Tensorflow lite micro: Embedded machine learning for tinyml systems. Proceedings of Machine Learning and Systems 3 (2021), 800\u2013811.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_22_2","first-page":"4937","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Dellaferrera Giorgia","year":"2022","unstructured":"Giorgia Dellaferrera and Gabriel Kreiman. 2022. Error-driven input modulation: Solving the credit assignment problem without a backward pass. In Proceedings of the International Conference on Machine Learning. PMLR, 4937\u20134955."},{"key":"e_1_3_2_23_2","article-title":"A guide through the zoo of biased SGD","volume":"36","author":"Demidovich Yury","year":"2024","unstructured":"Yury Demidovich, Grigory Malinovsky, Igor Sokolov, and Peter Richt\u00e1rik. 2024. A guide through the zoo of biased SGD. Advances in Neural Information Processing Systems 36 (2024).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_25_2","unstructured":"Hao Di Haishan Ye Yueling Zhang Xiangyu Chang Guang Dai and Ivor W. Tsang. 2024. Double variance reduction: A smoothing trick for composite optimization problems without first-order gradient. In Proceedings of the 41st International Conference on Machine Learning. 10792\u201310810."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2015.2409256"},{"key":"e_1_3_2_27_2","first-page":"10249","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Fournier Louis","year":"2023","unstructured":"Louis Fournier, St\u00e9phane Rivaud, Eugene Belilovsky, Michael Eickenberg, and Edouard Oyallon. 2023. Can forward gradient match backpropagation?. In Proceedings of the International Conference on Machine Learning. PMLR, 10249\u201310264."},{"key":"e_1_3_2_28_2","unstructured":"Jonathan Frankle and Michael Carbin. 2018. The lottery ticket hypothesis: Finding sparse trainable neural networks. In International Conference on Learning Representations."},{"key":"e_1_3_2_29_2","unstructured":"Jonathan Frankle David J. Schwab and Ari S. Morcos. 2020. Training batchnorm and only batchnorm: On the expressive power of random features in CNNs. In International Conference on Learning Representations."},{"key":"e_1_3_2_30_2","first-page":"7077","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Gao Katelyn","year":"2022","unstructured":"Katelyn Gao and Ozan Sener. 2022. Generalizing Gaussian smoothing for random search. In Proceedings of the International Conference on Machine Learning. PMLR, 7077\u20137101."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1137\/120880811"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i9.16928"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218593"},{"key":"e_1_3_2_34_2","first-page":"8649","article-title":"L2ight: Enabling on-chip learning for optical neural networks via efficient in-situ subspace optimization","volume":"34","author":"Gu Jiaqi","year":"2021","unstructured":"Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Zixuan Jiang, Ray Chen, and David Pan. 2021. L2ight: Enabling on-chip learning for optical neural networks via efficient in-situ subspace optimization. Advances in Neural Information Processing Systems 34 (2021), 8649\u20138661.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Zixian Guo Yuxiang Wei Ming Liu Zhilong Ji Jinfeng Bai Yiwen Guo and Wangmeng Zuo. 2023. Black-box tuning of vision-language models with effective gradient approximation. In Findings of the Association for Computational Linguistics: EMNLP 2023. 5356\u20135368.","DOI":"10.18653\/v1\/2023.findings-emnlp.356"},{"key":"e_1_3_2_36_2","unstructured":"Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations."},{"key":"e_1_3_2_37_2","unstructured":"Geoffrey Hinton. 2022. The forward-forward algorithm: Some preliminary investigations. arXiv preprint arXiv:2212.13345 (2022)."},{"key":"e_1_3_2_38_2","first-page":"31929","article-title":"On the stability and scalability of node perturbation learning","volume":"35","author":"Hiratani Naoki","year":"2022","unstructured":"Naoki Hiratani, Yash Mehta, Timothy Lillicrap, and Peter E. Latham. 2022. On the stability and scalability of node perturbation learning. Advances in Neural Information Processing Systems 35 (2022), 31929\u201331941.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2025.3528366"},{"key":"e_1_3_2_40_2","unstructured":"Kai Huang Hanyun Yin Heng Huang and Wei Gao. 2023. Towards green AI in fine-tuning large language models via adaptive backpropagation. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_41_2","first-page":"448","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. PMLR, 448\u2013456."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_3_2_43_2","first-page":"3100","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ji Kaiyi","year":"2019","unstructured":"Kaiyi Ji, Zhe Wang, Yi Zhou, and Yingbin Liang. 2019. Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization. In Proceedings of the International Conference on Machine Learning. PMLR, 3100\u20133109."},{"key":"e_1_3_2_44_2","unstructured":"Jinyang Jiang Zeliang Zhang Chenliang Xu Zhaofei Yu and Yijie Peng. 2023. One forward is enough for neural network training via likelihood ratio method. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_45_2","unstructured":"Qing Jin Jian Ren Richard Zhuang Sumant Hanumante Zhengang Li Zhiyu Chen Yanzhi Wang Kaiyuan Yang and Sergey Tulyakov. 2023. F8Net: Fixed-point 8-bit only multiplication for network quantization. In International Conference on Learning Representations."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE48585.2020.9116273"},{"key":"e_1_3_2_47_2","unstructured":"Diederik P. Kingma. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2013.77"},{"key":"e_1_3_2_49_2","unstructured":"Alex Krizhevsky Geoffrey Hinton et\u00a0al. 2009. Learning multiple layers of features from tiny images. (2009)."},{"key":"e_1_3_2_50_2","first-page":"25812","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"Kwon Young D.","year":"2024","unstructured":"Young D. Kwon, Rui Li, Stylianos I. Venieris, Jagmohan Chauhan, Nicholas D. Lane, and Cecilia Mascolo. 2024. TinyTrain: Resource-aware task-adaptive sparse training of DNNs at the data-scarce edge. In Proceedings of the 41st International Conference on Machine Learning. 25812\u201325843."},{"key":"e_1_3_2_51_2","unstructured":"L. Lai N. Suda and V. CMSIS-NN Chandra. [n. d.]. Efficient neural network kernels for arm cortex-M CPUs. arXiv 2018. arXiv preprint arXiv:1801.06601 ([n. d.])."},{"key":"e_1_3_2_52_2","first-page":"21","volume-title":"Proceedings of the 1988 Connectionist Models Summer School","volume":"1","author":"LeCun Yann","year":"1988","unstructured":"Yann LeCun, D. Touresky, G. Hinton, and T. Sejnowski. 1988. A theoretical framework for back-propagation. In Proceedings of the 1988 Connectionist Models Summer School, Vol. 1. 21\u201328."},{"key":"e_1_3_2_53_2","unstructured":"Yoonho Lee Annie S. Chen Fahim Tajwar Ananya Kumar Huaxiu Yao Percy Liang and Chelsea Finn. 2023. Surgical fine-tuning improves adaptation to distribution shifts. In The Eleventh International Conference on Learning Representations."},{"issue":"3","key":"e_1_3_2_54_2","first-page":"3411","article-title":"Low dimensional trajectory hypothesis is true: DNNs can be trained in tiny subspaces","volume":"45","author":"Li Tao","year":"2022","unstructured":"Tao Li, Lei Tan, Zhehao Huang, Qinghua Tao, Yipeng Liu, and Xiaolin Huang. 2022. Low dimensional trajectory hypothesis is true: DNNs can be trained in tiny subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3411\u20133420.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_55_2","first-page":"11711","article-title":"MCUNet: Tiny deep learning on IoT devices","volume":"33","author":"Lin Ji","year":"2020","unstructured":"Ji Lin, Wei-Ming Chen, Yujun Lin, Chuang Gan, and Song Han. 2020. MCUNet: Tiny deep learning on IoT devices. Advances in Neural Information Processing Systems 33 (2020), 11711\u201311722.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_56_2","first-page":"22941","article-title":"On-device training under 256kb memory","volume":"35","author":"Lin Ji","year":"2022","unstructured":"Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, and Song Han. 2022. On-device training under 256kb memory. Advances in Neural Information Processing Systems 35 (2022), 22941\u201322954.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_57_2","first-page":"288","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","author":"Liu Sijia","year":"2018","unstructured":"Sijia Liu, Jie Chen, Pin-Yu Chen, and Alfred Hero. 2018. Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 288\u2013297."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2020.3003837"},{"key":"e_1_3_2_59_2","first-page":"53038","article-title":"Fine-tuning language models with just forward passes","volume":"36","author":"Malladi Sadhika","year":"2023","unstructured":"Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, and Sanjeev Arora. 2023. Fine-tuning language models with just forward passes. Advances in Neural Information Processing Systems 36 (2023), 53038\u201353075.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_60_2","first-page":"23610","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Malladi Sadhika","year":"2023","unstructured":"Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, and Sanjeev Arora. 2023. A kernel-based view of language model fine-tuning. In Proceedings of the International Conference on Machine Learning. PMLR, 23610\u201323641."},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1305"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v008.i14"},{"key":"e_1_3_2_63_2","unstructured":"Pramod Kaushik Mudrakarta Mark Sandler Andrey Zhmoginov and Andrew Howard. 2018. K for the price of 1: Parameter-efficient multi-task and transfer learning. arXiv preprint arXiv:1810.10703 (2018)."},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10208-015-9296-2"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICVGIP.2008.47"},{"key":"e_1_3_2_66_2","unstructured":"Shuaicheng Niu Chunyan Miao Guohao Chen Pengcheng Wu and Peilin Zhao. 2024. Test-time model adaptation with only forward passes. In International Conference on Machine Learning. PMLR 38298\u201338315."},{"key":"e_1_3_2_67_2","article-title":"Direct feedback alignment provides learning in deep neural networks","volume":"29","author":"N\u00f8kland Arild","year":"2016","unstructured":"Arild N\u00f8kland. 2016. Direct feedback alignment provides learning in deep neural networks. Advances in Neural Information Processing Systems 29 (2016).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_68_2","article-title":"Tensorizing neural networks","volume":"28","author":"Novikov Alexander","year":"2015","unstructured":"Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, and Dmitry P. Vetrov. 2015. Tensorizing neural networks. Advances in Neural Information Processing Systems 28 (2015).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02320"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01424"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.ade8450"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/1113316.1113319"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248092"},{"key":"e_1_3_2_74_2","unstructured":"Ruizhong Qiu and Hanghang Tong. 2024. Gradient compressed sensing: A query-efficient gradient estimator for high-dimensional zeroth-order optimization. In Proceedings of the 41st International Conference on Machine Learning. 41717\u201341748."},{"key":"e_1_3_2_75_2","first-page":"18293","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Rachwan John","year":"2022","unstructured":"John Rachwan, Daniel Z\u00fcgner, Bertrand Charpentier, Simon Geisler, Morgane Ayle, and Stephan G\u00fcnnemann. 2022. Winning the lottery ahead of time: Efficient early network pruning. In Proceedings of the International Conference on Machine Learning. PMLR, 18293\u201318309."},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN52387.2021.9533927"},{"key":"e_1_3_2_77_2","unstructured":"Mengye Ren Simon Kornblith Renjie Liao and Geoffrey Hinton. 2022. Scaling forward gradient with local losses. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_2_78_2","unstructured":"Tao Ren Zishi Zhang Jinyang Jiang Guanghao Li Zeliang Zhang Mingqian Feng and Yijie Peng. 2024. FLOPS: Forward learning with optimal sampling. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_3_2_79_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Shu Yao","year":"2023","unstructured":"Yao Shu, Zhongxiang Dai, Weicong Sng, Arun Verma, Patrick Jaillet, and Bryan Kian Hsiang Low. 2023. Zeroth-order optimization with trajectory-informed derivative estimation. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/9.119632"},{"key":"e_1_3_2_81_2","first-page":"24193","article-title":"Training neural networks with fixed sparse masks","volume":"34","author":"Sung Yi-Lin","year":"2021","unstructured":"Yi-Lin Sung, Varun Nair, and Colin A. Raffel. 2021. Training neural networks with fixed sparse masks. Advances in Neural Information Processing Systems 34 (2021), 24193\u201324205.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_82_2","first-page":"9614","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tsai Yun-Yun","year":"2020","unstructured":"Yun-Yun Tsai, Pin-Yu Chen, and Tsung-Yi Ho. 2020. Transfer learning without knowing: Reprogramming black-box machine learning models with scarce data and limited resources. In Proceedings of the International Conference on Machine Learning. PMLR, 9614\u20139624."},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.3301742"},{"key":"e_1_3_2_84_2","unstructured":"Chaoqi Wang Guodong Zhang and Roger Grosse. 2020. Picking winning tickets before training by preserving gradient flow. In International Conference on Learning Representations."},{"key":"e_1_3_2_85_2","unstructured":"Dequan Wang Evan Shelhamer Shaoteng Liu Bruno Olshausen and Trevor Darrell. 2021. Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations."},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00706"},{"key":"e_1_3_2_87_2","first-page":"44017","article-title":"SODA: Robust training of test-time data adaptors","volume":"36","author":"Wang Zige","year":"2023","unstructured":"Zige Wang, Yonggang Zhang, Zhen Fang, Long Lan, Wenjing Yang, and Bo Han. 2023. SODA: Robust training of test-time data adaptors. Advances in Neural Information Processing Systems 36 (2023), 44017\u201344038.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_88_2","unstructured":"Peter Welinder Steve Branson Takeshi Mita Catherine Wah Florian Schroff Serge Belongie and Pietro Perona. 2010. Caltech-UCSD birds 200. (2010)."},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-021-04223-6"},{"key":"e_1_3_2_90_2","doi-asserted-by":"crossref","unstructured":"Yifan Yang Kai Zhen Ershad Banijamal Athanasios Mouchtaris and Zheng Zhang. 2024. AdaZeta: Adaptive zeroth-order tensor-train adaption for memory-efficient large language models fine-tuning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 977\u2013995.","DOI":"10.18653\/v1\/2024.emnlp-main.56"},{"key":"e_1_3_2_91_2","unstructured":"Zi Yang Samridhi Choudhary Xinfeng Xie Cao Gao Siegfried Kunzmann and Zheng Zhang. 2024. CoMERA: Computing-and memory-efficient training via rank-adaptive tensor optimization. Advances in Neural Information Processing Systems 37 (2024) 77200\u201377225."},{"key":"e_1_3_2_92_2","unstructured":"Haishan Ye Zhichao Huang Cong Fang Chris Junchi Li and Tong Zhang. 2018. Hessian-aware zeroth-order optimization for black-box adversarial attack. arXiv preprint arXiv:1812.11377 (2018)."},{"key":"e_1_3_2_93_2","unstructured":"Xinling Yu Sean Hooten Ziyue Liu Yequan Zhao Marco Fiorentino Thomas Van Vaerenbergh and Zheng Zhang. 2024. Separable operator networks. Transactions on Machine Learning Research (2024)."},{"key":"e_1_3_2_94_2","unstructured":"Kaiqi Zhang Cole Hawkins Xiyuan Zhang Cong Hao and Zheng Zhang. 2021. On-FPGA training with ultra memory reduction: A low-precision tensor method. arXiv preprint arXiv:2104.03420 (2021)."},{"key":"e_1_3_2_95_2","unstructured":"Yihua Zhang Pingzhi Li Junyuan Hong Jiaxiang Li Yimeng Zhang Wenqing Zheng Pin-Yu Chen Jason D. Lee Wotao Yin Mingyi Hong et\u00a0al. 2024. Revisiting zeroth-order optimization for memory-efficient LLM fine-tuning: A benchmark. In International Conference on Machine Learning. PMLR 59173\u201359190."},{"key":"e_1_3_2_96_2","first-page":"18309","article-title":"Advancing model pruning via bi-level optimization","volume":"35","author":"Zhang Yihua","year":"2022","unstructured":"Yihua Zhang, Yuguang Yao, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, and Sijia Liu. 2022. Advancing model pruning via bi-level optimization. Advances in Neural Information Processing Systems 35 (2022), 18309\u201318326.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_97_2","unstructured":"Yequan Zhao Xian Xiao Xinling Yu Ziyue Liu Zhixiong Chen Geza Kurczveil Raymond G. Beausoleil and Zheng Zhang. 2023. Real-Time FJ\/MAC PDE solvers via tensorized back-propagation-free optical PINN training. arXiv preprint arXiv:2401.00413 (2023)."},{"key":"e_1_3_2_98_2","unstructured":"Yequan Zhao Xinling Yu Zhixiong Chen Ziyue Liu Sijia Liu and Zheng Zhang. 2023. Tensor-compressed back-propagation-free training for (physics-informed) neural networks. arXiv preprint arXiv:2308.09858 (2023)."},{"key":"e_1_3_2_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00204"},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613424.3614307"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3745772","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T13:16:18Z","timestamp":1755004578000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3745772"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,12]]},"references-count":99,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3745772"],"URL":"https:\/\/doi.org\/10.1145\/3745772","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,12]]},"assertion":[{"value":"2024-11-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-04","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}