{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:29:57Z","timestamp":1760239797391,"version":"build-2065373602"},"reference-count":44,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2020,12,31]],"date-time":"2020-12-31T00:00:00Z","timestamp":1609372800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"U.S. Department of Energy, Office of Science, Early Career Research Program","award":["ERKJ314"],"award-info":[{"award-number":["ERKJ314"]}]},{"name":"U.S. Department of Energy, Office of Advanced Scientific Computing Research","award":["ERKJ331","ERKJ345"],"award-info":[{"award-number":["ERKJ331","ERKJ345"]}]},{"name":"National Science Foundation, Division of Mathematical Sciences, Computational Mathematics program","award":["DMS1620280"],"award-info":[{"award-number":["DMS1620280"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>In this effort, we propose a new deep architecture utilizing residual blocks inspired by implicit discretization schemes. As opposed to the standard feed-forward networks, the outputs of the proposed implicit residual blocks are defined as the fixed points of the appropriately chosen nonlinear transformations. We show that this choice leads to the improved stability of both forward and backward propagations, has a favorable impact on the generalization power, and allows for control the robustness of the network with only a few hyperparameters. In addition, the proposed reformulation of ResNet does not introduce new parameters and can potentially lead to a reduction in the number of required layers due to improved forward stability. Finally, we derive the memory-efficient training algorithm, propose a stochastic regularization technique, and provide numerical results in support of our findings.<\/jats:p>","DOI":"10.3390\/make3010003","type":"journal-article","created":{"date-parts":[[2020,12,31]],"date-time":"2020-12-31T14:31:49Z","timestamp":1609425109000},"page":"34-55","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Robust Learning with Implicit Residual Networks"],"prefix":"10.3390","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1545-4462","authenticated-orcid":false,"given":"Viktor","family":"Reshniak","sequence":"first","affiliation":[{"name":"Data Analysis and Machine Learning, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1375-0359","authenticated-orcid":false,"given":"Clayton G.","family":"Webster","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of Tennessee at Knoxville, Knoxville, TN 37996, USA"},{"name":"Behavioral Reinforcement Learning Lab (BReLL), Lirio LLC, Knoxville, TN 37923, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hanin, B. (2019). Universal function approximation by deep neural nets with bounded width and ReLU activations. Mathematics, 7.","DOI":"10.3390\/math7100992"},{"key":"ref_3","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). The Expressive Power of Neural Networks: A View from the Width. Advances in Neural Information Processing Systems 30: 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4\u20139 December 2017, Curran Associates, Inc."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1109\/72.279181","article-title":"Learning long-term dependencies with gradient descent is difficult","volume":"5","author":"Bengio","year":"1994","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s40304-017-0103-z","article-title":"A Proposal on Machine Learning via Dynamical Systems","volume":"5","author":"Weinan","year":"2017","journal-title":"Commun. Math. Stat."},{"key":"ref_6","first-page":"10","article-title":"A mean-field optimal control formulation of deep learning","volume":"6","author":"Weinan","year":"2018","journal-title":"Res. Math. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"352","DOI":"10.1007\/s10851-019-00903-1","article-title":"Deep Neural Networks Motivated by Partial Differential Equations","volume":"62","author":"Ruthotto","year":"2020","journal-title":"J. Math. Imaging Vis."},{"key":"ref_8","unstructured":"Sonoda, S., and Murata, N. (2017, January 10). Double continuum limit of deep neural networks. Proceedings of the ICML Workshop on Principled Approaches to Deep Learning (ICML 2017), Sydney, Australia."},{"key":"ref_9","unstructured":"Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (2018). Neural Ordinary Differential Equations. Advances in Neural Information Processing Systems 31: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, QC, Canada, 3\u20138 December 2018, Curran Associates, Inc."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"014004","DOI":"10.1088\/1361-6420\/aa9a90","article-title":"Stable architectures for deep neural networks","volume":"34","author":"Haber","year":"2017","journal-title":"Inverse Probl."},{"key":"ref_11","unstructured":"Touretzky, D., Hinton, G., and Sejnowsky, T. (1988). A theoretical framework for back-propagation. Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, CMU."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"630","DOI":"10.1007\/978-3-319-46493-0_38","article-title":"Identity Mappings in Deep Residual Networks","volume":"Volume 9908","author":"Leibe","year":"2016","journal-title":"Proceedings of the 14th European Conference on Computer Vision (ECCV 2016)"},{"key":"ref_13","unstructured":"Hairer, E., N\u00f8rsett, S.P., and Wanner, G. (1993). Solving Ordinary Differential Equations I, Nonstiff Problems, Springer."},{"key":"ref_14","unstructured":"Hairer, E., Lubich, C., and Wanner, G. (2006). Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, Springer."},{"key":"ref_15","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). The Reversible Residual Network: Backpropagation Without Storing Activations. Advances in Neural Information Processing Systems 30: 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4\u20139 December 2017, Curran Associates, Inc."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., and Holtham, E. (2018, January 2\u20137). Reversible architectures for arbitrarily deep residual neural networks. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11668"},{"key":"ref_17","unstructured":"Targ, S., Almeida, D., and Lyman, K. (2016). Resnet in resnet: Generalizing residual architectures. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely Connected Convolutional Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_19","unstructured":"Chaudhuri, K., and Salakhutdinov, R. (2019, January 9\u201315). IMEXnet A Forward Stable Deep Neural Network. Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA."},{"key":"ref_20","unstructured":"El Ghaoui, L., Gu, F., Travacca, B., and Askari, A. (2019). Implicit deep learning. arXiv."},{"key":"ref_21","unstructured":"Wallach, H., Larochelle, H., Beygelzimer, A., d\u2019Alch\u00e9-Buc, F., Fox, E., and Garnett, R. (2020). Deep Equilibrium Models. Advances in Neural Information Processing Systems 32: 32nd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8\u201314 December 2019, Curran Associates, Inc."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2229","DOI":"10.1103\/PhysRevLett.59.2229","article-title":"Generalization of back-propagation to recurrent neural networks","volume":"59","author":"Pineda","year":"1987","journal-title":"Phys. Rev. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1083","DOI":"10.1016\/0005-1098(92)90053-I","article-title":"Neural networks for control systems\u2014A survey","volume":"28","author":"Hunt","year":"1992","journal-title":"Automatica"},{"key":"ref_24","unstructured":"Li, M., He, L., and Lin, Z. (2020, January 12\u201318). Implicit Euler Skip Connections: Enhancing Adversarial Robustness via Numerical Stability. Proceedings of the 37th International Conference on Machine Learning (ICML 2020), Vienna, Austria."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Reshniak, V., and Webster, C. (2020). Robust Learning with Implicit Residual Networks, Unpublished work.","DOI":"10.3390\/make3010003"},{"key":"ref_26","unstructured":"Nocedal, J., and Wright, S. (2006). Numerical Optimization, Springer. [2nd ed.]."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"4265","DOI":"10.1109\/TSP.2017.2708039","article-title":"Robust large margin deep neural networks","volume":"65","author":"Giryes","year":"2017","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_28","unstructured":"Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (2018). Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. Advances in Neural Information Processing Systems 31: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, QC, Canada, 3\u20138 December 2018, Curran Associates, Inc."},{"key":"ref_29","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2018). Improved Training of Wasserstein GANs. Advances in Neural Information Processing Systems 30: 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4\u20139 December 2017, Curran Associates, Inc."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1118","DOI":"10.1007\/s11263-019-01265-2","article-title":"Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities","volume":"128","author":"Qi","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ferrari, V., Hebert, M., Sminchisescu, C., and Weiss, Y. (2018, January 8\u201314). Improving DNN robustness to adversarial attacks using Jacobian regularization. Proceedings of the 15th European Conference on Computer Vision (ECCV 2018), Munich, Germany. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-030-01216-8"},{"key":"ref_32","unstructured":"Hoffman, J., Roberts, D.A., and Yaida, S. (2019). Robust learning with Jacobian regularization. arXiv."},{"key":"ref_33","first-page":"3563","article-title":"What regularized auto-encoders learn from the data-generating distribution","volume":"15","author":"Alain","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_34","unstructured":"Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. (May, January 30). Spectral normalization for generative adversarial networks. Proceedings of the Sixth International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Gouk, H., Frank, E., Pfahringer, B., and Cree, M. (2020). Regularisation of neural networks by enforcing Lipschitz continuity. Mach. Learn.","DOI":"10.1007\/s10994-020-05929-w"},{"key":"ref_36","unstructured":"Yoshida, Y., and Miyato, T. (2017). Spectral norm regularization for improving the generalizability of deep learning. arXiv."},{"key":"ref_37","unstructured":"Behrmann, J., Grathwohl, W., Chen, R.T., Duvenaud, D., and Jacobsen, J.H. (2019, January 9\u201315). Invertible Residual Networks. Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA."},{"key":"ref_38","unstructured":"Golub, G.H., and Van Loan, C.F. (2013). Matrix computations, The Johns Hopkins University Press. [4 ed.]."},{"key":"ref_39","unstructured":"Finlay, C., Jacobsen, J.H., Nurbekyan, L., and Oberman, A.M. (2020, January 12\u201318). How to train your neural ODE: The world of Jacobian and kinetic regularization. Proceedings of the 37th International Conference on Machine Learning (ICML 2020), Vienna, Austria."},{"key":"ref_40","unstructured":"Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., and Lin, H.T. (2020, January 6\u201312). Learning Differential Equations that are Easy to Solve. Proceedings of the 34st Annual Conference on Neural Information Processing Systems (NIPS 2020), Vancouver, BC, Canada."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1059","DOI":"10.1080\/03610918908812806","article-title":"A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines","volume":"18","author":"Hutchinson","year":"1989","journal-title":"Commun. Stat. Simul. Comput."},{"key":"ref_42","unstructured":"Grathwohl, W., Chen, R.T., Bettencourt, J., Sutskever, I., and Duvenaud, D. (2019, January 6\u20139). Ffjord: Free-form continuous dynamics for scalable reversible generative models. Proceedings of the Seventh International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1214","DOI":"10.1016\/j.apnum.2007.01.003","article-title":"An estimator for the diagonal of a matrix","volume":"57","author":"Bekas","year":"2007","journal-title":"Appl. Numer. Math."},{"key":"ref_44","unstructured":"Wallach, H., Larochelle, H., Beygelzimer, A., d\u2019Alch\u00e9-Buc, F., Fox, E., and Garnett, R. (2020). Augmented neural ODEs. Advances in Neural Information Processing Systems 32: 32nd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8\u201314 December 2019, Curran Associates, Inc."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/3\/1\/3\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:48:20Z","timestamp":1760179700000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/3\/1\/3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,31]]},"references-count":44,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["make3010003"],"URL":"https:\/\/doi.org\/10.3390\/make3010003","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2020,12,31]]}}}