{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T05:14:55Z","timestamp":1767590095082,"version":"3.48.0"},"reference-count":30,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T00:00:00Z","timestamp":1767139200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>When training a neural network, the choice of activation function can greatly impact its performance. A function with a larger derivative may cause the coefficients of the latter layers to deviate further from the calculated direction, making deep learning more difficult to train. However, an activation function with a derivative amplitude of less than one can result in the problem of a vanishing gradient. To overcome this drawback, we propose the application of pseudo-normalization to enlarge some gradients by dividing them by the root mean square. This amplification is performed every few layers to ensure that the amplitudes are larger than one, thus avoiding the condition of vanishing gradient and preventing gradient explosion. We successfully applied this approach to several deep learning networks with hyperbolic tangent activation for image classifications. To gain a deeper understanding of the algorithm, we employed interpretability techniques to examine the network\u2019s prediction outcomes. We discovered that, in contrast to popular networks that learn picture characteristics, the networks primarily employ the contour information of images for categorization. This suggests that our technique can be utilized in addition to other widely used algorithms.<\/jats:p>","DOI":"10.3390\/e28010057","type":"journal-article","created":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T16:34:23Z","timestamp":1767198863000},"page":"57","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method"],"prefix":"10.3390","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0405-9512","authenticated-orcid":false,"given":"Yun","family":"Bu","sequence":"first","affiliation":[{"name":"School of Electrical Engineering and Electronic Information, Xihua University, Chengdu 610039, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6138-3082","authenticated-orcid":false,"given":"Wenbo","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Electronic Information, Xihua University, Chengdu 610039, China"}]},{"given":"Gang","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Electronic Information, Xihua University, Chengdu 610039, China"}]},{"given":"Qiang","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Sciences, Southwest Petroleum University, Chengdu 610500, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1527","DOI":"10.1162\/neco.2006.18.7.1527","article-title":"A Fast Learning Algorithm for Deep Belief Nets","volume":"18","author":"Hinton","year":"2006","journal-title":"Neural Comput."},{"key":"ref_2","unstructured":"Glorot, X., and Bengio, Y. (2010, January 13\u201315). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy."},{"key":"ref_3","unstructured":"Krizheevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the Neural Information Processing System (NIPS), Lake Tahoe, Nevada."},{"key":"ref_4","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the International Conference on Machine Learning (ICML), Lille, France."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_6","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"6999","DOI":"10.1109\/TNNLS.2021.3084827","article-title":"A survey of convolutional neural networks: Analysis, applications, and prospects","volume":"33","author":"Li","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_8","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2014, January 8\u201313). Distilling the knowledge in a neural network. Proceedings of the Neural Information Processing System (NIPS), Montreal, Canada."},{"key":"ref_9","unstructured":"Cao, H., Tan, C., Gao, Z., Xu, Y., Chen, G., Heng, P.-A., and Li, S.Z. (2023). A survey on generative diffusion model. arXiv."},{"key":"ref_10","unstructured":"Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Networks. arXiv."},{"key":"ref_11","first-page":"3523","article-title":"Image segmentation using deep learning: A survey","volume":"44","author":"Minaee","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/TNNLS.2020.2978386","article-title":"A comprehensive on graph neural networks","volume":"31","author":"Wu","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1109\/TNNLS.2021.3070843","article-title":"A survey on knowledge graphs: Representation, acquisition, and applications","volume":"33","author":"Ji","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_14","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention Is All You Need. Proceedings of the Neural Information Processing System (NIPS), Long Beach, CA, USA."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1109\/TPAMI.2022.3152247","article-title":"A survey on vision transformer","volume":"45","author":"Han","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","first-page":"5149","article-title":"Meta-learning in neural networks: A survey","volume":"44","author":"Hospedales","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_19","unstructured":"Nielsen, M.A. (2015). Neural Networks and Deep Learning, Determination Press."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Tan, H.H., and Lim, K.H. (2019, January 28\u201330). Vanishing gradient mitigation with deep learning neural network optimization. Proceedings of the 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia.","DOI":"10.1109\/ICSCC.2019.8843652"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1109\/TNNLS.2020.2979121","article-title":"A novel learning algorithm to optimize deep neural networks: Evolved gradient direction optimizer (EVGO)","volume":"32","author":"Karabayir","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3007","DOI":"10.1109\/TCSVT.2017.2734838","article-title":"An end-to-end compression framework based on convolutional neural networks","volume":"28","author":"Tao","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1049\/iet-rsn.2019.0307","article-title":"Doppler net: A convolutional neural network for recognizing targets in real scenarios using a persistent range doppler radar","volume":"14","author":"Roldan","year":"2020","journal-title":"IET Radar Sonar Navig."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Le, H., Doan, V.-S., Le, D.P., Nguyen, H.-H., Huynh-The, T., Le-Ha, K., and Hoang, V.-P. (2020, January 8\u201310). Micro-Doppler-Rada-Based UAV Detection Using Inception-Residual Neural Network. Proceedings of the International Conference on Advanced Technologies for Communications, Nha Trang, Vietnam.","DOI":"10.1109\/ATC50776.2020.9255454"},{"key":"ref_25","unstructured":"Maclaurin, D., Duvenaud, D., and Adams, R.P. (2015, January 6\u201311). Gradient-based hyperparameter optimization through reversible learning. Proceedings of the International Conference on Machine Learning (ICML), Lille, France."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.eswa.2016.05.022","article-title":"Deep learning with adaptive learning rate using Laplacian score","volume":"63","author":"Chandra","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_27","unstructured":"Smith, L.N. (2015). Cyclical learning rates for training neural networks. arXiv."},{"key":"ref_28","unstructured":"Ishida, T., Yamane, I., Sakai, T., Niu, G., and Sugiyama, M. (2020). Do we need zeros training loss after achieving zeros training error?. arXiv."},{"key":"ref_29","unstructured":"Loshchilov, I., and Hutter, F. (2017, January 6\u201311). SGDR: Stochastic gradient decent with warm restarts. Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1007\/s11263-019-01228-7","article-title":"Grad-cam: Visual explanations from deep networks via gradient-based location","volume":"128","author":"Selvaraju","year":"2020","journal-title":"Int. J. Comput. Vis."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/28\/1\/57\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T05:12:04Z","timestamp":1767589924000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/28\/1\/57"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,31]]},"references-count":30,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,1]]}},"alternative-id":["e28010057"],"URL":"https:\/\/doi.org\/10.3390\/e28010057","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2025,12,31]]}}}