{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T12:37:26Z","timestamp":1781181446804,"version":"3.54.1"},"reference-count":27,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T00:00:00Z","timestamp":1764806400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Deanship of Graduate Studies and Scientific Research at Qassim University","award":["QU-APC-2025"],"award-info":[{"award-number":["QU-APC-2025"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Axioms"],"abstract":"<jats:p>Although wavelet neural networks (WNNs) combine the expressive capability of neural models with multiscale localization, there are currently few theoretical guarantees for their training. We investigate the weight decay (L2 regularization) optimization dynamics of gradient descent (GD) for WNNs. Using explicit rates controlled by the spectrum of the regularized Gram matrix, we first demonstrate global linear convergence to the unique ridge solution for the feature regime when wavelet atoms are fixed and only the linear head is trained. Second, for fully trainable WNNs, we demonstrate linear rates in regions satisfying a Polyak\u2013\u0141ojasiewicz (PL) inequality and establish convergence of GD to stationary locations under standard smoothness and boundedness of wavelet parameters; weight decay enlarges these regions by suppressing flat directions. Third, we characterize the implicit bias in the over-parameterized neural tangent kernel (NTK) regime: GD converges to the minimum reproducing kernel Hilbert space (RKHS) norm interpolant associated with the WNN kernel with L2. In addition to an assessment process on synthetic regression, denoising, and ablations across \u03bb and stepsize, we supplement the theory with useful recommendations on initialization, stepsize schedules, and regularization scales. Together, our findings give a principled prescription for dependable training that has broad applicability to signal processing applications and shed light on when and why L2-regularized GD is stable and quick for WNNs.<\/jats:p>","DOI":"10.3390\/axioms14120899","type":"journal-article","created":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T16:41:41Z","timestamp":1765212101000},"page":"899","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["The Module Gradient Descent Algorithm via L2 Regularization for Wavelet Neural Networks"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8309-6121","authenticated-orcid":false,"given":"Khidir Shaib","family":"Mohamed","sequence":"first","affiliation":[{"name":"Department of Mathematics, College of Sciences, Qassim University, Buraydah 51452, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ibrahim. M. A.","family":"Suliman","sequence":"additional","affiliation":[{"name":"General Department of College of Technical Engineering, Bright Star University, Al-Brega P.O. Box 858, Libya"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2735-8208","authenticated-orcid":false,"given":"Abdalilah","family":"Alhalangy","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alawia","family":"Adam","sequence":"additional","affiliation":[{"name":"Department of Mathematics, College of Sciences, Qassim University, Buraydah 51452, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Muntasir","family":"Suhail","sequence":"additional","affiliation":[{"name":"Department of Mathematics, College of Sciences, Qassim University, Buraydah 51452, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Habeeb","family":"Ibrahim","sequence":"additional","affiliation":[{"name":"Department of Mathematics, College of Sciences, Qassim University, Buraydah 51452, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mona A.","family":"Mohamed","sequence":"additional","affiliation":[{"name":"Department of Mathematics, College of Sciences, Qassim University, Buraydah 51452, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sofian A. A.","family":"Saad","sequence":"additional","affiliation":[{"name":"Department of Mathematics, College of Sciences, Qassim University, Buraydah 51452, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yousif Shoaib","family":"Mohammed","sequence":"additional","affiliation":[{"name":"Department of Physics, College of Science, Qassim University, Buraydah 51452, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,4]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"131648","DOI":"10.1016\/j.neucom.2025.131648","article-title":"Wavelet-integrated deep neural networks: A systematic review of applications and synergistic architectures","volume":"657","author":"Wu","year":"2025","journal-title":"Neurocomputing"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kio, A.E., Xu, J., Gautam, N., and Ding, Y. (2024). Wavelet decomposition and neural networks: A potent combination for short term wind speed and power forecasting. Front. Energy Res., 12.","DOI":"10.3389\/fenrg.2024.1277464"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"102620","DOI":"10.1016\/j.trc.2020.102620","article-title":"Learning traffic as a graph: A gated graph wavelet recurrent neural network for network-scale traffic prediction","volume":"115","author":"Cui","year":"2020","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1007\/s12530-020-09328-3","article-title":"Fault detection in smart grids with time-varying distributed generation using wavelet energy and evolving neural networks","volume":"11","author":"Lucas","year":"2020","journal-title":"Evol. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Baharlouei, Z., Rabbani, H., and Plonka, G. (2023). Wavelet scattering transform application in classification of retinal abnormalities using OCT images. Sci. Rep., 13.","DOI":"10.1038\/s41598-023-46200-1"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, J., Wang, Z., Li, J., and Wu, J. (2018, January 19\u201323). Multilevel wavelet decomposition network for interpretable time series analysis. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK.","DOI":"10.1145\/3219819.3220060"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1007\/s10589-023-00542-8","article-title":"The continuous stochastic gradient method: Part I\u2013convergence theory","volume":"87","author":"Grieshammer","year":"2024","journal-title":"Comput. Optim. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"753","DOI":"10.1007\/s10589-025-00656-1","article-title":"On the convergence of the gradient descent method with stochastic fixed-point rounding errors under the Polyak\u2013\u0141ojasiewicz inequality","volume":"90","author":"Xia","year":"2025","journal-title":"Comput. Optim. Appl."},{"key":"ref_9","unstructured":"Galanti, T., Siegel, Z.S., Gupte, A., and Poggio, T. (2025, November 28). SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks. Available online: https:\/\/hdl.handle.net\/1721.1\/148231."},{"key":"ref_10","first-page":"1","article-title":"Neural tangent kernel: Convergence and generalization in neural networks","volume":"31","author":"Jacot","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3743128","article-title":"A survey on kolmogorov-arnold network","volume":"58","author":"Somvanshi","year":"2025","journal-title":"ACM Comput. Surv."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"52624","DOI":"10.52202\/079017-1668","article-title":"Overfitting behaviour of gaussian kernel ridgeless regression: Varying bandwidth or dimensionality","volume":"37","author":"Medvedev","year":"2024","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/s10033-023-00838-0","article-title":"Denoising fault-aware wavelet network: A signal processing informed neural network for fault diagnosis","volume":"36","author":"Shang","year":"2023","journal-title":"Chin. J. Mech. Eng."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Saragadam, V., LeJeune, D., Tan, J., Balakrishnan, G., Veeraraghavan, A., and Baraniuk, R.G. (2023, January 17\u201324). Wire: Wavelet implicit neural representations. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01775"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"020014","DOI":"10.1063\/5.0255422","article-title":"Wavelet neural networks in signal parameter estimation: A comprehensive review for next-generation wireless systems","volume":"Volume 3255","author":"Sadoon","year":"2025","journal-title":"Proceedings of the AIP Conference Proceedings"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Akujuobi, C.M. (2022). Wavelets and Wavelet Transform Systems and Their Applications, Springer International Publishing.","DOI":"10.1007\/978-3-030-87528-2"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, P., and Wen, Z. (2024). A spatiotemporal graph wavelet neural network (ST-GWNN) for association mining in timely social media data. Sci. Rep., 14.","DOI":"10.1038\/s41598-024-82433-4"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Uddin, Z., Ganga, S., Asthana, R., and Ibrahim, W. (2023). Wavelets based physics informed neural networks to solve non-linear differential equations. Sci. Rep., 13.","DOI":"10.1038\/s41598-023-29806-3"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Jung, H., Lodhi, B., and Kang, J. (2019). An automatic nuclei segmentation method based on deep convolutional neural networks for histopathology images. BMC Biomed. Eng., 1.","DOI":"10.1186\/s42490-019-0026-8"},{"key":"ref_20","first-page":"63847","article-title":"An alternating optimization method for bilevel problems under the Polyak-\u0141ojasiewicz condition","volume":"36","author":"Xiao","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"524","DOI":"10.1109\/LCSYS.2021.3082800","article-title":"Asynchronous parallel nonconvex optimization under the polyak-\u0142ojasiewicz condition","volume":"6","author":"Yazdani","year":"2021","journal-title":"IEEE Control Syst. Lett."},{"key":"ref_22","unstructured":"Chen, K., Yi, C., and Yang, H. (2024). Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks. arXiv."},{"key":"ref_23","first-page":"4481","article-title":"Weight decay induces low-rank attention layers","volume":"37","author":"Kobayashi","year":"2024","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","unstructured":"Seleznova, M., and Kutyniok, G. (2022). Analyzing finite neural networks: Can we trust neural tangent kernel theory?. Mathematical and Scientific Machine Learning, PMLR."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1007\/s13735-023-00318-0","article-title":"How does a kernel based on gradients of infinite-width neural networks come to be widely used: A review of the neural tangent kernel","volume":"13","author":"Tan","year":"2024","journal-title":"Int. J. Multimed. Inf. Retr."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Tang, A., Wang, J.B., Pan, Y., Wu, T., Chen, Y., Yu, H., and Elkashlan, M. (2025). Revisiting XL-MIMO channel estimation: When dual-wideband effects meet near field. IEEE Trans. Wirel. Commun.","DOI":"10.1109\/TWC.2025.3609466"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"025011","DOI":"10.1088\/1361-6420\/ad1a3c","article-title":"Deep unfolding as iterative regularization for imaging inverse problems","volume":"40","author":"Cui","year":"2024","journal-title":"Inverse Probl."}],"container-title":["Axioms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2075-1680\/14\/12\/899\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T16:44:48Z","timestamp":1765212288000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2075-1680\/14\/12\/899"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,4]]},"references-count":27,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["axioms14120899"],"URL":"https:\/\/doi.org\/10.3390\/axioms14120899","relation":{},"ISSN":["2075-1680"],"issn-type":[{"value":"2075-1680","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,4]]}}}