{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T19:17:43Z","timestamp":1767986263337,"version":"3.49.0"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T00:00:00Z","timestamp":1758240000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T00:00:00Z","timestamp":1758240000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004434","name":"Universit\u00e0 degli Studi di Firenze","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004434","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comput Optim Appl"],"published-print":{"date-parts":[[2026,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    A pruning-aware adaptive gradient method is proposed which classifies the variables in two sets before updating them using different strategies. This technique extends the \u201crelevant\/irrelevant\" approach of Ding et al. (Adv Neural Inf Process Syst 32, 2019) and Zimmer et al. (Mathematical optimization for machine learning: proceedings of the MATH+ thematic Einstein semester 2023, 2025) and allows a posteriori sparsification of the solution of model parameter fitting problems. The new method is proved to be convergent with a global rate of decrease of the averaged gradient\u2019s norm of the form\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\mathcal{O}(\\log (k)\/\\sqrt{k+1})$$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    . Numerical experiments on several applications show that it is competitive with existing pruning-aware Frank-Wolfe algorithms, see e.g. Zimmer et al. (Mathematical optimization for machine learning: proceedings of the MATH+ thematic Einstein semester 2023, 2025).\n                  <\/jats:p>","DOI":"10.1007\/s10589-025-00723-7","type":"journal-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T10:22:55Z","timestamp":1758277375000},"page":"85-119","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["prunAdag: an adaptive pruning-aware gradient method"],"prefix":"10.1007","volume":"93","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0183-1204","authenticated-orcid":false,"given":"Margherita","family":"Porcelli","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0482-6805","authenticated-orcid":false,"given":"Giovanni","family":"Seraghiti","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6166-1860","authenticated-orcid":false,"given":"Philippe L.","family":"Toint","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,9,19]]},"reference":[{"key":"723_CR1","unstructured":"Ding, X., Zhou, X., Guo, Y., Han, J., Liu, J. et\u00a0al.: Global sparse momentum SGD for pruning very deep neural networks. In: Advances in Neural Information Processing Systems, vol.\u00a032 (2019)"},{"key":"723_CR2","doi-asserted-by":"crossref","unstructured":"Zimmer, M., Spiegel, C., Pokutta, S.: Compression aware training of neural networks using frank-wolfe. In: Mathematical Optimization for Machine Learning: Proceedings of the MATH+ Thematic Einstein Semester 2023, p.\u00a0137 (2025)","DOI":"10.1515\/9783111376776-010"},{"issue":"7","key":"723_CR3","first-page":"2121","volume":"12","author":"J Duchi","year":"2011","unstructured":"Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121\u20132159 (2011)","journal-title":"J. Mach. Learn. Res."},{"key":"723_CR4","unstructured":"McMahan, H.B., Streeter, M.: Adaptive bound optimization for online convex optimization. arXiv preprint arXiv:1002.4908 (2010)"},{"key":"723_CR5","unstructured":"Kingma, D.P.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)"},{"key":"723_CR6","unstructured":"Tieleman, T., Hinton, G.: Lecture 6.5-Rmsprop, coursera: Neural networks for machine learning. University of Toronto, Technical Report, vol.\u00a06 (2012)"},{"key":"723_CR7","unstructured":"Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)"},{"key":"723_CR8","unstructured":"Gratton, S., Jerad, S., Toint, P.L.: First-order objective-function-free optimization algorithms and their complexity. arXiv preprint arXiv:2203.01757 (2022)"},{"issue":"4","key":"723_CR9","doi-asserted-by":"publisher","first-page":"2772","DOI":"10.1137\/23M1553455","volume":"33","author":"S Gratton","year":"2023","unstructured":"Gratton, S., Kopani\u010d\u00e1kov\u00e1, A., Toint, P.L.: Multilevel objective-function-free optimization with an application to neural networks training. SIAM J. Optim. 33(4), 2772\u20132800 (2023)","journal-title":"SIAM J. Optim."},{"key":"723_CR10","doi-asserted-by":"crossref","unstructured":"Gratton, S., Jerad, S., Toint, P.L.: Complexity of a class of first-order objective-function-free optimization algorithms. Optim. Methods Softw. pp.\u00a01\u201331 (2024)","DOI":"10.1080\/10556788.2023.2296431"},{"key":"723_CR11","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898719857","volume-title":"Trust Region Methods","author":"AR Conn","year":"2000","unstructured":"Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods. SIAM, Philadelphia (2000)"},{"key":"723_CR12","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1007\/s10107-015-0893-2","volume":"151","author":"Y-X Yuan","year":"2015","unstructured":"Yuan, Y.-X.: Recent advances in trust region algorithms. Math. Program. 151, 249\u2013281 (2015)","journal-title":"Math. Program."},{"issue":"5","key":"723_CR13","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1109\/72.248452","volume":"4","author":"R Reed","year":"1993","unstructured":"Reed, R.: Pruning algorithms-a survey. IEEE Trans. Neural Netw. 4(5), 740\u2013747 (1993)","journal-title":"IEEE Trans. Neural Netw."},{"key":"723_CR14","unstructured":"LeCun, Y., Denker, J., Solla, S.: Optimal brain damage. In: Advances in Neural Information Processing Systems, vol.\u00a02 (1989)"},{"key":"723_CR15","unstructured":"Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878 (2017)"},{"issue":"10","key":"723_CR16","doi-asserted-by":"publisher","first-page":"1943","DOI":"10.1109\/TPAMI.2015.2502579","volume":"38","author":"X Zhang","year":"2015","unstructured":"Zhang, X., Zou, J., He, K., Sun, J.: Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 1943\u20131955 (2015)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"723_CR17","unstructured":"Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, vol.\u00a027 (2014)"},{"key":"723_CR18","doi-asserted-by":"crossref","unstructured":"Yu, X., Liu, T., Wang, X., Tao, D.: On compressing deep models by low rank and sparse decomposition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\u00a07370\u20137379 (2017)","DOI":"10.1109\/CVPR.2017.15"},{"key":"723_CR19","unstructured":"Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830 (2016)"},{"key":"723_CR20","doi-asserted-by":"crossref","unstructured":"Wang, P., Hu, Q., Zhang, Y., Zhang, C., Liu, Y., Cheng, J.: Two-step quantization for low-bit neural networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp.\u00a04376\u20134384 (2018)","DOI":"10.1109\/CVPR.2018.00460"},{"key":"723_CR21","first-page":"20415","volume":"33","author":"J Kim","year":"2020","unstructured":"Kim, J., Yoo, K., Kwak, N.: Position-based scaled gradient for model quantization and pruning. Adv. Neural. Inf. Process. Syst. 33, 20415\u201320426 (2020)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"723_CR22","unstructured":"Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, vol.\u00a028 (2015)"},{"key":"723_CR23","doi-asserted-by":"crossref","unstructured":"Yu, D., Seide, F., Li, G., Deng, L.: Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\u00a04409\u20134412, IEEE (2012)","DOI":"10.1109\/ICASSP.2012.6288897"},{"key":"723_CR24","unstructured":"Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through $$\\ell _0$$ regularization. arXiv preprint arXiv:1712.01312 (2017)"},{"key":"723_CR25","unstructured":"Alvarez, J.M., Salzmann, M.: Compression-aware training of deep networks. In: Advances in neural information processing systems, vol.\u00a030 (2017)"},{"key":"723_CR26","unstructured":"Hu, H.: Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016)"},{"issue":"241","key":"723_CR27","first-page":"1","volume":"22","author":"T Hoefler","year":"2021","unstructured":"Hoefler, T., Alistarh, D., Ben-Nun, T., Dryden, N., Peste, A.: Sparsity in deep learning: pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 22(241), 1\u2013124 (2021)","journal-title":"J. Mach. Learn. Res."},{"key":"723_CR28","unstructured":"Vaskevicius, T., Kanade, V., Rebeschini, P.: Implicit regularization for optimal sparse recovery. In: Advances in Neural Information Processing Systems, vol.\u00a032 (2019)"},{"key":"723_CR29","unstructured":"Li, J., Nguyen, T.V., Hegde, C., Wong, R.K.: Implicit regularization for group sparsity. arXiv preprint arXiv:2301.12540 (2023)"},{"key":"723_CR30","unstructured":"Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNs. In: Advances in neural information processing systems, vol.\u00a029 (2016)"},{"issue":"1","key":"723_CR31","doi-asserted-by":"publisher","first-page":"2383","DOI":"10.1038\/s41467-018-04316-3","volume":"9","author":"DC Mocanu","year":"2018","unstructured":"Mocanu, D.C., Mocanu, E., Stone, P., Nguyen, P.H., Gibescu, M., Liotta, A.: Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat. Commun. 9(1), 2383 (2018)","journal-title":"Nat. Commun."},{"key":"723_CR32","doi-asserted-by":"crossref","unstructured":"He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Soft filter pruning for accelerating deep convolutional neural networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI\u201918, p.\u00a02234\u20132240, AAAI Press (2018)","DOI":"10.24963\/ijcai.2018\/309"},{"key":"723_CR33","unstructured":"Pokutta, S., Spiegel, C., Zimmer, M.: Deep neural network training with Frank-Wolfe. arXiv preprint arXiv:2010.07243 (2020)"},{"key":"723_CR34","unstructured":"Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440 (2016)"},{"key":"723_CR35","unstructured":"Theis, L., Korshunova, I., Tejani, A., Husz\u00e1r, F.: Faster gaze prediction with dense networks and Fisher pruning. arXiv preprint arXiv:1801.05787 (2018)"},{"key":"723_CR36","unstructured":"Lu, M., Luo, X., Chen, T., Chen, W., Liu, D., Wang, Z.: Learning pruning-friendly networks via Frank-Wolfe: One-shot, any-sparsity, and no retraining. In: International Conference on Learning Representations (2022)"},{"key":"723_CR37","unstructured":"Argyriou, A., Foygel, R., Srebro, N.: Sparse prediction with the $$ k $$-support norm. In: Advances in Neural Information Processing Systems, vol.\u00a025 (2012)"},{"key":"723_CR38","doi-asserted-by":"crossref","unstructured":"Rao, N., Dud\u00edk, M., Harchaoui, Z.: The group $$k$$-support norm for learning with structured sparsity. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\u00a02402\u20132406, IEEE (2017)","DOI":"10.1109\/ICASSP.2017.7952587"},{"issue":"1\u20132","key":"723_CR39","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1002\/nav.3800030109","volume":"3","author":"M Frank","year":"1956","unstructured":"Frank, M., Wolfe, P., et al.: An algorithm for quadratic programming. Naval Res Logistics Q. 3(1\u20132), 95\u2013110 (1956)","journal-title":"Naval Res Logistics Q."},{"issue":"5","key":"723_CR40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/0041-5553(66)90114-5","volume":"6","author":"ES Levitin","year":"1966","unstructured":"Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Comput. Math. Math. Phys. 6(5), 1\u201350 (1966)","journal-title":"USSR Comput. Math. Math. Phys."},{"key":"723_CR41","doi-asserted-by":"crossref","unstructured":"Reddi, S.J., Sra, S., P\u00f3czos, B., Smola, A.: Stochastic Frank-Wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp.\u00a01244\u20131251, IEEE (2016)","DOI":"10.1109\/ALLERTON.2016.7852377"},{"key":"723_CR42","unstructured":"Ward, R., Wu, X., Bottou, L.: AdaGrad stepsizes: Sharp convergence over nonconvex landscapes. In: Proceedings of the 36th International Conference on Machine Learning (K.\u00a0Chaudhuri and R.\u00a0Salakhutdinov, eds.), vol.\u00a097 of Proceedings of Machine Learning Research, (Long Beach, California, USA), pp.\u00a06677\u20136686, PMLR (2019)"},{"key":"723_CR43","unstructured":"D\u00e9fossez, A., Bottou, L., Bach, F., Usunier, N.: A simple convergence proof of Adam and Adagrad. Trans. Mach. Learn. Res. (2022)"},{"key":"723_CR44","unstructured":"Wu, X., Ward, R., Bottou, L.: WNGrad: Learn the learning rate in gradient descent. arXiv preprint arXiv:1803.02865 (2018)"},{"key":"723_CR45","doi-asserted-by":"publisher","first-page":"329","DOI":"10.1007\/BF02124750","volume":"5","author":"RM Corless","year":"1996","unstructured":"Corless, R.M., Gonnet, G.H., Hare, D.E., Jeffrey, D.J., Knuth, D.E.: On the Lambert W function. Adv. Comput. Math. 5, 329\u2013359 (1996)","journal-title":"Adv. Comput. Math."},{"key":"723_CR46","unstructured":"van\u00a0den Berg, E., Friedlander, M., Hennenfent, G., Herrmann, F., Saab, R., Y\u0131lmaz, O.: Sparco: A testing framework for sparse reconstruction, Dept. Comput. Sci., Univ. British Columbia, Vancouver, Tech. Rep. TR-2007-20. http:\/\/www.cs.ubc.ca\/labs\/scl\/sparco (2007)"},{"key":"723_CR47","doi-asserted-by":"crossref","unstructured":"Gratton, S., Toint, P.L.: S2MPJ and CUTEst optimization problems for Matlab, Python and Julia. arXiv:2407.07812 (2024)","DOI":"10.1080\/10556788.2025.2490640"},{"issue":"4","key":"723_CR48","doi-asserted-by":"publisher","first-page":"1832","DOI":"10.1137\/090747695","volume":"32","author":"Z Wen","year":"2010","unstructured":"Wen, Z., Yin, W., Goldfarb, D., Zhang, Y.: A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation. SIAM J. Sci. Comput. 32(4), 1832\u20131857 (2010)","journal-title":"SIAM J. Sci. Comput."},{"issue":"3","key":"723_CR49","doi-asserted-by":"publisher","first-page":"565","DOI":"10.1007\/s10589-014-9653-0","volume":"59","author":"M Porcelli","year":"2014","unstructured":"Porcelli, M., Rinaldi, F.: A variable fixing version of the two-block nonlinear constrained Gauss\u2013Seidel algorithm for $$\\ell $$ 1-regularized least-squares. Comput. Optim. Appl. 59(3), 565\u2013589 (2014)","journal-title":"Comput. Optim. Appl."},{"issue":"11","key":"723_CR50","doi-asserted-by":"publisher","first-page":"4311","DOI":"10.1109\/TSP.2006.881199","volume":"54","author":"M Aharon","year":"2006","unstructured":"Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311\u20134322 (2006)","journal-title":"IEEE Trans. Signal Process."},{"key":"723_CR51","unstructured":"Aharon, M., Elad, M., Bruckstein, A.: UCI machine learning repository (2013)"},{"issue":"3","key":"723_CR52","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1961189.1961199","volume":"2","author":"C-C Chang","year":"2011","unstructured":"Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1\u201327 (2011)","journal-title":"ACM Trans. Intell. Syst. Technol."}],"container-title":["Computational Optimization and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10589-025-00723-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10589-025-00723-7","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10589-025-00723-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T14:26:15Z","timestamp":1767968775000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10589-025-00723-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,19]]},"references-count":52,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1]]}},"alternative-id":["723"],"URL":"https:\/\/doi.org\/10.1007\/s10589-025-00723-7","relation":{},"ISSN":["0926-6003","1573-2894"],"issn-type":[{"value":"0926-6003","type":"print"},{"value":"1573-2894","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,19]]},"assertion":[{"value":"11 February 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 July 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 September 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no Conflict of interest.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}