{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T11:04:43Z","timestamp":1773486283851,"version":"3.50.1"},"reference-count":49,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T00:00:00Z","timestamp":1764720000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T00:00:00Z","timestamp":1764720000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Korea Advanced Institute of Science and Technology"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Pattern Anal Applic"],"published-print":{"date-parts":[[2026,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Several pruning methods prune a neural network at initialization. These methods carefully determine the importance of each weight, retaining only the important ones and pruning the others. However, subsequent studies have shown that random pruning can perform similarly, provided that the number of remaining weights within each layer is the same as the number of important weights within the corresponding layer. How can this random pruning, which disregards weight importance, especially within layers, still achieve comparable performance? In this work, we shed light on this question. Specifically, we demonstrate that by simply setting the number of weights to retain within layers in this manner, a large number of important weights tend to remain after pruning, and the importance of the remaining weights tends to be high. These statistical benefits are shown by comparing this random pruning method, which retains weights in a layer equal to the number of important weights in that layer, to another random pruning method that uses a uniform pruning ratio across all layers. With randomness applied, where important weights cannot be selectively distinguished from unimportant ones, the superiority of the former over the latter should be clarified. Theoretical proofs are provided, as well as empirical results from various architectures and datasets.<\/jats:p>","DOI":"10.1007\/s10044-025-01574-y","type":"journal-article","created":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T18:59:53Z","timestamp":1764788393000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Prior knowledge of layer-specific pruning numbers guarantees effective random pruning at initialization"],"prefix":"10.1007","volume":"29","author":[{"given":"Minju","family":"Jung","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sunghyun","family":"Baek","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunho","family":"Jeon","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junmo","family":"Kim","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,12,3]]},"reference":[{"key":"1574_CR1","unstructured":"LeCun Y, Denker J, Solla S (1989) Optimal brain damage. Adv Neural Inf Proc Syst 2"},{"key":"1574_CR2","unstructured":"Hassibi B, Stork D (1992) Second order derivatives for network pruning: optimal brain surgeon. Adv Neural Inf Proc Syst 5"},{"key":"1574_CR3","unstructured":"Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. Adv Neural Inf Proc Syst 28"},{"key":"1574_CR4","unstructured":"Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient dnns. Adv Neural Inf Proc Syst 29"},{"key":"1574_CR5","doi-asserted-by":"crossref","unstructured":"Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2736\u20132744","DOI":"10.1109\/ICCV.2017.298"},{"key":"1574_CR6","unstructured":"Renda A, Frankle J, Carbin M (2020) Comparing rewinding and fine-tuning in neural network pruning. In: International Conference on Learning Representations"},{"key":"1574_CR7","unstructured":"Frankle J, Carbin M (2019) The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International Conference on Learning Representations"},{"key":"1574_CR8","unstructured":"Lee N, Ajanthan T, Torr P (2019) Snip: single-shot network pruning based on connection sensitivity. In: International Conference on Learning Representations"},{"key":"1574_CR9","unstructured":"Wang C, Zhang G, Grosse R (2020) Picking winning tickets before training by preserving gradient flow. In: International Conference on Learning Representations"},{"key":"1574_CR10","unstructured":"Lubana ES, Dick R (2021) A gradient flow framework for analyzing network pruning. In: International Conference on Learning Representations"},{"key":"1574_CR11","first-page":"20390","volume":"33","author":"J Su","year":"2020","unstructured":"Su J, Chen Y, Cai T, Wu T, Gao R, Wang L, Lee JD (2020) Sanity-checking pruning methods: random tickets can win the jackpot. Adv Neural Inf Process Syst 33:20390\u201320401","journal-title":"Adv Neural Inf Process Syst"},{"key":"1574_CR12","unstructured":"Frankle J, Dziugaite GK, Roy D, Carbin M (2021) Pruning neural networks at initialization: why are we missing the mark? In: International Conference on Learning Representations"},{"key":"1574_CR13","unstructured":"Liu S, Chen T, Chen X, Shen L, Mocanu DC, Wang Z, Pechenizkiy M (2021) The unreasonable effectiveness of random pruning: return of the most naive baseline for sparse training. In: International Conference on Learning Representations"},{"key":"1574_CR14","unstructured":"Gadhikar AH, Mukherjee S, Burkholz R (2023) Why random pruning is all we need to start sparse. In: International Conference on Machine Learning, pp 10542\u201310570. PMLR"},{"key":"1574_CR15","unstructured":"Golubeva A, Gur-Ari G, Neyshabur B (2020) Are wider nets better given the same number of parameters? In: International Conference on Learning Representations"},{"key":"1574_CR16","first-page":"6974","volume":"35","author":"X Chang","year":"2021","unstructured":"Chang X, Li Y, Oymak S, Thrampoulidis C (2021) Provable benefits of overparameterization in model compression: from double descent to pruning neural networks. Proc AAAI Conf Artif Intell 35:6974\u20136983","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"1574_CR17","unstructured":"Gadhikar AH, Mukherjee S, Burkholz R (2023) How Erd\u00f6s and R\u00e9nyi Win the Lottery"},{"key":"1574_CR18","doi-asserted-by":"crossref","unstructured":"Ramanujan V, Wortsman M, Kembhavi A, Farhadi A, Rastegari M (2020) What\u2019s hidden in a randomly weighted neural network? In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 11893\u201311902","DOI":"10.1109\/CVPR42600.2020.01191"},{"key":"1574_CR19","unstructured":"Cunha A, Natale E, Viennot L (2022) Proving the lottery ticket hypothesis for convolutional neural networks. In: International Conference on Learning Representations"},{"key":"1574_CR20","unstructured":"Burkholz R (2022) Convolutional and residual networks provably contain lottery tickets. In: International Conference on Machine Learning, pp 2414\u20132433. PMLR"},{"key":"1574_CR21","doi-asserted-by":"publisher","first-page":"18707","DOI":"10.52202\/068431-1359","volume":"35","author":"R Burkholz","year":"2022","unstructured":"Burkholz R (2022) Most activation functions can win the lottery without excessive depth. Adv Neural Inf Process Syst 35:18707\u201318720","journal-title":"Adv Neural Inf Process Syst"},{"key":"1574_CR22","unstructured":"Malach E, Yehudai G, Shalev-Schwartz S, Shamir O (2020) Proving the lottery ticket hypothesis: pruning is all you need. In: International Conference on Machine Learning, pp 6682\u20136691. PMLR"},{"key":"1574_CR23","unstructured":"Liu Z, Sun M, Zhou T, Huang G, Darrell T (2019) Rethinking the value of network pruning. In: International Conference on Learning Representations"},{"issue":"6","key":"1574_CR24","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1109\/MSP.2012.2211477","volume":"29","author":"L Deng","year":"2012","unstructured":"Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29(6):141\u2013142","journal-title":"IEEE Signal Process Mag"},{"key":"1574_CR25","unstructured":"Krizhevsky A, Hinton G et al. (2009) Learning multiple layers of features from tiny images"},{"issue":"7","key":"1574_CR26","first-page":"3","volume":"7","author":"Y Le","year":"2015","unstructured":"Le Y, Yang X (2015) Tiny imagenet visual recognition challenge. CS 231N 7(7):3","journal-title":"CS 231N"},{"key":"1574_CR27","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556"},{"key":"1574_CR28","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"1574_CR29","unstructured":"Hu Z, Huang H (2021) On the random conjugate kernel and neural tangent kernel. In: International Conference on Machine Learning, pp 4359\u20134368. PMLR"},{"key":"1574_CR30","unstructured":"Schoenholz SS, Gilmer J, Ganguli S, Sohl-Dickstein J (2017) Deep information propagation. In: International Conference on Learning Representations"},{"key":"1574_CR31","unstructured":"Yang G, Schoenholz S (2017) Mean field residual networks: on the edge of chaos. Adv Neural Inf Proc Syst 30"},{"key":"1574_CR32","unstructured":"Yang G (2019) Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation. arXiv preprint arXiv:1902.04760"},{"key":"1574_CR33","unstructured":"Hayou S, Doucet A, Rousseau J (2019) On the impact of the activation function on deep neural networks training. In: International Conference on Machine Learning, pp 2672\u20132680. PMLR"},{"key":"1574_CR34","unstructured":"Hayou S, Ton J-F, Doucet A, Teh YW (2021) Robust pruning at initialization. In: International Conference on Learning Representations"},{"issue":"1","key":"1574_CR35","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The J Mach Learn Res 15(1):1929\u20131958","journal-title":"The J Mach Learn Res"},{"key":"1574_CR36","unstructured":"Wager S, Wang S, Liang PS (2013) Dropout training as adaptive regularization. Adv Neural Inf Proc Syst 26"},{"key":"1574_CR37","unstructured":"Gal Y, Ghahramani Z (2016) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp 1050\u20131059. PMLR"},{"issue":"11","key":"1574_CR38","doi-asserted-by":"publisher","first-page":"7421","DOI":"10.1109\/TPAMI.2024.3395484","volume":"46","author":"N Ye","year":"2024","unstructured":"Ye N, Zeng Z, Zhou J, Zhu L, Duan Y, Wu Y, Wu J, Zeng H, Gu Q, Wang X et al (2024) Ood-control: generalizing control in unseen environments. IEEE Trans Pattern Anal Mach Intell 46(11):7421\u20137433","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"9","key":"1574_CR39","doi-asserted-by":"publisher","first-page":"3375","DOI":"10.1007\/s11263-024-02036-4","volume":"132","author":"L Zhu","year":"2024","unstructured":"Zhu L, Yin W, Yang Y, Wu F, Zeng Z, Gu Q, Wang X, Zhou C, Ye N (2024) Vision-language alignment learning under affinity and divergence principles for few-shot out-of-distribution generalization. Int J Comput Vision 132(9):3375\u20133407","journal-title":"Int J Comput Vision"},{"key":"1574_CR40","doi-asserted-by":"crossref","unstructured":"Zhu L, Yin W, Wu F, Gu Q, Wang X, Zhou C, Ye N (2025) Bayes-cal: Robust cross-modal alignment by bayesian approach for few-shot ood generalization. Int J Comput Vision 1\u201334","DOI":"10.1007\/s11263-025-02527-y"},{"key":"1574_CR41","doi-asserted-by":"crossref","unstructured":"Ye N, Li K, Bai H, Yu R, Hong L, Zhou F, Li Z, Zhu J (2022) Ood-bench: quantifying and understanding two dimensions of out-of-distribution generalization. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 7947\u20137958","DOI":"10.1109\/CVPR52688.2022.00779"},{"issue":"1","key":"1574_CR42","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/s44172-023-00074-3","volume":"2","author":"N Ye","year":"2023","unstructured":"Ye N, Cao L, Yang L, Zhang Z, Fang Z, Gu Q, Yang G-Z (2023) Improving the robustness of analog deep neural networks through a bayes-optimized noise injection approach. Commun Eng 2(1):25","journal-title":"Commun Eng"},{"key":"1574_CR43","doi-asserted-by":"publisher","first-page":"52293","DOI":"10.52202\/075280-2276","volume":"36","author":"C Chen","year":"2023","unstructured":"Chen C, Fu Z, Liu K, Chen Z, Tao M, Ye J (2023) Optimal parameter and neuron pruning for out-of-distribution detection. Adv Neural Inf Process Syst 36:52293\u201352311","journal-title":"Adv Neural Inf Process Syst"},{"key":"1574_CR44","unstructured":"CAI R, Li H, Kot A Towards domain generalized pruning by scoring out-of-distribution importance. In: NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications"},{"key":"1574_CR45","unstructured":"Qiao F, Peng X (2024) Ensemble pruning for out-of-distribution generalization. In: Forty-first International Conference on Machine Learning"},{"key":"1574_CR46","unstructured":"Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Icml"},{"key":"1574_CR47","unstructured":"Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp 448\u2013456. PMLR"},{"key":"1574_CR48","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026\u20131034","DOI":"10.1109\/ICCV.2015.123"},{"key":"1574_CR49","unstructured":"Xiao L, Bahri Y, Sohl-Dickstein J, Schoenholz S, Pennington J (2018) Dynamical isometry and a mean field theory of cnns: how to train 10,000-layer vanilla convolutional neural networks. In: International Conference on Machine Learning, pp 5393\u20135402. PMLR"}],"container-title":["Pattern Analysis and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10044-025-01574-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10044-025-01574-y","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10044-025-01574-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T10:38:22Z","timestamp":1773484702000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10044-025-01574-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,3]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,3]]}},"alternative-id":["1574"],"URL":"https:\/\/doi.org\/10.1007\/s10044-025-01574-y","relation":{},"ISSN":["1433-7541","1433-755X"],"issn-type":[{"value":"1433-7541","type":"print"},{"value":"1433-755X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,3]]},"assertion":[{"value":"10 April 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 October 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 December 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no Conflict of interest to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval and consent to participate"}},{"value":"All authors have agreed to the publication.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}],"article-number":"3"}}