{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,15]],"date-time":"2026-02-15T21:11:06Z","timestamp":1771189866792,"version":"3.50.1"},"reference-count":45,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T00:00:00Z","timestamp":1692748800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 Research and Innovation action","award":["101016776"],"award-info":[{"award-number":["101016776"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Transformer models are being increasingly used in end-to-end speech recognition systems for their performance. However, their substantial size poses challenges for deploying them in real-world applications. These models heavily rely on attention and feedforward layers, with the latter containing a vast number of parameters that significantly contribute to the model\u2019s memory footprint. Consequently, it becomes pertinent to consider pruning these layers to reduce the model\u2019s size. In this article, our primary focus is on the feedforward layers. We conduct a comprehensive analysis of their parameter count and distribution. Specifically, we examine the weight distribution within each layer and observe how the weight values progress across the transformer model\u2019s blocks. Our findings demonstrate a correlation between the depth of the feedforward layers and the magnitude of their weights. Consequently, layers with higher weight values require less pruning. Building upon this insight, we propose a novel pruning algorithm based on variable rates. This approach sets the pruning rate according to the significance and location of each feedforward layer within the network. To evaluate our new pruning method, we conduct experiments on various datasets. The results reveal its superiority over conventional pruning techniques, such as local pruning and global pruning.<\/jats:p>","DOI":"10.3390\/a16090398","type":"journal-article","created":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T08:01:21Z","timestamp":1692777681000},"page":"398","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Variable Scale Pruning for Transformer Model Compression in End-to-End Speech Recognition"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0474-3229","authenticated-orcid":false,"given":"Leila","family":"Ben Letaifa","sequence":"first","affiliation":[{"name":"LINEACT, UR-EA 7527, CESI Nancy, 54500 Vand\u0153uvre-l\u00e8s-Nancy, France"},{"name":"LaBRI, CNRS UMR 5800, University of Bordeaux, Bordeaux INP, 33405 Talence, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1933-0504","authenticated-orcid":false,"given":"Jean-Luc","family":"Rouas","sequence":"additional","affiliation":[{"name":"LaBRI, CNRS UMR 5800, University of Bordeaux, Bordeaux INP, 33405 Talence, France"}]}],"member":"1968","published-online":{"date-parts":[[2023,8,23]]},"reference":[{"key":"ref_1","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention Is All You Need. Proceedings of the Advances in NIPS, 2017, Long Beach, CA, USA."},{"key":"ref_2","unstructured":"Han, S., Pool, J., Tran, J., and Dally, W.J. (2015, January 7\u201312). Learning both Weights and Connections for Efficient Neural Networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, USA."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"55939","DOI":"10.1109\/ACCESS.2021.3071485","article-title":"Perceptual Borderline for Balancing Multi-Class Spontaneous Emotional Data","volume":"9","author":"Letaifa","year":"2021","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Soplin, N.E.Y., Heymann, J., Wiesner, M., and Chen, N. (2018, January 2\u20136). Espnet: End-to-End Speech Processing Toolkit. Proceedings of the INTERSPEECH, Hyderabad, India.","DOI":"10.21437\/Interspeech.2018-1456"},{"key":"ref_5","unstructured":"Rabiner, L., and Juang, B.H. (1993). Fundamentals of Speech Recognition, Prentice Hall."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/TASL.2011.2134090","article-title":"Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition","volume":"20","author":"Dahl","year":"2012","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Li, S., Raj, D., Lu, X., Shen, P., Kawahara, T., and Kawai, H. (2019). Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation, INTERSPEECH.","DOI":"10.21437\/Interspeech.2019-2112"},{"key":"ref_8","first-page":"42","article-title":"Embedded Real Time Speech Recognition System for Smart Home Environment","volume":"8","author":"Zouari","year":"2017","journal-title":"Int. J. Sci. Eng. Res."},{"key":"ref_9","unstructured":"Fan, A., Grave, E., and Joulin, A. (2020, January 26\u201330). Reducing Transformer Depth on Demand with Structured Dropout. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Li, Z., Li, H., and Meng, L. (2023). Model Compression for Deep Neural Networks: A Survey. Computers, 12.","DOI":"10.3390\/computers12030060"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3411","DOI":"10.1109\/TSP.2020.2993164","article-title":"Analyzing upper bounds on mean absolute errors for deep neural network-based vector-to-vector regression","volume":"68","author":"Qi","year":"2020","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"633","DOI":"10.1109\/TASLP.2022.3231714","article-title":"Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing","volume":"31","author":"Qi","year":"2023","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_13","unstructured":"Letaifa, L.B., and Rouas, J.L. (September, January 29). Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices. Proceedings of the European Signal Processing Conference, EUSIPCO, Belgrade, Serbia."},{"key":"ref_14","unstructured":"LeCun, Y., Denker, J., and Solla, S. (1989, January 27\u201330). Optimal Brain Damage. Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kim, H.G., Na, H., Lee, H., Lee, J., Kang, T.G., Lee, M.J., and Choi, Y.S. (2019, January 12\u201317). Knowledge Distillation Using Output Errors for Self-attention End-to-end Models. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP, Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682775"},{"key":"ref_16","unstructured":"Noach, M.B., and Goldberg, Y. (2020, January 4\u20137). Compressing Pre-trained Language Models by Matrix Decomposition. Proceedings of the International Joint Conference on Natural Language Processing, Suzhou, China."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lu, Z., Sindhwani, V., and Sainath, T. (2016, January 20\u201325). Learning Compact Recurrent Neural Networks. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Shanghai, China.","DOI":"10.1109\/ICASSP.2016.7472821"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"He, T., Fan, Y., Qian, Y., Tan, T., and Yu, K. (2014, January 4\u20139). Reshaping deep neural network for fast decoding by node-pruning. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP, Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6853595"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Cao, S., Zhang, C., Yao, Z., Xiao, W., Nie, L., Zhan, D., Liu, Y., Wu, M., and Zhang, L. (2019, January 24\u201326). Efficient and effective sparse LSTM on FPGA with Bank-Balanced Sparsity. Proceedings of the Proceedings SIGDA International Symposium on FPGA, Seaside, CA, USA.","DOI":"10.1145\/3289602.3293898"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Chen, S., Sun, W., and Huang, L. (2023, January 4\u201310). WHC: Weighted Hybrid Criterion for Filter Pruning on Convolutional Neural Networks. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP, Rhodes Island, Greece.","DOI":"10.1109\/ICASSP49357.2023.10094874"},{"key":"ref_21","unstructured":"Han, S., Pool, J., Narang, S., Mao, H., Gong, E., Tang, S., Elsen, E., Vajda, P., Paluri, M., and Tran, J. (2017, January 24\u201326). DSD: Dense-sparse-dense training for deep neural networks. Proceedings of the Proceedings ICLR, Toulon, France."},{"key":"ref_22","unstructured":"Bie, A., Venkitesh, B., Monteiro, J., Haidar, M.A., and Rezagholizadeh, M. (2023, August 09). A Simplified Fully Quantized Transformer for End-to-End Speech Recognition. Available online: https:\/\/arxiv.org\/pdf\/1911.03604.pdf."},{"key":"ref_23","unstructured":"Letaifa, L.B., and Rouas, J.L. (2022, January 12\u201314). Fine-grained analysis of the transformer model for efficient pruning. Proceedings of the International Conference on Machine Learning and Applications ICMLA, Nassau, Bahamas."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Peng, Y., Sudo, Y., Muhammad, S., and Watanabe, S. (2023). DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. INTERSPEECH. arXiv.","DOI":"10.21437\/Interspeech.2023-1213"},{"key":"ref_25","first-page":"61","article-title":"Compression of Deep Learning Models for Text: A Survey","volume":"16","author":"Gupta","year":"2020","journal-title":"Comput. Sci. ACM Trans. Knowl. Discov. Data"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1145\/3005348","article-title":"Structured Pruning of Deep Convolutional Neural Networks","volume":"13","author":"Anwar","year":"2017","journal-title":"ACM J. Emerg. Technol. Comput. Syst."},{"key":"ref_27","unstructured":"Voita, E., Talbot, D., Moiseev, F., Sennrich, R., and Titov, I. (August, January 28). Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_28","unstructured":"Michel, P., Levy, O., and Neubig, G. (2019, January 8\u201314). Are Sixteen Heads Really Better than One?. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_29","unstructured":"Han, S., Mao, H., and Dally, W.J. (2016, January 2\u20134). Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. Proceedings of the ICLR, San Juan, Puerto Rico."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Molchanov, P., Mallya, A., Tyree, S., Frosio, I., and Kautz, J. (2019, January 15\u201320). Importance Estimation for Neural Network Pruning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01152"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1162\/tacl_a_00413","article-title":"Compressing Large-Scale Transformer-Based Models: A Case Study on BERT","volume":"9","author":"Ganesh","year":"2021","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_32","unstructured":"Nazli, G., Ankit, J., and Qian, S. (2003). Comparative analysis of sparse matrix algorithms for information retrieval. Computer, 2."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Amirshahi, A., Klein, J.A., Ansaloni, G., and Atienza, D. (2023, January 16\u201319). TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers. Proceedings of the 28th Asia and South Pacific Design Automation Conference, Tokyo, Japan.","DOI":"10.1145\/3566097.3567867"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1109\/PROC.1977.10514","article-title":"A survey of sparse matrix research","volume":"65","author":"Duff","year":"1977","journal-title":"Proc. IEEE"},{"key":"ref_35","unstructured":"Blalock, D., Gonzalez Ortiz, J.J., Frankle, J., and Guttag, J. (2020, January 16\u201318). What is the State of Neural Network Pruning?. Proceedings of the Machine Learning and Systems, Cambridge, MA, USA."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"See, A., Luong, M.T., and Manning, C.D. (2016, January 11\u201312). Compression of Neural Machine Translation Models via Pruning. Proceedings of the SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany.","DOI":"10.18653\/v1\/K16-1029"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Dong, L., Xu, S., and Xu, B. (2018, January 15\u201320). Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8462506"},{"key":"ref_38","unstructured":"Kocabiyikoglu, A.C., Besacier, L., and Kraif, O. (2018, January 7\u201312). Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation. Proceedings of the LREC, Miyazaki, Japan."},{"key":"ref_39","unstructured":"(2023, August 17). Voxforge (Italian). Available online: http:\/\/www.repository.voxforge1.org\/downloads."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Panayotov, V., Chen, G., Povey, D., and Khudanpur, S. (2015, January 19\u201324). LIBRISPEECH: An ASR corpus based in public domain audio books. Proceedings of the ICCASP, South Brisbane, QLD, Australia.","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Ko, T., Peddinti, V., Povey, D., and Khudanpur, S. (2015, January 6\u201310). Audio Augmentation for Speech Recognition. Proceedings of the INTERSPEECH, Dresden, Germany.","DOI":"10.21437\/Interspeech.2015-711"},{"key":"ref_42","unstructured":"Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., and Schwarz, P. (2011, January 11\u201315). The Kaldi Speech Recognition Toolkit. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU. IEEE Signal Processing Society, Waikoloa, HI, USA."},{"key":"ref_43","unstructured":"Kingma, D.P., and Ba, J.L. (2014, January 14\u201316). ADAM: A Method for stochastic Optimization. Proceedings of the International Conference on Learning Representations, Banff, AB, Canada."},{"key":"ref_44","unstructured":"B\u00e9rard, A., Besacier, L., Kocabiyikoglu, A.C., and Pietquin, O. (2019, January 12\u201317). A comparative study on transformer vs. RNN in speech applications. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK."},{"key":"ref_45","unstructured":"Karita, S., Chen, N., Hayashi, T., Hori, T., Inaguma, H., Jiang, Z., Someki, M., Soplin, N., Yamamoto, R., and Wang, X. (2018, January 15\u201320). End-to-End Automatic Speech Translation of Audiobooks. Proceedings of the Automatic Speech Recognition and Understanding ASRU, Calgary, AB, Canada."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/9\/398\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:40:36Z","timestamp":1760128836000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/9\/398"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,23]]},"references-count":45,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,9]]}},"alternative-id":["a16090398"],"URL":"https:\/\/doi.org\/10.3390\/a16090398","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,23]]}}}