{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T08:46:01Z","timestamp":1765961161019,"version":"3.48.0"},"reference-count":27,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T00:00:00Z","timestamp":1765929600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>In the growing field of Natural Language Processing (NLP), transformers have become excessively large, pushing the boundaries of both training and inference compute. Given the size and widespread use of these models, there is now a strong emphasis on improving both training and inference efficiency. We propose an approach to reduce the computational requirements of transformers. We specifically tested this approach using BERT for sentiment classification. In particular, we reduced the number of attention heads in the model using the lottery ticket hypothesis and an adapted search strategy from a genetic-based lottery ticket pruning algorithm. This search process removes any need for full-sized model training and additionally reduces the training data by up to 95% through lottery sample selection. We achieve leading results in lossless head pruning with a 70% reduction in heads, and up to a 90% reduction with only a 1% F1 loss allocated. The search process was efficiently performed using 5% of training samples under random selection and was further shown to work with just 0.5% of samples by selecting a diverse set of sample embeddings. Inference time was also improved by up to 47.2%. We plan to generalize this work to Large Language Models (LLMs) and language generation tasks to improve both their training and inference requirements.<\/jats:p>","DOI":"10.3390\/a18120798","type":"journal-article","created":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T08:32:27Z","timestamp":1765960347000},"page":"798","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Genetic-Based Lottery Ticket Pruning for Transformers in Sentiment Classification: Realized Through Lottery Sample Selection"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9777-0711","authenticated-orcid":false,"given":"Ryan","family":"Bluteau","sequence":"first","affiliation":[{"name":"School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0563-0250","authenticated-orcid":false,"given":"Robin","family":"Gras","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada"}]},{"given":"Gabriel","family":"Peralta","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,17]]},"reference":[{"key":"ref_1","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_2","first-page":"24824","article-title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","volume":"35","author":"Wei","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_3","unstructured":"DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., and Wang, P. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv."},{"key":"ref_4","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"400","DOI":"10.3390\/make5020024","article-title":"Lottery Ticket Search on Untrained Models with Applied Lottery Sample Selection","volume":"5","author":"Bluteau","year":"2023","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_6","first-page":"2009","article-title":"Twitter sentiment classification using distant supervision","volume":"1","author":"Go","year":"2009","journal-title":"CS224N Proj. Rep. Stanf."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bluteau, R., and Gras, R. (2025). Improving Sentiment Classification Using 0-Shot Generated Labels for Custom Transformer Embeddings. Eur. J. Artif. Intell., 1\u201313.","DOI":"10.1177\/30504554251326853"},{"key":"ref_8","unstructured":"Frankle, J., and Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks (2018). arXiv."},{"key":"ref_9","unstructured":"Morcos, A.S., Yu, H., Paganini, M., and Tian, Y. (2019). One ticket to win them all: Generalizing lottery ticket initializations across datasets and optimizers. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Girish, S., Maiya, S.R., Gupta, K., Chen, H., Davis, L.S., and Shrivastava, A. (2021, January 20\u201325). The lottery ticket hypothesis for object recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00082"},{"key":"ref_11","first-page":"15834","article-title":"The Lottery Ticket Hypothesis for Pre-trained BERT Networks","volume":"33","author":"Larochelle","year":"2020","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"ref_12","unstructured":"McCarley, J.S., Chakravarti, R., and Sil, A. (2019). Structured Pruning of a BERT-based Question Answering Model. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Prasanna, S., Rogers, A., and Rumshisky, A. (2020). When BERT Plays the Lottery, All Tickets Are Winning. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-main.259"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chen, X., Cheng, Y., Wang, S., Gan, Z., Wang, Z., and Liu, J. (2021). EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets. arXiv.","DOI":"10.18653\/v1\/2021.acl-long.171"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1016\/j.ins.2023.03.122","article-title":"Your lottery ticket is damaged: Towards all-alive pruning for extremely sparse networks","volume":"634","author":"Kim","year":"2023","journal-title":"Inf. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liu, Y., Meng, F., Lin, Z., Fu, P., Cao, Y., Wang, W., and Zhou, J. (2022). Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training. arXiv.","DOI":"10.18653\/v1\/2022.naacl-main.428"},{"key":"ref_17","unstructured":"Gao, Y., Colombo, N., and Wang, W. (2021). Adapting by Pruning: A Case Study on BERT. arXiv."},{"key":"ref_18","unstructured":"Michel, P., Levy, O., and Neubig, G. (2019). Are Sixteen Heads Really Better than One?. arXiv."},{"key":"ref_19","unstructured":"Parnami, A., Singh, R., and Joshi, T. (2021). Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures. arXiv."},{"key":"ref_20","unstructured":"Webber, B., Cohn, T., He, Y., and Liu, Y. (2020, January 16\u201320). Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Qi, F., Liu, Z., Liu, Q., and Sun, M. (2020). Know What You Don\u2019t Need: Single-Shot Meta-Pruning for Attention Heads. arXiv.","DOI":"10.1016\/j.aiopen.2021.05.003"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Li, B., Wang, Z., Huang, S., Bragin, M.A., Li, J., and Ding, C. (2023, January 19\u201325). Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23), Macao, China.","DOI":"10.24963\/ijcai.2023\/568"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.neucom.2020.03.082","article-title":"Network pruning using sparse learning and genetic algorithm","volume":"404","author":"Wang","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1016\/j.neunet.2011.06.003","article-title":"Genetic algorithm pruning of probabilistic neural networks in medical disease estimation","volume":"24","author":"Mantzaris","year":"2011","journal-title":"Neural Netw."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hancock, P.J. (1992). Pruning neural nets by genetic algorithm. Artificial Neural Networks, Elsevier.","DOI":"10.1016\/B978-0-444-89488-5.50036-1"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yang, C., An, Z., Li, C., Diao, B., and Xu, Y. (2019, January 17\u201319). Multi-objective pruning for cnns using genetic algorithm. Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany.","DOI":"10.1007\/978-3-030-30484-3_25"},{"key":"ref_27","unstructured":"Adams, C.J., Borkan, D., Sorensen, J., Dixon, L., Vasserman, L., and Thain, N. (2025, December 01). Jigsaw Unintended Bias in Toxicity Classification. Kaggle. Available online: https:\/\/kaggle.com\/competitions\/jigsaw-unintended-bias-in-toxicity-classification."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/798\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T08:36:58Z","timestamp":1765960618000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/798"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,17]]},"references-count":27,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["a18120798"],"URL":"https:\/\/doi.org\/10.3390\/a18120798","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,17]]}}}