{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T17:01:57Z","timestamp":1769187717700,"version":"3.49.0"},"reference-count":70,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T00:00:00Z","timestamp":1769126400000},"content-version":"vor","delay-in-days":7,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,1,16]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Recent advances in large language model (LLM) pruning have shown state-of-the-art (SotA) compression results in post-training and retraining-free settings while maintaining high predictive performance. However, previous research mainly considered calibrating based on English text, despite the multilingual nature of modern LLMs and their frequent use in non-English languages. This analysis paper conducts an in-depth investigation of the performance and internal representation changes associated with pruning multilingual language models for monolingual applications. We present the first comprehensive empirical study, comparing different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques. We further analyze the latent subspaces, pruning masks, and individual neurons within pruned models. Our results reveal that while calibration on the target language effectively retains perplexity and yields high signal-to-noise ratios, it does not consistently improve downstream task performance. Further analysis of internal representations at three different levels highlights broader limitations of current pruning approaches: While they effectively preserve dominant information like language-specific features, this is insufficient to counteract the loss of nuanced, language-agnostic features that are crucial for knowledge retention and reasoning.<\/jats:p>","DOI":"10.1162\/tacl.a.599","type":"journal-article","created":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T15:20:22Z","timestamp":1769181622000},"page":"167-192","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":0,"title":["On the Limitations of Language-targeted Pruning: Investigating the Calibration Language Impact in Multilingual LLM Pruning"],"prefix":"10.1162","volume":"14","author":[{"given":"Simon","family":"Kurz","sequence":"first","affiliation":[{"name":"Department of Computer Science, TU Dortmund University, Germany"},{"name":"Lamarr Institute for Machine Learning and Artificial Intelligence, Germany"}]},{"given":"Jian-Jia","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, TU Dortmund University, Germany"},{"name":"Lamarr Institute for Machine Learning and Artificial Intelligence, Germany"}]},{"given":"Lucie","family":"Flek","sequence":"additional","affiliation":[{"name":"Bonn-Aachen International Center for Information Technology, University of Bonn, Germany"},{"name":"Lamarr Institute for Machine Learning and Artificial Intelligence, Germany"}]},{"given":"Zhixue","family":"Zhao","sequence":"additional","affiliation":[{"name":"Computer Science School, University of Sheffield, United Kingdom. zhixue.zhao@sheffield.ac.uk"}]}],"member":"281","published-online":{"date-parts":[[2026,1,16]]},"reference":[{"key":"2026012310201496900_bib1","unstructured":"Marah I.\n              Abdin\n            , Sam AdeJacobs, Ammar AhmadAwan, JyotiAneja, AhmedAwadallah, HanyAwadalla, NguyenBach, AmitBahree, ArashBakhtiari, Harkirat S.Behl, AlonBenhaim, MishaBilenko, JohanBjorck, S\u00e9bastienBubeck, MartinCai, Caio C\u00e9sar TeodoroMendes, WeizhuChen, VishravChaudhary, ParulChopra, AllieDel Giorno, Gustavode Rosa, MatthewDixon, RonenEldan, DanIter, AmitGarg, AbhishekGoswami, SuriyaGunasekar, EmmanHaider, JunhengHao, Russell J.Hewett, JamieHuynh, MojanJavaheripi, XinJin, PieroKauffmann, NikosKarampatziakis, DongwooKim, MahoudKhademi, LevKurilenko, James R.Lee, Yin TatLee, YuanzhiLi, ChenLiang, WeishungLiu, EricLin, ZeqiLin, PiyushMadan, ArindamMitra, HardikModi, AnhNguyen, BrandonNorick, BarunPatra, DanielPerez-Becker, ThomasPortet, ReidPryzant, HeyangQin, MarkoRadmilac, CorbyRosset, SambudhaRoy, OlatunjiRuwase, OlliSaarikivi, AminSaied, AdilSalim, MichaelSantacroce, ShitalShah, NingShang, HiteshiSharma, XiaSong, MasahiroTanaka, XinWang, RachelWard, GuanhuaWang, PhilippWitte, MichaelWyatt, CanXu, JiahangXu, SonaliYadav, FanYang, ZiyiYang, DonghanYu, ChengruidongZhang, CyrilZhang, JianwenZhang, LiLyna Zhang, YiZhang, YueZhang, YunanZhang, and XirenZhou. 2024. Phi-3 technical report: A highly capable language model locally on your phone. CoRR, abs\/2404.14219."},{"key":"2026012310201496900_bib2","first-page":"242","article-title":"A convergence theory for deep learning via over-parameterization","volume-title":"Proceedings of the 36th International Conference on Machine Learning","author":"Allen-Zhu","year":"2019"},{"key":"2026012310201496900_bib3","article-title":"Aya 23: Open weight releases to further multilingual progress","author":"Aryabumi","year":"2024","journal-title":"arXiv preprint arXiv:2405.15032v2"},{"key":"2026012310201496900_bib4","doi-asserted-by":"publisher","first-page":"18089","DOI":"10.18653\/v1\/2024.emnlp-main.1004","article-title":"Is C4 dataset optimal for pruning? An investigation of calibration data for LLM pruning","volume-title":"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing","author":"Bandari","year":"2024"},{"key":"2026012310201496900_bib5","doi-asserted-by":"publisher","first-page":"749","DOI":"10.18653\/v1\/2024.acl-long.44","article-title":"The Belebele benchmark: A parallel reading comprehension dataset in 122 language variants","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Bandarkar","year":"2024"},{"key":"2026012310201496900_bib6","first-page":"995","article-title":"An empirical investigation of statistical significance in NLP","volume-title":"Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning","author":"Berg-Kirkpatrick","year":"2012"},{"key":"2026012310201496900_bib7","first-page":"1347","article-title":"Monolingual or multilingual instruction tuning: Which makes a better alpaca","volume-title":"Findings of the Association for Computational Linguistics: EACL 2024","author":"Chen","year":"2024"},{"key":"2026012310201496900_bib8","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.emnlp-main.1369","article-title":"M-wanda: Improving one-shot pruning for multilingual LLMs","author":"Choenni","year":"2025","journal-title":"CoRR"},{"key":"2026012310201496900_bib9","doi-asserted-by":"publisher","first-page":"1163","DOI":"10.1162\/tacl_a_00695","article-title":"Investigating hallucinations in pruned large language models for abstractive summarization","volume":"12","author":"Chrysostomou","year":"2024","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2026012310201496900_bib10","article-title":"Think you have solved question answering? Try ARC, the AI2 Reasoning Challenge","author":"Clark","year":"2018","journal-title":"arXiv:1803.05457v1"},{"key":"2026012310201496900_bib11","article-title":"DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning","author":"DeepSeek-AI","year":"2025","journal-title":"arXiv preprint arXiv:2501.12948v1"},{"key":"2026012310201496900_bib12","article-title":"Multilingual jailbreak challenges in large language models","volume-title":"The Twelfth International Conference on Learning Representations","author":"Deng","year":"2024"},{"key":"2026012310201496900_bib13","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-4541-9","volume-title":"An Introduction to the Bootstrap","author":"Efron","year":"1993"},{"key":"2026012310201496900_bib14","article-title":"SparseGPT: Massive language models can be accurately pruned in one-shot","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Frantar","year":"2023"},{"key":"2026012310201496900_bib15","article-title":"OPTQ: Accurate quantization for generative pre-trained transformers","volume-title":"The Eleventh International Conference on Learning Representations","author":"Frantar","year":"2023"},{"key":"2026012310201496900_bib16","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.12608602","article-title":"A framework for few-shot language model evaluation","author":"Gao","year":"2024"},{"key":"2026012310201496900_bib17","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1201\/9781003162810-13","article-title":"A survey of quantization methods for efficient neural network inference","volume-title":"Low-Power Computer Vision","author":"Gholami","year":"2022"},{"key":"2026012310201496900_bib18","article-title":"SlimLLM: Accurate structured pruning for large language models","volume-title":"Forty-second International Conference on Machine Learning","author":"Guo","year":"2025"},{"key":"2026012310201496900_bib19","article-title":"Measuring massive multitask language understanding","volume-title":"International Conference on Learning Representations","author":"Hendrycks","year":"2021"},{"key":"2026012310201496900_bib20","article-title":"Revisiting pruning at initialization through the lens of Ramanujan graph","volume-title":"The Eleventh International Conference on Learning Representations","author":"Hoang","year":"2023"},{"issue":"1","key":"2026012310201496900_bib21","first-page":"1","article-title":"Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks","volume":"22","author":"Hoefler","year":"2021","journal-title":"Journal of Machine Learning Research"},{"key":"2026012310201496900_bib22","first-page":"92","article-title":"Bridging the resource gap: Exploring the efficacy of English and multilingual LLMs for Swedish","volume-title":"Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)","author":"Holmstr\u00f6m","year":"2023"},{"key":"2026012310201496900_bib23","doi-asserted-by":"publisher","first-page":"12365","DOI":"10.18653\/v1\/2023.findings-emnlp.826","article-title":"Not all languages are created equal in LLMs: Improving multilingual capability by cross-lingual-thought prompting","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023","author":"Huang","year":"2023"},{"key":"2026012310201496900_bib24","article-title":"Compressing LLMs: The truth is rarely pure and never simple","volume-title":"The Twelfth International Conference on Learning Representations","author":"Jaiswal","year":"2024"},{"key":"2026012310201496900_bib25","article-title":"Mistral 7b","author":"Jiang","year":"2023","journal-title":"arXiv preprint arXiv:2310.06825"},{"key":"2026012310201496900_bib26","doi-asserted-by":"publisher","first-page":"131","DOI":"10.18653\/v1\/2022.blackboxnlp-1.11","article-title":"Are multilingual sentiment models equally right for the right reasons?","volume-title":"Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP","author":"J\u00f8rgensen","year":"2022"},{"key":"2026012310201496900_bib27","doi-asserted-by":"publisher","first-page":"9921","DOI":"10.18653\/v1\/2024.findings-emnlp.580","article-title":"Pruning multilingual large language models for multilingual inference","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2024","author":"Kim","year":"2024"},{"key":"2026012310201496900_bib28","article-title":"Pruning vs quantization: Which is better?","volume-title":"Thirty-seventh Conference on Neural Information Processing Systems","author":"Kuzmin","year":"2023"},{"key":"2026012310201496900_bib29","first-page":"14651","article-title":"Fp8 quantization: The power of the exponent","volume":"35","author":"Kuzmin","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2026012310201496900_bib30","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1162\/tacl_a_00276","article-title":"Natural questions: A benchmark for question answering research","volume":"7","author":"Kwiatkowski","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2026012310201496900_bib31","doi-asserted-by":"publisher","first-page":"318","DOI":"10.18653\/v1\/2023.emnlp-demo.28","article-title":"Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Lai","year":"2023"},{"key":"2026012310201496900_bib32","article-title":"Optimal brain damage","volume-title":"Advances in Neural Information Processing Systems","author":"LeCun","year":"1989"},{"key":"2026012310201496900_bib33","article-title":"GPTAQ: Efficient finetuning-free quantization for asymmetric calibration","volume-title":"Forty-second International Conference on Machine Learning","author":"Li","year":"2025"},{"key":"2026012310201496900_bib34","first-page":"9908","article-title":"Sparse training via boosting pruning plasticity with neuroregeneration","volume-title":"Proceedings of the 35th International Conference on Neural Information Processing Systems","author":"Liu","year":"2021"},{"key":"2026012310201496900_bib35","doi-asserted-by":"publisher","first-page":"8181","DOI":"10.18653\/v1\/2024.emnlp-main.467","article-title":"VPTQ: Extreme low-bit vector post-training quantization for large language models","volume-title":"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing","author":"Liu","year":"2024"},{"key":"2026012310201496900_bib36","doi-asserted-by":"publisher","first-page":"1389","DOI":"10.1162\/tacl_a_00433","article-title":"MKQA: A linguistically diverse benchmark for multilingual open domain question answering","volume":"9","author":"Longpre","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2026012310201496900_bib37","doi-asserted-by":"publisher","first-page":"5328","DOI":"10.18653\/v1\/2021.acl-long.414","article-title":"Language model evaluation beyond perplexity","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Meister","year":"2021"},{"key":"2026012310201496900_bib38","article-title":"Locating and editing factual associations in GPT","volume-title":"Advances in Neural Information Processing Systems","author":"Meng","year":"2022"},{"key":"2026012310201496900_bib39","article-title":"Mass-editing memory in a transformer","volume-title":"The Eleventh International Conference on Learning Representations","author":"Meng","year":"2023"},{"key":"2026012310201496900_bib40","unstructured":"Meta. 2024. Introducing Meta Llama 3: The most capable openly available LLM to date\u2014 ai.meta.com. https:\/\/ai.meta.com\/blog\/meta-llama-3\/. Accessed 15-07-2024."},{"key":"2026012310201496900_bib41","unstructured":"OpenAI, JoshAchiam, StevenAdler, SandhiniAgarwal, LamaAhmad, IlgeAkkaya, Florencia LeoniAleman, DiogoAlmeida, JankoAltenschmidt, SamAltman, ShyamalAnadkat, RedAvila, IgorBabuschkin, SuchirBalaji, ValerieBalcom, PaulBaltescu, HaimingBao, MohammadBavarian, JeffBelgum, IrwanBello, JakeBerdine, GabrielBernadett-Shapiro, ChristopherBerner, LennyBogdonoff, OlegBoiko, MadelaineBoyd, Anna-LuisaBrakman, GregBrockman, TimBrooks, MilesBrundage, KevinButton, TrevorCai, RosieCampbell, AndrewCann, BrittanyCarey, ChelseaCarlson, RoryCarmichael, BrookeChan, CheChang, FotisChantzis, DerekChen, SullyChen, RubyChen, JasonChen, MarkChen, BenChess, ChesterCho, CaseyChu, Hyung WonChung, DaveCummings, JeremiahCurrier, YunxingDai, CoryDecareaux, ThomasDegry, NoahDeutsch, DamienDeville, ArkaDhar, DavidDohan, SteveDowling, SheilaDunning, AdrienEcoffet, AttyEleti, TynaEloundou, DavidFarhi, LiamFedus, NikoFelix, Sim\u00f3n PosadaFishman, JustonForte, IsabellaFulford, LeoGao, ElieGeorges, ChristianGibson, VikGoel, TarunGogineni, GabrielGoh, RaphaGontijo-Lopes, JonathanGordon, MorganGrafstein, ScottGray, RyanGreene, JoshuaGross, Shixiang ShaneGu, YufeiGuo, ChrisHallacy, JesseHan, JeffHarris, YuchenHe, MikeHeaton, JohannesHeidecke, ChrisHesse, AlanHickey, WadeHickey, PeterHoeschele, BrandonHoughton, KennyHsu, ShengliHu, XinHu, JoostHuizinga, ShantanuJain, ShawnJain, JoanneJang, AngelaJiang, RogerJiang, HaozhunJin, DennyJin, ShinoJomoto, BillieJonn, HeewooJun, TomerKaftan, \u0141ukaszKaiser, AliKamali, IngmarKanitscheider, Nitish ShirishKeskar, TabarakKhan, LoganKilpatrick, Jong WookKim, ChristinaKim, YongjikKim, Jan HendrikKirchner, JamieKiros, MattKnight, DanielKokotajlo, \u0141ukaszKondraciuk, AndrewKondrich, ArisKonstantinidis, KyleKosic, GretchenKrueger, VishalKuo, MichaelLampe, IkaiLan, TeddyLee, JanLeike, JadeLeung, DanielLevy, Chak MingLi, RachelLim, MollyLin, StephanieLin, MateuszLitwin, TheresaLopez, RyanLowe, PatriciaLue, AnnaMakanju, KimMalfacini, SamManning, TodorMarkov, YanivMarkovski, BiancaMartin, KatieMayer, AndrewMayne, BobMcGrew, Scott MayerMcKinney, ChristineMcLeavey, PaulMcMillan, JakeMcNeil, DavidMedina, AalokMehta, JacobMenick, LukeMetz, AndreyMishchenko, PamelaMishkin, VinnieMonaco, EvanMorikawa, DanielMossing, TongMu, MiraMurati, OlegMurk, DavidM\u00e9ly, AshvinNair, ReiichiroNakano, RajeevNayak, ArvindNeelakantan, RichardNgo, HyeonwooNoh, LongOuyang, CullenO\u2019Keefe, JakubPachocki, AlexPaino, JoePalermo, AshleyPantuliano, GiambattistaParascandolo, JoelParish, EmyParparita, AlexPassos, MikhailPavlov, AndrewPeng, AdamPerelman, Filipede Avila Belbute Peres, MichaelPetrov, HenriquePonde de Oliveira Pinto, MichaelPokorny, MichellePokrass, Vitchyr H.Pong, TollyPowell, AletheaPower, BorisPower, ElizabethProehl, RaulPuri, AlecRadford, JackRae, AdityaRamesh, CameronRaymond, FrancisReal, KendraRimbach, CarlRoss, BobRotsted, HenriRoussez, NickRyder, MarioSaltarelli, TedSanders, ShibaniSanturkar, GirishSastry, HeatherSchmidt, DavidSchnurr, JohnSchulman, DanielSelsam, KylaSheppard, TokiSherbakov, JessicaShieh, SarahShoker, PranavShyam, SzymonSidor, EricSigler, MaddieSimens, JordanSitkin, KatarinaSlama, IanSohl, BenjaminSokolowsky, YangSong, NatalieStaudacher, Felipe PetroskiSuch, NatalieSummers, IlyaSutskever, JieTang, NikolasTezak, Madeleine B.Thompson, PhilTillet, AminTootoonchian, ElizabethTseng, PrestonTuggle, NickTurley, JerryTworek, Juan Felipe Cer\u00f3nUribe, AndreaVallone, ArunVijayvergiya, ChelseaVoss, CarrollWainwright, Justin JayWang, AlvinWang, BenWang, JonathanWard, JasonWei, C. J.Weinmann, AkilaWelihinda, PeterWelinder, JiayiWeng, LilianWeng, MattWiethoff, DaveWillner, ClemensWinter, SamuelWolrich, HannahWong, LaurenWorkman, SherwinWu, JeffWu, MichaelWu, KaiXiao, TaoXu, SarahYoo, KevinYu, QimingYuan, WojciechZaremba, RowanZellers, ChongZhang, MarvinZhang, ShengjiaZhao, TianhaoZheng, JuntangZhuang, WilliamZhuk, and BarretZoph. 2024. GPT-4 technical report. arXiv preprint arXiv:2303.08774v6."},{"key":"2026012310201496900_bib42","first-page":"492","article-title":"The roles of English in evaluating multilingual language models","volume-title":"Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa\/Baltic-HLT 2025)","author":"Poelman","year":"2025"},{"issue":"1","key":"2026012310201496900_bib43","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2026012310201496900_bib44","first-page":"135","article-title":"Language-specific pruning for efficient reduction of large language models","volume-title":"Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024","author":"Shamrai","year":"2024"},{"key":"2026012310201496900_bib45","article-title":"Language models are multilingual chain-of-thought reasoners","volume-title":"The Eleventh International Conference on Learning Representations","author":"Shi","year":"2023"},{"key":"2026012310201496900_bib46","doi-asserted-by":"publisher","first-page":"1182","DOI":"10.18653\/v1\/2024.emnlp-main.68","article-title":"Rethinking pruning large language models: Benefits and pitfalls of reconstruction error minimization","volume-title":"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing","author":"Shin","year":"2024"},{"key":"2026012310201496900_bib47","article-title":"A simple and effective pruning approach for large language models","volume-title":"The Twelfth International Conference on Learning Representations","author":"Sun","year":"2024"},{"key":"2026012310201496900_bib48","doi-asserted-by":"publisher","first-page":"5701","DOI":"10.18653\/v1\/2024.acl-long.309","article-title":"Language-specific neurons: The key to multilingual capabilities in large language models","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Tang","year":"2024"},{"key":"2026012310201496900_bib49","article-title":"LLaMA: Open and efficient foundation language models","author":"Touvron","year":"2023","journal-title":"arXiv preprint arXiv:2302.13971v1"},{"key":"2026012310201496900_bib50","doi-asserted-by":"publisher","first-page":"6213","DOI":"10.18653\/v1\/2025.findings-acl.323","article-title":"Blessing of multilinguality: A systematic analysis of multilingual in-context learning","volume-title":"Findings of the Association for Computational Linguistics: ACL 2025","author":"Yilei","year":"2025"},{"key":"2026012310201496900_bib51","article-title":"Revisiting the primacy of english in zero-shot cross-lingual transfer","author":"Turc","year":"2021","journal-title":"CoRR"},{"key":"2026012310201496900_bib52","doi-asserted-by":"publisher","first-page":"12159","DOI":"10.18653\/v1\/2024.findings-acl.724","article-title":"Probing the emergence of cross-lingual alignment during LLM training","volume-title":"Findings of the Association for Computational Linguistics ACL 2024","author":"Wang","year":"2024"},{"key":"2026012310201496900_bib53","doi-asserted-by":"publisher","first-page":"5075","DOI":"10.18653\/v1\/2025.acl-long.253","article-title":"Lost in multilinguality: Dissecting cross-lingual factual inconsistency in transformer language models","volume-title":"Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Wang","year":"2025"},{"key":"2026012310201496900_bib54","article-title":"All languages matter: On the multilingual safety of large language models","author":"Wang","year":"2023","journal-title":"CoRR"},{"key":"2026012310201496900_bib55","doi-asserted-by":"publisher","first-page":"10100","DOI":"10.18653\/v1\/2024.acl-long.544","article-title":"On the impact of calibration data in post-training quantization and pruning","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Williams","year":"2024"},{"key":"2026012310201496900_bib56","doi-asserted-by":"publisher","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"2026012310201496900_bib57","doi-asserted-by":"publisher","first-page":"5617","DOI":"10.18653\/v1\/2022.emnlp-main.379","article-title":"Discovering low-rank subspaces for language-agnostic multilingual representations","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Xie","year":"2022"},{"key":"2026012310201496900_bib58","first-page":"1155","article-title":"Over-parameterization exponentially slows down gradient descent for learning a single neuron","volume-title":"The Thirty Sixth Annual Conference on Learning Theory","author":"Weihang","year":"2023"},{"issue":"11","key":"2026012310201496900_bib59","doi-asserted-by":"publisher","first-page":"1911362","DOI":"10.1007\/s11704-024-40579-4","article-title":"A survey on multilingual large language models: corpora, alignment, and bias","volume":"19","author":"Yuemei","year":"2025","journal-title":"Frontiers of Computer Science"},{"key":"2026012310201496900_bib60","doi-asserted-by":"publisher","first-page":"4321","DOI":"10.18653\/v1\/2025.findings-acl.224","article-title":"Wanda++: Pruning large language models via regional gradients","volume-title":"Findings of the Association for Computational Linguistics: ACL 2025","author":"Yang","year":"2025"},{"key":"2026012310201496900_bib61","first-page":"20838","article-title":"Mest: Accurate and fast memory-economic sparse training framework on the edge","volume":"34","author":"Yuan","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2026012310201496900_bib62","article-title":"Prune once for all: Sparse pre-trained language models","author":"Zafrir","year":"2021","journal-title":"arXiv preprint arXiv:2111.05754v1"},{"key":"2026012310201496900_bib63","doi-asserted-by":"publisher","first-page":"4791","DOI":"10.18653\/v1\/P19-1472","article-title":"HellaSwag: Can a machine really finish your sentence?","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zellers","year":"2019"},{"key":"2026012310201496900_bib64","first-page":"11794","article-title":"Multilingual brain surgeon: Large language models can be compressed leaving no language behind","volume-title":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)","author":"Zeng","year":"2024"},{"key":"2026012310201496900_bib65","article-title":"Understanding deep learning requires rethinking generalization","volume-title":"International Conference on Learning Representations","author":"Zhang","year":"2017"},{"key":"2026012310201496900_bib66","doi-asserted-by":"publisher","DOI":"10.20944\/preprints202310.1487.v2","article-title":"Plug-and-play: An efficient post-training pruning method for large language models","volume-title":"The Twelfth International Conference on Learning Representations","author":"Zhang","year":"2024"},{"key":"2026012310201496900_bib67","doi-asserted-by":"publisher","DOI":"10.20944\/preprints202207.0139.v2","article-title":"Epitopological sparse ultra-deep learning: A brain-network topological theory carves communities in sparse and percolated hyperbolic ANNs","author":"Zhang","year":"2023","journal-title":"Preprints"},{"key":"2026012310201496900_bib68","article-title":"How do large language models handle multilingualism?","volume-title":"The Thirty-eighth Annual Conference on Neural Information Processing Systems","author":"Zhao","year":"2024"},{"key":"2026012310201496900_bib69","doi-asserted-by":"publisher","first-page":"3226","DOI":"10.18653\/v1\/2024.naacl-long.178","article-title":"Comparing explanation faithfulness between multilingual and monolingual fine-tuned language models","volume-title":"Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)","author":"Zhao","year":"2024"},{"key":"2026012310201496900_bib70","doi-asserted-by":"publisher","first-page":"1556","DOI":"10.1162\/tacl_a_00704","article-title":"A survey on model compression for large language models","volume":"12","author":"Zhu","year":"2024","journal-title":"Transactions of the Association for Computational Linguistics"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/TACL.a.599\/2577928\/tacl.a.599.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/TACL.a.599\/2577928\/tacl.a.599.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T15:20:30Z","timestamp":1769181630000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/TACL.a.599\/135000\/On-the-Limitations-of-Language-targeted-Pruning"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,16]]},"references-count":70,"URL":"https:\/\/doi.org\/10.1162\/tacl.a.599","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,16]]}}}