{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T23:50:00Z","timestamp":1773013800651,"version":"3.50.1"},"reference-count":62,"publisher":"Institution of Engineering and Technology (IET)","issue":"1","license":[{"start":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T00:00:00Z","timestamp":1750636800000},"content-version":"vor","delay-in-days":173,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/doi.wiley.com\/10.1002\/tdm_license_1.1"}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Information Security"],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:p>Large language models (LLMs) have evolved significantly, achieving unprecedented linguistic capabilities that underpin a wide range of AI applications. However, they also pose risks and challenges such as ethical concerns, bias and computational sustainability. How to balance the high performance in revolutionising information processing with the risks they pose is critical to their future development. LLM is a type of NLP model and many of the LLM risks are also risks that NLP has experienced in the past. We, therefore, summarise these risks, focusing more on the underlying understanding of these risks\/technical tools, rather than simply describing their occurrence in LLM. In this paper, we first discuss and compare the current state of research on the four main risks in the process of developing LLMs: data, system, pretraining and inference, and then, try to summarise the rationale, complexity, prospects and challenges of the key issues and challenges in each phase. Finally, this review concludes with a discussion of the fundamental issues that should be of most concern and risk and that should be addressed in the early stages of modelling research, including the correlated issues of privacy preservation and countering attacks and model robustness. Based on the LLM research and development (R&amp;D) process perspective, this review summarises the actual risks and provides guidance for research directions, with the aim of helping researchers to identify these risk points and technology directions worth investigating, as well as helping to establish a safe and efficient R&amp;D process.<\/jats:p>","DOI":"10.1049\/ise2\/7358963","type":"journal-article","created":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T00:20:00Z","timestamp":1750724400000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["From Data to Deployment: A Comprehensive Analysis of Risks in Large Language Model Research and Development"],"prefix":"10.1049","volume":"2025","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-1715-1459","authenticated-orcid":false,"given":"Tianshu","family":"Zhang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0058-4088","authenticated-orcid":false,"given":"Ruidan","family":"Su","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0006-1705-6723","authenticated-orcid":false,"given":"Anli","family":"Zhong","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5619-3465","authenticated-orcid":false,"given":"Minwei","family":"Fang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4870-1493","authenticated-orcid":false,"given":"Yu-dong","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"265","published-online":{"date-parts":[[2025,6,23]]},"reference":[{"key":"e_1_2_10_1_2","doi-asserted-by":"crossref","unstructured":"BenderE. M. GebruT. McMillan-MajorA. andShmitchellS. On the Dangers of Stochastic Parrots: Can Language Models be too Big? Proceedings of the 2021 ACM Conference on Fairness Accountability and Transparency 2021 ACM 610\u2013623.","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_2_10_2_2","unstructured":"WeidingerL. MellorJ. F. J. andRauhM. et al.Ethical and Social Risks of Harm From Language Models 2021 ArXiv abs\/2112.04359."},{"key":"e_1_2_10_3_2","unstructured":"CarliniN. LiuC. Erlingsson\u00da. KosJ. andSongD. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks 2018 In USENIX Security Symposium."},{"key":"e_1_2_10_4_2","unstructured":"ShenT. JinR. andHuangY. et al.Large Language Model Alignment: A Survey 2023 ArXiv abs\/2309.15025."},{"key":"e_1_2_10_5_2","unstructured":"WangY. ZhongW. andLiL. et al.Aligning Large Language Models With Human: A Survey 2023 ArXiv 2309 no. 11235."},{"key":"e_1_2_10_6_2","unstructured":"KaddourJ. HarrisJ. MozesM. BradleyH. RaileanuR. andMcHardyR. Challenges and Applications of Large Language Models 2023 ArXiv abs\/2307.10169."},{"key":"e_1_2_10_7_2","unstructured":"WallaceE. ZhaoT. Z. FengS. andSinghS. Concealed Data Poisoning Attacks on NLP Models 2020 arXiv preprint arXiv: 2010.12563."},{"key":"e_1_2_10_8_2","unstructured":"SteinhardtJ. KohP. Wei andLiangP. S. Certified Defenses for Data Poisoning Attacks 2017 Advances in neural information processing systems."},{"key":"e_1_2_10_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-021-06119-y"},{"key":"e_1_2_10_10_2","unstructured":"LiJ. Z. Principled Approaches to Robust Machine Learning and Beyond 2018."},{"key":"e_1_2_10_11_2","unstructured":"SteinhardtJ. Robust Learning: Information Theory and Algorithms 2018."},{"key":"e_1_2_10_12_2","unstructured":"BowenD. MurphyB. CaiW. KhachaturovD. GleaveA. andPelrineK. Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws 2024."},{"key":"e_1_2_10_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/11787006_1"},{"key":"e_1_2_10_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/11681878_14"},{"key":"e_1_2_10_15_2","unstructured":"OhtaS.andNishioT. \u039b-Split: A Privacy- Preserving Split Computing Framework for Cloud-Powered Generative AI 2023 arXiv preprint arXiv: 2310.14651."},{"key":"e_1_2_10_16_2","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.4404340"},{"key":"e_1_2_10_17_2","unstructured":"VyasN. KakadeS. andBarakB. Provable Copyright Protection for Generative Models 2023 arXiv preprint arXiv: 2302.10870."},{"key":"e_1_2_10_18_2","doi-asserted-by":"crossref","unstructured":"DingJ. LiX. andGudivadaV. N. Augmentation and Evaluation of Training Data for Deep Learning 2017 IEEE International Conference on Big Data (Big Data) 2017 IEEE 2603\u20132611.","DOI":"10.1109\/BigData.2017.8258220"},{"key":"e_1_2_10_19_2","doi-asserted-by":"crossref","unstructured":"MishraN. SahuG. CalixtoI. Abu-HannaA. andLaradjiI. H. LLM Aided Semisupervision for Extractive Dialog Summarization 2023 arXiv preprint arXiv:2311.11462.","DOI":"10.18653\/v1\/2023.findings-emnlp.670"},{"key":"e_1_2_10_20_2","unstructured":"RafailovR. SharmaA. MitchellE. ErmonS. ManningC. D. andFinnC. Direct Preference Optimization: Your Language Model is Secretly a Reward Model 2023 arXiv preprint arXiv: 2305.18290."},{"key":"e_1_2_10_21_2","doi-asserted-by":"crossref","unstructured":"ZhouD. WangK. andGuJ. et al.Dataset Quantization 2023 IEEE\/CVF International Conference on Computer Vision (ICCV) 2023 IEEE 17159\u201317170.","DOI":"10.1109\/ICCV51070.2023.01578"},{"key":"e_1_2_10_22_2","unstructured":"GriggsT. LiuX. andYuJ. et al.M\u00e9lange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity 2024 arXiv preprint arXiv: 2404.14527."},{"key":"e_1_2_10_23_2","unstructured":"IbrahimA. Th\u00e9rienB. andGuptaK. et al.Simple and Scalable Strategies to Continually Pre-Train Large Language Models 2024 arXiv preprint arXiv: 2403.08763."},{"key":"e_1_2_10_24_2","unstructured":"JiangZ. LinH. andZhongY. et al.Megascale: Scaling Large Language Model Training to More Than 10 000 GPUs 2024 ArXiv abs\/2402.15627."},{"key":"e_1_2_10_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSAC.2023.3280970"},{"key":"e_1_2_10_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3064966"},{"key":"e_1_2_10_27_2","doi-asserted-by":"crossref","unstructured":"SaxenaV. JayaramK. R. BasuS. SabharwalY. andVermaA. Effective Elastic Scaling of Deep Learning Workloads 2020 28th International Symposium on Modeling Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) 2020 IEEE 1\u20138.","DOI":"10.1109\/MASCOTS50786.2020.9285954"},{"key":"e_1_2_10_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2640087.2644155"},{"key":"e_1_2_10_29_2","doi-asserted-by":"crossref","unstructured":"RajbhandariS. RuwaseO. RasleyJ. SmithS. andHeY. Zero-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning Proceedings of the International Conference for High Performance Computing Networking Storage and Analysis 2021 ACM 1\u201314.","DOI":"10.1145\/3458817.3476205"},{"key":"e_1_2_10_30_2","unstructured":"LiD. ShaoR. WangH. GuoH. XingE. P. andZhangH. Mpcformer: Fast Performant and Private Transformer Inference With Mpc 2022 arXiv preprint arXiv: 2211.01452."},{"key":"e_1_2_10_31_2","unstructured":"DongY. LuW. J. andZhengY. et al.Puma: Secure Inference of LLaMA-7B in Five Minutes 2023 ArXiv abs\/2307.12533."},{"key":"e_1_2_10_32_2","doi-asserted-by":"crossref","unstructured":"ParkJ. QuanC. MoonH. andLeeJ. Hyperdimensional Computing as a Rescue for Efficient Privacy-Preserving Machine Learning-as- a-Service 2023 IEEE\/ACM International Conference on Computer Aided Design (ICCAD) 2023 IEEE 1\u20138.","DOI":"10.1109\/ICCAD57390.2023.10323815"},{"key":"e_1_2_10_33_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.adhoc.2019.101937"},{"key":"e_1_2_10_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TETCI.2017.2764109"},{"key":"e_1_2_10_35_2","doi-asserted-by":"crossref","unstructured":"MelisL. SongC. De CristofaroE. andShmatikovV. Exploiting Unintended Feature Leakage in Collaborative Learning 2019 IEEE Symposium on Security and Privacy (SP) 2018 IEEE 691\u2013706.","DOI":"10.1109\/SP.2019.00029"},{"key":"e_1_2_10_36_2","unstructured":"ZhuL. LiuZ. andHanS. Deep Leakage From Gradients 2019 Neural Information Processing Systems."},{"key":"e_1_2_10_37_2","doi-asserted-by":"crossref","unstructured":"DengJ. WangY. andLiJ. et al.TAG: Gradient Attack on Transformer-Based Language Models Conference on Empirical Methods in Natural Language Processing 2021 Association for Computational Linguistics 3600\u20133610.","DOI":"10.18653\/v1\/2021.findings-emnlp.305"},{"key":"e_1_2_10_38_2","unstructured":"ChristouN. DiJ. AtlidakisV. RayB. andKemerlisV. P. {IvySyn}: Automated Vulnerability Discovery in Deep Learning Frameworks 32nd USENIX Security Symposium (USENIX Security 23) 2023 USENIX 2383\u20132400."},{"key":"e_1_2_10_39_2","doi-asserted-by":"crossref","unstructured":"ShiJ. XiaoY. andLiY. et al.Acetest: Automated Constraint Extraction for Testing Deep Learning Operators Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis 2023 ACM 690\u2013702.","DOI":"10.1145\/3597926.3598088"},{"key":"e_1_2_10_40_2","doi-asserted-by":"crossref","unstructured":"JustusD. BrennanJ. BonnerS. andMcGoughA. S. Predicting the Computational Cost of Deep Learning Models 2018 IEEE International Conference on Big Data (Big Data) 2018 IEEE 3873\u20133882.","DOI":"10.1109\/BigData.2018.8622396"},{"key":"e_1_2_10_41_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics11152316"},{"key":"e_1_2_10_42_2","unstructured":"LinJ. ZhaoH. ZhangA. WuY. PingH.-U. andChenQ. Agentsims: An Open-Source Sandbox for Large Language Model Evaluation 2023 arXiv preprint arXiv: 2308.04026."},{"key":"e_1_2_10_43_2","unstructured":"MoL. WangB. ChenM. andSunH. How Trustworthy are Open-Source LLMS? An Assessment Under Malicious Demonstrations Shows Their Vulnerabilities 2023 arXiv preprint arXiv: 2311.09447."},{"key":"e_1_2_10_44_2","unstructured":"TaoriR. GulrajaniI. andZhangT. et al.Stanford Alpaca: An Instruction- Following Llama Model 2023."},{"key":"e_1_2_10_45_2","unstructured":"OuyangL. WuJ. andJiangX. et al.Training Language Models to Follow Instructions With Human Feedback 2022 ArXiv abs\/2203.02155."},{"key":"e_1_2_10_46_2","unstructured":"WangG. ChengS. ZhanX. LiX. SongS. andLiuY. Openchat: Advancing Open- Source Language Models With Mixed-Quality Data 2023 ArXiv abs\/2309.11235."},{"key":"e_1_2_10_47_2","unstructured":"CaoB. CaoY. LinL. andChenJ. Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM 2023 ArXiv abs\/2309.14348."},{"key":"e_1_2_10_48_2","unstructured":"ZhuS. ZhangR. andAnB. et al.Autodan: Interpretable Gradient-Based Adversarial Attacks on Large Language Models 2023."},{"key":"e_1_2_10_49_2","unstructured":"ZhangC. WangZ. MangalR. FredriksonM. JiaL. andPasareanuC. S. Transfer Attacks and Defenses for Large Language Models on Coding Tasks 2023 ArXiv abs\/2311.13445."},{"key":"e_1_2_10_50_2","unstructured":"YaoJ.-Y. NingK.-P. LiuZ.-H. NingM.-N. andYuanL. LLM Lies: Hallucinations Are Not Bugs but Features as Adversarial Examples 2023 arXiv preprint arXiv: 2310.01469."},{"key":"e_1_2_10_51_2","unstructured":"RobeyA. WongE. HassaniH. andPappasG. J. SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks 2023 ArXiv abs\/2310.03684."},{"key":"e_1_2_10_52_2","unstructured":"LapidR. LangbergR. andSipperM. Open Sesame! Universal Black-Box Jailbreaking of Large Language Models 2023 arXiv preprint arXiv: 2309.01446."},{"key":"e_1_2_10_53_2","unstructured":"KrishnaK. TomarG. S. ParikhA. P. PapernotN. andIyyerM. Thieves on Sesame Street! Model Extraction of Bert-Based Apis 2019 ArXiv abs\/1910.12366."},{"key":"e_1_2_10_54_2","unstructured":"KandpalN. PillutlaK. OpreaA. KairouzP. Choquette-ChooC. A. andXuZ. User Inference Attacks on Large Language Models 2023 ArXiv abs\/2310.09266."},{"key":"e_1_2_10_55_2","doi-asserted-by":"crossref","unstructured":"SongL. ShokriR. andMittalP. Privacy Risks of Securing Machine Learning Models Against Adversarial Examples Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security 2019 ACM 241\u2013257.","DOI":"10.1145\/3319535.3354211"},{"key":"e_1_2_10_56_2","unstructured":"JalalzaiH. KadocheE. LelucR. andPlassierV. Membership Inference Attacks via Adversarial Examples 2022 arXiv preprint arXiv: 2207.13572."},{"key":"e_1_2_10_57_2","unstructured":"LiuY. JiaY. GengR. JiaJ. andGongN. Z. Prompt Injection Attacks and Defenses in LLM-Integrated Applications 2023 arXiv preprint arXiv: 2310.12815."},{"key":"e_1_2_10_58_2","unstructured":"ChaiY.andLiW. Towards Deep Learning Interpretability: A Topic Modeling Approach 2019 In International Conference on Interaction Sciences."},{"key":"e_1_2_10_59_2","unstructured":"RigottiM. MiksovicC. GiurgiuI. GschwindT. andScottonP. Attention-Based Interpretability With Concept Transformers 2022 International Conference on Learning Representations."},{"key":"e_1_2_10_60_2","unstructured":"YaoY. WangP. andTianB. et al.Editing Large Language Models: Problems Meth- Ods and Opportunities 2023 ArXiv abs\/2305.13172."},{"key":"e_1_2_10_61_2","first-page":"17359","volume-title":"Neural Information Processing Systems","author":"Meng K.","year":"2022"},{"key":"e_1_2_10_62_2","unstructured":"MengK. SharmaA. AndonianA. BelinkovY. andBauD. Mass-Editing Memory in a Transformer 2022 ArXiv abs\/2210.07229."}],"container-title":["IET Information Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/ise2\/7358963","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/full-xml\/10.1049\/ise2\/7358963","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/ise2\/7358963","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T22:35:12Z","timestamp":1773009312000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/ise2\/7358963"}},"subtitle":[],"editor":[{"given":"Jiwei","family":"Tian","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,1]]},"references-count":62,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["10.1049\/ise2\/7358963"],"URL":"https:\/\/doi.org\/10.1049\/ise2\/7358963","archive":["Portico"],"relation":{},"ISSN":["1751-8709","1751-8717"],"issn-type":[{"value":"1751-8709","type":"print"},{"value":"1751-8717","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1]]},"assertion":[{"value":"2024-12-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-23","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"7358963"}}