{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:07:09Z","timestamp":1772136429261,"version":"3.50.1"},"reference-count":79,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,5,10]],"date-time":"2025-05-10T00:00:00Z","timestamp":1746835200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,10]],"date-time":"2025-05-10T00:00:00Z","timestamp":1746835200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000065","name":"National Institute of Neurological Disorders and Stroke","doi-asserted-by":"publisher","award":["R00NS114850"],"award-info":[{"award-number":["R00NS114850"]}],"id":[{"id":"10.13039\/100000065","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000065","name":"National Institute of Neurological Disorders and Stroke","doi-asserted-by":"publisher","award":["R00NS114850"],"award-info":[{"award-number":["R00NS114850"]}],"id":[{"id":"10.13039\/100000065","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Large-language models (LLMs) show promise for clinical note information extraction, but deployment challenges include high computational costs and privacy concerns. We used synthetic data distillation to fine-tune smaller, open-source LLMs to achieve performance comparable to larger models while enabling local hardware deployment or reduced cloud costs. Using Llama-3.1-70B-Instruct, we generated synthetic question-answer training pairs to fine-tune smaller Llama models. We evaluated performance across three tasks: synthetic clinical trial criteria, the i2b2 2018 Clinical Trial Eligibility Challenge, and apixaban trial criteria questions. The 8B-parameter model achieved high accuracy across all tasks and sometimes outperformed the 70B-Instruct teacher model. Fine-tuning with only the most challenging questions still improved performance, demonstrating the value of targeted training. Results from 3B- and 1B-parameter models showed a clear size-performance tradeoff. This work demonstrates synthetic data distillation\u2019s potential for enabling scalable clinical information extraction.<\/jats:p>","DOI":"10.1038\/s41746-025-01681-4","type":"journal-article","created":{"date-parts":[[2025,5,10]],"date-time":"2025-05-10T09:17:03Z","timestamp":1746868623000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Synthetic data distillation enables the extraction of clinical information at scale"],"prefix":"10.1038","volume":"8","author":[{"given":"Elizabeth Geena","family":"Woo","sequence":"first","affiliation":[]},{"given":"Michael C.","family":"Burkhart","sequence":"additional","affiliation":[]},{"given":"Emily","family":"Alsentzer","sequence":"additional","affiliation":[]},{"given":"Brett K.","family":"Beaulieu-Jones","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,10]]},"reference":[{"key":"1681_CR1","unstructured":"Goel, A. et al. LLMs Accelerate Annotation for Medical Information Extraction. arXiv [cs.CL] (2023)."},{"key":"1681_CR2","unstructured":"Pangakis, N., Wolken, S. & Fasching, N. Automated annotation with generative AI requires validation. arXiv [cs.CL] (2023)."},{"key":"1681_CR3","doi-asserted-by":"crossref","unstructured":"Agrawal, M., Hegselmann, S., Lang, H., Kim, Y. & Sontag, D. Large language models are few-shot clinical information extractors. arXiv [cs.CL] (2022).","DOI":"10.18653\/v1\/2022.emnlp-main.130"},{"key":"1681_CR4","doi-asserted-by":"crossref","unstructured":"McInerney, D. J., Young, G., van de Meent, J.-W. & Wallace, B. C. CHiLL: Zero-shot custom interpretable feature extraction from clinical notes with large language models. arXiv [cs.CL] (2023).","DOI":"10.18653\/v1\/2023.findings-emnlp.568"},{"key":"1681_CR5","doi-asserted-by":"crossref","unstructured":"He, K. et al. A survey of large language models for Healthcare: From data, technology, and applications to accountability and ethics. arXiv [cs.CL] (2023).","DOI":"10.2139\/ssrn.4809363"},{"key":"1681_CR6","doi-asserted-by":"publisher","first-page":"540","DOI":"10.1136\/amiajnl-2011-000465","volume":"18","author":"WW Chapman","year":"2011","unstructured":"Chapman, W. W. et al. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J. Am. Med. Inform. Assoc.: JAMIA 18, 540\u2013543 (2011).","journal-title":"J. Am. Med. Inform. Assoc.: JAMIA"},{"key":"1681_CR7","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1016\/j.jbi.2017.11.011","volume":"77","author":"Y Wang","year":"2018","unstructured":"Wang, Y. et al. Clinical information extraction applications: A literature review. J. Biomed. Inform. 77, 34\u201349 (2018).","journal-title":"J. Biomed. Inform."},{"key":"1681_CR8","unstructured":"OpenAI, et al. GPT-4 Technical Report. arXiv [cs.CL] (2023)."},{"key":"1681_CR9","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1038\/d41586-023-03803-y","volume":"624","author":"A Toma","year":"2023","unstructured":"Toma, A., Senkaiahliyan, S., Lawler, P. R., Rubin, B. & Wang, B. Generative AI could revolutionize health care - but not if control is ceded to big tech. Nature 624, 36\u201338 (2023).","journal-title":"Nature"},{"key":"1681_CR10","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1186\/s13012-024-01357-9","volume":"19","author":"S Reddy","year":"2024","unstructured":"Reddy, S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implement. Sci. 19, 27 (2024).","journal-title":"Implement. Sci."},{"key":"1681_CR11","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1001\/jama.2023.9651","volume":"330","author":"T Minssen","year":"2023","unstructured":"Minssen, T., Vayena, E. & Cohen, I. G. The challenges for regulating medical use of ChatGPT and other large language models. JAMA 330, 315\u2013316 (2023).","journal-title":"JAMA"},{"key":"1681_CR12","unstructured":"Blogs, M. C. Microsoft and Epic expand AI collaboration to accelerate generative AI\u2019s impact in healthcare, addressing the industry\u2019s most pressing needs. The Official Microsoft Blog https:\/\/blogs.microsoft.com\/blog\/2023\/08\/22\/microsoft-and-epic-expand-ai-collaboration-to-accelerate-generative-ais-impact-in-healthcare-addressing-the-industrys-most-pressing-needs\/ (2023)."},{"key":"1681_CR13","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1038\/s41746-024-01239-w","volume":"7","author":"G Zhang","year":"2024","unstructured":"Zhang, G. et al. Closing the gap between open source and commercial large language models for medical evidence summarization. NPJ Digit. Med. 7, 239 (2024).","journal-title":"NPJ Digit. Med."},{"key":"1681_CR14","unstructured":"Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots. https:\/\/gradio.app\/."},{"key":"1681_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-024-01233-2","volume":"7","author":"IC Wiest","year":"2024","unstructured":"Wiest, I. C. et al. Privacy-preserving large language models for structured medical information retrieval. NPJ Digit. Med. 7, 1\u20139 (2024).","journal-title":"NPJ Digit. Med."},{"key":"1681_CR16","unstructured":"Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. arXiv [stat.ML] (2015)."},{"key":"1681_CR17","unstructured":"Papamakarios, G. Distilling model knowledge. arXiv [stat.ML] (2015)."},{"key":"1681_CR18","first-page":"30675","volume":"14","author":"S Ding","year":"2024","unstructured":"Ding, S., Ye, J., Hu, X. & Zou, N. Distilling the knowledge from large-language model for health event prediction. Health Inform. 14, 30675 (2024).","journal-title":"Health Inform."},{"key":"1681_CR19","doi-asserted-by":"crossref","unstructured":"Li, R., Wang, X. & Yu, H. LlamaCare: An instruction fine-tuned large language model for clinical NLP. In Proc. of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (eds Calzolari, N. et al.) 10632\u201310641 (ELRA and ICCL, Torino, Italia, 2024).","DOI":"10.18653\/v1\/2024.findings-acl.341"},{"key":"1681_CR20","doi-asserted-by":"publisher","unstructured":"Qin, D. et al. Efficient medical image segmentation based on knowledge distillation. arXiv [eess.IV] https:\/\/doi.org\/10.1109\/TMI.2021.3098703 (2021).","DOI":"10.1109\/TMI.2021.3098703"},{"key":"1681_CR21","doi-asserted-by":"publisher","first-page":"4170","DOI":"10.1109\/JBHI.2024.3385098","volume":"28","author":"X Qi","year":"2024","unstructured":"Qi, X. et al. Exploring generalizable distillation for efficient medical image segmentation. IEEE J. Biomed. Health Inform. 28, 4170\u20134183 (2024).","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"1681_CR22","unstructured":"Wang, T., Zhu, J.-Y., Torralba, A. & Efros, A. A. Dataset Distillation. arXiv [cs.LG] (2018)."},{"key":"1681_CR23","unstructured":"Wang, Z., Yu, A. W., Firat, O. & Cao, Y. Towards zero-label language learning. arXiv [cs.CL] (2021)."},{"key":"1681_CR24","unstructured":"Shirgaonkar, A., Pandey, N., Abay, N. C., Aktas, T. & Aski, V. Knowledge distillation using frontier open-source LLMs: Generalizability and the role of synthetic data. arXiv [cs.LG] (2024)."},{"key":"1681_CR25","unstructured":"Yu, P., Xu, J., Weston, J. & Kulikov, I. Distilling System 2 into System 1. arXiv [cs.CL] (2024)."},{"key":"1681_CR26","doi-asserted-by":"crossref","unstructured":"Ding, B. et al. Data augmentation using large language models: Data perspectives, learning paradigms and challenges. arXiv [cs.CL] (2024).","DOI":"10.18653\/v1\/2024.findings-acl.97"},{"key":"1681_CR27","unstructured":"Peng, B., Li, C., He, P., Galley, M. & Gao, J. Instruction Tuning with GPT-4. arXiv [cs.CL] (2023)."},{"key":"1681_CR28","doi-asserted-by":"publisher","unstructured":"Fitzsimmons, L., Frau, F., Bozzi, S., Chandross, K. J. & Beaulieu-Jones, B. K. Characterizing the connection between Parkinson\u2019s disease progression and healthcare utilization. medRxiv 2024.09.15.24313708 https:\/\/doi.org\/10.1101\/2024.09.15.24313708 (2024).","DOI":"10.1101\/2024.09.15.24313708"},{"key":"1681_CR29","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1038\/s41531-024-00667-5","volume":"10","author":"BK Beaulieu-Jones","year":"2024","unstructured":"Beaulieu-Jones, B. K. et al. Disease progression strikingly differs in research and real-world Parkinson\u2019s populations. NPJ Parkinsons Dis. 10, 58 (2024).","journal-title":"NPJ Parkinsons Dis."},{"key":"1681_CR30","doi-asserted-by":"publisher","first-page":"212","DOI":"10.1038\/s41746-023-00957-x","volume":"6","author":"E Alsentzer","year":"2023","unstructured":"Alsentzer, E. et al. Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models. NPJ Digit. Med. 6, 212 (2023).","journal-title":"NPJ Digit. Med."},{"key":"1681_CR31","doi-asserted-by":"crossref","unstructured":"Peikos, G., Symeonidis, S., Kasela, P. & Pasi, G. Utilizing ChatGPT to enhance clinical trial enrollment. arXiv [cs.IR] (2023).","DOI":"10.2139\/ssrn.4492872"},{"key":"1681_CR32","first-page":"1324","volume":"2023","author":"J Yuan","year":"2023","unstructured":"Yuan, J., Tang, R., Jiang, X. & Hu, X. Large language models for healthcare data augmentation: An example on patient-trial matching. AMIA Annu. Symp. Proc. 2023, 1324\u20131333 (2023).","journal-title":"AMIA Annu. Symp. Proc."},{"key":"1681_CR33","unstructured":"Wong, C. et al. Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology. in Machine Learning for Healthcare Conference 846\u2013862 (PMLR, 2023)."},{"key":"1681_CR34","doi-asserted-by":"publisher","first-page":"305","DOI":"10.1038\/s41746-024-01274-7","volume":"7","author":"S Gupta","year":"2024","unstructured":"Gupta, S. et al. PRISM: Patient Records Interpretation for Semantic clinical trial Matching system using large language models. NPJ Digit. Med. 7, 305 (2024).","journal-title":"NPJ Digit. Med."},{"key":"1681_CR35","doi-asserted-by":"crossref","unstructured":"Jin, Q. et al. Matching patients to clinical trials with large language models. arXiv [cs.CL] (2023).","DOI":"10.1038\/s41467-024-53081-z"},{"key":"1681_CR36","doi-asserted-by":"publisher","first-page":"1953","DOI":"10.1093\/jamia\/ocae073","volume":"31","author":"M Nievas","year":"2024","unstructured":"Nievas, M., Basu, A., Wang, Y. & Singh, H. Distilling large language models for matching patients to clinical trials. J. Am. Med. Inform. Assoc. 31, 1953\u20131963 (2024).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1681_CR37","unstructured":"Dettmers, T., et al. vol. 36 10088\u201310115 (Curran Associates, Inc., 2023)."},{"key":"1681_CR38","unstructured":"Terms of use. https:\/\/openai.com\/policies\/row-terms-of-use\/."},{"key":"1681_CR39","unstructured":"Snell, C., Klein, D. & Zhong, R. Learning by distilling context. arXiv [cs.CL] (2022)."},{"key":"1681_CR40","doi-asserted-by":"crossref","unstructured":"Huang, J. et al. Large Language Models can self-improve. arXiv [cs.CL] (2022).","DOI":"10.18653\/v1\/2023.emnlp-main.67"},{"key":"1681_CR41","doi-asserted-by":"crossref","unstructured":"Hsieh, C.-Y. et al. Distilling step-by-step! Outperforming larger language models with less training data and smaller model sizes. arXiv [cs.CL] (2023).","DOI":"10.18653\/v1\/2023.findings-acl.507"},{"key":"1681_CR42","doi-asserted-by":"publisher","first-page":"981","DOI":"10.1056\/NEJMoa1107039","volume":"365","author":"CB Granger","year":"2011","unstructured":"Granger, C. B. et al. Apixaban versus warfarin in patients with atrial fibrillation. N. Engl. J. Med. 365, 981\u2013992 (2011).","journal-title":"N. Engl. J. Med."},{"key":"1681_CR43","unstructured":"Study Details. https:\/\/www.clinicaltrials.gov\/study\/NCT00496769#participation-criteria."},{"key":"1681_CR44","doi-asserted-by":"publisher","first-page":"1163","DOI":"10.1093\/jamia\/ocz163","volume":"26","author":"A Stubbs","year":"2019","unstructured":"Stubbs, A., Filannino, M., Soysal, E., Henry, S. & Uzuner, \u00d6 Cohort selection for clinical trials: n2c2 2018 shared task track 1. J. Am. Med. Inform. Assoc. 26, 1163\u20131171 (2019).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1681_CR45","unstructured":"Llama 3.2: Revolutionizing edge AI and vision with open, customizable models. Meta AI https:\/\/ai.meta.com\/blog\/llama-3-2-connect-2024-vision-edge-mobile-devices\/."},{"key":"1681_CR46","unstructured":"Wang, T. et al. Self-Taught Evaluators. arXiv [cs.CL] (2024)."},{"key":"1681_CR47","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1038\/s41746-025-01475-8","volume":"8","author":"AJ Goodell","year":"2025","unstructured":"Goodell, A. J., Chu, S. N., Rouholiman, D. & Chu, L. F. Large language model agents can use tools to perform clinical calculations. NPJ Digit. Med. 8, 163 (2025).","journal-title":"NPJ Digit. Med."},{"key":"1681_CR48","doi-asserted-by":"publisher","first-page":"2613","DOI":"10.1038\/s41591-024-03097-1","volume":"30","author":"P Hager","year":"2024","unstructured":"Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 30, 2613\u20132622 (2024).","journal-title":"Nat. Med."},{"key":"1681_CR49","unstructured":"Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv [cs.CL] (2020)."},{"key":"1681_CR50","unstructured":"Gao, Y. et al. Retrieval-Augmented Generation for large Language Models: A survey. arXiv [cs.CL] (2023)."},{"key":"1681_CR51","unstructured":"Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv [cs.CL] (2022)."},{"key":"1681_CR52","unstructured":"Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large Language Models are Zero-Shot Reasoners. arXiv [cs.CL] (2022)."},{"key":"1681_CR53","unstructured":"Paul, M., Ganguli, S. & Dziugaite, G. K. Deep Learning on a Data Diet: Finding Important Examples Early in Training. in Advances in Neural Information Processing Systems (eds. Beygelzimer, A., Dauphin, Y., Liang, P. & Vaughan, J. W.) (2021)."},{"key":"1681_CR54","unstructured":"Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S. & Morcos, A. S. Beyond neural scaling laws: beating power law scaling via data pruning. in Advances in Neural Information Processing Systems (eds. Oh, A. H., Agarwal, A., Belgrave, D. & Cho, K.) (2022)."},{"key":"1681_CR55","unstructured":"Yang, Y., Bean, A. M., McCraith, R. & Mahdi, A. Fine-tuning Large Language Models with human-inspired learning strategies in medical question answering. arXiv [cs.CL] (2024)."},{"key":"1681_CR56","doi-asserted-by":"publisher","first-page":"1601","DOI":"10.1016\/S0140-6736(22)00232-X","volume":"399","author":"A Arora","year":"2022","unstructured":"Arora, A. & Arora, A. Synthetic patient data in health care: a widening legal loophole. Lancet 399, 1601\u20131602 (2022).","journal-title":"Lancet"},{"key":"1681_CR57","doi-asserted-by":"publisher","unstructured":"Beduschi, A. Synthetic data protection: Towards a paradigm change in data regulation? Big Data Soc. 11, https:\/\/doi.org\/10.1177\/20539517241231277 (2024).","DOI":"10.1177\/20539517241231277"},{"key":"1681_CR58","unstructured":"Marwala, T. The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development. https:\/\/unu.edu\/publication\/use-synthetic-data-train-ai-models-opportunities-and-risks-sustainable-development (2023)."},{"key":"1681_CR59","doi-asserted-by":"publisher","first-page":"e24164","DOI":"10.1016\/j.heliyon.2024.e24164","volume":"10","author":"B Draghi","year":"2024","unstructured":"Draghi, B., Wang, Z., Myles, P. & Tucker, A. Identifying and handling data bias within primary healthcare data using synthetic data generators. Heliyon 10, e24164 (2024).","journal-title":"Heliyon"},{"key":"1681_CR60","unstructured":"Gallegos, I. O. et al. Bias and fairness in large language models: A survey. arXiv [cs.CL] (2023)."},{"key":"1681_CR61","doi-asserted-by":"crossref","unstructured":"Gupta, U. et al. Mitigating gender bias in distilled language models via counterfactual role reversal. in Findings of the Association for Computational Linguistics: ACL 2022 658\u2013678 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2022).","DOI":"10.18653\/v1\/2022.findings-acl.55"},{"key":"1681_CR62","doi-asserted-by":"crossref","unstructured":"Delobelle, P. & Berendt, B. FairDistillation: Mitigating stereotyping in language models. arXiv [cs.CL] (2022).","DOI":"10.1007\/978-3-031-26390-3_37"},{"key":"1681_CR63","doi-asserted-by":"crossref","unstructured":"Ahn, J., Lee, H., Kim, J. & Oh, A. Why knowledge distillation amplifies gender bias and how to mitigate from the perspective of DistilBERT. in Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) 266\u2013272 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2022).","DOI":"10.18653\/v1\/2022.gebnlp-1.27"},{"key":"1681_CR64","unstructured":"Webster, K. et al. Measuring and reducing gendered correlations in pre-trained models. arXiv [cs.CL] (2020)."},{"key":"1681_CR65","doi-asserted-by":"crossref","unstructured":"Ghanbarzadeh, S., Huang, Y., Palangi, H., Cruz Moreno, R. & Khanpour, H. Gender-tuning: Empowering fine-tuning for debiasing pre-trained language models. in Findings of the Association for Computational Linguistics: ACL 2023 5448\u20135458 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2023).","DOI":"10.18653\/v1\/2023.findings-acl.336"},{"key":"1681_CR66","doi-asserted-by":"publisher","unstructured":"Dixon, L., Li, J., Sorensen, J., Thain, N. & Vasserman, L. Measuring and mitigating unintended bias in text classification. in Proceedings of the 2018 AAAI\/ACM Conference on AI, Ethics, and Society. https:\/\/doi.org\/10.1145\/3278721.3278729 (ACM, New York, NY, USA, 2018).","DOI":"10.1145\/3278721.3278729"},{"key":"1681_CR67","doi-asserted-by":"crossref","unstructured":"Hall Maudslay, R., Gonen, H., Cotterell, R. & Teufel, S. It\u2019s all in the name: Mitigating gender bias with name-based counterfactual data substitution. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 5267\u20135275 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2019).","DOI":"10.18653\/v1\/D19-1530"},{"key":"1681_CR68","unstructured":"Zayed, A. et al. Deep learning on a healthy data diet: Finding important examples for fairness. arXiv [cs.CL] (2022)."},{"key":"1681_CR69","doi-asserted-by":"crossref","unstructured":"Zhang, H., Cisse, M., Dauphin, Y. N. & Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. arXiv [cs.LG] (2017).","DOI":"10.1007\/978-1-4899-7687-1_79"},{"key":"1681_CR70","doi-asserted-by":"crossref","unstructured":"Yu, L., Mao, Y., Wu, J. & Zhou, F. Mixup-based unified framework to overcome gender bias resurgence. in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval 1755\u20131759 (ACM, New York, NY, USA, 2023).","DOI":"10.1145\/3539618.3591938"},{"key":"1681_CR71","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s44401-024-00003-2","volume":"2","author":"R Yang","year":"2025","unstructured":"Yang, R. et al. Retrieval-augmented generation for generative artificial intelligence in health care. npj Health Syst. 2, 1\u20135 (2025).","journal-title":"npj Health Syst."},{"key":"1681_CR72","doi-asserted-by":"publisher","unstructured":"Dubey, A. et al. The Llama 3 herd of models. arXiv [cs.AI] https:\/\/doi.org\/10.48550\/arXiv.2309.03882 (2024).","DOI":"10.48550\/arXiv.2309.03882"},{"key":"1681_CR73","unstructured":"Hu, E. J. et al. LoRA: Low-Rank Adaptation of Large Language Models. in International Conference on Learning Representations (2022)."},{"key":"1681_CR74","unstructured":"Chang, C.-C., Reitter, D., Aksitov, R. & Sung, Y.-H. KL-divergence guided temperature sampling. arXiv [cs.CL] (2023)."},{"key":"1681_CR75","doi-asserted-by":"crossref","unstructured":"Renze, M. & Guven, E. The effect of sampling temperature on problem solving in Large Language Models. arXiv [cs.CL] (2024).","DOI":"10.18653\/v1\/2024.findings-emnlp.432"},{"key":"1681_CR76","unstructured":"Holtzman, A., Buys, J., Du, L., Forbes, M. & Choi, Y. The Curious Case of Neural Text Degeneration. in International Conference on Learning Representations (2020)."},{"key":"1681_CR77","doi-asserted-by":"publisher","first-page":"507","DOI":"10.1136\/jamia.2009.001560","volume":"17","author":"GK Savova","year":"2010","unstructured":"Savova, G. K. et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17, 507\u2013513 (2010).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1681_CR78","doi-asserted-by":"publisher","unstructured":"Johnson, A. et al. MIMIC-IV. PhysioNet https:\/\/doi.org\/10.13026\/HXP0-HG59 (2024).","DOI":"10.13026\/HXP0-HG59"},{"key":"1681_CR79","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-022-01899-x","volume":"10","author":"AEW Johnson","year":"2023","unstructured":"Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1 (2023).","journal-title":"Sci. Data"}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01681-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01681-4","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01681-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,10]],"date-time":"2025-05-10T09:17:23Z","timestamp":1746868643000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01681-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,10]]},"references-count":79,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1681"],"URL":"https:\/\/doi.org\/10.1038\/s41746-025-01681-4","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.09.27.24314517","asserted-by":"object"}]},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,10]]},"assertion":[{"value":"27 September 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 April 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 May 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"267"}}