{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T07:18:28Z","timestamp":1774595908730,"version":"3.50.1"},"reference-count":58,"publisher":"American Association for the Advancement of Science (AAAS)","issue":"27","content-domain":{"domain":["www.science.org"],"crossmark-restriction":true},"short-container-title":["Sci. Adv."],"published-print":{"date-parts":[[2025,7,4]]},"abstract":"<jats:p>Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations, can produce inaccurate information, and reinforce existing biases. Yet, many scientists use them for their scholarly writing. But how widespread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: We study vocabulary changes in more than 15 million biomedical abstracts from 2010 to 2024 indexed by PubMed and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the COVID pandemic.<\/jats:p>","DOI":"10.1126\/sciadv.adt3813","type":"journal-article","created":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T17:59:13Z","timestamp":1751479153000},"update-policy":"https:\/\/doi.org\/10.34133\/aaas_crossmark","source":"Crossref","is-referenced-by-count":60,"title":["Delving into LLM-assisted writing in biomedical publications through excess vocabulary"],"prefix":"10.1126","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5639-7209","authenticated-orcid":true,"given":"Dmitry","family":"Kobak","sequence":"first","affiliation":[{"name":"Hertie Institute for AI in Brain Health, University of T\u00fcbingen, 72076 T\u00fcbingen, Germany."}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-6840-7979","authenticated-orcid":true,"given":"Rita","family":"Gonz\u00e1lez-M\u00e1rquez","sequence":"additional","affiliation":[{"name":"Hertie Institute for AI in Brain Health, University of T\u00fcbingen, 72076 T\u00fcbingen, Germany."}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7709-1172","authenticated-orcid":true,"given":"Em\u0151ke-\u00c1gnes","family":"Horv\u00e1t","sequence":"additional","affiliation":[{"name":"Northwestern University, Evanston, 60208 IL, USA."}]},{"given":"Jan","family":"Lause","sequence":"additional","affiliation":[{"name":"Hertie Institute for AI in Brain Health, University of T\u00fcbingen, 72076 T\u00fcbingen, Germany."}]}],"member":"221","reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsif.2014.0841"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"D. Hall D. Jurafsky C. D. Manning \u201cStudying the history of ideas using topic models \u201d in Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing Honolulu Hawaii October 2008 (Association for Computational Linguistics) pp. 363\u2013371.","DOI":"10.3115\/1613715.1613763"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.3389\/frai.2020.00073"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-023-01637-2"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.adg9714"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1038\/d41586-023-02980-0"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1038\/d41586-023-00107-z"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-023-01744-0"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-023-01730-6"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-023-41032-5"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3571730"},{"key":"e_1_3_2_13_2","unstructured":"Y. Zhang Y. Li L. Cui D. Cai L. Liu T. Fu X. Huang E. Zhao Y. Zhang Y. Chen L. Wang A. T. Luu W. Bi F. Shi S. Shi Siren\u2019s song in the AI ocean: A survey on hallucination in large language models. arXiv:2309.01219 [cs.CL] (2023)."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1002\/leap.1578"},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"T. Lazebnik A. Rosenfeld Detecting LLM-Assisted writing in scientific communication: Are we there yet? arXiv:2401.16807 [cs.IR] (2024).","DOI":"10.2478\/jdis-2024-0020"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.xcrp.2023.101426"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3624725"},{"key":"e_1_3_2_18_2","unstructured":"A. Akram Quantitative analysis of AI-generated texts in academic research: A study of AI presence in Arxiv submissions using AI detection tool. arXiv:2403.13812 [cs.DL] (2024)."},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"H. Cheng B. Sheng A. Lee V. Chaudhary A. G. Atanasov N. Liu Y. Qiu T. Y. Wong Y.-C. Tham Y.-F. Zheng Have AI-generated texts from LLM infiltrated the realm of scientific writing? A large-scale analysis of preprint platforms. bioRxiv 586710 [Preprint] (2024). https:\/\/doi.org\/10.1101\/2024.03.25.586710.","DOI":"10.1101\/2024.03.25.586710"},{"key":"e_1_3_2_20_2","unstructured":"J. Liu Y. Bu Towards the relationship between AIGC in manuscript writing and author profiles: Evidence from preprints in LLMs. arXiv:2404.15799 [cs.DL] (2024)."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-024-05298-0"},{"key":"e_1_3_2_22_2","unstructured":"W. Liang Y. Zhang Z. Wu H. Lepp W. Ji X. Zhao H. Cao S. Liu S. He Z. Huang D. Yang C. Potts C. D. Manning J. Y. Zou Mapping the increasing use of LLMs in scientific papers. arXiv:2404.01268 [cs.CL] (2024)."},{"key":"e_1_3_2_23_2","unstructured":"W. Liang Z. Izzo Y. Zhang H. Lepp H. Cao X. Zhao L. Chen H. Ye S. Liu Z. Huang D. A. McFarland J. Y. Zou \u201cMonitoring AI-modified content at scale: A case study on the impact of ChatGPT on AI conference peer reviews \u201d in Forty-first International Conference on Machine Learning Vienna Austria 21 to 27 July 2024."},{"key":"e_1_3_2_24_2","unstructured":"M. Geng R. Trotta \u201cIs ChatGPT transforming academics\u2019 writing style?\u201d in Next Generation of AI Safety Workshop at ICML 2024 Vienna Austria 26 July 2024."},{"key":"e_1_3_2_25_2","unstructured":"S. Astarita S. Kruk J. Reerink P. G\u00f3mez Delving into the utilisation of ChatGPT in scientific publications in astronomy. arXiv:2406.17324 [cs.CL] (2024)."},{"key":"e_1_3_2_26_2","unstructured":"A. Gray ChatGPT \u201ccontamination\u201d: Estimating the prevalence of LLMs in the scholarly literature. arXiv:2403.16887 [cs.DL] (2024)."},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"K. Matsui Delving into PubMed records: Some terms in medical writing have drastically changed after the arrival of ChatGPT. medRxiv 24307373 [Preprint] (2024). https:\/\/doi.org\/10.1101\/2024.05.14.24307373.","DOI":"10.1101\/2024.05.14.24307373"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1136\/bmj.n1137"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.69336"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-022-05522-2"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2024.100968"},{"key":"e_1_3_2_32_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Van der Maaten L.","year":"2008","unstructured":"L. Van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579\u20132605 (2008).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_33_2","unstructured":"H. Yakura E. Lopez-Lopez L. Brinkmann I. Serna P. Gupta I. Rahwan Empirical evidence of Large Language Model\u2019s influence on human spoken communication. arXiv:2409.01754 [cs.CY] (2024)."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-024-02855-5"},{"key":"e_1_3_2_35_2","first-page":"39","article-title":"Benchmarking large language models for news summarization","volume":"12","author":"Zhang T.","year":"2024","unstructured":"T. Zhang, F. Ladhak, E. Durmus, P. Liang, K. McKeown, T. B. Hashimoto, Benchmarking large language models for news summarization. Trans. Assoc. Comput. Ling. 12, 39\u201357 (2024).","journal-title":"Trans. Assoc. Comput. Ling."},{"key":"e_1_3_2_36_2","doi-asserted-by":"crossref","unstructured":"L. Tang I. Shalyminov A. W.-m. Wong J. Burnsky J. W. Vincent Y. Yang S. Singh S. Feng H. Song H. Su L. Sun Y. Zhang S. Mansour K. McKeown TofuEval: Evaluating hallucinations of LLMs on topic-focused dialogue summarization. arXiv:2402.13249 [cs.CL] (2024).","DOI":"10.18653\/v1\/2024.naacl-long.251"},{"key":"e_1_3_2_37_2","unstructured":"Y. Kim Y. Chang M. Karpinska A. Garimella V. Manjunatha K. Lo T. Goyal M. Iyyer FABLES: Evaluating faithfulness and content selection in book-length summarization. arXiv:2404.01261 [cs.CL] (2024)."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.amjmed.2023.02.011"},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","unstructured":"E. M. Bender T. Gebru A. McMillan-Major S. Shmitchell \u201cOn the dangers of stochastic parrots: Can language models be too big?\u201d in Proceedings of the 2021 ACM Conference on Fairness Accountability and Transparency virtual event 3 to 10 March 2021 (Association for Computing Machinery 2021) pp. 610\u2013623.","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597307"},{"key":"e_1_3_2_41_2","unstructured":"X. Bai A. Wang I. Sucholutsky T. L. Griffiths Measuring implicit bias in explicitly unbiased large language models. arXiv:2402.04105 [cs.CY] (2024)."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-023-01716-4"},{"key":"e_1_3_2_43_2","first-page":"652","article-title":"How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN","volume":"11","author":"McCoy R. T.","year":"2023","unstructured":"R. T. McCoy, P. Smolensky, T. Linzen, J. Gao, A. Celikyilmaz, How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN. Trans. Assoc. Comput. Ling. 11, 652\u2013670 (2023).","journal-title":"Trans. Assoc. Comput. Ling."},{"key":"e_1_3_2_44_2","unstructured":"V. Padmakumar H. He Does writing with language models reduce content diversity? arXiv:2309.05196 [cs.CL] (2023)."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-024-00986-7"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-023-01652-3"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2020.05.011"},{"key":"e_1_3_2_48_2","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive NLP tasks","volume":"33","author":"Lewis P.","year":"2020","unstructured":"P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K\u00fcttler, M. Lewis, W.-t. Yih, T. Rockt\u00e4schel, S. Riedel, D. Kiela, Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst. 33, 9459\u20139474 (2020).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_49_2","unstructured":"S. Borgeaud A. Mensch J. Hoffmann T. Cai E. Rutherford K. Millican G. B. Van Den Driessche J.-B. Lespiau B. Damoc A. Clark D. D. L. Casas A. Guy J. Menick R. Ring T. Hennigan S. Huang L. Maggiore C. Jones A. Cassirer A. Brock M. Paganini G. Irving O. Vinyals S. Osindero K. Simonyan J. Rae E. Elsen L. Sifre \u201cImproving language models by retrieving from trillions of tokens \u201d in International Conference on Machine Learning (PMLR 2022) pp. 2206\u20132240."},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.adj8309"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.adh2762"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.adg7879"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-023-01742-2"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-024-01859-y"},{"key":"e_1_3_2_55_2","first-page":"000223","article-title":"Jane, John, ... Leslie? A historical method for algorithmic gender prediction","volume":"9","author":"Blevins C.","year":"2015","unstructured":"C. Blevins, L. Mullen, Jane, John, ... Leslie? A historical method for algorithmic gender prediction. DHQ Digit. Humanit. Quart. 9, 000223 (2015).","journal-title":"DHQ Digit. Humanit. Quart."},{"key":"e_1_3_2_56_2","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa F.","year":"2011","unstructured":"F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, P. Alexandre, C. David, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825\u20132830 (2011).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_57_2","unstructured":"S. Bird E. Klein E. Loper Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit (O\u2019Reilly Media Inc. 2009)."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458754"},{"key":"e_1_3_2_59_2","unstructured":"Retraction Watch Hindawi shuttering four journals overrun by paper mills (2023). https:\/\/web.archive.org\/web\/20230513163806\/https:\/\/retractionwatch.com\/2023\/05\/02\/hindawi-shuttering-four-journals-overrun-by-paper-mills\/."}],"container-title":["Science Advances"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.science.org\/doi\/pdf\/10.1126\/sciadv.adt3813","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T18:01:20Z","timestamp":1751479280000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.science.org\/doi\/10.1126\/sciadv.adt3813"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,4]]},"references-count":58,"journal-issue":{"issue":"27","published-print":{"date-parts":[[2025,7,4]]}},"alternative-id":["10.1126\/sciadv.adt3813"],"URL":"https:\/\/doi.org\/10.1126\/sciadv.adt3813","relation":{},"ISSN":["2375-2548"],"issn-type":[{"value":"2375-2548","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,4]]},"assertion":[{"value":"2024-09-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-27","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"eadt3813"}}