{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:21:58Z","timestamp":1775229718695,"version":"3.50.1"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"8035","license":[{"start":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T00:00:00Z","timestamp":1729641600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T00:00:00Z","timestamp":1729641600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nature"],"published-print":{"date-parts":[[2024,10,24]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Large language models (LLMs) have enabled the generation of high-quality synthetic text, often indistinguishable from human-written content, at a scale that can markedly affect the nature of the information ecosystem<jats:sup>1\u20133<\/jats:sup>. Watermarking can help identify synthetic text and limit accidental or deliberate misuse<jats:sup>4<\/jats:sup>, but has not been adopted in production systems owing to stringent quality, detectability and computational efficiency requirements. Here we describe SynthID-Text, a production-ready text watermarking scheme that preserves text quality and enables high detection accuracy, with minimal latency overhead. SynthID-Text does not affect LLM training and modifies only the sampling procedure; watermark detection is computationally efficient, without using the underlying LLM. To enable watermarking at scale, we develop an algorithm integrating watermarking with speculative sampling, an efficiency technique frequently used in production systems<jats:sup>5<\/jats:sup>. Evaluations across multiple LLMs empirically show that SynthID-Text provides improved detectability over comparable methods, and standard benchmarks and human side-by-side ratings indicate no change in LLM capabilities. To demonstrate the feasibility of watermarking in large-scale-production systems, we conducted a live experiment that assessed feedback from nearly 20\u2009million Gemini<jats:sup>6<\/jats:sup> responses, again confirming the preservation of text quality. We hope that the availability of SynthID-Text<jats:sup>7<\/jats:sup> will facilitate further development of watermarking and responsible use of LLM systems.<\/jats:p>","DOI":"10.1038\/s41586-024-08025-4","type":"journal-article","created":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T16:03:57Z","timestamp":1729699437000},"page":"818-823","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":84,"title":["Scalable watermarking for identifying large language model outputs"],"prefix":"10.1038","volume":"634","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-4937-9903","authenticated-orcid":false,"given":"Sumanth","family":"Dathathri","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3137-6599","authenticated-orcid":false,"given":"Abigail","family":"See","sequence":"additional","affiliation":[]},{"given":"Sumedh","family":"Ghaisas","sequence":"additional","affiliation":[]},{"given":"Po-Sen","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Rob","family":"McAdam","sequence":"additional","affiliation":[]},{"given":"Johannes","family":"Welbl","sequence":"additional","affiliation":[]},{"given":"Vandana","family":"Bachani","sequence":"additional","affiliation":[]},{"given":"Alex","family":"Kaskasoli","sequence":"additional","affiliation":[]},{"given":"Robert","family":"Stanforth","sequence":"additional","affiliation":[]},{"given":"Tatiana","family":"Matejovicova","sequence":"additional","affiliation":[]},{"given":"Jamie","family":"Hayes","sequence":"additional","affiliation":[]},{"given":"Nidhi","family":"Vyas","sequence":"additional","affiliation":[]},{"given":"Majd Al","family":"Merey","sequence":"additional","affiliation":[]},{"given":"Jonah","family":"Brown-Cohen","sequence":"additional","affiliation":[]},{"given":"Rudy","family":"Bunel","sequence":"additional","affiliation":[]},{"given":"Borja","family":"Balle","sequence":"additional","affiliation":[]},{"given":"Taylan","family":"Cemgil","sequence":"additional","affiliation":[]},{"given":"Zahra","family":"Ahmed","sequence":"additional","affiliation":[]},{"given":"Kitty","family":"Stacpoole","sequence":"additional","affiliation":[]},{"given":"Ilia","family":"Shumailov","sequence":"additional","affiliation":[]},{"given":"Ciprian","family":"Baetu","sequence":"additional","affiliation":[]},{"given":"Sven","family":"Gowal","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2812-9917","authenticated-orcid":false,"given":"Demis","family":"Hassabis","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7466-7997","authenticated-orcid":false,"given":"Pushmeet","family":"Kohli","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,10,23]]},"reference":[{"key":"8025_CR1","doi-asserted-by":"publisher","first-page":"106553","DOI":"10.1016\/j.chb.2020.106553","volume":"114","author":"N K\u00f6bis","year":"2021","unstructured":"K\u00f6bis, N. & Mossink, L. D. Artificial intelligence versus Maya Angelou: experimental evidence that people cannot differentiate AI-generated from human-written poetry. Comput. Hum. Behav. 114, 106553 (2021).","journal-title":"Comput. Hum. Behav."},{"key":"8025_CR2","doi-asserted-by":"crossref","unstructured":"Clark, E. et al. All that\u2019s \u2018human\u2019 is not gold: evaluating human evaluation of generated text. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (eds. Zong, C. et al.) 7282\u20137296 (Association for Computational Linguistics, 2021).","DOI":"10.18653\/v1\/2021.acl-long.565"},{"key":"8025_CR3","doi-asserted-by":"publisher","first-page":"2208839120","DOI":"10.1073\/pnas.2208839120","volume":"120","author":"M Jakesch","year":"2023","unstructured":"Jakesch, M., Hancock, J. T. & Naaman, M. Human heuristics for AI-generated language are flawed. Proc. Natl Acad. Sci. USA 120, 2208839120 (2023).","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"8025_CR4","unstructured":"Wu, J. et al. A survey on LLM-generated text detection: necessity, methods, and future directions. Preprint at https:\/\/arxiv.org\/abs\/2310.14724 (2024)."},{"key":"8025_CR5","unstructured":"Chen, C. et al. Accelerating large language model decoding with speculative sampling. Preprint at https:\/\/arxiv.org\/abs\/2302.01318 (2023)."},{"key":"8025_CR6","unstructured":"Team, G. et al. Gemini: a family of highly capable multimodal models. Preprint at https:\/\/arxiv.org\/abs\/2312.11805 (2023)."},{"key":"8025_CR7","unstructured":"SynthID-Team Code and data. GitHub https:\/\/github.com\/google-deepmind\/synthid-text (2024)."},{"key":"8025_CR8","doi-asserted-by":"publisher","first-page":"755","DOI":"10.1038\/s41586-024-07566-y","volume":"631","author":"I Shumailov","year":"2024","unstructured":"Shumailov, I. et al. AI models collapse when trained on recursively generated data. Nature 631, 755\u2013759 (2024).","journal-title":"Nature"},{"key":"8025_CR9","unstructured":"Alemohammad, S. et al. Self-consuming generative models go MAD. In Proc. Twelfth International Conference on Learning Representations (ICLR, 2024)."},{"key":"8025_CR10","unstructured":"Taori, R. & Hashimoto, T. Data feedback loops: model-driven amplification of dataset biases. In Proc. 40th International Conference on Machine Learning 33883\u201333920 (JMLR, 2023)."},{"key":"8025_CR11","doi-asserted-by":"crossref","unstructured":"Wyllie, S., Shumailov, I. & Papernot, N. Fairness feedback loops: training on synthetic data amplifies bias. In Proc. 2024 ACM Conference on Fairness, Accountability, and Transparency 2113\u20132147 (Association for Computing Machinery, 2024).","DOI":"10.1145\/3630106.3659029"},{"key":"8025_CR12","unstructured":"Krishna, K., Song, Y., Karpinska, M., Wieting, J. F. & Iyyer, M. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. In Proc. Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS, 2023)."},{"key":"8025_CR13","unstructured":"Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D. & Finn, C. DetectGPT: zero-shot machine-generated text detection using probability curvature. In Proc. 40th International Conference on Machine Learning 24950\u201324962 (JMLR, 2023)."},{"key":"8025_CR14","doi-asserted-by":"crossref","unstructured":"Verma, V., Fleisig, E., Tomlin, N. & Klein, D. Ghostbuster: detecting text ghostwritten by large language models. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) 1702\u20131717 (Association for Computational Linguistics, 2024).","DOI":"10.18653\/v1\/2024.naacl-long.95"},{"key":"8025_CR15","unstructured":"Hans, A. et al. Spotting LLMs with binoculars: zero-shot detection of machine-generated text. In Proc. 41st International Conference on Machine Learning 17519-17537 (PMLR, 2024)."},{"key":"8025_CR16","doi-asserted-by":"publisher","DOI":"10.1007\/s40979-023-00140-5","volume":"19","author":"AM Elkhatat","year":"2023","unstructured":"Elkhatat, A. M., Elsaid, K. & Almeer, S. Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. Int. J. Educ. Integrity 19, 17 (2023).","journal-title":"Int. J. Educ. Integrity"},{"key":"8025_CR17","doi-asserted-by":"publisher","first-page":"100779","DOI":"10.1016\/j.patter.2023.100779","volume":"4","author":"W Liang","year":"2023","unstructured":"Liang, W., Yuksekgonul, M., Mao, Y., Wu, E. & Zou, J. GPT detectors are biased against non-native English writers. Patterns 4, 100779 (2023).","journal-title":"Patterns"},{"key":"8025_CR18","doi-asserted-by":"publisher","first-page":"8011","DOI":"10.1109\/ACCESS.2018.2796585","volume":"6","author":"NS Kamaruddin","year":"2018","unstructured":"Kamaruddin, N. S., Kamsin, A., Por, L. Y. & Rahman, H. A review of text watermarking: theory, methods, and applications. IEEE Access 6, 8011\u20138028 (2018).","journal-title":"IEEE Access"},{"key":"8025_CR19","unstructured":"Gu, C., Huang, C., Zheng, X., Chang, K.-W. & Hsieh, C.-J. Watermarking pre-trained language models with backdooring. Preprint at https:\/\/arxiv.org\/abs\/2210.07543 (2022)."},{"key":"8025_CR20","unstructured":"SynthID-Team Watermarking AI-generated text and video with SynthID. Google DeepMind Blog https:\/\/deepmind.google\/discover\/blog\/watermarking-ai-generated-text-and-video-with-synthid (2024)."},{"key":"8025_CR21","unstructured":"Piet, J., Sitawarin, C., Fang, V., Mu, N. & Wagner, D. Mark my words: analyzing and evaluating language model watermarks. Preprint at https:\/\/arxiv.org\/abs\/2312.00273 (2023)."},{"key":"8025_CR22","unstructured":"Aaronson, S. & Kirchner, H. Watermarking of large language models. Scott Aaronson https:\/\/www.scottaaronson.com\/talks\/watermark.ppt (2022)."},{"key":"8025_CR23","unstructured":"Kirchenbauer, J. et al. A watermark for large language models. In Proc. 40th International Conference on Machine Learning 17061\u201317084 (PMLR, 2023)."},{"key":"8025_CR24","unstructured":"Kuditipudi, R., Thickstun, J., Hashimoto, T. & Liang, P. Robust distortion-free watermarks for language models. Trans. Mach. Learn. Res. https:\/\/openreview.net\/pdf?id=FpaCL1MO2C (2024)."},{"key":"8025_CR25","unstructured":"Christ, M., Gunn, S. & Zamir, O. Undetectable watermarks for language models. In Proc. Thirty Seventh Conference on Learning Theory 1125\u20131139 (PMLR, 2024)."},{"key":"8025_CR26","unstructured":"Casper, S. et al. Open problems and fundamental limitations of reinforcement learning from human feedback. Trans. Mach. Learn. Res. https:\/\/openreview.net\/pdf?id=bx24KpJ4Eb (2023)."},{"key":"8025_CR27","unstructured":"Hu, Z. et al. Unbiased watermark for large language models. In Proc. Twelfth International Conference on Learning Representations (ICLR, 2024)."},{"key":"8025_CR28","unstructured":"Team, G. et al. Gemma: open models based on Gemini research and technology. Preprint at https:\/\/arxiv.org\/abs\/2403.08295 (2024)."},{"key":"8025_CR29","unstructured":"Jiang, A. Q. et al. Mistral 7B. Preprint at https:\/\/arxiv.org\/abs\/2310.06825 (2023)."},{"key":"8025_CR30","doi-asserted-by":"crossref","unstructured":"Fan, A. et al. ELI5: long form question answering. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 3558\u20133567 (Association for Computational Linguistics, 2019).","DOI":"10.18653\/v1\/P19-1346"},{"key":"8025_CR31","unstructured":"Cloud, G. TPU v5e. Google Cloud https:\/\/cloud.google.com\/tpu\/docs\/v5e-inference (2024)."},{"key":"8025_CR32","unstructured":"Jovanovi\u0107, N., Staab, R. & Vechev, M. Watermark stealing in large language models. In Proc. 41st International Conference on Machine Learning 22570\u201322593 (PMLR, 2024)."},{"key":"8025_CR33","unstructured":"Zhang, H. et al. Watermarks in the sand: impossibility of strong watermarking for language models. In Proc. 41st International Conference on Machine Learning 58851\u201358880 (PMLR, 2024)."},{"key":"8025_CR34","unstructured":"Holtzman, A., Buys, J., Du, L., Forbes, M. & Choi, Y. The curious case of neural text degeneration. In Proc. Eighth International Conference on Learning Representations (ICLR, 2020)."},{"key":"8025_CR35","first-page":"147","volume":"9","author":"DH Ackley","year":"1985","unstructured":"Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. A learning algorithm for Boltzmann machines. Cogn. Sci. 9, 147\u2013169 (1985).","journal-title":"Cogn. Sci."},{"key":"8025_CR36","doi-asserted-by":"crossref","unstructured":"Fan, A., Lewis, M. & Dauphin, Y. Hierarchical neural story generation. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Gurevych, I. & Miyao, Y.) 889\u2013898 (Association for Computational Linguistics, 2018).","DOI":"10.18653\/v1\/P18-1082"}],"container-title":["Nature"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41586-024-08025-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41586-024-08025-4","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41586-024-08025-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T16:07:53Z","timestamp":1729699673000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41586-024-08025-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,23]]},"references-count":36,"journal-issue":{"issue":"8035","published-print":{"date-parts":[[2024,10,24]]}},"alternative-id":["8025"],"URL":"https:\/\/doi.org\/10.1038\/s41586-024-08025-4","relation":{},"ISSN":["0028-0836","1476-4687"],"issn-type":[{"value":"0028-0836","type":"print"},{"value":"1476-4687","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,23]]},"assertion":[{"value":"8 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 October 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Work funded and performed by Google DeepMind, with some collaborators at Google. S.D., A.S., B.B., S. Ghasias, P.K., P.-S. H. and J. W. have filed patent applications EP23162983.3, PCT\/EP2024\/057423 and US18611417, currently pending publication, on behalf of DeepMind Technologies Limited, relating to the SynthID-Text watermarking method.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}