{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:30:24Z","timestamp":1760059824160,"version":"build-2065373602"},"reference-count":32,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T00:00:00Z","timestamp":1752451200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"College-level Characteristic Teaching Material Project","award":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"],"award-info":[{"award-number":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"]}]},{"name":"College Teaching Incubation Project","award":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"],"award-info":[{"award-number":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"]}]},{"name":"Ministry of Education Industry-University Cooperation Collaborative Education Project","award":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"],"award-info":[{"award-number":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"]}]},{"name":"Central Universities Basic Scientific Research Fund","award":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"],"award-info":[{"award-number":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"]}]},{"name":"Beijing Higher Education \u201cUndergraduate Teaching Reform and Innovation Project\u201d","award":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"],"award-info":[{"award-number":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"]}]},{"name":"College Discipline Construction Project","award":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"],"award-info":[{"award-number":["20220119Z0221","20220120Z0220","20220163H0211","3282024009","20230051Z0114","20230050Z0114","20220121Z0208","202110018002","20230007Z0452","20230010Z0452"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>The widespread deployment of large language models (LLMs) has raised urgent demands for verifiable content attribution and misuse mitigation. Existing text watermarking techniques often struggle in black-box or sampling-based scenarios due to limitations in robustness, imperceptibility, and detection generality. These challenges are particularly critical in open-access settings, where model internals and generation logits are unavailable for attribution. To address these limitations, we propose CWS (Contrastive Watermarking with Semantic Modeling)\u2014a novel keyless watermarking framework that integrates contrastive semantic token selection and shared embedding space alignment. CWS enables context-aware, fluent watermark embedding while supporting robust detection via a dual-branch mechanism: a lightweight z-score statistical test for public verification and a GRU-based semantic decoder for black-box adversarial robustness. Experiments on GPT-2, OPT-1.3B, and LLaMA-7B over C4 and DBpedia datasets demonstrate that CWS achieves F1 scores up to 99.9% and maintains F1 \u2265 93% under semantic rewriting, token substitution, and lossy compression (\u03b5 \u2264 0.25, \u03b4 \u2264 0.2). The GRU-based detector offers a superior speed\u2013accuracy trade-off (0.42 s\/sample) over LSTM and Transformer baselines. These results highlight CWS as a lightweight, black-box-compatible, and semantically robust watermarking method suitable for practical content attribution across LLM architectures and decoding strategies. Furthermore, CWS maintains a symmetrical architecture between embedding and detection stages via shared semantic representations, ensuring structural consistency and robustness. This semantic symmetry helps preserve detection reliability across diverse decoding strategies and adversarial conditions.<\/jats:p>","DOI":"10.3390\/sym17071124","type":"journal-article","created":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T09:56:54Z","timestamp":1752487014000},"page":"1124","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Contrastive Semantic Watermarking Framework for Large Language Models"],"prefix":"10.3390","volume":"17","author":[{"given":"Jianxin","family":"Wang","sequence":"first","affiliation":[{"name":"Department of Electronics and Communication Engineering, Beijing Electronic Science and Technology Institute, Beijing 100070, China"}]},{"given":"Xiangze","family":"Chang","sequence":"additional","affiliation":[{"name":"Department of Electronics and Communication Engineering, Beijing Electronic Science and Technology Institute, Beijing 100070, China"}]},{"given":"Chaoen","family":"Xiao","sequence":"additional","affiliation":[{"name":"Department of Electronics and Communication Engineering, Beijing Electronic Science and Technology Institute, Beijing 100070, China"}]},{"given":"Lei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Electronics and Communication Engineering, Beijing Electronic Science and Technology Institute, Beijing 100070, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,7,14]]},"reference":[{"key":"ref_1","unstructured":"OpenAI (2023). GPT-4 Technical Report, OpenAI."},{"key":"ref_2","first-page":"1","article-title":"PaLM: Scaling Language Models with Pathways","volume":"24","author":"Chowdhery","year":"2023","journal-title":"J. Mach. Learn. Res."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"142733","DOI":"10.1109\/ACCESS.2024.3468368","article-title":"A comprehensive review on generative ai for education","volume":"12","author":"Mittal","year":"2024","journal-title":"IEEE Access"},{"key":"ref_4","first-page":"354","article-title":"Combating misinformation in the age of llms: Opportunities and challenges","volume":"45","author":"Chen","year":"2024","journal-title":"AI Mag."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1162\/coli_a_00549","article-title":"A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions","volume":"51","author":"Wu","year":"2025","journal-title":"Comput. Linguist."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"100988","DOI":"10.1016\/j.patter.2024.100988","article-title":"AI deception: A survey of examples, risks, and potential solutions","volume":"5","author":"Park","year":"2024","journal-title":"Patterns"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1930","DOI":"10.1038\/s41591-023-02448-8","article-title":"Large language models in medicine","volume":"29","author":"Thirunavukarasu","year":"2023","journal-title":"Nat. Med."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Hacker, P., Engel, A., and Mauer, M. (2023, January 12\u201315). Regulating ChatGPT and other large generative AI models. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Chicago, IL, USA.","DOI":"10.1145\/3593013.3594067"},{"key":"ref_9","first-page":"1","article-title":"A survey of text watermarking in the era of large language models","volume":"57","author":"Liu","year":"2024","journal-title":"ACM Comput. Surv."},{"key":"ref_10","unstructured":"Liu, A., Pan, L., Hu, X., Li, S., Wen, L., King, I., and Yu, P.S. (2024, January 7\u201311). An unforgeable publicly verifiable watermark for large language models. Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria."},{"key":"ref_11","unstructured":"Huo, M., Somayajula, S.A., Liang, Y., Zhang, R., Koushanfar, F., and Xie, P. (2024, January 7\u201311). Token-specific watermarking with enhanced detectability and semantic coherence for large language models. Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Rodriguez, J.D., Hay, T., Gros, D., Shamsi, Z., and Srinivasan, R. (2022, January 10\u201315). Cross-domain detection of GPT-2-generated technical text. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, USA.","DOI":"10.18653\/v1\/2022.naacl-main.88"},{"key":"ref_13","unstructured":"Christ, M., Gunn, S., and Zamir, O. (July, January 30). Undetectable watermarks for language models. Proceedings of the Thirty Seventh Annual Conference on Learning Theory, Edmonton, AB, Canada."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1145\/3655103.3655106","article-title":"Fighting fire with fire: Can ChatGPT detect AI-generated text?","volume":"25","author":"Bhattacharjee","year":"2024","journal-title":"ACM SIGKDD Explor. Newsl."},{"key":"ref_15","unstructured":"Zhao, X., Ananth, P., Li, L., and Wang, Y.X. (2024, January 21\u201327). Provable robust watermarking for AI-generated text. Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria."},{"key":"ref_16","unstructured":"Pan, L., Liu, A., Hu, X., Meng, S., and Wen, L. (2024, January 7\u201311). Combating AI-generated fake content with robust watermarking. Proceedings of the International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Bender, E.M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021, January 3\u201310). On the dangers of stochastic parrots: Can language models be too big?. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Toronto, ON, Canada.","DOI":"10.1145\/3442188.3445922"},{"key":"ref_18","first-page":"30","article-title":"Spirit-lm: Interleaved spoken and written language model","volume":"13","author":"Nguyen","year":"2025","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_19","unstructured":"Zhan, H., He, X., Xu, Q., Wu, Y., and Stenetorp, P. (2023). G3Detector: General GPT-generated text detector. arXiv."},{"key":"ref_20","unstructured":"Gehrmann, S., Strobelt, H., and Rush, A.M. (August, January 28). GLTR: Statistical detection and visualization of generated text. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Florence, Italy."},{"key":"ref_21","unstructured":"Liu, A., Pan, L., Hu, X., Meng, S., and Wen, L. (2023). A semantic invariant robust watermark for large language models. arXiv."},{"key":"ref_22","unstructured":"Sadasivan, V.S., Kumar, A., Balasubramanian, S., Wang, W., and Feizi, S. (2023). Can AI-generated text be reliably detected?. arXiv."},{"key":"ref_23","unstructured":"He, X., Xu, Q., Lyu, L., Wu, F., and Wang, C. (March, January 22). Protecting IP of language generation APIs with lexical watermark. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA."},{"key":"ref_24","unstructured":"Yang, L., Ma, X., Fu, Y., and Xiong, D. (2023). Syntax-aware watermarking for text generation. arXiv."},{"key":"ref_25","unstructured":"Abdelnabi, S., and Fritz, M. (2021, January 24\u201327). Adversarial watermarking transformer. Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA."},{"key":"ref_26","unstructured":"Zhang, R., Hussain, S.S., and Koushanfar, F. (2023). REMARK-LLM: Robust and efficient watermarking for generative large language models. arXiv."},{"key":"ref_27","unstructured":"Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., and Goldstein, T. (2023, January 23\u201329). A watermark for large language models. Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA."},{"key":"ref_28","unstructured":"Chen, M., Wu, X., Li, L., Wang, Y., Tan, S., and Shi, S. (2023). Improving semantic coherence in watermarked texts with LLaMA2-based clustering. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lee, T., Hong, S., Ahn, J., Hong, I., Lee, H., Yun, S., Shin, J., and Kim, G. (2023). Who wrote this code? Watermarking for code generation. arXiv.","DOI":"10.18653\/v1\/2024.acl-long.268"},{"key":"ref_30","unstructured":"Mitchell, E., Lee, Y., Khazatsky, A., Manning, C.D., and Finn, C. (2023, January 23\u201329). DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. Proceedings of the 40th International Conference on Machine Learning (ICML 2023), Honolulu, HI, USA."},{"key":"ref_31","unstructured":"Yang, X., Zhang, K., Chen, H., Petzold, L., Wang, W.Y., and Cheng, W. (2023). Zero-shot detection of machine-generated codes. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Lin, K., Luo, Y., Zhang, Z., and Luo, P. (2024). Zero-shot generative linguistic steganography. arXiv.","DOI":"10.18653\/v1\/2024.naacl-long.289"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/7\/1124\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:09:32Z","timestamp":1760033372000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/7\/1124"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,14]]},"references-count":32,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["sym17071124"],"URL":"https:\/\/doi.org\/10.3390\/sym17071124","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2025,7,14]]}}}