{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T21:24:38Z","timestamp":1740173078988,"version":"3.37.3"},"reference-count":41,"publisher":"Springer Fachmedien Wiesbaden GmbH","issue":"2","license":[{"start":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T00:00:00Z","timestamp":1708300800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T00:00:00Z","timestamp":1708300800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100006764","name":"Technische Universit\u00e4t Berlin","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006764","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["HMD"],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:title>Zusammenfassung<\/jats:title><jats:p>Durch Transformer-basierte KI-Systeme wurden gro\u00dfe Fortschritte, u.\u202fa. in den Bereichen Textverarbeitung und -verst\u00e4ndnis, erreicht. Diese Deep-Learning-Modelle erm\u00f6glichen das Generieren von Texten und bilden die Grundlage moderner Sprachmodelle. Die rasante Entwicklung der letzten Jahre hat gro\u00dfe Sprachmodelle, wie ChatGPT, Bard oder VICUNA-13B, hervorgebracht.<\/jats:p><jats:p>Der Beitrag pr\u00e4sentiert die Entwicklung der Sprachmodelle hin zu den gro\u00dfen Sprachmodellen. Durch die fortschreitende Entwicklung der Sprachmodelle ergeben sich vielf\u00e4ltige M\u00f6glichkeiten und Probleme, weshalb eine Erkennung von LLM-generierten Texten wichtig ist. Dieser Artikel stellt unterschiedliche Ans\u00e4tze bekannter Erkennungsverfahren dar. Neben statistischen Klassifizierungsverfahren werden auch Deep-Learning-basierte und Zero-Shot-Verfahren besprochen. Daneben werden ein kompressionsorientierter Ansatz vorgestellt sowie Kennzeichnungsverfahren pr\u00e4sentiert. Nach dem tabellarischen Vergleich der in der Literatur vorgestellten Verfahren werden implementierte Softwaredetektoren pr\u00e4sentiert. Im Anschluss werden \u00dcberlegungen zum Entwurf eines Trainingsdatensatzes aufgezeigt, wodurch die Grundlage f\u00fcr einen eigenen Ansatz zur Erkennung von KI-generierten Texten in deutscher Sprache geschaffen wird. Dar\u00fcber hinaus werden die Architektur und das Design des eigenen Ansatzes, dem KI-Inhalte-Detektor, vorgestellt und beschrieben sowie die Limitationen aufgezeigt.<\/jats:p>","DOI":"10.1365\/s40702-024-01051-w","type":"journal-article","created":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T17:02:21Z","timestamp":1708362141000},"page":"418-435","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Erkennungsverfahren f\u00fcr KI-generierte Texte: \u00dcberblick und Architekturentwurf","Detection Methods for AI-generated Texts: Overview and Architectural Design"],"prefix":"10.1365","volume":"61","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-5235-3801","authenticated-orcid":false,"given":"Thorsten","family":"Pr\u00f6hl","sequence":"first","affiliation":[]},{"given":"Radoslaw","family":"Mohrhardt","sequence":"additional","affiliation":[]},{"given":"Niels","family":"F\u00f6rster","sequence":"additional","affiliation":[]},{"given":"Erik","family":"Putzier","sequence":"additional","affiliation":[]},{"given":"R\u00fcdiger","family":"Zarnekow","sequence":"additional","affiliation":[]}],"member":"93","published-online":{"date-parts":[[2024,2,19]]},"reference":[{"key":"1051_CR1","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1186\/s40537-021-00444-8","volume":"8","author":"L Alzubaidi","year":"2021","unstructured":"Alzubaidi\u00a0L, Zhang\u00a0J, Humaidi\u00a0AJ, Al-Dujaili\u00a0A, Duan\u00a0Y, Al-Shamma\u00a0O, Santamar\u00eda\u00a0J, Fadhel\u00a0MA, Al-Amidie\u00a0M, Farhan\u00a0L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J\u00a0Big Data 8:53. https:\/\/doi.org\/10.1186\/s40537-021-00444-8","journal-title":"J Big Data"},{"key":"1051_CR2","doi-asserted-by":"crossref","unstructured":"Chakraborty M, Tonmoy SMTI, Zaman SMM, Sharma K, Barman NR, Gupta C, Gautam S, Kumar T, Jain V, Chadha A, Sheth AP, Das A (2023a) Counter Turing test CT^2: aI-generated text detection is not as easy as you may think\u2014introducing AI detectability index. https:\/\/arxiv.org\/pdf\/2310.05030.pdf","DOI":"10.18653\/v1\/2023.emnlp-main.136"},{"key":"1051_CR3","unstructured":"Chakraborty S, Bedi AS, Zhu S, An B, Manocha D, Huang F (2023b) On the possibilities of AI-generated text detection. https:\/\/arxiv.org\/pdf\/2304.04736.pdf"},{"key":"1051_CR4","doi-asserted-by":"crossref","unstructured":"Chan B, Schweter S, M\u00f6ller T (2020) German\u2019s next language model. https:\/\/arxiv.org\/pdf\/2010.10906.pdf","DOI":"10.18653\/v1\/2020.coling-main.598"},{"key":"1051_CR5","volume-title":"Evaluation metrics for language models","author":"S Chen","year":"1998","unstructured":"Chen\u00a0S, Beeferman\u00a0D, Rosenfeld\u00a0R (1998) Evaluation metrics for language models. Carnegie Mellon University"},{"key":"1051_CR6","unstructured":"Deng Z, Gao H, Miao Y, Zhang H (2023) Efficient detection of LLM-generated texts with a\u00a0Bayesian surrogate model. https:\/\/arxiv.org\/pdf\/2305.16617.pdf"},{"key":"1051_CR7","unstructured":"Devlin J, Chang M\u2011W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. https:\/\/arxiv.org\/pdf\/1810.04805.pdf"},{"key":"1051_CR8","doi-asserted-by":"publisher","first-page":"e443","DOI":"10.7717\/peerj-cs.443","volume":"7","author":"L Fr\u00f6hling","year":"2021","unstructured":"Fr\u00f6hling\u00a0L, Zubiaga\u00a0A (2021) Feature-based detection of automated language models: tackling GPT\u20112, GPT\u20113 and Grover. PeerJ Comput Sci 7:e443. https:\/\/doi.org\/10.7717\/peerj-cs.443","journal-title":"PeerJ Comput Sci"},{"key":"1051_CR9","unstructured":"Gall\u00e9 M, Rozen J, Kruszewski G, Elsahar H (2021) Unsupervised and distributional detection of machine-generated text. https:\/\/arxiv.org\/pdf\/2111.02878.pdf"},{"key":"1051_CR10","unstructured":"Goodside R (2023) There are adversarial attacks for that proposal as well\u2014in particular, generating with emojis after words and then removing them before submitting defeats it. https:\/\/twitter.com\/goodside\/status\/1610682909647671306"},{"key":"1051_CR11","doi-asserted-by":"publisher","first-page":"208","DOI":"10.1080\/00033790.2014.917437","volume":"73","author":"MD Gordin","year":"2016","unstructured":"Gordin\u00a0MD (2016) The Dostoevsky machine in Georgetown: scientific translation in the Cold War. Ann Sci 73:208\u2013223. https:\/\/doi.org\/10.1080\/00033790.2014.917437","journal-title":"Ann Sci"},{"key":"1051_CR12","unstructured":"Guo B, Zhang X, Wang Z, Jiang M, Nie J, Ding Y, Yue J, Wu Y (2023) How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. https:\/\/arxiv.org\/pdf\/2301.07597.pdf"},{"key":"1051_CR13","first-page":"1","volume-title":"2022 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). 11\u201313 Oct. 2022","author":"J Harguess","year":"2022","unstructured":"Harguess\u00a0J, Ward\u00a0CM (2022) Is the next winter coming for AI? Elements of making secure and robust AI. In: 2022 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). 11\u201313 Oct. 2022. IEEE, Piscataway, S\u00a01\u20137"},{"key":"1051_CR14","doi-asserted-by":"publisher","first-page":"1349","DOI":"10.1001\/jama.2023.5321","volume":"329","author":"CE Haupt","year":"2023","unstructured":"Haupt\u00a0CE, Marks\u00a0M (2023) AI-generated medical advice-GPT and beyond. JAMA 329:1349\u20131350. https:\/\/doi.org\/10.1001\/jama.2023.5321","journal-title":"JAMA"},{"key":"1051_CR15","unstructured":"Hazell J (2023) Large language models can be used to effectively scale spear phishing campaigns. https:\/\/arxiv.org\/pdf\/2305.06972.pdf"},{"key":"1051_CR16","series-title":"Proceedings of the 58th\u00a0Annual Meeting of the Association for Computational Linguistics:","doi-asserted-by":"publisher","first-page":"1808","DOI":"10.18653\/v1\/2020.acl-main.164","volume-title":"Automatic detection of generated text is easiest when humans are fooled","author":"D Ippolito","year":"2020","unstructured":"Ippolito\u00a0D, Duckworth\u00a0D, Callison-Burch\u00a0C, Eck\u00a0D (2020) Automatic detection of generated text is easiest when humans are fooled. Proceedings of the 58th\u00a0Annual Meeting of the Association for Computational Linguistics:, S\u00a01808\u20131822 https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.164"},{"key":"1051_CR17","unstructured":"Kirchenbauer J, Geiping J, Wen Y, Katz J, Miers I, Goldstein T (2023) A watermark for large language models. https:\/\/arxiv.org\/pdf\/2301.10226.pdf"},{"key":"1051_CR18","unstructured":"Kirchner JH, Ahmad L, Aaronson S, Leike J (2023) New AI classifier for indicating AI-written text. https:\/\/openai.com\/blog\/new-ai-classifier-for-indicating-ai-written-text"},{"key":"1051_CR19","unstructured":"Liu Y, Ott M, Goyal N, Du Jingfei, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: a\u00a0robustly optimized BERT pretraining approach. https:\/\/arxiv.org\/pdf\/1907.11692.pdf"},{"key":"1051_CR20","volume-title":"Foundations of statistical natural language processing","author":"CD Manning","year":"2005","unstructured":"Manning\u00a0CD, Sch\u00fctze\u00a0H (2005) Foundations of statistical natural language processing. MIT Press, Cambridge"},{"key":"1051_CR21","series-title":"Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)","doi-asserted-by":"publisher","first-page":"78","DOI":"10.18653\/v1\/2021.fever-1.9","volume-title":"FANG-COVID: a\u00a0new large-scale benchmark dataset for fake news detection in German","author":"J Mattern","year":"2021","unstructured":"Mattern\u00a0J, Qiao\u00a0Y, Kerz\u00a0E, Wiechmann\u00a0D, Strohmaier\u00a0M (2021) FANG-COVID: a\u00a0new large-scale benchmark dataset for fake news detection in German. Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER), S\u00a078\u201391 https:\/\/doi.org\/10.18653\/v1\/2021.fever-1.9"},{"key":"1051_CR22","unstructured":"Mitchell E, Lee Y, Khazatsky A, Manning CD, Finn C (2023) DetectGPT: zero-shot machine-generated text detection using probability curvature. https:\/\/arxiv.org\/pdf\/2301.11305.pdf"},{"key":"1051_CR23","unstructured":"Orenstrakh MS, Karnalim O, Suarez CA, Liut M (2023) Detecting LLM-generated text in computing education: a\u00a0comparative study for ChatGPT cases. https:\/\/arxiv.org\/pdf\/2307.07411.pdf"},{"key":"1051_CR24","unstructured":"Peng L, Zhang Y, Shang J (2023) Generating efficient training data via LLM-based attribute manipulation. https:\/\/arxiv.org\/pdf\/2307.07099.pdf"},{"key":"1051_CR25","unstructured":"Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2019) Exploring the limits of transfer learning with a\u00a0unified text-to-text transformer. https:\/\/arxiv.org\/pdf\/1910.10683.pdf"},{"key":"1051_CR26","doi-asserted-by":"publisher","first-page":"1213","DOI":"10.18653\/v1\/2022.naacl-main.88","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"J Rodriguez","year":"2022","unstructured":"Rodriguez\u00a0J, Hay\u00a0T, Gros\u00a0D, Shamsi\u00a0Z, Srinivasan\u00a0R (2022) Cross-domain detection of GPT-2-generated technical text. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, S\u00a01213\u20131233 https:\/\/doi.org\/10.18653\/v1\/2022.naacl-main.88"},{"key":"1051_CR27","doi-asserted-by":"publisher","first-page":"386","DOI":"10.1037\/h0042519","volume":"65","author":"F Rosenblatt","year":"1958","unstructured":"Rosenblatt\u00a0F (1958) The perceptron: a\u00a0probabilistic model for information storage and organization in the brain. Psychol Rev 65:386\u2013408. https:\/\/doi.org\/10.1037\/h0042519","journal-title":"Psychol Rev"},{"key":"1051_CR28","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1007\/s42979-022-01043-x","volume":"3","author":"IH Sarker","year":"2022","unstructured":"Sarker\u00a0IH (2022) AI-based modeling: techniques, applications and research issues towards automation, intelligent and smart systems. SN Comput Sci 3:158. https:\/\/doi.org\/10.1007\/s42979-022-01043-x","journal-title":"SN Comput Sci"},{"key":"1051_CR29","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1186\/s12911-022-01753-5","volume":"22","author":"Z Shuai","year":"2022","unstructured":"Shuai\u00a0Z, Xiaolin\u00a0D, Jing\u00a0Y, Yanni\u00a0H, Meng\u00a0C, Yuxin\u00a0W, Wei\u00a0Z (2022) Comparison of different feature extraction methods for applicable automated ICD coding. BMC Med Inform Decis Mak 22:11. https:\/\/doi.org\/10.1186\/s12911-022-01753-5","journal-title":"BMC Med Inform Decis Mak"},{"key":"1051_CR30","unstructured":"Su J, Zhuo TY, Mansurov J, Wang D, Nakov P (2023) Fake news detectors are biased against texts generated by large language models. https:\/\/arxiv.org\/pdf\/2309.08674.pdf"},{"key":"1051_CR31","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2012-65","volume-title":"LSTM neural networks for language modeling","author":"M Sundermeyer","year":"2012","unstructured":"Sundermeyer\u00a0M, Schl\u00fcter\u00a0R, Ney\u00a0H (2012) LSTM neural networks for language modeling. Interspeech"},{"key":"1051_CR32","unstructured":"Tang R, Chuang Y\u2011N, Hu X (2023) The science of detecting LLM-generated texts. https:\/\/arxiv.org\/pdf\/2303.07205.pdf"},{"key":"1051_CR33","volume-title":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","author":"CC Tappert","year":"2019","unstructured":"Tappert\u00a0CC (2019) Who is the father of deep learning? In: 2019 International Conference on Computational Science and Computational Intelligence (CSCI)"},{"key":"1051_CR34","unstructured":"Torrey J (2023) Meet \u201eZipPy\u201c, a\u00a0fast AI LLM text detector. https:\/\/blog.thinkst.com\/2023\/06\/meet-zippy-a-fast-ai-llm-text-detector.html"},{"key":"1051_CR35","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. https:\/\/arxiv.org\/pdf\/1706.03762.pdf"},{"key":"1051_CR36","unstructured":"Venkit PN, Gautam S, Panchanadikar R, Huang T\u2011H, Wilson S (2023) Nationality bias in text generation. https:\/\/arxiv.org\/pdf\/2302.02463.pdf"},{"key":"1051_CR37","unstructured":"Verma V, Fleisig E, Tomlin N, Klein D (2023) Ghostbuster: detecting text ghostwritten by large language models. https:\/\/arxiv.org\/pdf\/2305.15047.pdf"},{"key":"1051_CR38","unstructured":"Vicuna (2023) Vicuna: an open-source chatbot impressing gpt\u20114 with 90\u202f%* Chatgpt quality. https:\/\/vicuna.lmsys.org\/"},{"key":"1051_CR39","doi-asserted-by":"crossref","unstructured":"Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, Folt\u00fdnek T, Guerrero-Dib J, Popoola O, \u0160igut P, Waddington L (2023) Testing of detection tools for AI-generated text. https:\/\/arxiv.org\/pdf\/2306.15666.pdf","DOI":"10.1007\/s40979-023-00146-z"},{"key":"1051_CR40","series-title":"Communications of the ACM","doi-asserted-by":"publisher","DOI":"10.1145\/365153.365168","volume-title":"ELIZA\u2014a computer program for the study of natural language communication between man and machine","author":"J Weizenbaum","year":"1966","unstructured":"Weizenbaum\u00a0J (1966) ELIZA\u2014a computer program for the study of natural language communication between man and machine. Communications of the ACM"},{"key":"1051_CR41","unstructured":"Yao Y, Duan J, Xu K, Cai Y, Sun E, Zhang Y (2023) A survey on Large Language Model (LLM) security and privacy: the good, the bad, and the ugly. http:\/\/arxiv.org\/pdf\/2312.02003.pdf"}],"container-title":["HMD Praxis der Wirtschaftsinformatik"],"original-title":[],"language":"de","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1365\/s40702-024-01051-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1365\/s40702-024-01051-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1365\/s40702-024-01051-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,11]],"date-time":"2024-04-11T15:02:20Z","timestamp":1712847740000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1365\/s40702-024-01051-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,19]]},"references-count":41,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["1051"],"URL":"https:\/\/doi.org\/10.1365\/s40702-024-01051-w","relation":{},"ISSN":["1436-3011","2198-2775"],"issn-type":[{"type":"print","value":"1436-3011"},{"type":"electronic","value":"2198-2775"}],"subject":[],"published":{"date-parts":[[2024,2,19]]},"assertion":[{"value":"31 October 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 January 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 February 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}