{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T01:27:39Z","timestamp":1777166859410,"version":"3.51.4"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,12,1]],"date-time":"2024-12-01T00:00:00Z","timestamp":1733011200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,12,4]],"date-time":"2024-12-04T00:00:00Z","timestamp":1733270400000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"IU Internationale Hochschule GmbH"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Speech Technol"],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Chatbots based on large language models (LLMs) like ChatGPT are available to the wide public. These tools can for instance be used by students to generate essays or whole theses from scratch or by rephrasing an existing text. But how does for instance a teacher know whether a text is written by a student or an AI? In this paper, we investigate <jats:italic>perplexity<\/jats:italic>, <jats:italic>semantic<\/jats:italic>, <jats:italic>list lookup<\/jats:italic>, <jats:italic>document<\/jats:italic>, <jats:italic>error-based<\/jats:italic>, <jats:italic>readability<\/jats:italic>, <jats:italic>AI feedback<\/jats:italic> and <jats:italic>text vector<\/jats:italic> features to classify <jats:italic>human-generated<\/jats:italic> and <jats:italic>AI-generated<\/jats:italic> texts from the educational domain as well as news articles. We analyze two scenarios: (1)\u00a0The detection of text generated by AI from scratch, and (2)\u00a0the detection of text rephrased by AI. Since we assumed that classification is more difficult when the AI has been prompted to create or rephrase the text in a way that a human would not recognize that it was generated or rephrased by an AI, we also investigate this <jats:italic>advanced prompting<\/jats:italic> scenario. To train, fine-tune and test the classifiers, we created the <jats:italic>Multilingual Human-AI-Generated Text Corpus<\/jats:italic> which contains <jats:italic>human-generated<\/jats:italic>, <jats:italic>AI-generated<\/jats:italic> and <jats:italic>AI-rephrased<\/jats:italic> texts from the educational domain in English, French, German, and Spanish and English texts from the news domain. We demonstrate that the same features can be used for the detection of <jats:italic>AI-generated<\/jats:italic> and <jats:italic>AI-rephrased<\/jats:italic> texts from the educational domain in all languages and the detection of <jats:italic>AI-generated<\/jats:italic> and <jats:italic>AI-rephrased<\/jats:italic> news texts. Our best systems significantly outperform GPTZero and ZeroGPT\u2014state-of-the-art systems for the detection of <jats:italic>AI-generated<\/jats:italic> text. Our best <jats:italic>text rephrasing<\/jats:italic> detection system even outperforms GPTZero by 181.3% relative in F1-score.<\/jats:p>","DOI":"10.1007\/s10772-024-10143-3","type":"journal-article","created":{"date-parts":[[2024,12,4]],"date-time":"2024-12-04T13:31:28Z","timestamp":1733319088000},"page":"935-956","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Classification of human- and AI-generated texts for different languages and domains"],"prefix":"10.1007","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-5286-6548","authenticated-orcid":false,"given":"Kristina","family":"Schaaff","sequence":"first","affiliation":[]},{"given":"Tim","family":"Schlippe","sequence":"additional","affiliation":[]},{"given":"Lorenz","family":"Mindner","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,12,4]]},"reference":[{"key":"10143_CR5","unstructured":"Adiwardana, D., et al. (2020). Towards a human-like open-domain Chatbot. ArXiv Preprint http:\/\/arxiv.org\/abs\/2001.09977."},{"key":"10143_CR3","doi-asserted-by":"crossref","unstructured":"Arteaga, D., Arenas, J., Paz, F., Tupia, M., & Bruzza, M. (2019). Design of information system architecture for the recommendation of tourist sites in the city of Manta, Ecuador through a chatbot, (pp. 1\u20136). IEEE.","DOI":"10.23919\/CISTI.2019.8760669"},{"key":"10143_CR14","doi-asserted-by":"crossref","unstructured":"Baidoo-Anu, D., & Owusu Ansah, L. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Available at SSRN 4337484.","DOI":"10.2139\/ssrn.4337484"},{"key":"10143_CR43","unstructured":"Bird, S., & Loper, E. (2004). NLTK: The natural language toolkit, (pp. 214\u2013217). Association for Computational Linguistics. http:\/\/aclanthology.org\/P04-3031."},{"key":"10143_CR31","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L. (2001). Random forests. Machine Learning, 45, 5\u201332.","journal-title":"Machine Learning"},{"key":"10143_CR20","unstructured":"Brown, T. B., Mann, B., Ryder, N.,...., & Amodei, D. (2020). Language models are few-shot learners. CoRR http:\/\/arxiv.org\/abs\/2005.14165."},{"key":"10143_CR45","doi-asserted-by":"publisher","unstructured":"Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system, KDD \u201916, (pp. 785\u2013794). Association for Computing Machinery. https:\/\/doi.org\/10.1145\/2939672.2939785","DOI":"10.1145\/2939672.2939785"},{"key":"10143_CR33","unstructured":"Components. Components (2023). https:\/\/components.one."},{"key":"10143_CR17","unstructured":"Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019, (pp. 4171\u20134186)."},{"key":"10143_CR2","doi-asserted-by":"crossref","unstructured":"Dibitonto, M., Leszczynska, K., Tazzi, F., & Medaglia, C. M. (2018). Chatbot in a campus environment: Design of LiSA, a virtual assistant to help students in their university life, (pp.103\u2013116). Springer.","DOI":"10.1007\/978-3-319-91250-9_9"},{"key":"10143_CR11","unstructured":"Ethnologue. (2023). What are the top 200 most spoken languages? https:\/\/www.ethnologue.com\/insights\/ethnologue200."},{"key":"10143_CR4","doi-asserted-by":"crossref","unstructured":"Falala-S\u00e9chet, C., Antoine, L., Thiriez, I., & Bungener, C. (2019). OWLIE: A Chatbot that provides emotional support for coping with psychological difficulties, (pp. 236\u2013237).","DOI":"10.1145\/3308532.3329416"},{"issue":"3","key":"10143_CR36","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1037\/h0057532","volume":"32","author":"RF Flesch","year":"1948","unstructured":"Flesch, R. F. (1948). A new readability yardstick. The Journal of Applied Psychology, 32(3), 221\u2013233.","journal-title":"The Journal of Applied Psychology"},{"key":"10143_CR34","volume-title":"GLTR: Statistical Detection and Visualization of Generated Text, 111\u2013116","author":"S Gehrmann","year":"2019","unstructured":"Gehrmann, S., Strobelt, H., & Rush, A. (2019). GLTR: Statistical detection and visualization of generated text, (pp.111\u2013116). Association for Computational Linguistics."},{"key":"10143_CR28","unstructured":"Guo, B., Zhang, X.,Wang, Z., Jiang, M., Nie, J.,Ding, Y., Yue, J., & Wu, Y. (2023). How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. arXiv:2301.07597."},{"key":"10143_CR15","doi-asserted-by":"publisher","first-page":"2817","DOI":"10.1007\/s00330-023-10213-1","volume":"34","author":"K Jeblick","year":"2023","unstructured":"Jeblick, K., Schachtner, B., Dexl, J., Mittermeier, A., Stv\u00fcber, A. T., Topalis, J., Weber, T., Wesp, P., Sabel, B. O., Ricke, J., & Ingrisch, M. (2023). ChatGPT makes medicine easy to swallow: An exploratory case study on simplified radiology reports. European Radiology, 34, 2817\u20132825.","journal-title":"European Radiology"},{"key":"10143_CR16","unstructured":"Jiao, W., Wang, W., Huang, J.-t., Wang, X., & Tu, Z. (2023). Is ChatGPT a good translator? A preliminary study. ArXiv Preprint arXiv:2301.08745."},{"key":"10143_CR7","doi-asserted-by":"publisher","DOI":"10.1145\/3571730","author":"Z Ji","year":"2023","unstructured":"Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computer Survey. https:\/\/doi.org\/10.1145\/3571730","journal-title":"ACM Computer Survey"},{"key":"10143_CR37","volume-title":"Derivation of New Readability Formulas","author":"JP Kincaid","year":"1975","unstructured":"Kincaid, J. P., Fishburne Jr., R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas\u00a0(Automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel."},{"key":"10143_CR35","unstructured":"Kumarage, T., Garland, J., Bhattacharjee, A., Trapeznikov, K., Ruston, S., & Liu, H. (2023). Stylometric detection of AI-generated text in Twitter timelines. arXiv:2303.03697."},{"key":"10143_CR18","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. CoRR arXiv:abs\/1907.11692."},{"key":"10143_CR12","doi-asserted-by":"publisher","first-page":"e48392","DOI":"10.2196\/48392","volume":"25","author":"B Mesko","year":"2023","unstructured":"Mesko, B. (2023). The ChatGPT (generative artificial intelligence) revolution has made artificial intelligence approachable for medical professionals. Journal of Medical Internet Research, 25, e48392.","journal-title":"Journal of Medical Internet Research"},{"key":"10143_CR9","doi-asserted-by":"crossref","unstructured":"Mindner, L., Schlippe, T., Schaaff, K. Schlippe, T., Cheng, E. C. K., & Wang, T. (eds) (2023). Classification of human- and AI-generated texts: Investigating features for ChatGPT. In Schlippe, T., Cheng, E. C. K. & Wang, T. (Eds.) Artificial intelligence in education technologies: New development and innovative practices, (pp. 152\u2013170. Springer Nature.","DOI":"10.1007\/978-981-99-7947-9_12"},{"key":"10143_CR24","unstructured":"Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D., & Finn, C. (2023). DetectGPT: Zero-shot machine-generated text detection using probability curvature."},{"key":"10143_CR21","unstructured":"Mitrovi\u0107, S., Andreoletti, D., & Ayoub, O. (2023). ChatGPT or human? Detect and explain. Explaining decisions of machine learning model for detecting short ChatGPT-generated text. arXiv preprint arXiv:2301.13852."},{"key":"10143_CR44","unstructured":"Mooney, P. (2022). Kaggle machine learning and data science survey 2022. https:\/\/kaggle.com\/competitions\/kaggle-survey-2022."},{"key":"10143_CR32","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1016\/0925-2312(91)90023-5","volume":"2","author":"F Murtagh","year":"1991","unstructured":"Murtagh, F. (1991). Multilayer perceptrons for classification and regression. Neurocomputing, 2, 183\u2013197.","journal-title":"Neurocomputing"},{"key":"10143_CR42","unstructured":"Natalie. (2023). What is ChatGPT? https:\/\/help.openai.com\/en\/articles\/6783457-what-is-chatgpt."},{"key":"10143_CR1","doi-asserted-by":"publisher","first-page":"106855","DOI":"10.1016\/j.chb.2021.106855","volume":"122","author":"C Pelau","year":"2021","unstructured":"Pelau, C., Dabija, D.-C., & Ene, I. (2021). What makes an AI device human-like? The role of interaction quality, empathy and perceived psychological anthropomorphic characteristics in the acceptance of artificial intelligence in the service industry. Computers in Human Behavior, 122, 106855.","journal-title":"Computers in Human Behavior"},{"key":"10143_CR39","doi-asserted-by":"crossref","unstructured":"Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks, (pp. 3982\u20133992). Association for Computational Linguistics. https:\/\/aclanthology.org\/D19-1410.","DOI":"10.18653\/v1\/D19-1410"},{"key":"10143_CR19","volume-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","author":"A Roberts","year":"2019","unstructured":"Roberts, A., Raffel, C., Lee, K., Matena, M., Shazee, N., Liu, P. J., Narang, S., Li, W., & Zhou, Y. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. Google: Tech. Rep."},{"key":"10143_CR30","volume-title":"DistilBERT, a Distilled Version of BERT: Smaller","author":"V Sanh","year":"2019","unstructured":"Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, cheaper and lighter: Faster."},{"key":"10143_CR22","doi-asserted-by":"crossref","unstructured":"Schaaff, K., Reinig, C., & Schlippe, T. (2023). Exploring ChatGPT\u2019s empathic abilities. In 2023 11th international conference on affective computing and intelligent interaction (ACII 2023) (pp. 1\u20138). IEEE Computer Society. https:\/\/doi.ieeecomputersociety.org\/10.1109\/ACII59096.2023.10388208.","DOI":"10.1109\/ACII59096.2023.10388208"},{"key":"10143_CR10","unstructured":"Schaaff, K., Schlippe, T., Mindner, L. Abbas, M., & Freihat, A. A. (eds) (2023). Classification of human- and AI-generated texts for English, French, German, and Spanish. In Abbas, M. & Freihat, A. A. (Eds.) The 6th international conference on natural language and speech processing (ICNLSP 2023), (pp. 1\u201310). Association for Computational Linguistics, Online. https:\/\/aclanthology.org\/2023.icnlsp-1.1."},{"key":"10143_CR26","unstructured":"Shijaku, R., & Canhasi, E. (2023). ChatGPT generated text detection."},{"key":"10143_CR23","unstructured":"Shrivastava, R. (2023). With seed funding secured, AI detection tool GPTZero launches new browser plugin. https:\/\/www.forbes.com\/sites\/rashishrivastava\/2023\/05\/09\/with-seed-funding-secured-ai-detection-tool-gptzero-launches-new-browser-plugin."},{"key":"10143_CR38","unstructured":"Solaiman, I., Brundage, M., Clark, J., Askell, A., Herbert-Voss, A., Wu, J.,Radford, A., Krueger, G., Kim, J. W., Kreps, S., McCain, M., Newhouse, A., Blazakis, J., McGuffie, K., & Wang, J. (2019). Release strategies and the social impacts of language models. arXiv:1908.09203."},{"key":"10143_CR29","unstructured":"Soni, M., & Wade, V. (2023). Comparing abstractive summaries generated by ChatGPT to real summaries through blinded reviewers and text classification algorithms. arXiv:2303.17650."},{"key":"10143_CR6","doi-asserted-by":"publisher","first-page":"35","DOI":"10.3390\/bdcc7010035","volume":"7","author":"V Taecharungroj","year":"2023","unstructured":"Taecharungroj, V. (2023). \u201cWhat can ChatGPT do?\u201d Analyzing early reactions to the innovative AI chatbot on Twitter. Big Data and Cognitive Computing, 7, 35.","journal-title":"Big Data and Cognitive Computing"},{"key":"10143_CR8","unstructured":"Thompson, P. (2023). A developer built a \u2019Propaganda machine\u2019 using OpenAI Tech to highlight the dangers of mass-produced AI disinformation. https:\/\/www.businessinsider.com\/developer-creates-ai-disinformation-system-using-openai-2023-9."},{"key":"10143_CR13","unstructured":"Touvron, H., et al. (2023). LLaMA: Open and efficient foundation language models. arXiv:2302.13971."},{"key":"10143_CR40","doi-asserted-by":"crossref","unstructured":"Vu, N. T., Schlippe, T., Kraus, F., & Schultz, T. (2010). Rapid bootstrapping of five Eastern European languages using the rapid language adaptation toolkit. https:\/\/api.semanticscholar.org\/CorpusID:12942559.","DOI":"10.21437\/Interspeech.2010-292"},{"key":"10143_CR41","doi-asserted-by":"publisher","first-page":"5731","DOI":"10.1007\/s10462-022-10144-1","volume":"55","author":"M Wankhade","year":"2022","unstructured":"Wankhade, M., Rao, A., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55, 5731\u20135780.","journal-title":"Artificial Intelligence Review"},{"key":"10143_CR25","unstructured":"Yu, P., Chen, J., Feng, X., & Xia, Z. (2023). CHEAT: A large-scale dataset for detecting ChatGPT-writtEn AbsTracts. arXiv:2304.12008."},{"key":"10143_CR27","doi-asserted-by":"crossref","unstructured":"Zaitsu, W., & Jin, M. (2023). Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis. arXiv:2304.05534.","DOI":"10.1371\/journal.pone.0288453"}],"container-title":["International Journal of Speech Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10772-024-10143-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10772-024-10143-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10772-024-10143-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,16]],"date-time":"2024-12-16T10:10:33Z","timestamp":1734343833000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10772-024-10143-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12]]},"references-count":45,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["10143"],"URL":"https:\/\/doi.org\/10.1007\/s10772-024-10143-3","relation":{},"ISSN":["1381-2416","1572-8110"],"issn-type":[{"value":"1381-2416","type":"print"},{"value":"1572-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12]]},"assertion":[{"value":"10 August 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 December 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}