{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T15:22:21Z","timestamp":1777130541128,"version":"3.51.4"},"reference-count":11,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T00:00:00Z","timestamp":1738800000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T00:00:00Z","timestamp":1738800000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The rapid development of artificial intelligence, especially AI assistants, is leading to new forms of plagiarism that are difficult to detect using existing methods. Paraphrasing tools make this problem even more complex and challenging especially in minor languages with inadequate resources and tools. This study explores strategies to help detect plagiarism generated by ChatGPT 4.0 and altered by paraphrasing tools. We propose two new datasets consisting of abstracts of doctoral theses in English and Serbian. Both datasets were subjected to ChatGPT paraphrasing, which allowed us to form two classes of texts: human-written and AI-generated, i.e., AI-paraphrased. We then comprehensively compare 19 widely used classification algorithms based on two feature sets: word unigrams and character multigrams. In addition, we compare these to the results of a commercially available pre-trained ChatGPT content detector, ZeroGPT. The results on the English corpus turn out to be very accurate, achieving an accuracy of 95% or more. In contrast, the results on the Serbian corpus were less accurate, achieving an accuracy of just over 85%. Syntax analysis of the training datasets has shown that in Serbian GPT-paraphrased texts, 33.2% of sentences remain the same, and they are found in 63% of documents. GPT-paraphrased English texts showed that 3.2% of sentences remain the same, and they are found in 16% of documents. Syntax analysis of the test datasets has shown that the change of the model temperature influences syntactic features (average number of words and sentences) in English texts and slightly or not in Serbian texts. We attribute all these differences to GPT\u2019s lower paraphrasing ability in minor languages such as Serbian. Presented findings underscore the necessity for making persistent effort in developing tools made for detecting AI-paraphrased texts in academic and professional settings, particularly for minor languages with limited NLP resources, to preserve content integrity and authenticity.<\/jats:p>","DOI":"10.1186\/s40537-025-01082-0","type":"journal-article","created":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T17:39:51Z","timestamp":1738863591000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Comparison of algorithms for the recognition of ChatGPT paraphrased texts"],"prefix":"10.1186","volume":"12","author":[{"given":"Aleksandar","family":"Kartelj","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Miljana","family":"Mladenovi\u0107","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sta\u0161a","family":"Vuji\u010di\u0107 Stankovi\u0107","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,2,6]]},"reference":[{"key":"1082_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/bs.adcom.2016.09.001","volume":"104","author":"V Blagojevi\u0107","year":"2017","unstructured":"Blagojevi\u0107 V, Boji\u0107 D, Bojovi\u0107 M, Cvetanovi\u0107 M, Djordjevi\u0107 J, Djurdjevi\u0107 D, et al. A systematic approach to generation of new ideas for PhD research in computing. Adv Comput. 2017;104:1\u201331.","journal-title":"Adv Comput"},{"issue":"1","key":"1082_CR2","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1007\/s40979-023-00140-5","volume":"19","author":"AM Elkhatat","year":"2023","unstructured":"Elkhatat AM, Elsaid K, Almeer S. Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. Int J Educ Integr. 2023;19(1):17\u201333. https:\/\/doi.org\/10.1007\/s40979-023-00140-5.","journal-title":"Int J Educ Integr"},{"key":"1082_CR3","doi-asserted-by":"publisher","DOI":"10.1177\/01655515241227531","author":"K Hayawi","year":"2024","unstructured":"Hayawi K, Shahriar S, Mathew SS. The imitation game: detecting human and AI-generated texts in the era of ChatGPT and BARD. J Inf Sci. 2024. https:\/\/doi.org\/10.1177\/01655515241227531.","journal-title":"J Inf Sci."},{"key":"1082_CR4","doi-asserted-by":"publisher","unstructured":"Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, Folt\u1ef3nek T, Guerrero-Dib J, Popoola O, et\u00a0al. Testing of detection tools for AI-generated text. International Journal for Educational Integrity. 2023;19(26). https:\/\/doi.org\/10.1007\/s40979-023-00146-z.","DOI":"10.1007\/s40979-023-00146-z"},{"key":"1082_CR5","unstructured":"Antoun W, Mouilleron V, Sagot B, Seddah D. Towards a robust detection of language model-generated text: is ChatGPT that easy to detect? In: Servan C, Vilnat A, editors. Actes de CORIA-TALN 2023. Actes de la 30e Conf\u00e9rence sur le Traitement Automatique des Langues Naturelles (TALN), vol. 1 : travaux de recherche originaux \u2013 articles longs. Paris, France: ATALA; 2023. pp. 14\u201327. https:\/\/aclanthology.org\/2023.jeptalnrecital-long.2."},{"issue":"11","key":"1082_CR6","doi-asserted-by":"publisher","DOI":"10.1016\/j.xcrp.2023.101672","volume":"4","author":"H Desaire","year":"2023","unstructured":"Desaire H, Chua AE, Kim MG, Hua D. Accurately detecting AI text when ChatGPT is told to write like a chemist. Cell Rep Phys Sci. 2023;4(11): 101672. https:\/\/doi.org\/10.1016\/j.xcrp.2023.101672.","journal-title":"Cell Rep Phys Sci"},{"key":"1082_CR7","unstructured":"Mitrovi\u0107 S, Andreoletti D, Ayoub O. ChatGPT or human? Detect and explain. Explaining decisions of machine learning model for detecting short ChatGPT-generated text. https:\/\/arxiv.org\/pdf\/2301.13852.pdf."},{"key":"1082_CR8","unstructured":"Zhou C, Qiu C, Acuna DE. Paraphrase identification with deep learning: a review of datasets and methods."},{"key":"1082_CR9","unstructured":"Souradip C, Bedi AS, Zhu S, An B, Manocha D, Huang F. On the possibilities of AI-generated text detection. https:\/\/arxiv.org\/pdf\/2304.04736."},{"key":"1082_CR10","unstructured":"Le Q, Mikolov T. Distributed representations of sentences and documents. In: International conference on machine learning. PMLR; 2014. pp. 1188\u201396."},{"key":"1082_CR11","unstructured":"Honnibal M, Montani I, Van\u00a0Landeghem S, Boyd A, et\u00a0al. spaCy: industrial-strength natural language processing in Python. 2020."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01082-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-025-01082-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01082-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T17:39:55Z","timestamp":1738863595000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-025-01082-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,6]]},"references-count":11,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1082"],"URL":"https:\/\/doi.org\/10.1186\/s40537-025-01082-0","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,6]]},"assertion":[{"value":"29 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 January 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 February 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors conceived, designed the study, and collected the data. All authors performed the analysis and manuscript preparation, read, and approved the final manuscript.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Author contributions"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Funding"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"28"}}