{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T17:02:02Z","timestamp":1771520522358,"version":"3.50.1"},"reference-count":20,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,4,26]],"date-time":"2023-04-26T00:00:00Z","timestamp":1682467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Deception in computer-mediated communication represents a threat, and there is a growing need to develop efficient methods of detecting it. Machine learning models have, through natural language processing, proven to be extremely successful at detecting lexical patterns related to deception. In this study, four selected machine learning models are trained and tested on data collected through a crowdsourcing platform on the topics of COVID-19 and climate change. The performance of the models was tested by analyzing n-grams (from unigrams to trigrams) and by using psycho-linguistic analysis. A selection of important features was carried out and further deepened with additional testing of the models on different subsets of the obtained features. This study concludes that the subjectivity of the collected data greatly affects the detection of hidden linguistic features of deception. The psycho-linguistic analysis alone and in combination with n-grams achieves better classification results than an n-gram analysis while testing the models on own data, but also while examining the possibility of generalization, especially on trigrams where the combined approach achieves a notably higher accuracy of up to 16%. The n-gram analysis proved to be a more robust method during the testing of the mutual applicability of the models while psycho-linguistic analysis remained most inflexible.<\/jats:p>","DOI":"10.3390\/a16050221","type":"journal-article","created":{"date-parts":[[2023,4,27]],"date-time":"2023-04-27T01:28:28Z","timestamp":1682558908000},"page":"221","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change"],"prefix":"10.3390","volume":"16","author":[{"given":"Barbara","family":"Brzic","sequence":"first","affiliation":[{"name":"Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia"}]},{"given":"Ivica","family":"Boticki","sequence":"additional","affiliation":[{"name":"Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4979-2216","authenticated-orcid":false,"given":"Marina","family":"Bagic Babac","sequence":"additional","affiliation":[{"name":"Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"979","DOI":"10.1037\/0022-3514.70.5.979","article-title":"Lying in Everyday Life","volume":"70","author":"DePaulo","year":"1996","journal-title":"J. Pers. Soc. Psychol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1080\/07421222.2004.11045779","article-title":"A comparison of classification methods for predicting deception in computer-mediated communication","volume":"20","author":"Zhou","year":"2004","journal-title":"J. Manag. Inf. Syst."},{"key":"ref_3","unstructured":"Hancock, J.T., Curry, L., Goorha, S., and Woodworth, M. (2005, January 3\u20136). Automated linguistic analysis of deceptive and truthful synchronous computer-mediated communication. Proceedings of the 38th Annual Hawaii International Conference on System Sciences, Big Island, HI, USA."},{"key":"ref_4","unstructured":"Ott, M., Choi, Y., Cardie, C., and Hancock, J.T. (2011, January 19\u201324). Finding deceptive opinion spam by any stretch of the imagination. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_5","unstructured":"Feng, S., Banerjee, R., and Choi, Y. (2012, January 8\u201314). Syntactic stylometry for deception detection. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Republic of Korea."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"P\u00e9rez-Rosas, V., Abouelenien, M., Mihalcea, R., and Burzo, M. (2015, January 9\u201313). Deception detection using real-life trial data. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Washington, DC, USA.","DOI":"10.1145\/2818346.2820758"},{"key":"ref_7","unstructured":"Poesio, M., and Fornaciari, T. (2023, April 25). Detecting Deception in Text Using NLP Methods. Available online: https:\/\/research.signal-ai.com\/assets\/Deception_Detection_with_NLP.pdf."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Mihalcea, R., and Strapparava, C. (2009, January 4). The lie detector: Explorations in the automatic recognition of deceptive language. Proceedings of the ACL-IJCNLP 2009 Conference, Suntec, Singapore.","DOI":"10.3115\/1667583.1667679"},{"key":"ref_9","unstructured":"Zalta, E.N. (2016). The Stanford Encyclopedia of Philosophy (Winter 2016 Edition), Stanford University."},{"key":"ref_10","unstructured":"Isenberg, A. (1973). Aesthetics and the theory of criticism: Selected essays of Arnold Isenberg, University of Chicago Press."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1177\/0093650213485785","article-title":"Deception, Detection, Demeanor, and Truth Bias in Face-To-Face and Computer-Mediated Communication","volume":"42","author":"Braun","year":"2015","journal-title":"Commun. Res."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/01638530701739181","article-title":"On lying and being lied to: A linguistic analysis of deception in computer-mediated communication","volume":"45","author":"Hancock","year":"2008","journal-title":"Discourse Process."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1111\/j.1468-2885.1996.tb00132.x","article-title":"Interpersonal Deception Theory","volume":"6","author":"Burgoon","year":"1996","journal-title":"Commun. Theory"},{"key":"ref_14","unstructured":"National Research Council (2002). The Polygraph and Lie Detection, National Research Council."},{"key":"ref_15","first-page":"1","article-title":"The Development and Psychometric Properties of LIWC2007","volume":"1","author":"Pennebaker","year":"2007","journal-title":"Psychol. Sci. Univ. Tex. Austin Dev."},{"key":"ref_16","unstructured":"Boyd, R., Ashokkumar, A., Seraj, S., and Pennebaker, J. (2023, April 25). The Development and Psychometric Properties of LIWC-22. Available online: https:\/\/www.livc.app."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1177\/0261927X09351676","article-title":"The psychological meaning of words: LIWC and computerized text analysis methods","volume":"29","author":"Tausczik","year":"2010","journal-title":"J. Lang. Soc. Psychol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1007\/s13278-015-0273-1","article-title":"Deception detection in Twitter","volume":"5","author":"Alowibdi","year":"2015","journal-title":"Soc. Netw. Anal. Min."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Barsever, D., Singh, S., and Neftci, E. (2020, January 19\u201324). Building a Better Lie Detector with BERT: The Difference between Truth and Lies. Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK.","DOI":"10.1109\/IJCNN48605.2020.9206937"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Fornaciari, T., Bianchi, F., Poesio, M., and Hovy, D. (2021, January 19\u201323). BERTective: Language models and contextual information for deception detection. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2021.eacl-main.232"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/5\/221\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:23:35Z","timestamp":1760124215000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/5\/221"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,26]]},"references-count":20,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["a16050221"],"URL":"https:\/\/doi.org\/10.3390\/a16050221","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,26]]}}}