{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,1]],"date-time":"2026-03-01T21:33:26Z","timestamp":1772400806649,"version":"3.50.1"},"reference-count":45,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2022,11,10]],"date-time":"2022-11-10T00:00:00Z","timestamp":1668038400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"University of Economic Ho Chi Minh City (UEH) Vietnam","award":["2022-09-09-1144"],"award-info":[{"award-number":["2022-09-09-1144"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The explosive growth of the social media community has increased many kinds of misinformation and is attracting tremendous attention from the research community. One of the most prevalent ways of misleading news is cheapfakes. Cheapfakes utilize non-AI techniques such as unaltered images with false context news to create false news, which makes it easy and \u201ccheap\u201d to create and leads to an abundant amount in the social media community. Moreover, the development of deep learning also opens and invents many domains relevant to news such as fake news detection, rumour detection, fact-checking, and verification of claimed images. Nevertheless, despite the impact on and harmfulness of cheapfakes for the social community and the real world, there is little research on detecting cheapfakes in the computer science domain. It is challenging to detect misused\/false\/out-of-context pairs of images and captions, even with human effort, because of the complex correlation between the attached image and the veracity of the caption content. Existing research focuses mostly on training and evaluating on given dataset, which makes the proposal limited in terms of categories, semantics and situations based on the characteristics of the dataset. In this paper, to address these issues, we aimed to leverage textual semantics understanding from the large corpus and integrated with different combinations of text-image matching and image captioning methods via ANN\/Transformer boosting schema to classify a triple of (image, caption1, caption2) into OOC (out-of-context) and NOOC (no out-of-context) labels. We customized these combinations according to various exceptional cases that we observed during data analysis. We evaluate our approach using the dataset and evaluation metrics provided by the COSMOS baseline. Compared to other methods, including the baseline, our method achieves the highest Accuracy, Recall, and F1 scores.<\/jats:p>","DOI":"10.3390\/a15110423","type":"journal-article","created":{"date-parts":[[2022,11,10]],"date-time":"2022-11-10T02:07:48Z","timestamp":1668046068000},"page":"423","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Leverage Boosting and Transformer on Text-Image Matching for Cheap Fakes Detection"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8896-9611","authenticated-orcid":false,"given":"Tuan-Vinh","family":"La","sequence":"first","affiliation":[{"name":"University of Information Technology, Ho Chi Minh City 700000, Vietnam"},{"name":"Vietnam National University, Ho Chi Minh City 700000, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3044-8175","authenticated-orcid":false,"given":"Minh-Son","family":"Dao","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology, Tokyo 184-8795, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7987-6627","authenticated-orcid":false,"given":"Duy-Dong","family":"Le","sequence":"additional","affiliation":[{"name":"University of Economics, Ho Chi Minh City 700000, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3933-1301","authenticated-orcid":false,"given":"Kim-Phung","family":"Thai","sequence":"additional","affiliation":[{"name":"University of Economics, Ho Chi Minh City 700000, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6363-0160","authenticated-orcid":false,"given":"Quoc-Hung","family":"Nguyen","sequence":"additional","affiliation":[{"name":"University of Economics, Ho Chi Minh City 700000, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3777-1805","authenticated-orcid":false,"given":"Thuy-Kieu","family":"Phan-Thi","sequence":"additional","affiliation":[{"name":"University of Economics, Ho Chi Minh City 700000, Vietnam"}]}],"member":"1968","published-online":{"date-parts":[[2022,11,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"40","DOI":"10.22215\/timreview\/1282","article-title":"The emergence of deepfake technology: A review","volume":"9","author":"Westerlund","year":"2019","journal-title":"Technol. Innov. Manag. Rev."},{"key":"ref_2","unstructured":"Collins, A. (2019). Forged Authenticity: Governing Deepfake Risks, EPFL International Risk Governance Center (IRGC). Technical Report."},{"key":"ref_3","unstructured":"Fazio, L. (2022, October 07). Out-of-Context Photos Are a Powerful Low-Tech Form of Misinformation. Available online: https:\/\/mat.miracosta.edu\/mat210_cotnoir\/instructor\/pdfs-for-class\/Out-of-context-photos-are-a-powerful-low-tech-form-of-misinformation.pdf."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Thorne, J., Vlachos, A., Christodoulopoulos, C., and Mittal, A. (2018). Fever: A large-scale dataset for fact extraction and verification. arXiv.","DOI":"10.18653\/v1\/N18-1074"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wang, W.Y. (2017). \u201c liar, liar pants on fire\u201d: A new benchmark dataset for fake news detection. arXiv.","DOI":"10.18653\/v1\/P17-2067"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Choudhary, A., and Arora, A. (2021). Linguistic feature based learning model for fake news detection and classification. Expert Syst. Appl., 169.","DOI":"10.1016\/j.eswa.2020.114171"},{"key":"ref_7","unstructured":"Singh, V., Dasgupta, R., Sonagra, D., Raman, K., and Ghosh, I. (2017, January 5\u20138). Automated fake news detection using linguistic analysis and machine learning. Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS), Washington, DC, USA."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1145\/3137597.3137600","article-title":"Fake news detection on social media: A data mining perspective","volume":"19","author":"Shu","year":"2017","journal-title":"ACM SIGKDD Explor. Newsl."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"17","DOI":"10.5121\/ijnlc.2019.8302","article-title":"Fake news detection with semantic features and text mining","volume":"8","author":"Bharadwaj","year":"2019","journal-title":"Int. J. Nat. Lang. Comput. (IJNLC)"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Pan, J.Z., Pavlova, S., Li, C., Li, N., Li, Y., and Liu, J. (2018, January 8\u201312). Content based fake news detection using knowledge graphs. Proceedings of the International Semantic Web Conference, Monterey, CA, USA.","DOI":"10.1007\/978-3-030-00671-6_39"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Hu, L., Yang, T., Zhang, L., Zhong, W., Tang, D., Shi, C., Duan, N., and Zhou, M. (2021, January 1\u20136). Compare to the knowledge: Graph neural fake news detection with external knowledge. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online.","DOI":"10.18653\/v1\/2021.acl-long.62"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, Y., Qian, S., Hu, J., Fang, Q., and Xu, C. (2020, January 26\u201329). Fake news detection via knowledge-driven multimodal graph convolutional networks. Proceedings of the 2020 International Conference on Multimedia Retrieval, Dublin, Ireland.","DOI":"10.1145\/3372278.3390713"},{"key":"ref_13","unstructured":"Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., and Cha, M. (2016, January 9\u201315). Detecting rumors from microblogs with recurrent neural networks. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Ma, J., Gao, W., and Wong, K.F. (2018). Rumor Detection on Twitter with Tree-Structured Recursive Neural Networks, Association for Computational Linguistics.","DOI":"10.18653\/v1\/P18-1184"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"113595","DOI":"10.1016\/j.eswa.2020.113595","article-title":"Rumor detection based on propagation graph neural network with attention mechanism","volume":"158","author":"Wu","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Bian, T., Xiao, X., Xu, T., Zhao, P., Huang, W., Rong, Y., and Huang, J. (2020, January 7\u201312). Rumor detection on social media with bi-directional graph convolutional networks. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i01.5393"},{"key":"ref_17","unstructured":"Mishra, S., Suryavardan, S., Bhaskar, A., Chopra, P., Reganti, A., Patwa, P., Das, A., Chakraborty, T., Sheth, A., and Ekbal, A. (March, January 22). Factify: A multi-modal fact verification dataset. Proceedings of the First Workshop on Multimodal Fact-Checking and Hate Speech Detection (DE-FACTIFY), Vancouver, BC, Canada."},{"key":"ref_18","unstructured":"Gao, J., Hoffmann, H.F., Oikonomou, S., Kiskovski, D., and Bandhakavi, A. (2021). Logically at the factify 2022: Multimodal fact verification. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Pan, L., Chen, W., Xiong, W., Kan, M.Y., and Wang, W.Y. (2021). Zero-shot fact verification by claim generation. arXiv.","DOI":"10.18653\/v1\/2021.acl-short.61"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ciampaglia, G.L., Shiralkar, P., Rocha, L.M., Bollen, J., Menczer, F., and Flammini, A. (2015). Computational fact checking from knowledge networks. PLoS ONE, 10.","DOI":"10.1371\/journal.pone.0141938"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, Y., Ma, F., Jin, Z., Yuan, Y., Xun, G., Jha, K., Su, L., and Gao, J. (2018, January 19\u201323). Eann: Event adversarial neural networks for multi-modal fake news detection. Proceedings of the 24th ACM Sigkdd International Conference on Knowledge Discovery & Data Mining, London, UK.","DOI":"10.1145\/3219819.3219903"},{"key":"ref_22","unstructured":"Khattar, D., Goud, J.S., Gupta, M., and Varma, V. (May, January 13). Mvae: Multimodal variational autoencoder for fake news detection. Proceedings of the World Wide Web Conference, San Francisco, CA, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Singhal, S., Shah, R.R., Chakraborty, T., Kumaraguru, P., and Satoh, S. (2019, January 11\u201313). Spotfake: A multi-modal framework for fake news detection. Proceedings of the 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), Singapore.","DOI":"10.1109\/BigMM.2019.00-44"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"La, T.V., Dao, M.S., Tran, Q.T., Tran, T.P., Tran, A.D., and Nguyen, D.T.D. (2022, January 10\u201314). A Combination of Visual-Semantic Reasoning and Text Entailment-based Boosting Algorithm for Cheapfake Detection. Proceedings of the ACM MM 2022, Lisbon, Portugal.","DOI":"10.1145\/3503161.3551595"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zlatkova, D., Nakov, P., and Koychev, I. (2019). Fact-checking meets fauxtography: Verifying claims about images. arXiv.","DOI":"10.18653\/v1\/D19-1216"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., and Parikh, D. (2015, January 7\u201313). Vqa: Visual question answering. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.279"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Li, X., Yin, X., Li, C., Zhang, P., Hu, X., Zhang, L., Wang, L., Hu, H., Dong, L., and Wei, F. (2020, January 23\u201328). Oscar: Object-semantics aligned pre-training for vision-language tasks. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58577-8_8"},{"key":"ref_28","unstructured":"Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. (2021, January 18\u201324). Zero-shot text-to-image generation. Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lee, K.H., Chen, X., Hua, G., Hu, H., and He, X. (2018, January 8\u201314). Stacked cross attention for image-text matching. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01225-0_13"},{"key":"ref_30","unstructured":"Aneja, S., Bregler, C., and Nie\u00dfner, M. (2021). Cosmos: Catching out-of-context misinformation with self-supervised learning. arXiv."},{"key":"ref_31","unstructured":"Aneja, S., Midoglu, C., Dang-Nguyen, D.T., Khan, S.A., Riegler, M., Halvorsen, P., Bregler, C., and Adsumilli, B. (2022). ACM Multimedia Grand Challenge on Detecting Cheapfakes. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_33","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Cer, D., Yang, Y., Kong, S.y., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., and Tar, C. (2018). Universal sentence encoder. arXiv.","DOI":"10.18653\/v1\/D18-2029"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., and Goel, V. (2017, January 21\u201326). Self-critical sequence training for image captioning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.131"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., and Specia, L. (2017). Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv.","DOI":"10.18653\/v1\/S17-2001"},{"key":"ref_39","unstructured":"Li, K., Zhang, Y., Li, K., Li, Y., and Fu, Y. (November, January 27). Visual semantic reasoning for image-text matching. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Williams, A., Nangia, N., and Bowman, S.R. (2017). A broad-coverage challenge corpus for sentence understanding through inference. arXiv.","DOI":"10.18653\/v1\/N18-1101"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"2146","DOI":"10.1109\/TASLP.2020.3008390","article-title":"Sbert-wk: A sentence embedding method by dissecting bert-based word models","volume":"28","author":"Wang","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_42","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"La, T.V., Tran, Q.T., Tran, T.P., Tran, A.D., Dang-Nguyen, D.T., and Dao, M.S. (2022, January 27\u201330). Multimodal Cheapfakes Detection by Utilizing Image Captioning for Global Context. Proceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval, Newark, NJ, USA.","DOI":"10.1145\/3512731.3534210"},{"key":"ref_44","unstructured":"Akgul, T., Civelek, T.E., Ugur, D., and Begen, A.C. (October, January 28). COSMOS on Steroids: A Cheap Detector for Cheapfakes. Proceedings of the 12th ACM Multimedia Systems Conference, Istanbul, Turkey."},{"key":"ref_45","unstructured":"Boididou, C., Andreadou, K., Papadopoulos, S., Dang-Nguyen, D.T., Boato, G., Riegler, M., and Kompatsiaris, Y. (2015). Verifying multimedia use at mediaeval 2015. MediaEval, 3."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/11\/423\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:13:37Z","timestamp":1760145217000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/11\/423"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,10]]},"references-count":45,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["a15110423"],"URL":"https:\/\/doi.org\/10.3390\/a15110423","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,10]]}}}