{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T18:08:47Z","timestamp":1775066927291,"version":"3.50.1"},"reference-count":50,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T00:00:00Z","timestamp":1731628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Comput. Sci."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>The proliferation of social media platforms has facilitated the spread of fake news, posing significant risks to public perception and societal stability. Existing methods for multimodal fake news detection have made important progress in combining textual and visual information but still face challenges in effectively aligning and merging these different types of data. These challenges often result in incomplete or inaccurate feature representations, thereby limiting overall performance.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>To address these limitations, we propose a novel framework named MCOT (<jats:bold>M<\/jats:bold>ultimodal Fake News Detection with <jats:bold>C<\/jats:bold>ontrastive Learning and <jats:bold>O<\/jats:bold>ptimal <jats:bold>T<\/jats:bold>ransport). MCOT integrates textual and visual information through three key components: cross-modal attention mechanism, contrastive learning, and optimal transport. Specifically, we first use cross-modal attention mechanism to enhance the interaction between text and image features. Then, we employ contrastive learning to align related embeddings while distinguishing unrelated pairs, and we apply optimal transport to refine the alignment of feature distributions across modalities.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>This integrated approach results in more precise and robust feature representations, thus enhancing detection accuracy. Experimental results on two public datasets demonstrate that the proposed MCOT outperforms state-of-the-art methods.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>Our future work will focus on improving its generalization and expanding its capabilities to additional modalities.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fcomp.2024.1473457","type":"journal-article","created":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T06:14:05Z","timestamp":1731651245000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Multimodal Fake News Detection with Contrastive Learning and Optimal Transport"],"prefix":"10.3389","volume":"6","author":[{"given":"Xiaorong","family":"Shen","sequence":"first","affiliation":[]},{"given":"Maowei","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Zheng","family":"Hu","sequence":"additional","affiliation":[]},{"given":"Shimin","family":"Cai","sequence":"additional","affiliation":[]},{"given":"Tao","family":"Zhou","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,11,15]]},"reference":[{"key":"B1","first-page":"214","article-title":"\u201cWasserstein generative adversarial networks,\u201d","volume-title":"International Conference on Machine Learning","author":"Arjovsky","year":"2017"},{"key":"B2","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1145\/1963405.1963500","article-title":"\u201cInformation credibility on twitter,\u201d","volume-title":"Proceedings of the 20th International Conference on World Wide Web","author":"Castillo","year":"2011"},{"key":"B3","first-page":"1597","article-title":"\u201cA simple framework for contrastive learning of visual representations,\u201d","volume-title":"International Conference on Machine Learning","author":"Chen","year":"2020"},{"key":"B4","first-page":"15750","article-title":"\u201cExploring simple siamese representation learning,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Chen","year":"2021"},{"key":"B5","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1145\/3485447.3511968","article-title":"\u201cCross-modal ambiguity learning for multimodal fake news detection,\u201d","volume-title":"Proceedings of the ACM Web Conference 2022","author":"Chen","year":"2022"},{"key":"B6","doi-asserted-by":"crossref","first-page":"1121","DOI":"10.1145\/3357384.3357950","article-title":"\u201cAttention-residual network with cnn for rumor detection,\u201d","volume-title":"Proceedings of the 28th ACM International Conference on Information and Knowledge Management","author":"Chen","year":"2019"},{"key":"B7","article-title":"\u201cJoint distribution optimal transportation for domain adaptation,\u201d","volume-title":"Advances in Neural Information Processing Systems 30","author":"Courty","year":"2017"},{"key":"B8","article-title":"\u201cSinkhorn distances: lightspeed computation of optimal transport,\u201d","volume-title":"Advances in Neural Information Processing Systems 26","author":"Cuturi","year":"2013"},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1810.04805","article-title":"Bert: Pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018","journal-title":"arXiv"},{"key":"B10","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2010.11929","article-title":"An image is worth 16x16 words: transformers for image recognition at scale","author":"Dosovitskiy","year":"2020","journal-title":"arXiv"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.552","article-title":"Simcse: simple contrastive learning of sentence embeddings","author":"Gao","year":"2021","journal-title":"arXiv"},{"key":"B12","doi-asserted-by":"publisher","first-page":"18039","DOI":"10.1609\/aaai.v38i16.29760","article-title":"Customizing language model responses with contrastive in-context learning","volume":"38","author":"Gao","year":"2024","journal-title":"Proc. AAAI Conf. Artif. Intellig"},{"key":"B13","doi-asserted-by":"publisher","first-page":"1421","DOI":"10.3390\/e25101421","article-title":"Multi-modal representation via contrastive learning with attention bottleneck fusion and attentive statistics features","volume":"25","author":"Guo","year":"2023","journal-title":"Entropy"},{"key":"B14","doi-asserted-by":"publisher","first-page":"1159063","DOI":"10.3389\/fcomp.2023.1159063","article-title":"A two-branch multimodal fake news detection model based on multimodal bilinear pooling and attention mechanism","volume":"5","author":"Guo","year":"2023","journal-title":"Front. Comp. Sci"},{"key":"B15","first-page":"9729","article-title":"\u201cMomentum contrast for unsupervised visual representation learning,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"He","year":"2020"},{"key":"B16","doi-asserted-by":"publisher","first-page":"110125","DOI":"10.1016\/j.asoc.2023.110125","article-title":"Multimodal fake news detection through data augmentation-based contrastive learning","volume":"136","author":"Hua","year":"2023","journal-title":"Appl. Soft Comp"},{"key":"B17","first-page":"4904","article-title":"\u201cScaling up visual and vision-language representation learning with noisy text supervision,\u201d","volume-title":"International Conference on Machine Learning","author":"Jia","year":"2021"},{"key":"B18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TGRS.2023.3349076","article-title":"Graphgst: Graph generative structure-aware transformer for hyperspectral image classification","volume":"62","author":"Jiang","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens"},{"key":"B19","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1145\/3123266.3123454","article-title":"\u201cMultimodal fusion with recurrent neural networks for rumor detection on microblogs,\u201d","volume-title":"Proceedings of the 25th ACM international conference on Multimedia","author":"Jin","year":"2017"},{"key":"B20","doi-asserted-by":"publisher","first-page":"598","DOI":"10.1109\/TMM.2016.2617078","article-title":"Novel visual and statistical image features for microblogs news verification","volume":"19","author":"Jin","year":"2016","journal-title":"IEEE Trans. Multimed"},{"key":"B21","doi-asserted-by":"crossref","first-page":"2915","DOI":"10.1145\/3308558.3313552","article-title":"\u201cMVAE: multimodal variational autoencoder for fake news detection,\u201d","volume-title":"The World Wide Web Conference","author":"Khattar","year":"2019"},{"key":"B22","doi-asserted-by":"publisher","first-page":"18426","DOI":"10.1609\/aaai.v38i16.29803","article-title":"Frequency spectrum is more effective for multimodal representation and fusion: a multimodal spectrum rumor detector","volume":"38","author":"Lao","year":"2024","journal-title":"Proc. AAAI Conf. Artif. Intellig"},{"key":"B23","first-page":"9781","article-title":"\u201cInterpretable multimodal misinformation detection with logic reasoning,\u201d","author":"Liu","year":"","journal-title":"Findings of the Association for Computational Linguistics: ACL 2023"},{"key":"B24","doi-asserted-by":"publisher","first-page":"793","DOI":"10.1109\/TIFS.2023.3326368","article-title":"Robust domain misinformation detection via multi-modal feature alignment","volume":"19","author":"Liu","year":"","journal-title":"IEEE Trans. Inform. Forens. Secur"},{"key":"B25","doi-asserted-by":"publisher","first-page":"13918","DOI":"10.1609\/aaai.v38i12.29299","article-title":"Timesurl: Self-supervised contrastive learning for universal time series representation learning","volume":"38","author":"Liu","year":"2024","journal-title":"Proc. AAAI Conf. Artif. Intellig"},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1708.07104","article-title":"Automatic detection of fake news","author":"P\u00e9rez-Rosas","year":"2017","journal-title":"arXiv"},{"key":"B27","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1561\/9781680835519","article-title":"Computational optimal transport: with applications to data science","volume":"11","author":"Peyr\u00e9","year":"2019","journal-title":"Found. Trends Mach. Learn"},{"key":"B28","first-page":"3930","article-title":"\u201cMultimodal learning using optimal transport for sarcasm and humor detection,\u201d","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Pramanick","year":"2022"},{"key":"B29","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1109\/ICDM.2019.00062","article-title":"\u201cExploiting multi-domain visual information for fake news detection,\u201d","volume-title":"2019 IEEE International Conference on Data Mining (ICDM)","author":"Qi","year":"2019"},{"key":"B30","first-page":"8748","article-title":"\u201cLearning transferable visual models from natural language supervision,\u201d","volume-title":"International Conference on Machine Learning","author":"Radford","year":"2021"},{"key":"B31","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1145\/3137597.3137600","article-title":"Fake news detection on social media: a data mining perspective","volume":"19","author":"Shu","year":"2017","journal-title":"ACM SIGKDD Explorat. Newslett"},{"key":"B32","doi-asserted-by":"publisher","first-page":"13915","DOI":"10.1609\/aaai.v34i10.7230","article-title":"Spotfake+: a multimodal framework for fake news detection via transfer learning (student abstract)","volume":"34","author":"Singhal","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intellig"},{"key":"B33","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1109\/BigMM.2019.00-44","article-title":"\u201cSpotfake: A multi-modal framework for fake news detection,\u201d","volume-title":"2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM)","author":"Singhal","year":"2019"},{"key":"B34","first-page":"11","article-title":"Visualizing data using T-SNE","volume":"9","author":"Van der Maaten","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"B35","article-title":"\u201cAttention is all you need,\u201d","author":"Vaswani","year":"2017","journal-title":"Advances in Neural Information Processing Systems 30"},{"key":"B36","doi-asserted-by":"crossref","first-page":"5696","DOI":"10.1145\/3581783.3613850","article-title":"\u201cCross-modal contrastive learning for multimodal fake news detection,\u201d","volume-title":"Proceedings of the 31st ACM International Conference on Multimedia","author":"Wang","year":"2023"},{"key":"B37","first-page":"849","article-title":"\u201cEANN: Event adversarial neural networks for multi-modal fake news detection,\u201d","author":"Wang","year":"2018","journal-title":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery"},{"key":"B38","doi-asserted-by":"crossref","first-page":"2560","DOI":"10.18653\/v1\/2021.findings-acl.226","article-title":"\u201cMultimodal fusion with co-attention networks for fake news detection,\u201d","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Wu","year":"2021"},{"key":"B39","doi-asserted-by":"crossref","first-page":"2805","DOI":"10.1145\/3583780.3614914","article-title":"\u201cHiPo: Detecting fake news via historical and multi-modal analyses of social media posts,\u201d","volume-title":"Proceedings of the 32nd ACM International Conference on Information and Knowledge Management","author":"Xiao","year":"2023"},{"key":"B40","first-page":"21241","article-title":"\u201cMultimodal optimal transport-based co-attention transformer with global structure consistency for survival prediction,\u201d","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Xu","year":"2023"},{"key":"B41","doi-asserted-by":"publisher","first-page":"102610","DOI":"10.1016\/j.ipm.2021.102610","article-title":"Detecting fake news by exploring the consistency of multimodal data","volume":"58","author":"Xue","year":"2021","journal-title":"Inform. Proc. Manage"},{"key":"B42","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.393","article-title":"Consert: a contrastive framework for self-supervised sentence representation transfer","author":"Yan","year":"2021","journal-title":"arXiv"},{"key":"B43","doi-asserted-by":"publisher","first-page":"5384","DOI":"10.1609\/aaai.v37i4.25670","article-title":"Bootstrapping multi-view representations for fake news detection","volume":"37","author":"Ying","year":"2023","journal-title":"Proc. AAAI conf. Artif. Intellig"},{"key":"B44","doi-asserted-by":"publisher","first-page":"3901","DOI":"10.24963\/ijcai.2017\/545","article-title":"A convolutional approach for misinformation identification","volume":"2017","author":"Yu","year":"2017","journal-title":"IJCAI"},{"key":"B45","first-page":"11782","article-title":"\u201cProduct1m: Towards weakly supervised instance-level product retrieval via cross-modal pretraining,\u201d","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Zhan","year":"2021"},{"key":"B46","doi-asserted-by":"publisher","first-page":"4884","DOI":"10.1609\/aaai.v37i4.25614","article-title":"TOT: topology-aware optimal transport for multimodal hate detection","volume":"37","author":"Zhang","year":"2023","journal-title":"Proc. AAAI Conf. Artif. Intellig"},{"key":"B47","first-page":"354","article-title":"\u201cSAFE: similarity-aware multi-modal fake news detection,\u201d","author":"Zhou","year":"2020","journal-title":"Advances in Knowledge Discovery and Data Mining"},{"key":"B48","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s40747-024-01473-5","article-title":"Multimodal fake news detection through intra-modality feature aggregation and inter-modality semantic fusion","volume":"2024","author":"Zhu","year":"","journal-title":"Comp. Intellig. Syst"},{"key":"B49","first-page":"568","article-title":"\u201cA general black-box adversarial attack on graph-based fake news detectors,\u201d","author":"Zhu","year":"","journal-title":"Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24"},{"key":"B50","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1007\/978-3-319-67217-5_8","article-title":"\u201cExploiting context for rumour detection in social media,\u201d","volume-title":"Social Informatics: 9th International Conference, SocInfo 2017","author":"Zubiaga","year":"2017"}],"container-title":["Frontiers in Computer Science"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1473457\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,18]],"date-time":"2024-11-18T12:17:59Z","timestamp":1731932279000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1473457\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,15]]},"references-count":50,"alternative-id":["10.3389\/fcomp.2024.1473457"],"URL":"https:\/\/doi.org\/10.3389\/fcomp.2024.1473457","relation":{},"ISSN":["2624-9898"],"issn-type":[{"value":"2624-9898","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,15]]},"article-number":"1473457"}}