{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:28:02Z","timestamp":1778081282464,"version":"3.51.4"},"reference-count":37,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2025,1,17]],"date-time":"2025-01-17T00:00:00Z","timestamp":1737072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Yunnan Provincial Science and Technology Major Project","award":["202202AE090008"],"award-info":[{"award-number":["202202AE090008"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Systems"],"abstract":"<jats:p>Recent advances in graph neural networks (GNNs) have enhanced multimodal recommendation systems\u2019 ability to process complex user\u2013item interactions. However, current approaches face two key limitations: they rely on static similarity metrics for product relationship graphs and they struggle to effectively fuse information across modalities. We propose MR-CSAF, a novel multimodal recommendation algorithm using cross-self-attention fusion. Building on FREEDOM, our approach introduces an adaptive modality selector that dynamically weights each modality\u2019s contribution to product similarity, enabling more accurate product relationship graphs and optimized modality representations. We employ a cross-self-attention mechanism to facilitate both inter- and intra-modal information transfer, while using graph convolution to incorporate updated features into item and product modal representations. Experimental results on three public datasets demonstrate MR-CSAF outperforms eight baseline methods, validating its effectiveness in providing personalized recommendations, advancing the field of personalized recommendation in complex multimodal environments.<\/jats:p>","DOI":"10.3390\/systems13010057","type":"journal-article","created":{"date-parts":[[2025,1,17]],"date-time":"2025-01-17T11:24:56Z","timestamp":1737113096000},"page":"57","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Multimodal Recommendation System Based on Cross Self-Attention Fusion"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-5051-0947","authenticated-orcid":false,"given":"Peishan","family":"Li","sequence":"first","affiliation":[{"name":"College of Big Data, Yunnan Agricultural University, Kunming 650201, China"},{"name":"Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming 650201, China"},{"name":"Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming 650201, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2091-9797","authenticated-orcid":false,"given":"Weixiao","family":"Zhan","sequence":"additional","affiliation":[{"name":"College of Computer Science and Engineering, University of California, San Diego, CA 92093, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1097-619X","authenticated-orcid":false,"given":"Lutao","family":"Gao","sequence":"additional","affiliation":[{"name":"College of Big Data, Yunnan Agricultural University, Kunming 650201, China"},{"name":"Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming 650201, China"},{"name":"Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming 650201, China"}]},{"given":"Shuran","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Big Data, Yunnan Agricultural University, Kunming 650201, China"},{"name":"Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming 650201, China"},{"name":"Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming 650201, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-2058-0895","authenticated-orcid":false,"given":"Linnan","family":"Yang","sequence":"additional","affiliation":[{"name":"College of Big Data, Yunnan Agricultural University, Kunming 650201, China"},{"name":"Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming 650201, China"},{"name":"Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming 650201, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,1,17]]},"reference":[{"key":"ref_1","first-page":"142","article-title":"Modeling instant user intent and content-level transition for sequential fashion recommendation","volume":"24","author":"Ding","year":"2008","journal-title":"IEEE Trans. Multimed."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1854","DOI":"10.1109\/TKDE.2019.2913394","article-title":"A hierarchical attention model for social contextual image recommendation","volume":"32","author":"Wu","year":"2019","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Yan, C., and Liu, L. (2023). Recommendation Method Based on Heterogeneous Information Network and Multiple Trust Relationship. Systems, 11.","DOI":"10.3390\/systems11040169"},{"key":"ref_4","first-page":"1","article-title":"Kr-gcn: Knowledge-aware reasoning with graph convolution network for explainable recommendation","volume":"41","author":"Ma","year":"2023","journal-title":"ACM Trans. Inf. Syst."},{"key":"ref_5","unstructured":"Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2012). BPR: Bayesian personalized ranking from implicit feedback. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"He, X., Deng, K., Wang, X., Li, Y., Zhang, Y., and Wang, M. (2020, January 25\u201330). LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. Proceedings of the SIGIR, Virtual.","DOI":"10.1145\/3397271.3401063"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhou, X., Lin, D., Liu, Y., and Miao, C. (2023, January 3\u20137). Layer-refined graph convolutional networks for recommendation. Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA.","DOI":"10.1109\/ICDE55515.2023.00100"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"101989","DOI":"10.1016\/j.inffus.2023.101989","article-title":"Prompt-based and weak-modality enhanced multimodal recommendation","volume":"101","author":"Dong","year":"2024","journal-title":"Inf. Fusion"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"103003","DOI":"10.1016\/j.jretconser.2022.103003","article-title":"Economic corollaries of personalized recommendations","volume":"68","author":"Molaie","year":"2022","journal-title":"J. Retail. Consum. Serv."},{"key":"ref_10","unstructured":"Zhou, X., and Shen, Z. (November, January 29). A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada."},{"key":"ref_11","unstructured":"Zhou, H., Zhou, X., Zeng, Z., Zhang, L., and Shen, Z. (2023). A comprehensive survey on multimodal recommender systems: Taxonomy, evaluation, and future directions. arXiv."},{"key":"ref_12","first-page":"1","article-title":"Multimodal recommender systems: A survey","volume":"57","author":"Liu","year":"2024","journal-title":"ACM Comput. Surv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"424","DOI":"10.1016\/j.inffus.2022.09.025","article-title":"Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions","volume":"91","author":"Gandhi","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Gadzicki, K., Khamsehashari, R., and Zetzsche, C. (2020, January 6\u20139). Early vs late fusion in multimodal convolutional neural networks. Proceedings of the 2020 IEEE 23rd International Conference on Information Fusion (FUSION), Rustenburg, South Africa.","DOI":"10.23919\/FUSION45008.2020.9190246"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, Y., Xu, X., Yu, W., Xu, R., Cao, Z., and Shen, H.T. (2021, January 5\u20139). Combine early and late fusion together: A hybrid fusion framework for image-text matching. Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China.","DOI":"10.1109\/ICME51207.2021.9428201"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Li, K., Xu, L., Zhu, C., and Zhang, K. (2024). A Multimodal Graph Recommendation Method Based on Cross-Attention Fusion. Mathematics, 12.","DOI":"10.3390\/math12152353"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"116036","DOI":"10.1016\/j.eswa.2021.116036","article-title":"Attention-based dynamic user modeling and deep collaborative filtering recommendation","volume":"188","author":"Wang","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"102277","DOI":"10.1016\/j.ipm.2020.102277","article-title":"Mgat: Multimodal graph attention network for recommendation","volume":"57","author":"Tao","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"110518","DOI":"10.1016\/j.asoc.2023.110518","article-title":"Collaborative recommendation model based on multi-modal multi-view attention network: Movie and literature cases","volume":"144","author":"Hu","year":"2023","journal-title":"Appl. Soft Comput."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"19809","DOI":"10.1109\/ACCESS.2023.3248618","article-title":"Optimal Recommendation Models Based on Knowledge Representation Learning and Graph Attention Networks","volume":"11","author":"He","year":"2023","journal-title":"IEEE Access"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"7149","DOI":"10.1109\/TMM.2022.3217449","article-title":"Disentangled multimodal representation learning for recommendation","volume":"25","author":"Liu","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, F., Cheng, Z., Sun, C., Wang, Y., Nie, L., and Kankanhalli, M. (2019, January 21\u201325). User diverse preference modeling by multimodal attentive metric learning. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France.","DOI":"10.1145\/3343031.3350953"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wu, C., Wu, F., Qi, T., Zhang, C., Huang, Y., and Xu, T. (2022, January 11\u201315). Mm-rec: Visiolinguistic model empowered multimodal news recommendation. Proceedings of the 45th international ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain.","DOI":"10.1145\/3477495.3531896"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Xun, J., Zhang, S., Zhao, Z., Zhu, J., Zhang, Q., Li, J., He, X., He, X., Chua, T.S., and Wu, F. (2021, January 20\u201324). Why do we click: Visual impression-aware news recommendation. Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event.","DOI":"10.1145\/3474085.3475514"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wei, Y., Wang, X., Nie, L., He, X., Hong, R., and Chua, T.S. (2019, January 21\u201325). MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France.","DOI":"10.1145\/3343031.3351034"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wei, Y., Wang, X., Nie, L., He, X., and Chua, T.S. (2020, January 12\u201316). Graph-refined convolutional network for multimedia recommendation with implicit feedback. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413556"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1074","DOI":"10.1109\/TMM.2021.3138298","article-title":"Dualgnn: Dual graph neural network for multimedia recommendation","volume":"25","author":"Wang","year":"2021","journal-title":"IEEE Trans. Multimed."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Chen, F., Wang, J., Wei, Y., Zheng, H.T., and Shao, J. (2022, January 10\u201314). Breaking isolation: Multimodal graph fusion for multimedia recommendation by edge-wise modulation. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal.","DOI":"10.1145\/3503161.3548399"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Mu, Z., Zhuang, Y., Tan, J., Xiao, J., and Tang, S. (2022, January 10\u201314). Learning hybrid behavior patterns for multimedia recommendation. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal.","DOI":"10.1145\/3503161.3548119"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, J., Zhu, Y., Liu, Q., Wu, S., Wang, S., and Wang, L. (2021, January 20\u201324). Mining latent structures for multimedia recommendation. Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event.","DOI":"10.1145\/3474085.3475259"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3573010","article-title":"Learning the user\u2019s deeper preferences for multi-modal recommendation systems","volume":"19","author":"Lei","year":"2023","journal-title":"ACM Trans. Multimed. Comput. Commun. Appl."},{"key":"ref_32","unstructured":"Arora, S., Liang, Y., and Ma, T. (2017, January 24\u201326). A simple but tough-to-beat baseline for sentence embeddings. Proceedings of the International Conference on Learning Representations, Toulon, France."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 26\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1109\/MIC.2003.1167344","article-title":"Amazon. com recommendations: Item-to-item collaborative filtering","volume":"7","author":"Linden","year":"2003","journal-title":"IEEE Internet Comput."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"He, R., and McAuley, J.J. (2016, January 12\u201317). VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. Proceedings of the AAAI, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.9973"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"5107","DOI":"10.1109\/TMM.2022.3187556","article-title":"Self-Supervised Learning for Multimedia Recommendation","volume":"25","author":"Tao","year":"2023","journal-title":"IEEE Trans. Multim."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhou, X., Zhou, H., Liu, Y., Zeng, Z., Miao, C., Wang, P., You, Y., and Jiang, F. (2023, January 14\u201320). Bootstrap Latent Representations for Multi-modal Recommendation. Proceedings of the WWW, Melbourne, Australia.","DOI":"10.1145\/3543507.3583251"}],"container-title":["Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-8954\/13\/1\/57\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T10:31:09Z","timestamp":1759919469000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-8954\/13\/1\/57"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,17]]},"references-count":37,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,1]]}},"alternative-id":["systems13010057"],"URL":"https:\/\/doi.org\/10.3390\/systems13010057","relation":{},"ISSN":["2079-8954"],"issn-type":[{"value":"2079-8954","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,17]]}}}