{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T15:22:08Z","timestamp":1777735328209,"version":"3.51.4"},"reference-count":85,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,10,10]],"date-time":"2024-10-10T00:00:00Z","timestamp":1728518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,2,28]]},"abstract":"<jats:p>\n            The recommender system (RS) has been an integral toolkit of online services. They are equipped with various deep learning techniques to model user preference based on identifier and attribute information. With the emergence of multimedia services, such as short videos, news, and and so on, understanding these contents while recommending becomes critical. Besides, multimodal features are also helpful in alleviating the problem of data sparsity in RS. Thus,\n            <jats:bold>M<\/jats:bold>\n            ultimodal\n            <jats:bold>R<\/jats:bold>\n            ecommender\n            <jats:bold>S<\/jats:bold>\n            ystem (MRS) has attracted much attention from both academia and industry recently. In this article, we will give a comprehensive survey of the MRS models, mainly from technical views. First, we conclude the general procedures and major challenges for MRS. Then, we introduce the existing MRS models according to four categories, i.e.,\n            <jats:bold>Modality Encoder<\/jats:bold>\n            ,\n            <jats:bold>Feature Interaction<\/jats:bold>\n            ,\n            <jats:bold>Feature Enhancement<\/jats:bold>\n            , and\n            <jats:bold>Model Optimization<\/jats:bold>\n            . Besides, to make it convenient for those who want to research this field, we also summarize the dataset and code resources. Finally, we discuss some promising future directions of MRS and conclude this article. To access more details of the surveyed articles, such as implementation code, we open source a repository.\n            <jats:xref ref-type=\"fn\">\n              <jats:sup>1<\/jats:sup>\n            <\/jats:xref>\n          <\/jats:p>","DOI":"10.1145\/3695461","type":"journal-article","created":{"date-parts":[[2024,9,10]],"date-time":"2024-09-10T14:33:41Z","timestamp":1725978821000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":55,"title":["Multimodal Recommender Systems: A Survey"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0751-2602","authenticated-orcid":false,"given":"Qidong","family":"Liu","sequence":"first","affiliation":[{"name":"Xi'an Jiaotong University, Xi'an, China and City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-3857-9069","authenticated-orcid":false,"given":"Jiaxi","family":"Hu","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8276-7920","authenticated-orcid":false,"given":"Yutian","family":"Xiao","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2926-4416","authenticated-orcid":false,"given":"Xiangyu","family":"Zhao","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4470-5972","authenticated-orcid":false,"given":"Jingtong","family":"Gao","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5976-0707","authenticated-orcid":false,"given":"Wanyu","family":"Wang","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3370-471X","authenticated-orcid":false,"given":"Qing","family":"Li","sequence":"additional","affiliation":[{"name":"The Hong Kong Polytechnic University, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7125-3898","authenticated-orcid":false,"given":"Jiliang","family":"Tang","sequence":"additional","affiliation":[{"name":"Michigan State University, East Lansing, United States"}]}],"member":"320","published-online":{"date-parts":[[2024,10,10]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3614978"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548273"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401198"},{"key":"e_1_3_2_5_2","unstructured":"Hong Chen Yudong Chen Xin Wang Ruobing Xie Rui Wang Feng Xia and Wenwu Zhu. 2021. Curriculum disentangled recommendation with noisy multi-feedback. Advances in Neural Information Processing Systems 34 (2021) 26924\u201326936."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330652"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331254"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","unstructured":"Xi Chen Yangsiyi Lu Yuehai Wang and Jianyi Yang. 2021. CMBF: Cross-modal-based fusion recommendation algorithm. Sensors 21 16 (2021) 5275.","DOI":"10.3390\/s21165275"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","unstructured":"Xiang Chen Ningyu Zhang Lei Li Shumin Deng Chuanqi Tan Changliang Xu Fei Huang Luo Si and Huajun Chen. 2022. Hybrid transformer with multi-level fusion for multimodal knowledge graph completion. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 904\u2013915.","DOI":"10.1145\/3477495.3531992"},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","unstructured":"Yashar Deldjoo Fatemeh Nazary Arnau Ramisa Julian Mcauley Giovanni Pellegrini Alejandro Bellogin and Tommaso Di Noia. 2023. A review of modern fashion recommender systems. ACM Computing Surveys (CSUR) 56 4 (2023) 1\u201337.","DOI":"10.1145\/3624733"},{"key":"e_1_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Yashar Deldjoo Markus Schedl Paolo Cremonesi and Gabriella Pasi. 2020. Recommender systems leveraging multimedia content. ACM Computing Surveys (CSUR) 53 5 (2020) 1\u201338.","DOI":"10.1145\/3407190"},{"key":"e_1_3_2_12_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Toutanova Kristina. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT Vol. 1. 2."},{"key":"e_1_3_2_13_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2020. An image is worth \\(16\\times 16\\) words: Transformers for image recognition at scale. In International Conference on Learning Representations."},{"key":"e_1_3_2_14_2","unstructured":"Songhao Han Wei Huang and Xiaotian Luan. 2022. VLSNR: Vision-linguistics coordination time sequence-aware news recommendation. arXiv:2210.02946. Retrieved from https:\/\/arxiv.org\/abs\/2210.02946"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3512079"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","unstructured":"Min Hou Le Wu Enhong Chen Zhi Li Vincent W. Zheng and Qi Liu. 2019. Explainable fashion recommendation: A semantic attribute region guided approach. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 4681\u20134688.","DOI":"10.24963\/ijcai.2019\/650"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3614775"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Umair Javed Kamran Shaukat Ibrahim A. Hameed Farhat Iqbal Talha Mahboob Alam and Suhuai Luo. 2021. A review of content-based and context-based recommendation systems. International Journal of Emerging Technologies in Learning (iJET) 16 3 (2021) 274\u2013306.","DOI":"10.3991\/ijet.v16i03.18851"},{"key":"e_1_3_2_20_2","unstructured":"Xiangen Jia Yihong Dong Feng Zhu Yu Xin and Jiangbo Qian. 2022. Preference-corrected multimodal graph convolutional recommendation network. Applied Intelligence (2022) 1\u201316."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557387"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","unstructured":"Yehuda Koren Robert Bell and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42 8 (2009) 30\u201337.","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401078"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475431"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-acl.29"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3539618.3591739"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583554"},{"key":"e_1_3_2_29_2","unstructured":"Jianxun Lian Iyad Batal Zheng Liu Akshay Soni Eun Yong Kang Yajun Wang and Xing Xie. 2021. Multi-interest-aware user modeling for large-scale sequential recommendations. arXiv:2102.09211. Retrieved from https:\/\/arxiv.org\/abs\/2102.09211"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583378"},{"key":"e_1_3_2_31_2","doi-asserted-by":"crossref","unstructured":"Yujie Lin Pengjie Ren Zhumin Chen Zhaochun Ren Jun Ma and Maarten De Rijke. 2019. Explainable outfit recommendation with joint outfit matching and comment generation. IEEE Transactions on Knowledge and Data Engineering 32 8 (2019) 1502\u20131516.","DOI":"10.1109\/TKDE.2019.2906190"},{"key":"e_1_3_2_32_2","unstructured":"Bo Liu. 2022. Implicit semantic-based personalized micro-videos recommendation. arXiv:2205.03297. Retrieved from https:\/\/arxiv.org\/abs\/2205.03297"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i5.16549"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611886"},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Fan Liu Huilin Chen Zhiyong Cheng Anan Liu Liqiang Nie and Mohan Kankanhalli. 2022. Disentangled multimodal representation learning for recommendation. IEEE Transactions on Multimedia 25 (2022) 7149\u20137159.","DOI":"10.1109\/TMM.2022.3217449"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CCET55412.2022.9906399"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331371"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","unstructured":"Kang Liu Feng Xue Dan Guo Le Wu Shujie Li and Richang Hong. 2023. MEGCF: Multimodal entity graph collaborative filtering for personalized recommendation. ACM Transactions on Information Systems 41 2 (2023) 1\u201327.","DOI":"10.1145\/3544106"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313513"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475709"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3512527.3531378"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Junmei Lv Bin Song Jie Guo Xiaojiang Du and Mohsen Guizani. 2019. Interest-related item similarity model based on multimodal data for top-N recommendation. IEEE Access 7 (2019) 12809\u201312821.","DOI":"10.1109\/ACCESS.2019.2893355"},{"key":"e_1_3_2_43_2","unstructured":"Jianxin Ma Chang Zhou Peng Cui Hongxia Yang and Wenwu Zhu. 2019. Learning disentangled representations for recommendation. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","unstructured":"Yunshan Ma Yingzhi He An Zhang Xiang Wang and Tat-Seng Chua. 2022. Crosscbr: Cross-view contrastive learning for bundle recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1233\u20131241.","DOI":"10.1145\/3534678.3539229"},{"key":"e_1_3_2_45_2","unstructured":"Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https:\/\/arxiv.org\/abs\/1301.3781"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548119"},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","unstructured":"Juan Ni Zhenhua Huang Yang Hu and Chen Lin. 2022. A two-stage embedding model for recommendation with multimodal auxiliary information. Information Sciences 582 (2022) 22\u201337.","DOI":"10.1016\/j.ins.2021.09.006"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557101"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_50_2","unstructured":"Alec Radford Jong Wook Kim Chris Hallacy Aditya Ramesh Gabriel Goh Sandhini Agarwal Girish Sastry Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR 8748\u20138763."},{"key":"e_1_3_2_51_2","unstructured":"Aghiles Salah Quoc-Tuan Truong and Hady W. Lauw. 2020. Cornac: A comparative framework for multimodal recommender systems. Journal of Machine Learning Research 21 95 (2020) 1\u20135."},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3612337"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3411947"},{"key":"e_1_3_2_54_2","doi-asserted-by":"crossref","unstructured":"Zhulin Tao Yinwei Wei Xiang Wang Xiangnan He Xianglin Huang and Tat-Seng Chua. 2020. MGAT: Multimodal graph attention network for recommendation. Information Processing & Management 57 5 (2020) 102277.","DOI":"10.1016\/j.ipm.2020.102277"},{"key":"e_1_3_2_55_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017)."},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313411"},{"key":"e_1_3_2_57_2","doi-asserted-by":"crossref","unstructured":"Jie Wang Fajie Yuan Mingyue Cheng Joemon M. Jose Chenyun Yu Beibei Kong Zhijin Wang Bo Hu and Zang Li. 2024. Transrec: Learning transferable recommendation from mixture-of-modality feedback. In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data. Springer 193\u2013208.","DOI":"10.1007\/978-981-97-7235-3_13"},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","unstructured":"Maolin Wang Yao Zhao Jiajia Liu Jingdong Chen Chenyi Zhuang Jinjie Gu Ruocheng Guo and Xiangyu Zhao. 2024. Large multimodal model compression via iterative efficient pruning and distillation. In Companion Proceedings of the ACM Web Conference 2024 235\u2013244.","DOI":"10.1145\/3589335.3648321"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531867"},{"key":"e_1_3_2_60_2","article-title":"Dualgnn: Dual graph neural network for multimedia recommendation","author":"Wang Qifan","year":"2021","unstructured":"Qifan Wang, Yinwei Wei, Jianhua Yin, Jianlong Wu, Xuemeng Song, and Liqiang Nie. 2021. Dualgnn: Dual graph neural network for multimedia recommendation. IEEE Transactions on Multimedia (2021).","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_61_2","doi-asserted-by":"crossref","unstructured":"Xin Wang Hong Chen Yuwei Zhou Jianxin Ma and Wenwu Zhu. 2023. Disentangled representation learning for recommendation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 1 (2023) 408\u2013424.","DOI":"10.1109\/TPAMI.2022.3153112"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME51207.2021.9428193"},{"key":"e_1_3_2_63_2","doi-asserted-by":"crossref","unstructured":"Yuequn Wang Liyan Dong Hao Zhang Xintao Ma Yongli Li and Minghui Sun. 2020. An enhanced multi-modal recommendation based on alternate training with knowledge graph representation. IEEE Access 8 (2020) 213012\u2013213026.","DOI":"10.1109\/ACCESS.2020.3039388"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583206"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589334.3645359"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351034"},{"key":"e_1_3_2_67_2","doi-asserted-by":"crossref","unstructured":"Chuhan Wu Fangzhao Wu Tao Qi Chao Zhang Yongfeng Huang and Tong Xu. 2022. Mm-rec: Visiolinguistic model empowered multimodal news recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2560\u20132564.","DOI":"10.1145\/3477495.3531896"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412092"},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","unstructured":"Cai Xu Ziyu Guan Wei Zhao Quanzhou Wu Meng Yan Long Chen and Qiguang Miao. 2020. Recommendation by users\u2019 multimodal preferences for smart city applications. IEEE Transactions on Industrial Informatics 17 6 (2020) 4197\u20134205.","DOI":"10.1109\/TII.2020.3008923"},{"key":"e_1_3_2_70_2","doi-asserted-by":"crossref","unstructured":"Jing Yi and Zhenzhong Chen. 2021. Multi-modal variational graph auto-encoder for recommendation systems. IEEE Transactions on Multimedia 24 (2021) 1067\u20131079.","DOI":"10.1109\/TMM.2021.3111487"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3532027"},{"key":"e_1_3_2_72_2","doi-asserted-by":"crossref","unstructured":"Yinwei Wei Xiang Wang Liqiang Nie Xiangnan He and Tat-Seng Chua. 2020. Graph-refined convolutional network for multimedia recommendation with implicit feedback. In Proceedings of the 28th ACM International Conference on Multimedia. 3541\u20133549.","DOI":"10.1145\/3394171.3413556"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3613915"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475259"},{"key":"e_1_3_2_75_2","doi-asserted-by":"crossref","unstructured":"Jinghao Zhang Yanqiao Zhu Qiang Liu Mengqi Zhang Shu Wu and Liang Wang. 2022. Latent structure mining with contrastive modality fusion for multimedia recommendation. IEEE Transactions on Knowledge and Data Engineering 35 9 (2022) 9154\u20139167.","DOI":"10.1109\/TKDE.2022.3221949"},{"key":"e_1_3_2_76_2","doi-asserted-by":"crossref","unstructured":"Lingzi Zhang Xin Zhou and Zhiqi Shen. 2023. Multimodal pre-training framework for sequential recommendation via contrastive learning. arXiv:2303.11879. Retrieved from https:\/\/arxiv.org\/abs\/2303.11879","DOI":"10.1145\/3682075"},{"key":"e_1_3_2_77_2","doi-asserted-by":"crossref","unstructured":"Xiaoyan Zhang Haihua Luo Bowei Chen and Guibing Guo. 2020. Multi-view visual Bayesian personalized ranking for restaurant recommendation. Applied Intelligence 50 9 (2020) 2901\u20132915.","DOI":"10.1007\/s10489-020-01703-6"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3412239"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482151"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589334.3645553"},{"key":"e_1_3_2_81_2","unstructured":"Hongyu Zhou Xin Zhou Zhiwei Zeng Lingzi Zhang and Zhiqi Shen. 2023. A comprehensive survey on multimodal recommender systems: Taxonomy evaluation and future directions. arXiv:2302.04473. Retrieved from https:\/\/arxiv.org\/abs\/2302.04473"},{"key":"e_1_3_2_82_2","doi-asserted-by":"crossref","unstructured":"Xin Zhou and Zhiqi Shen. 2023. A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. In Proceedings of the 31st ACM International Conference on Multimedia. 935\u2013943.","DOI":"10.1145\/3581783.3611943"},{"key":"e_1_3_2_83_2","doi-asserted-by":"crossref","unstructured":"Xin Zhou. 2023. Mmrec: Simplifying multimodal recommendation. In Proceedings of the 5th ACM International Conference on Multimedia in Asia Workshops. 1\u20132.","DOI":"10.1145\/3611380.3628561"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583251"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1145\/3539618.3591950"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539101"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3695461","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3695461","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:58:11Z","timestamp":1750294691000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3695461"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,10]]},"references-count":85,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2,28]]}},"alternative-id":["10.1145\/3695461"],"URL":"https:\/\/doi.org\/10.1145\/3695461","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,10]]},"assertion":[{"value":"2023-06-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-22","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}