{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T14:30:05Z","timestamp":1781879405854,"version":"3.54.5"},"reference-count":145,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2025,3,22]],"date-time":"2025-03-22T00:00:00Z","timestamp":1742601600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Recomm. Syst."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>Recommender systems (RSs) provide customers with a personalized navigation experience within the vast catalogs of products and services offered on popular online platforms. Despite the substantial success of traditional RSs, recommendation remains a highly challenging task, especially in specific scenarios and domains. For example, human affinity for items described through multimedia content (e.g., images, audio, and text), such as fashion products, movies, and music, is multi-faceted and primarily driven by their diverse characteristics. Therefore, by leveraging all available signals in such scenarios, multimodality enables us to tap into richer information sources and construct more refined user\/item profiles for recommendations. Despite the growing number of multimodal techniques proposed for multimedia recommendation, the existing literature lacks a shared and universal schema for modeling and solving the recommendation problem through the lens of multimodality. Given the recent advances in multimodal deep learning for other tasks and scenarios where precise theoretical and applicative procedures exist, we also consider it imperative to formalize a general multimodal schema for multimedia recommendation. In this work, we first provide a comprehensive literature review of multimodal approaches for multimedia recommendation from the last eight years. Second, we outline the theoretical foundations of a multimodal pipeline for multimedia recommendation by identifying and formally organizing recurring solutions\/patterns; at the same time, we demonstrate its rationale by conceptually applying it to selected state-of-the-art approaches in multimedia recommendation. Third, we conduct a benchmarking analysis of recent algorithms for multimedia recommendation within Elliot, a rigorous framework for evaluating recommender systems, where we re-implement such multimedia recommendation approaches. Finally, we highlight the significant unresolved challenges in multimodal deep learning for multimedia recommendation and suggest possible avenues for addressing them. The primary aim of this work is to provide guidelines for designing and implementing the next generation of multimodal approaches in multimedia recommendation.<\/jats:p>","DOI":"10.1145\/3662738","type":"journal-article","created":{"date-parts":[[2024,4,29]],"date-time":"2024-04-29T11:23:02Z","timestamp":1714389782000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Formalizing Multimedia Recommendation through Multimodal Deep Learning"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2228-0333","authenticated-orcid":false,"given":"Daniele","family":"Malitesta","sequence":"first","affiliation":[{"name":"CentraleSup\u00e9lec, Universit\u00e9 Paris-Saclay, Gif-sur-Yvette, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5448-9970","authenticated-orcid":false,"given":"Giandomenico","family":"Cornacchia","sequence":"additional","affiliation":[{"name":"IBM Research Europe - Ireland, Dublin, Ireland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5206-3909","authenticated-orcid":false,"given":"Claudio","family":"Pomo","sequence":"additional","affiliation":[{"name":"Dipartimento di Ingegneria Elettrica e dell?Informazione, Politecnico di Bari, Bari Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8429-3487","authenticated-orcid":false,"given":"Felice Antonio","family":"Merra","sequence":"additional","affiliation":[{"name":"Amazon Science, Berlin Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0939-5462","authenticated-orcid":false,"given":"Tommaso","family":"Di Noia","sequence":"additional","affiliation":[{"name":"Dipartimento di Ingegneria Elettrica e dell'Informazione, Politecnico di Bari, Bari Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5484-9945","authenticated-orcid":false,"given":"Eugenio","family":"Di Sciascio","sequence":"additional","affiliation":[{"name":"Dipartimento di Ingegneria Elettrica e dell'Informazione, Politecnico di Bari, Bari Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,3,22]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3109859.3109912"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463245"},{"key":"e_1_3_2_4_2","volume-title":"Proceedings of the DL4SR@CIKM.","author":"Anelli Vito Walter","year":"2022","unstructured":"Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, Antonio Ferrara, Daniele Malitesta, and Claudio Pomo. 2022. Reshaping graph recommendation with edge graph collaborative filtering and customer reviews. In Proceedings of the DL4SR@CIKM.CEUR-WS.org."},{"key":"e_1_3_2_5_2","volume-title":"Proceedings of the ICLR.","author":"Arora Sanjeev","year":"2017","unstructured":"Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of the ICLR. OpenReview.net."},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","unstructured":"Matteo Attimonelli Danilo Danese Daniele Malitesta Claudio Pomo Giuseppe Gassi and Tommaso Di Noia. 2024. Ducho 2.0: Towards a more up-to-date unified framework for the extraction of multimodal features in recommendation. CoRR abs\/2403.04503 (2024).","DOI":"10.1145\/3589335.3651440"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3418435"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3107990.3107993"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2798607"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.148"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2244870"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2020.102387"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3059508"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548195"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548399"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/339"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080797"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964291"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330652"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331254"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2978618"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3291060"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911491"},{"key":"e_1_3_2_25_2","unstructured":"Sameer Chhabra. [n.d.]. Netflix says 80 percent of watched content is based on algorithmic recommendations. Retrieved March 13 2021 from https:\/\/mobilesyrup.com\/2017\/08\/22\/80-percent-netflix-shows-discovered-recommendation\/"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952585"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2881260"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00445"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-99739-7_10"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401046"},{"issue":"5","key":"e_1_3_2_31_2","first-page":"106:1\u2013106:38","article-title":"Recommender systems leveraging multimedia content","volume":"53","author":"Deldjoo Yashar","year":"2020","unstructured":"Yashar Deldjoo, Markus Schedl, Paolo Cremonesi, and Gabriella Pasi. 2020. Recommender systems leveraging multimedia content. ACM Comput. Surv. 53, 5 (2020), 106:1\u2013106:38.","journal-title":"ACM Comput. Surv."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-0716-2197-4_25"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350905"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2807982"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco_a_01273"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00548"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV56688.2023.00223"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-24797-2"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123394"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872427.2883037"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.9973"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401063"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sigpro.2021.108036"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331213"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-short.27"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557506"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11257-015-9165-3"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2015.7363830"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00377"},{"key":"e_1_3_2_52_2","volume-title":"Proceedings of the NeurIPS","author":"Khosla Prannay","year":"2020","unstructured":"Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557387"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_2_55_2","volume-title":"Proceedings of the ICLR.","author":"Kipf Thomas N.","year":"2017","unstructured":"Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the ICLR. OpenReview.net."},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01435"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3573010"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.115708"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.126427"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462965"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-013-1825-x"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3414032"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611886"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350953"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3251108"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544106"},{"key":"e_1_3_2_68_2","unstructured":"Qidong Liu Jiaxi Hu Yutian Xiao Jingtong Gao and Xiangyu Zhao. 2023. Multimodal recommender systems: A survey. CoRR abs\/2302.03883 (2023)."},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475709"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3512527.3531378"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00258"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01764"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i3.16330"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.108976"},{"key":"e_1_3_2_75_2","article-title":"How retailers can keep up with consumers","volume":"18","author":"MacKenzie Ian","year":"2013","unstructured":"Ian MacKenzie, Chris Meyer, and Steve Noble. 2013. How retailers can keep up with consumers. McKinsey & Company 18 (2013).","journal-title":"McKinsey & Company"},{"key":"e_1_3_2_76_2","volume-title":"Proceedings of the EvalRS@KDD.","author":"Malitesta Daniele","year":"2023","unstructured":"Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, and Tommaso Di Noia. 2023. Disentangling the performance puzzle of multimodal-aware recommender systems. In Proceedings of the EvalRS@KDD.CEUR-WS.org."},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/3606040.3617441"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3613458"},{"key":"e_1_3_2_79_2","unstructured":"Daniele Malitesta Claudio Pomo Vito Walter Anelli Alberto Carlo Maria Mancino Eugenio Di Sciascio and Tommaso Di Noia. 2023. A Topology-aware analysis of graph collaborative filtering. CoRR abs\/2308.10778 (2023)."},{"key":"e_1_3_2_80_2","unstructured":"Daniele Malitesta Emanuele Rossi Claudio Pomo Fragkiskos D. Malliaros and Tommaso Di Noia. 2024. Dealing with missing modalities in multimodal recommendation: A feature propagation-based approach. CoRR abs\/2403.19841 (2024)."},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767755"},{"key":"e_1_3_2_82_2","volume-title":"Proceedings of the ICLR (Workshop Poster)","author":"Mikolov Tom\u00e1s","year":"2013","unstructured":"Tom\u00e1s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the ICLR (Workshop Poster)."},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2958761"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548119"},{"key":"e_1_3_2_85_2","first-page":"689","volume-title":"Proceedings of the ICML","author":"Ngiam Jiquan","year":"2011","unstructured":"Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal deep learning. In Proceedings of the ICML. Omnipress, 689\u2013696."},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-014-2339-x"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3125486.3125492"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.308"},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.216"},{"key":"e_1_3_2_90_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_2_91_2","volume-title":"Proceedings of the UAI","author":"Rendle Steffen","year":"2009","unstructured":"Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the UAI."},{"key":"e_1_3_2_92_2","first-page":"11","volume-title":"Proceedings of the LoG.","author":"Rossi Emanuele","year":"2022","unstructured":"Emanuele Rossi, Henry Kenlay, Maria I. Gorinova, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M. Bronstein. 2022. On the unreasonable effectiveness of feature propagation in learning on graphs with missing node features. In Proceedings of the LoG.PMLR, 11."},{"key":"e_1_3_2_93_2","first-page":"95:1\u201395:5","article-title":"Cornac: A comparative framework for multimodal recommender systems","volume":"21","author":"Salah Aghiles","year":"2020","unstructured":"Aghiles Salah, Quoc-Tuan Truong, and Hady W. Lauw. 2020. Cornac: A comparative framework for multimodal recommender systems. J. Mach. Learn. Res. 21 (2020), 95:1\u201395:5.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3007330"},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-85820-3_8"},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN48605.2020.9206894"},{"key":"e_1_3_2_97_2","volume-title":"Proceedings of the ICLR","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the ICLR."},{"key":"e_1_3_2_98_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3411947"},{"key":"e_1_3_2_99_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308560.3317303"},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9413461"},{"key":"e_1_3_2_101_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.nlp4convai-1.12"},{"key":"e_1_3_2_102_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01424-7_27"},{"key":"e_1_3_2_104_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3193288"},{"key":"e_1_3_2_105_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2020.102277"},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-020-08834-5"},{"key":"e_1_3_2_107_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460231.3473324"},{"key":"e_1_3_2_108_2","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2610382"},{"key":"e_1_3_2_109_2","doi-asserted-by":"publisher","DOI":"10.1145\/2043932.2043955"},{"key":"e_1_3_2_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigMM52142.2021.00012"},{"key":"e_1_3_2_111_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1373"},{"key":"e_1_3_2_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3138298"},{"key":"e_1_3_2_113_2","doi-asserted-by":"publisher","DOI":"10.1145\/3418211"},{"key":"e_1_3_2_114_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331267"},{"key":"e_1_3_2_115_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3450038"},{"key":"e_1_3_2_116_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583206"},{"key":"e_1_3_2_117_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3088307"},{"key":"e_1_3_2_118_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413556"},{"key":"e_1_3_2_119_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351034"},{"key":"e_1_3_2_120_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531896"},{"key":"e_1_3_2_121_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462862"},{"key":"e_1_3_2_122_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2020.3013234"},{"issue":"1","key":"e_1_3_2_123_2","first-page":"7:1\u20137:31","article-title":"Yum-Me: A personalized nutrient-based meal recommender system","volume":"36","author":"Yang Longqi","year":"2017","unstructured":"Longqi Yang, Cheng-Kang Hsieh, Hongjian Yang, John P. Pollak, Nicola Dell, Serge J. Belongie, Curtis Cole, and Deborah Estrin. 2017. Yum-Me: A personalized nutrient-based meal recommender system. ACM Trans. Inf. Syst. 36, 1 (2017), 7:1\u20137:31.","journal-title":"ACM Trans. Inf. Syst."},{"key":"e_1_3_2_124_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSS.2020.2986778"},{"key":"e_1_3_2_125_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5362"},{"key":"e_1_3_2_126_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3111487"},{"key":"e_1_3_2_127_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3532027"},{"key":"e_1_3_2_128_2","unstructured":"Shukang Yin Chaoyou Fu Sirui Zhao Ke Li Xing Sun Tong Xu and Enhong Chen. 2023. A survey on multimodal large language models. CoRR abs\/2306.13549 (2023)."},{"key":"e_1_3_2_129_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219890"},{"key":"e_1_3_2_130_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3613915"},{"key":"e_1_3_2_131_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350935"},{"key":"e_1_3_2_132_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3532064"},{"key":"e_1_3_2_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3059514"},{"key":"e_1_3_2_134_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539388"},{"key":"e_1_3_2_135_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475259"},{"key":"e_1_3_2_136_2","article-title":"Latent structures mining with contrastive modality fusion for multimedia recommendation","volume":"2111","author":"Zhang Jinghao","year":"2021","unstructured":"Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Mengqi Zhang, Shu Wu, and Liang Wang. 2021. Latent structures mining with contrastive modality fusion for multimedia recommendation. CoRR abs\/2111.00678 (2021).","journal-title":"CoRR"},{"key":"e_1_3_2_137_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/478"},{"key":"e_1_3_2_138_2","unstructured":"Yongfeng Zhang. 2017. Explainable recommendation: Theory and applications. arXiv:1708.06409. Retrieved from https:\/\/arxiv.org\/abs\/1708.06409"},{"key":"e_1_3_2_139_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000066"},{"key":"e_1_3_2_140_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.04.126"},{"key":"e_1_3_2_141_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557680"},{"key":"e_1_3_2_142_2","doi-asserted-by":"publisher","DOI":"10.1145\/3018661.3018665"},{"key":"e_1_3_2_143_2","doi-asserted-by":"publisher","DOI":"10.1145\/3570361.3592517"},{"key":"e_1_3_2_144_2","unstructured":"Hongyu Zhou Xin Zhou Zhiwei Zeng Lingzi Zhang and Zhiqi Shen. 2023. A comprehensive survey on multimodal recommender systems: Taxonomy evaluation and future directions. CoRR abs\/2302.04473 (2023)."},{"key":"e_1_3_2_145_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611943"},{"key":"e_1_3_2_146_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583251"}],"container-title":["ACM Transactions on Recommender Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3662738","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3662738","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T23:57:11Z","timestamp":1750291031000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3662738"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,22]]},"references-count":145,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3662738"],"URL":"https:\/\/doi.org\/10.1145\/3662738","relation":{},"ISSN":["2770-6699"],"issn-type":[{"value":"2770-6699","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,22]]},"assertion":[{"value":"2023-09-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}