{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T17:51:22Z","timestamp":1781891482120,"version":"3.54.5"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3548366","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:43:01Z","timestamp":1665416581000},"page":"258-267","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["From Abstract to Details"],"prefix":"10.1145","author":[{"given":"Fangxiong","family":"Xiao","sequence":"first","affiliation":[{"name":"Search and Recommendation Platform Department, JD.COM, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lixi","family":"Deng","sequence":"additional","affiliation":[{"name":"Search and Recommendation Platform Department, JD.COM, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jingjing","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Houye","family":"Ji","sequence":"additional","affiliation":[{"name":"Search and Recommendation Platform Department, JD.COM, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaorui","family":"Yang","sequence":"additional","affiliation":[{"name":"Search and Recommendation Platform Department, JD.COM, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhuoye","family":"Ding","sequence":"additional","affiliation":[{"name":"Search and Recommendation Platform Department, JD.COM, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bo","family":"Long","sequence":"additional","affiliation":[{"name":"Search and Recommendation Platform Department, JD.COM, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Abdulmotaleb El Saddik, and Mohan S Kankanhalli","author":"Atrey Pradeep K","year":"2010","unstructured":"Pradeep K Atrey , M Anwar Hossain , Abdulmotaleb El Saddik, and Mohan S Kankanhalli . 2010 . Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16, 6 (2010), 345--379. Pradeep K Atrey, M Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16, 6 (2010), 345--379."},{"key":"e_1_3_2_2_2_1","volume-title":"3rd International Conference on Learning Representations, ICLR","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyung Hyun Cho , and Yoshua Bengio . 2015 . Neural machine translation by jointly learning to align and translate . In 3rd International Conference on Learning Representations, ICLR 2015. Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015."},{"key":"e_1_3_2_2_3_1","volume-title":"Multimodal machine learning: A survey and taxonomy","author":"Tadas","year":"2018","unstructured":"Tadas Baltru?aitis, Chaitanya Ahuja , and Louis-Philippe Morency . 2018. Multimodal machine learning: A survey and taxonomy . IEEE transactions on pattern analysis and machine intelligence 41, 2 ( 2018 ), 423--443. Tadas Baltru?aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423--443."},{"key":"e_1_3_2_2_4_1","volume-title":"Representation learning: A review and new perspectives","author":"Bengio Yoshua","year":"2013","unstructured":"Yoshua Bengio , Aaron Courville , and Pascal Vincent . 2013. Representation learning: A review and new perspectives . IEEE transactions on pattern analysis and machine intelligence 35, 8 ( 2013 ), 1798--1828. Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798--1828."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3326937.3341261"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2959100.2959190"},{"key":"e_1_3_2_2_7_1","volume-title":"Sparse Fusion for Multimodal Transformers. arXiv preprint arXiv:2111.11992","author":"Ding Yi","year":"2021","unstructured":"Yi Ding , Alex Rich , Mason Wang , Noah Stier , Pradeep Sen , Matthew Turk , and Tobias H\u00f6llerer . 2021. Sparse Fusion for Multimodal Transformers. arXiv preprint arXiv:2111.11992 ( 2021 ). Yi Ding, Alex Rich, Mason Wang, Noah Stier, Pradeep Sen, Matthew Turk, and Tobias H\u00f6llerer. 2021. Sparse Fusion for Multimodal Transformers. arXiv preprint arXiv:2111.11992 (2021)."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3059295"},{"key":"e_1_3_2_2_9_1","article-title":"Adaptive subgradient methods for online learning and stochastic optimization","volume":"12","author":"Duchi John","year":"2011","unstructured":"John Duchi , Elad Hazan , and Yoram Singer . 2011 . Adaptive subgradient methods for online learning and stochastic optimization . Journal of machine learning research 12 , 7 (2011). John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research 12, 7 (2011).","journal-title":"Journal of machine learning research"},{"key":"e_1_3_2_2_10_1","volume-title":"Generative adversarial nets. Advances in neural information processing systems 27","author":"Goodfellow Ian","year":"2014","unstructured":"Ian Goodfellow , Jean Pouget-Abadie , Mehdi Mirza , Bing Xu , David Warde-Farley , Sherjil Ozair , Aaron Courville , and Yoshua Bengio . 2014. Generative adversarial nets. Advances in neural information processing systems 27 ( 2014 ). Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014)."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/239"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i14.17548"},{"key":"e_1_3_2_2_13_1","volume-title":"Proceedings of NAACL-HLT. 4171--4186","author":"Ming-Wei Chang Jacob Devlin","year":"2019","unstructured":"Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of NAACL-HLT. 4171--4186 . Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171--4186."},{"key":"e_1_3_2_2_14_1","volume-title":"A survey of the recent architectures of deep convolutional neural networks. Artificial intelligence review 53, 8","author":"Khan Asifullah","year":"2020","unstructured":"Asifullah Khan , Anabia Sohail , Umme Zahoora , and Aqsa Saeed Qureshi . 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial intelligence review 53, 8 ( 2020 ), 5455--5516. Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial intelligence review 53, 8 (2020), 5455--5516."},{"key":"e_1_3_2_2_15_1","volume-title":"Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114","author":"Kingma Diederik P","year":"2013","unstructured":"Diederik P Kingma and Max Welling . 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 ( 2013 ). Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)."},{"key":"e_1_3_2_2_16_1","volume-title":"Unifying visualsemantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539","author":"Kiros Ryan","year":"2014","unstructured":"Ryan Kiros , Ruslan Salakhutdinov , and Richard S Zemel . 2014. Unifying visualsemantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 ( 2014 ). Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. 2014. Unifying visualsemantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)."},{"key":"e_1_3_2_2_17_1","volume-title":"Parameter efficient multimodal transformers for video representation learning. arXiv preprint arXiv:2012.04124","author":"Lee Sangho","year":"2020","unstructured":"Sangho Lee , Youngjae Yu , Gunhee Kim , Thomas Breuel , Jan Kautz , and Yale Song . 2020. Parameter efficient multimodal transformers for video representation learning. arXiv preprint arXiv:2012.04124 ( 2020 ). Sangho Lee, Youngjae Yu, Gunhee Kim, Thomas Breuel, Jan Kautz, and Yale Song. 2020. Parameter efficient multimodal transformers for video representation learning. arXiv preprint arXiv:2012.04124 (2020)."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467189"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380163"},{"key":"e_1_3_2_2_20_1","volume-title":"Interbert: Vision-and-language interaction for multi-modal pretraining. arXiv preprint arXiv:2003.13198","author":"Lin Junyang","year":"2020","unstructured":"Junyang Lin , An Yang , Yichang Zhang , Jie Liu , Jingren Zhou , and Hongxia Yang . 2020 . Interbert: Vision-and-language interaction for multi-modal pretraining. arXiv preprint arXiv:2003.13198 (2020). Junyang Lin, An Yang, Yichang Zhang, Jie Liu, Jingren Zhou, and Hongxia Yang. 2020. Interbert: Vision-and-language interaction for multi-modal pretraining. arXiv preprint arXiv:2003.13198 (2020)."},{"key":"e_1_3_2_2_21_1","volume-title":"Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2686--2696","author":"Liu Hu","year":"2020","unstructured":"Hu Liu , Jing Lu , Hao Yang , Xiwei Zhao , Sulong Xu , Hao Peng , Zehua Zhang , Wenjie Niu , Xiaokun Zhu , Yongjun Bao , 2020 . Category-Specific CNN for Visual-aware CTR Prediction at JD. com . In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2686--2696 . Hu Liu, Jing Lu, Hao Yang, Xiwei Zhao, Sulong Xu, Hao Peng, Zehua Zhang, Wenjie Niu, Xiaokun Zhu, Yongjun Bao, et al. 2020. Category-Specific CNN for Visual-aware CTR Prediction at JD. com. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2686--2696."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1209"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2488200"},{"key":"e_1_3_2_2_24_1","volume-title":"Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784","author":"Mirza Mehdi","year":"2014","unstructured":"Mehdi Mirza and Simon Osindero . 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 ( 2014 ). Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)."},{"key":"e_1_3_2_2_25_1","volume-title":"Md Yasin Kabir, and Even Oldridge","author":"Moreira Gabriel","year":"2021","unstructured":"Gabriel de Souza P Moreira , Sara Rabhi , Ronay Ak , Md Yasin Kabir, and Even Oldridge . 2021 . Transformers with multi-modal features and post-fusion context for e-commerce session-based recommendation. arXiv preprint arXiv:2107.05124 (2021). Gabriel de Souza P Moreira, Sara Rabhi, Ronay Ak, Md Yasin Kabir, and Even Oldridge. 2021. Transformers with multi-modal features and post-fusion context for e-commerce session-based recommendation. arXiv preprint arXiv:2107.05124 (2021)."},{"key":"e_1_3_2_2_26_1","volume-title":"Mmgan: Generative adversarial networks for multi-modal distributions. arXiv preprint arXiv:1911.06663","author":"Pandeva Teodora","year":"2019","unstructured":"Teodora Pandeva and Matthias Schubert . 2019 . Mmgan: Generative adversarial networks for multi-modal distributions. arXiv preprint arXiv:1911.06663 (2019). Teodora Pandeva and Matthias Schubert. 2019. Mmgan: Generative adversarial networks for multi-modal distributions. arXiv preprint arXiv:1911.06663 (2019)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3284750"},{"key":"e_1_3_2_2_28_1","volume-title":"Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG","author":"Praveen R Gnana","year":"2021","unstructured":"R Gnana Praveen , Eric Granger , and Patrick Cardinal . 2021 . Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). IEEE, 1--8. R Gnana Praveen, Eric Granger, and Patrick Cardinal. 2021. Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). IEEE, 1--8."},{"key":"e_1_3_2_2_29_1","volume-title":"Multimodal Topic Learning for Video Recommendation. arXiv preprint arXiv:2010.13373","author":"Pu Shi","year":"2020","unstructured":"Shi Pu , Yijiang He , Zheng Li , and Mao Zheng . 2020. Multimodal Topic Learning for Video Recommendation. arXiv preprint arXiv:2010.13373 ( 2020 ). Shi Pu, Yijiang He, Zheng Li, and Mao Zheng. 2020. Multimodal Topic Learning for Video Recommendation. arXiv preprint arXiv:2010.13373 (2020)."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2010.127"},{"key":"e_1_3_2_2_31_1","volume-title":"Very deep convolutional networks for large-scale visual recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale visual recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale visual recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3090595"},{"key":"e_1_3_2_2_33_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/503"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM50108.2020.00065"},{"key":"e_1_3_2_2_36_1","article-title":"Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion","volume":"11","author":"Vincent Pascal","year":"2010","unstructured":"Pascal Vincent , Hugo Larochelle , Isabelle Lajoie , Yoshua Bengio , Pierre-Antoine Manzagol , and L\u00e9on Bottou . 2010 . Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion . Journal of machine learning research 11 , 12 (2010). Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and L\u00e9on Bottou. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research 11, 12 (2010).","journal-title":"Journal of machine learning research"},{"key":"e_1_3_2_2_37_1","volume-title":"A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215","author":"Wang Kaiye","year":"2016","unstructured":"Kaiye Wang , Qiyue Yin , Wei Wang , Shu Wu , and Liang Wang . 2016. A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215 ( 2016 ). Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. 2016. A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215 (2016)."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350875"},{"key":"e_1_3_2_2_39_1","volume-title":"International conference on machine learning. PMLR, 1083--1092","author":"Wang Weiran","year":"2015","unstructured":"Weiran Wang , Raman Arora , Karen Livescu , and Jeff Bilmes . 2015 . On deep multiview representation learning . In International conference on machine learning. PMLR, 1083--1092 . Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. 2015. On deep multiview representation learning. In International conference on machine learning. PMLR, 1083--1092."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3418211","article-title":"Market2Dish: Health-aware food recommendation","volume":"17","author":"Wang Wenjie","year":"2021","unstructured":"Wenjie Wang , Ling-Yu Duan , Hao Jiang , Peiguang Jing , Xuemeng Song , and Liqiang Nie . 2021 . Market2Dish: Health-aware food recommendation . ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17 , 1 (2021), 1 -- 19 . Wenjie Wang, Ling-Yu Duan, Hao Jiang, Peiguang Jing, Xuemeng Song, and Liqiang Nie. 2021. Market2Dish: Health-aware food recommendation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 1 (2021), 1--19.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351034"},{"key":"e_1_3_2_2_42_1","volume-title":"Neural news recommendation with attentive multi-view learning. arXiv preprint arXiv:1907.05576","author":"Wu Chuhan","year":"2019","unstructured":"Chuhan Wu , Fangzhao Wu , Mingxiao An , Jianqiang Huang , Yongfeng Huang , and Xing Xie . 2019. Neural news recommendation with attentive multi-view learning. arXiv preprint arXiv:1907.05576 ( 2019 ). Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019. Neural news recommendation with attentive multi-view learning. arXiv preprint arXiv:1907.05576 (2019)."},{"key":"e_1_3_2_2_43_1","volume-title":"MM-Rec: Multimodal News Recommendation. arXiv preprint arXiv:2104.07407","author":"Wu Chuhan","year":"2021","unstructured":"Chuhan Wu , Fangzhao Wu , Tao Qi , and Yongfeng Huang . 2021. MM-Rec: Multimodal News Recommendation. arXiv preprint arXiv:2104.07407 ( 2021 ). Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2021. MM-Rec: Multimodal News Recommendation. arXiv preprint arXiv:2104.07407 (2021)."},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.417"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i12.17289"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.38094\/jastt1224"},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33015941"},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219823"},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01174"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548366","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3548366","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:44Z","timestamp":1750186844000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548366"}},"subtitle":["A Generative Multimodal Fusion Framework for Recommendation"],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":49,"alternative-id":["10.1145\/3503161.3548366","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3548366","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}