{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T16:56:33Z","timestamp":1770742593322,"version":"3.49.0"},"reference-count":84,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,3,8]],"date-time":"2024-03-08T00:00:00Z","timestamp":1709856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62361002 and 62371330"],"award-info":[{"award-number":["62361002 and 62371330"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Guangxi Key Laboratory of Big Data in Finance and Economics"},{"name":"Doctor Start-up Funds","award":["BS2021025"],"award-info":[{"award-number":["BS2021025"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>As one of the representative types of user-generated contents (UGCs) in social platforms, micro-videos have been becoming popular in our daily life. Although micro-videos naturally exhibit multimodal features that are rich enough to support representation learning, the complex correlations across modalities render valuable information difficult to integrate. In this paper, we introduced a multimodal attentive representation network (MARNET) to learn complete and robust representations to benefit micro-video multi-label classification. To address the commonly missing modality issue, we presented a multimodal information aggregation mechanism module to integrate multimodal information, where latent common representations are obtained by modeling the complementarity and consistency in terms of visual-centered modality groupings instead of single modalities. For the label correlation issue, we designed an attentive graph neural network module to adaptively learn the correlation matrix and representations of labels for better compatibility with training data. In addition, a cross-modal multi-head attention module is developed to make the learned common representations label-aware for multi-label classification. Experiments conducted on two micro-video datasets demonstrate the superior performance of MARNET compared with state-of-the-art methods.<\/jats:p>","DOI":"10.1145\/3643888","type":"journal-article","created":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T12:13:40Z","timestamp":1707221620000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Multimodal Attentive Representation Learning for Micro-video Multi-label Classification"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2648-7358","authenticated-orcid":false,"given":"Peiguang","family":"Jing","sequence":"first","affiliation":[{"name":"Tianjin University, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6284-9470","authenticated-orcid":false,"given":"Xianyi","family":"Liu","sequence":"additional","affiliation":[{"name":"Tianjin University, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-1892-9617","authenticated-orcid":false,"given":"Lijuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tianjin University, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5784-1877","authenticated-orcid":false,"given":"Yun","family":"Li","sequence":"additional","affiliation":[{"name":"Guangxi University of Finance and Economics, Nanning, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5949-6587","authenticated-orcid":false,"given":"Yu","family":"Liu","sequence":"additional","affiliation":[{"name":"Tianjin University, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5165-204X","authenticated-orcid":false,"given":"Yuting","family":"Su","sequence":"additional","affiliation":[{"name":"Tianjin University, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,3,8]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"1247","volume-title":"Proceedings of International Conference on Machine Learning","author":"Andrew Galen","year":"2013","unstructured":"Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In Proceedings of International Conference on Machine Learning. 1247\u20131255."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2798607"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2004.03.009"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24673-2_3"},{"key":"e_1_3_2_6_2","article-title":"Heterogeneous hierarchical feature aggregation network for personalized micro-video recommendation","author":"Cai Desheng","year":"2021","unstructured":"Desheng Cai, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2021. Heterogeneous hierarchical feature aggregation network for personalized micro-video recommendation. IEEE Transactions on Multimedia (2021).","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_7_2","article-title":"Return of the devil in the details: Delving deep into convolutional nets","author":"Chatfield Ken","year":"2014","unstructured":"Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014).","journal-title":"arXiv preprint arXiv:1405.3531"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2017.7966144"},{"key":"e_1_3_2_9_2","first-page":"12655","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Chen Hui","year":"2020","unstructured":"Hui Chen, Guiguang Ding, Xudong Liu, Zijia Lin, Ji Liu, and Jungong Han. 2020. IMRAM: Iterative matching with recurrent attention memory for cross-modal image-text retrieval. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 12655\u201312663."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964314"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01065"},{"key":"e_1_3_2_12_2","first-page":"522","volume-title":"Proceedings of IEEE International Conference on Computer Vision","author":"Chen Tianshui","year":"2019","unstructured":"Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, and Liang Lin. 2019. Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of IEEE International Conference on Computer Vision. 522\u2013531."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2978618"},{"key":"e_1_3_2_14_2","first-page":"5177","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Chen Zhao-Min","year":"2019","unstructured":"Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. 2019. Multi-label image recognition with graph convolutional networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5177\u20135186."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547943"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964326"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240518"},{"key":"e_1_3_2_18_2","first-page":"4048","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Cheng Zhi-Qi","year":"2017","unstructured":"Zhi-Qi Cheng, Xiao Wu, Yang Liu, and Xian-Sheng Hua. 2017. Video2Shop: Exact matching clothes in videos to online shopping images. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 4048\u20134056."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3078971.3079025"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/3015812.3015987"},{"key":"e_1_3_2_21_2","first-page":"647","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Durand Thibaut","year":"2019","unstructured":"Thibaut Durand, Nazanin Mehrasa, and Greg Mori. 2019. Learning a deep ConvNet for multi-label classification with partial labels. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 647\u2013657."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611976236.40"},{"key":"e_1_3_2_23_2","volume-title":"Proceedings of Advances in Neural Information Processing Systems","volume":"26","author":"Frome Andrea","year":"2013","unstructured":"Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A deep visual-semantic embedding model. In Proceedings of Advances in Neural Information Processing Systems, Vol. 26."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-008-5064-8"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3078560"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.05.118"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"e_1_3_2_28_2","first-page":"1","article-title":"HMNet: A hierarchical multi-modal network for educational video concept prediction","author":"Huang Wei","year":"2023","unstructured":"Wei Huang, Tong Xiao, Qi Liu, Zhenya Huang, Jianhui Ma, and Enhong Chen. 2023. HMNet: A hierarchical multi-modal network for educational video concept prediction. International Journal of Machine Learning and Cybernetics (2023), 1\u201312.","journal-title":"International Journal of Machine Learning and Cybernetics"},{"key":"e_1_3_2_29_2","first-page":"3232","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Jiang Qing-Yuan","year":"2017","unstructured":"Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 3232\u20133240."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2670560"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3083079"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2785784"},{"key":"e_1_3_2_33_2","first-page":"1725","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Karpathy Andrej","year":"2014","unstructured":"Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1725\u20131732."},{"key":"e_1_3_2_34_2","first-page":"2482","volume-title":"Proceedings of International Conference on Machine Learning","author":"Li Cheng","year":"2016","unstructured":"Cheng Li, Bingyu Wang, Virgil Pavlu, and Javed Aslam. 2016. Conditional Bernoulli mixtures for multi-label classification. In Proceedings of International Conference on Machine Learning. 2482\u20132491."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3086895"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2022.11.111"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080834"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123341"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3121567"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2023.3240889"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350999"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i7.20731"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2848458"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.162"},{"key":"e_1_3_2_45_2","first-page":"689","volume-title":"Proceedings of International Conference on Machine Learning","author":"Ngiam Jiquan","year":"2011","unstructured":"Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal deep learning. In Proceedings of International Conference on Machine Learning. 689\u2013696."},{"key":"e_1_3_2_46_2","volume-title":"2017 TREC Video Retrieval Evaluation (TRECVID 2017)","author":"Nguyen Phuong Anh","year":"2017","unstructured":"Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, and Chong-Wah Ngo. 2017. VIREO@ TRECVID 2017: Video-to-text, Ad-hoc video search and video hyperlinking. In 2017 TREC Video Retrieval Evaluation (TRECVID 2017). National Institute of Standards and Technology (NIST)."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123313"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6853821"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1233"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46478-7_21"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-011-5256-5"},{"key":"e_1_3_2_52_2","first-page":"1234","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Sadanand Sreemanananth","year":"2012","unstructured":"Sreemanananth Sadanand and Jason J. Corso. 2012. Action bank: A high-level representation of activity in video. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1234\u20131241."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291311"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2697059"},{"key":"e_1_3_2_55_2","first-page":"1","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Szegedy Christian","year":"2015","unstructured":"Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1\u20139."},{"key":"e_1_3_2_56_2","first-page":"4489","volume-title":"Proceedings of IEEE international Conference on Computer Vision","author":"Tran Du","year":"2015","unstructured":"Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of IEEE international Conference on Computer Vision. 4489\u20134497."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00565"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2304.10465"},{"issue":"86","key":"e_1_3_2_59_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten Laurens van der","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579\u20132605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_60_2","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 6000\u20136010.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_61_2","article-title":"Graph attention networks","author":"Veli\u010dkovi\u0107 Petar","year":"2017","unstructured":"Petar Veli\u010dkovi\u0107, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).","journal-title":"arXiv preprint arXiv:1710.10903"},{"key":"e_1_3_2_62_2","first-page":"5005","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Wang Liwei","year":"2016","unstructured":"Liwei Wang, Yin Li, and Svetlana Lazebnik. 2016. Learning deep structure-preserving image-text embeddings. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5005\u20135013."},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.6089"},{"key":"e_1_3_2_64_2","first-page":"4305","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Wang Limin","year":"2015","unstructured":"Limin Wang, Yu Qiao, and Xiaoou Tang. 2015. Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 4305\u20134314."},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2958871"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3083978"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2923608"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351034"},{"key":"e_1_3_2_69_2","first-page":"24043","volume-title":"Proceedings of International Conference on Machine Learning","author":"Wu Nan","year":"2022","unstructured":"Nan Wu, Stanislaw Jastrzebski, Kyunghyun Cho, and Krzysztof J. Geras. 2022. Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. In Proceedings of International Conference on Machine Learning. PMLR, 24043\u201324055."},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380004"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547754"},{"key":"e_1_3_2_72_2","first-page":"5447","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Yang Xitong","year":"2017","unstructured":"Xitong Yang, Palghat Ramesh, Radha Chitta, Sriganesh Madhvanath, Edgar A. Bernal, and Jiebo Luo. 2017. Deep multimodal representation learning from temporal data. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5447\u20135455."},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58589-1_39"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10769"},{"key":"e_1_3_2_75_2","article-title":"Central moment discrepancy (CMD) for domain-invariant representation learning","author":"Zellinger Werner","year":"2017","unstructured":"Werner Zellinger, Thomas Grubinger, Edwin Lughofer, Thomas Natschl\u00e4ger, and Susanne Saminger-Platz. 2017. Central moment discrepancy (CMD) for domain-invariant representation learning. arXiv preprint arXiv:1702.08811 (2017).","journal-title":"arXiv preprint arXiv:1702.08811"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.06.003"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2013.39"},{"key":"e_1_3_2_78_2","article-title":"A convex formulation for learning task relationships in multi-task learning","author":"Zhang Yu","year":"2012","unstructured":"Yu Zhang and Dit-Yan Yeung. 2012. A convex formulation for learning task relationships in multi-task learning. arXiv preprint arXiv:1203.3536 (2012).","journal-title":"arXiv preprint arXiv:1203.3536"},{"key":"e_1_3_2_79_2","article-title":"Non-aligned multi-view multi-label classification via learning view-specific labels","author":"Zhao Dawei","year":"2022","unstructured":"Dawei Zhao, Qingwei Gao, Yixiang Lu, and Dong Sun. 2022. Non-aligned multi-view multi-label classification via learning view-specific labels. IEEE Transactions on Multimedia (2022).","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_80_2","article-title":"Multi-label image classification via category prototype compositional learning","author":"Zhou Fengtao","year":"2021","unstructured":"Fengtao Zhou, Sheng Huang, Bo Liu, and Dan Yang. 2021. Multi-label image classification via category prototype compositional learning. IEEE Transactions on Circuits and Systems for Video Technology (2021).","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_2_81_2","first-page":"5513","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Zhu Feng","year":"2017","unstructured":"Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang. 2017. Learning spatial regularization with image-level supervisions for multi-label image classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5513\u20135522."},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2974065"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2023.3282921"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2022.12.022"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2785795"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643888","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643888","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T23:57:34Z","timestamp":1750291054000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643888"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,8]]},"references-count":84,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3643888"],"URL":"https:\/\/doi.org\/10.1145\/3643888","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,8]]},"assertion":[{"value":"2023-06-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-27","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}