{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,17]],"date-time":"2026-07-17T03:23:24Z","timestamp":1784258604946,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":50,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T00:00:00Z","timestamp":1717027200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,5,30]]},"DOI":"10.1145\/3652583.3658036","type":"proceedings-article","created":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T06:30:40Z","timestamp":1717741840000},"page":"1016-1024","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-7697-9630","authenticated-orcid":false,"given":"Boyue","family":"Xu","sequence":"first","affiliation":[{"name":"State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8111-7339","authenticated-orcid":false,"given":"Ruichao","family":"Hou","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3092-424X","authenticated-orcid":false,"given":"Tongwei","family":"Ren","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1391-1762","authenticated-orcid":false,"given":"Gangshan","family":"Wu","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,6,7]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.160"},{"key":"e_1_3_2_1_2_1","volume-title":"One-Shot Video Object Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Caelles Sergi","year":"2017","unstructured":"Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taix\u00e9, Daniel Cremers, and Luc Van Gool. 2017. One-Shot Video Object Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_1_3_1","volume-title":"Blazingly Fast Video Object Segmentation with Pixel-wise Metric Learning. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Chen Yuhua","year":"2018","unstructured":"Yuhua Chen, Jordi Pont-Tuset, Alberto Montes, and Luc Van Gool. 2018. Blazingly Fast Video Object Segmentation with Pixel-wise Metric Learning. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19815-1_37"},{"key":"e_1_3_2_1_5_1","unstructured":"Ho Kei Cheng Yu-Wing Tai and Chi-Keung Tang. 2021. Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation. In Neural Information Processing Systems."},{"key":"e_1_3_2_1_6_1","volume-title":"Segflow: Joint Learning for Video Object Segmentation and Optical Flow. In IEEE International Conference on Computer Vision.","author":"Cheng Jingchun","year":"2017","unstructured":"Jingchun Cheng, Yi-Hsuan Tsai, Shengjin Wang, and Ming-Hsuan Yang. 2017. Segflow: Joint Learning for Video Object Segmentation and Optical Flow. In IEEE International Conference on Computer Vision."},{"key":"e_1_3_2_1_7_1","volume-title":"Segment and Track Anything. arXiv preprint arXiv:2305.06558","author":"Cheng Yangming","year":"2023","unstructured":"Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, and Yi Yang. 2023. Segment and Track Anything. arXiv preprint arXiv:2305.06558 (2023)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.228"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5214\/ans.0972.7531.200408"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.108945"},{"key":"e_1_3_2_1_11_1","volume-title":"Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_1_12_1","volume-title":"MIRNet: A Robust RGBT Tracking Jointly with Multi-Modal Interaction and Refinement. In IEEE International Conference on Multimedia and Expo.","author":"Hou Ruichao","year":"2022","unstructured":"Ruichao Hou, Tongwei Ren, and Gangshan Wu. 2022. MIRNet: A Robust RGBT Tracking Jointly with Multi-Modal Interaction and Refinement. In IEEE International Conference on Multimedia and Expo."},{"key":"e_1_3_2_1_13_1","volume-title":"MTNet: Learning Modality-aware Representation with Transformer for RGBT Tracking. In IEEE International Conference on Multimedia and Expo.","author":"Hou Ruichao","year":"2023","unstructured":"Ruichao Hou, Boyue Xu, Tongwei Ren, and Gangshan Wu. 2023. MTNet: Learning Modality-aware Representation with Transformer for RGBT Tracking. In IEEE International Conference on Multimedia and Expo."},{"key":"e_1_3_2_1_14_1","volume-title":"Learning Position and Target Consistency for Memory-Based Video Object Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Hu Li","year":"2021","unstructured":"Li Hu, Peng Zhang, Bang Zhang, Pan Pan, Yinghui Xu, and Rong Jin. 2021. Learning Position and Target Consistency for Memory-Based Video Object Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_1_15_1","volume-title":"Videomatch: Matching Based Video Object Segmentation. In European Conference on Computer Vision.","author":"Hu Yuan-Ting","year":"2018","unstructured":"Yuan-Ting Hu, Jia-Bin Huang, and Alexander G Schwing. 2018. Videomatch: Matching Based Video Object Segmentation. In European Conference on Computer Vision."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-11009-3_8"},{"key":"e_1_3_2_1_17_1","unstructured":"Anna Khoreva Rodrigo Benenson Eddy Ilg Thomas Brox and Bernt Schiele. 2017. Lucid Data Dreaming for Object Tracking. In The DAVIS Challenge on Video Object Segmentation."},{"key":"e_1_3_2_1_18_1","volume-title":"Segment Anything. In IEEE International Conference on Computer Vision.","author":"Kirillov Alexander","year":"2023","unstructured":"Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. 2023. Segment Anything. In IEEE International Conference on Computer Vision."},{"key":"e_1_3_2_1_19_1","volume-title":"Diffusion-Augmented Depth Prediction with Sparse Annotations. In ACM International Conference on Multimedia.","author":"Li Jiaqi","year":"2023","unstructured":"Jiaqi Li, Yiran Wang, Zihao Huang, Jinghong Zheng, Ke Xian, Zhiguo Cao, and Jianming Zhang. 2023. Diffusion-Augmented Depth Prediction with Sparse Annotations. In ACM International Conference on Multimedia."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3017924"},{"key":"e_1_3_2_1_21_1","volume-title":"Learning Quality-Aware Dynamic Memory for Video Object Segmentation. In European Conference on Computer Vision.","author":"Liu Yong","year":"2022","unstructured":"Yong Liu, Ran Yu, Fei Yin, Xinyuan Zhao, Wei Zhao, Weihao Xia, and Yujiu Yang. 2022. Learning Quality-Aware Dynamic Memory for Video Object Segmentation. In European Conference on Computer Vision."},{"key":"e_1_3_2_1_22_1","volume-title":"Decoupled Weight Decay Regularization. arXiv preprint arXiv:1711.05101","author":"Loshchilov Ilya","year":"2017","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled Weight Decay Regularization. arXiv preprint arXiv:1711.05101 (2017)."},{"key":"e_1_3_2_1_23_1","volume-title":"Rethinking Open-World Object Detection in Autonomous Driving Scenarios. In ACM International Conference on Multimedia.","author":"Ma Zeyu","year":"2022","unstructured":"Zeyu Ma, Yang Yang, Guoqing Wang, Xing Xu, Heng Tao Shen, and Mingxing Zhang. 2022. Rethinking Open-World Object Detection in Autonomous Driving Scenarios. In ACM International Conference on Multimedia."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00071"},{"key":"e_1_3_2_1_25_1","volume-title":"Semi-supervised Semantic Segmentation with Cross-Consistency Training. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Ouali Yassine","year":"2020","unstructured":"Yassine Ouali, C\u00e9line Hudelot, and Myriam Tami. 2020. Semi-supervised Semantic Segmentation with Cross-Consistency Training. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIM.2023.3243614"},{"key":"e_1_3_2_1_27_1","volume-title":"DAL: A Deep Depth-aware Long-term Tracker. In International Conference on Pattern Recognition.","author":"Qian Yanlin","year":"2021","unstructured":"Yanlin Qian, Song Yan, Alan Lukevz ivc, Matej Kristan, Joni-Kristian K\"am\"ar\"ainen, and Jivr 'i Matas. 2021. DAL: A Deep Depth-aware Long-term Tracker. In International Conference on Pattern Recognition."},{"key":"e_1_3_2_1_28_1","volume-title":"Video Object Segmentation Using Tracked Object Proposals. arXiv preprint arXiv:1707.06545","author":"Sharir Gilad","year":"2017","unstructured":"Gilad Sharir, Eddie Smolyansky, and Itamar Friedman. 2017. Video Object Segmentation Using Tracked Object Proposals. arXiv preprint arXiv:1707.06545 (2017)."},{"key":"e_1_3_2_1_29_1","volume-title":"Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks. In IEEE International Conference on Computer Vision.","author":"Yoon Jae Shin","year":"2017","unstructured":"Jae Shin Yoon, Francois Rameau, Junsik Kim, Seokju Lee, Seunghak Shin, and In So Kweon. 2017. Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks. In IEEE International Conference on Computer Vision."},{"key":"e_1_3_2_1_30_1","volume-title":"Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines. In IEEE International Conference on Computer Vision.","author":"Song S.","unstructured":"S. Song and J. Xiao. 2014. Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines. In IEEE International Conference on Computer Vision."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.101881"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.480"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2022.10.034"},{"key":"e_1_3_2_1_34_1","volume-title":"Attribute-based Progressive Fusion Network for Rgbt Tracking. In AAAI Conference on Artificial Intelligence.","author":"Xiao Yun","year":"2022","unstructured":"Yun Xiao, Mengmeng Yang, Chenglong Li, Lei Liu, and Jin Tang. 2022. Attribute-based Progressive Fusion Network for Rgbt Tracking. In AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_2_1_35_1","volume-title":"Jointly Modeling Association and Motion Cues for Robust Infrared UAV Tracking. The Visual Computer","author":"Xu Boyue","year":"2024","unstructured":"Boyue Xu, Ruichao Hou, Jia Bei, Tongwei Ren, and Gangshan Wu. 2024. Jointly Modeling Association and Motion Cues for Robust Infrared UAV Tracking. The Visual Computer (2024), 1--12."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3595916.3626441"},{"key":"e_1_3_2_1_37_1","volume-title":"Youtube-VOS: Sequence-to-Sequence Video Object Segmentation. In European Conference on Computer Vision.","author":"Xu Ning","year":"2018","unstructured":"Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, and Thomas Huang. 2018. Youtube-VOS: Sequence-to-Sequence Video Object Segmentation. In European Conference on Computer Vision."},{"key":"e_1_3_2_1_38_1","volume-title":"Reliable Propagation-Correction Modulation for Video Object Segmentation. In AAAI Conference on Artificial Intelligence.","author":"Xu Xiaohao","year":"2022","unstructured":"Xiaohao Xu, Jinglu Wang, Xiao Li, and Yan Lu. 2022. Reliable Propagation-Correction Modulation for Video Object Segmentation. In AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_2_1_39_1","volume-title":"IEEE International Conference on Computer Vision.","author":"Yan S.","unstructured":"S. Yan, J. Yang, J. Kpyl, F. Zheng, A. Leonardis, and J. K. Kmrinen. 2021. DepthTrack : Unveiling the Power of RGBD Tracking. In IEEE International Conference on Computer Vision."},{"key":"e_1_3_2_1_40_1","volume-title":"Prompting for Multi-modal Tracking. In the ACM International Conference on Multimedia.","author":"Yang Jinyu","year":"2022","unstructured":"Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompting for Multi-modal Tracking. In the ACM International Conference on Multimedia."},{"key":"e_1_3_2_1_41_1","unstructured":"Zongxin Yang Yunchao Wei and Yi Yang. 2021. Associating Objects with Transformers for Video Object Segmentation. In Neural Information Processing Systems."},{"key":"e_1_3_2_1_42_1","unstructured":"Zongxin Yang and Yi Yang. 2022. Decoupling Features in Hierarchical Propagation for Video Object Segmentation. In Neural Information Processing Systems."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3391743"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME55011.2023.00348"},{"key":"e_1_3_2_1_45_1","unstructured":"Dong Zhang Hanwang Zhang Jinhui Tang Xian-Sheng Hua and Qianru Sun. 2020. Causal Intervention for Weakly-Supervised Semantic Segmentation. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_1_46_1","volume-title":"ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Zhao Haojie","year":"2023","unstructured":"Haojie Zhao, Junsong Chen, Lijun Wang, and Huchuan Lu. 2023. ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR48806.2021.9413315"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Bineng Zhong Yingju Shen Yan Chen Weibo Xie Zhen Cui Hongbo Zhang Duansheng Chen Tian Wang Xin Liu Shujuan Peng et al. 2015. Online Learning 3D Context for Robust Visual Tracking. Neurocomputing (2015).","DOI":"10.1016\/j.neucom.2014.06.083"},{"key":"e_1_3_2_1_49_1","volume-title":"Visual Prompt Multi-Modal Tracking. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Zhu Jiawen","year":"2023","unstructured":"Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, and Huchuan Lu. 2023 a. Visual Prompt Multi-Modal Tracking. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i3.25500"}],"event":{"name":"ICMR '24: International Conference on Multimedia Retrieval","location":"Phuket Thailand","acronym":"ICMR '24","sponsor":["SIGMM ACM Special Interest Group on Multimedia","SIGSOFT ACM Special Interest Group on Software Engineering"]},"container-title":["Proceedings of the 2024 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3652583.3658036","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3652583.3658036","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T08:49:58Z","timestamp":1755766198000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3652583.3658036"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,30]]},"references-count":50,"alternative-id":["10.1145\/3652583.3658036","10.1145\/3652583"],"URL":"https:\/\/doi.org\/10.1145\/3652583.3658036","relation":{},"subject":[],"published":{"date-parts":[[2024,5,30]]},"assertion":[{"value":"2024-06-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}