{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:05:08Z","timestamp":1750309508827,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":88,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T00:00:00Z","timestamp":1730073600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Scientific and Technological innovation action plan of Shanghai Science and Technology Committee","award":["22511102202, 22511101502, 21DZ2203300"],"award-info":[{"award-number":["22511102202, 22511101502, 21DZ2203300"]}]},{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072112"],"award-info":[{"award-number":["62072112"]}],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,10,28]]},"DOI":"10.1145\/3664647.3680581","type":"proceedings-article","created":{"date-parts":[[2024,10,26]],"date-time":"2024-10-26T06:59:41Z","timestamp":1729925981000},"page":"5151-5160","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4388-9757","authenticated-orcid":false,"given":"Pinxue","family":"Guo","sequence":"first","affiliation":[{"name":"Shanghai Engineering Research Center of AI &amp; Robotics, Academy for Engineering &amp; Technology, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-0669-0661","authenticated-orcid":false,"given":"Wanyun","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5739-1271","authenticated-orcid":false,"given":"Hao","family":"Huang","sequence":"additional","affiliation":[{"name":"Shanghai Engineering Research Center of AI &amp; Robotics, Academy for Engineering &amp; Technology, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2749-5133","authenticated-orcid":false,"given":"Lingyi","family":"Hong","sequence":"additional","affiliation":[{"name":"Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-5734-1305","authenticated-orcid":false,"given":"Xinyu","family":"Zhou","sequence":"additional","affiliation":[{"name":"Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7112-2596","authenticated-orcid":false,"given":"Zhaoyu","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai Engineering Research Center of AI &amp; Robotics, Academy for Engineering &amp; Technology, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4930-6284","authenticated-orcid":false,"given":"Jinglun","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Engineering Research Center of AI &amp; Robotics, Academy for Engineering &amp; Technology, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2878-0497","authenticated-orcid":false,"given":"Kaixun","family":"Jiang","sequence":"additional","affiliation":[{"name":"Shanghai Engineering Research Center of AI &amp; Robotics, Academy for Engineering &amp; Technology, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2358-8543","authenticated-orcid":false,"given":"Wei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3339-8751","authenticated-orcid":false,"given":"Wenqiang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Engineering Research Center of AI &amp; Robotics, Ministry of Education, Academy for Engineering &amp; Technology, Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2024,10,28]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings, Part II 16","author":"Bhat Goutam","year":"2020","unstructured":"Goutam Bhat, Felix J\u00e4remo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, and Radu Timofte. 2020. Learning what to learn for video object segmentation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16. Springer, 777--794."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.565"},{"key":"e_1_3_2_1_3_1","volume-title":"Federico Perazzi, and Jordi Pont-Tuset.","author":"Caelles Sergi","year":"2018","unstructured":"Sergi Caelles, Alberto Montes, Kevis-Kokitsi Maninis, Yuhua Chen, Luc Van Gool, Federico Perazzi, and Jordi Pont-Tuset. 2018. The 2018 DAVIS Challenge on Video Object Segmentation. arXiv:1803.00557 (2018)."},{"key":"e_1_3_2_1_4_1","volume-title":"The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation. arXiv:1905.00737","author":"Caelles Sergi","year":"2019","unstructured":"Sergi Caelles, Jordi Pont-Tuset, Federico Perazzi, Alberto Montes, Kevis-Kokitsi Maninis, and Luc Van Gool. 2019. The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation. arXiv:1905.00737 (2019)."},{"key":"e_1_3_2_1_5_1","first-page":"16664","article-title":"Adaptformer: Adapting vision transformers for scalable visual recognition","volume":"35","author":"Chen Shoufa","year":"2022","unstructured":"Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. 2022. Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, Vol. 35 (2022), 16664--16678.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00130"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19815-1_37"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00551"},{"key":"e_1_3_2_1_10_1","first-page":"11781","article-title":"Rethinking space-time networks with improved memory coverage for efficient video object segmentation","volume":"34","author":"Cheng Ho Kei","year":"2021","unstructured":"Ho Kei Cheng, Yu-Wing Tai, and Chi-Keung Tang. 2021. Rethinking space-time networks with improved memory coverage for efficient video object segmentation. Advances in Neural Information Processing Systems, Vol. 34 (2021), 11781--11794.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00774"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20047-2_26"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_14_1","volume-title":"Philip HS Torr, and Song Bai","author":"Ding Henghui","year":"2023","unstructured":"Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Philip HS Torr, and Song Bai. 2023. Mose: A new dataset for video object segmentation in complex scenes. arXiv preprint arXiv:2302.01872 (2023)."},{"key":"e_1_3_2_1_15_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00585"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Guillermo Gallego Tobi Delbr\u00fcck Garrick Orchard Chiara Bartolozzi Brian Taba Andrea Censi Stefan Leutenegger Andrew J Davison J\u00f6rg Conradt Kostas Daniilidis et al. 2020. Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence Vol. 44 1 (2020) 154--180.","DOI":"10.1109\/TPAMI.2020.3008413"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00122"},{"key":"e_1_3_2_1_19_1","volume-title":"Convmae: Masked convolution meets masked autoencoders. arXiv preprint arXiv:2205.03892","author":"Gao Peng","year":"2022","unstructured":"Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, and Yu Qiao. 2022. Convmae: Masked convolution meets masked autoencoders. arXiv preprint arXiv:2205.03892 (2022)."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093335"},{"key":"e_1_3_2_1_21_1","volume-title":"ClickVOS: Click Video Object Segmentation. arXiv preprint arXiv:2403.06130","author":"Guo Pinxue","year":"2024","unstructured":"Pinxue Guo, Lingyi Hong, Xinyu Zhou, Shuyong Gao, Wanyun Li, Jinglun Li, Zhaoyu Chen, Xiaoqiang Li, Wei Zhang, and Wenqiang Zhang. 2024. ClickVOS: Click Video Object Segmentation. arXiv preprint arXiv:2403.06130 (2024)."},{"key":"e_1_3_2_1_22_1","volume-title":"OpenVIS: Open-vocabulary video instance segmentation. arXiv preprint arXiv:2305.16835","author":"Guo Pinxue","year":"2023","unstructured":"Pinxue Guo, Tony Huang, Peiyang He, Xuefeng Liu, Tianjun Xiao, Zhaoyu Chen, and Wenqiang Zhang. 2023. OpenVIS: Open-vocabulary video instance segmentation. arXiv preprint arXiv:2305.16835 (2023)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3219230"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01805"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611804"},{"key":"e_1_3_2_1_26_1","volume-title":"International conference on machine learning. PMLR, 2790--2799","author":"Houlsby Neil","year":"2019","unstructured":"Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International conference on machine learning. PMLR, 2790--2799."},{"key":"e_1_3_2_1_27_1","volume-title":"Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685","author":"Hu Edward J","year":"2021","unstructured":"Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00152"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01237-3_4"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19827-4_41"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00916"},{"key":"e_1_3_2_1_32_1","volume-title":"Advances in Neural Information Processing Systems","volume":"36","author":"Ke Lei","year":"2024","unstructured":"Lei Ke, Mingqiao Ye, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu, et al. 2024. Segment anything in high quality. Advances in Neural Information Processing Systems, Vol. 36 (2024)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00988"},{"key":"e_1_3_2_1_34_1","volume-title":"The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691","author":"Lester Brian","year":"2021","unstructured":"Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021)."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.106977"},{"key":"e_1_3_2_1_36_1","volume-title":"Event-assisted Low-Light Video Object Segmentation. arXiv preprint arXiv:2404.01945","author":"Li Hebei","year":"2024","unstructured":"Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, and Xiaoyan Sun. 2024. Event-assisted Low-Light Video Object Segmentation. arXiv preprint arXiv:2404.01945 (2024)."},{"key":"e_1_3_2_1_37_1","volume-title":"OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework. arXiv preprint arXiv:2403.08682","author":"Li Wanyun","year":"2024","unstructured":"Wanyun Li, Pinxue Guo, Xinyu Zhou, Lingyi Hong, Yangji He, Xiangyu Zheng, Wei Zhang, and Wenqiang Zhang. 2024. OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework. arXiv preprint arXiv:2403.08682 (2024)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_2_1_39_1","first-page":"1950","article-title":"Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning","volume":"35","author":"Liu Haokun","year":"2022","unstructured":"Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin A Raffel. 2022. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, Vol. 35 (2022), 1950--1965.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3117964"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2906195"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2863604"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19818-2_27"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2022.103489"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00716"},{"key":"e_1_3_2_1_46_1","volume-title":"Video object segmentation without temporal information","author":"Maninis K-K","year":"2018","unstructured":"K-K Maninis, Sergi Caelles, Yuhua Chen, Jordi Pont-Tuset, Laura Leal-Taix\u00e9, Daniel Cremers, and Luc Van Gool. 2018. Video object segmentation without temporal information. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 6 (2018), 1515--1530."},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision. 1154--1163","author":"Mueller Franziska","year":"2017","unstructured":"Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time hand tracking under occlusion from an egocentric rgb-d sensor. In Proceedings of the IEEE International Conference on Computer Vision. 1154--1163."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00770"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00932"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3108405"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.372"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"crossref","unstructured":"F. Perazzi J. Pont-Tuset B. McWilliams L. Van Gool M. Gross and A. Sorkine-Hornung. 2016. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In Computer Vision and Pattern Recognition.","DOI":"10.1109\/CVPR.2016.85"},{"key":"e_1_3_2_1_53_1","volume-title":"The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675","author":"Pont-Tuset Jordi","year":"2017","unstructured":"Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbel\u00e1ez, Alex Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017)."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR48806.2021.9412984"},{"key":"e_1_3_2_1_55_1","volume-title":"Proceedings, Part XV 16","author":"Seo Seonguk","year":"2020","unstructured":"Seonguk Seo, Joon-Young Lee, and Bohyung Han. 2020. Urvos: Unified referring video object segmentation network with a large-scale benchmark. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XV 16. Springer, 208--223."},{"key":"e_1_3_2_1_56_1","volume-title":"Proceedings, Part XXII 16","author":"Seong Hongje","year":"2020","unstructured":"Hongje Seong, Junhyuk Hyun, and Euntai Kim. 2020. Kernelized memory network for video object segmentation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXII 16. Springer, 629--645."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01265"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2023.105919"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00971"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00142"},{"key":"e_1_3_2_1_61_1","volume-title":"Semi-supervised video object segmentation with super-trajectories","author":"Wang Wenguan","year":"2018","unstructured":"Wenguan Wang, Jianbing Shen, Fatih Porikli, and Ruigang Yang. 2018. Semi-supervised video object segmentation with super-trajectories. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 4 (2018), 985--998."},{"key":"e_1_3_2_1_62_1","volume-title":"Visevent: Reliable object tracking via collaboration of frame and event flows","author":"Wang Xiao","year":"2023","unstructured":"Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, Yaowei Wang, Yonghong Tian, and Feng Wu. 2023. Visevent: Reliable object tracking via collaboration of frame and event flows. IEEE Transactions on Cybernetics (2023)."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2017.2706197"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00408"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00125"},{"key":"e_1_3_2_1_66_1","volume-title":"Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327","author":"Xu Ning","year":"2018","unstructured":"Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, and Thomas Huang. 2018. Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327 (2018)."},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i3.20200"},{"key":"e_1_3_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00525"},{"key":"e_1_3_2_1_69_1","volume-title":"Unveiling the Power of Visible-Thermal Video Object Segmentation","author":"Yang Jinyu","year":"2023","unstructured":"Jinyu Yang, Mingqi Gao, Runmin Cong, Chengjie Wang, Feng Zheng, and Alevs Leonardis. 2023. Unveiling the Power of Visible-Thermal Video Object Segmentation. IEEE Transactions on Circuits and Systems for Video Technology (2023)."},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2024.3374130"},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547851"},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00529"},{"key":"e_1_3_2_1_73_1","volume-title":"Associating objects with scalable transformers for video object segmentation. arXiv preprint arXiv:2203.11442","author":"Yang Zongxin","year":"2022","unstructured":"Zongxin Yang, Jiaxu Miao, Xiaohan Wang, Yunchao Wei, and Yi Yang. 2022. Associating objects with scalable transformers for video object segmentation. arXiv preprint arXiv:2203.11442 (2022)."},{"key":"e_1_3_2_1_74_1","volume-title":"Proceedings, Part V. Springer, 332--348","author":"Yang Zongxin","year":"2020","unstructured":"Zongxin Yang, Yunchao Wei, and Yi Yang. 2020. Collaborative video object segmentation by foreground-background integration. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part V. Springer, 332--348."},{"key":"e_1_3_2_1_75_1","first-page":"2491","article-title":"Associating objects with transformers for video object segmentation","volume":"34","author":"Yang Zongxin","year":"2021","unstructured":"Zongxin Yang, Yunchao Wei, and Yi Yang. 2021. Associating objects with transformers for video object segmentation. Advances in Neural Information Processing Systems, Vol. 34 (2021), 2491--2502.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_76_1","first-page":"4701","article-title":"Collaborative video object segmentation by multi-scale foreground-background integration","volume":"44","author":"Yang Zongxin","year":"2021","unstructured":"Zongxin Yang, Yunchao Wei, and Yi Yang. 2021. Collaborative video object segmentation by multi-scale foreground-background integration. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 9 (2021), 4701--4712.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_1_77_1","volume-title":"Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration","author":"Yang Z","year":"2022","unstructured":"Z Yang, Y Wei, and Y Yang. 2022. Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)."},{"key":"e_1_3_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3082763"},{"key":"e_1_3_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00116"},{"key":"e_1_3_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3060862"},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00868"},{"key":"e_1_3_2_1_82_1","volume-title":"End to end video segmentation for driving: Lane detection for autonomous car. arXiv preprint arXiv:1812.05914","author":"Zhang Wenhui","year":"2018","unstructured":"Wenhui Zhang and Tejas Mahale. 2018. End to end video segmentation for driving: Lane detection for autonomous car. arXiv preprint arXiv:1812.05914 (2018)."},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00698"},{"key":"e_1_3_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00496"},{"key":"e_1_3_2_1_85_1","volume-title":"Luc Van Gool, and Wenguan Wang","author":"Zhou Tianfei","year":"2022","unstructured":"Tianfei Zhou, Fatih Porikli, David J Crandall, Luc Van Gool, and Wenguan Wang. 2022. A survey on deep learning technique for video segmentation. IEEE transactions on pattern analysis and machine intelligence, Vol. 45, 6 (2022), 7099--7122."},{"key":"e_1_3_2_1_86_1","volume-title":"Advances in Neural Information Processing Systems","volume":"36","author":"Zhou Xinyu","year":"2024","unstructured":"Xinyu Zhou, Pinxue Guo, Lingyi Hong, Jinglun Li, Wei Zhang, Weifeng Ge, and Wenqiang Zhang. 2024. Reading relevant feature from global representation memory for visual object tracking. Advances in Neural Information Processing Systems, Vol. 36 (2024)."},{"key":"e_1_3_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00918"},{"key":"e_1_3_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3060015"}],"event":{"name":"MM '24: The 32nd ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Melbourne VIC Australia","acronym":"MM '24"},"container-title":["Proceedings of the 32nd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664647.3680581","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3664647.3680581","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:56Z","timestamp":1750295876000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664647.3680581"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,28]]},"references-count":88,"alternative-id":["10.1145\/3664647.3680581","10.1145\/3664647"],"URL":"https:\/\/doi.org\/10.1145\/3664647.3680581","relation":{},"subject":[],"published":{"date-parts":[[2024,10,28]]},"assertion":[{"value":"2024-10-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}