{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T15:41:35Z","timestamp":1775144495462,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3547851","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:42:35Z","timestamp":1665416555000},"page":"3492-3500","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":135,"title":["Prompting for Multi-Modal Tracking"],"prefix":"10.1145","author":[{"given":"Jinyu","family":"Yang","sequence":"first","affiliation":[{"name":"Southern University of Science and Technology, Shenzhen, China"}]},{"given":"Zhe","family":"Li","sequence":"additional","affiliation":[{"name":"Southern University of Science and Technology, Shenzhen, China"}]},{"given":"Feng","family":"Zheng","sequence":"additional","affiliation":[{"name":"Southern University of Science and Technology, Shenzhen, China"}]},{"given":"Ales","family":"Leonardis","sequence":"additional","affiliation":[{"name":"University of Birmingham, Birmingham, United Kingdom"}]},{"given":"Jingkuan","family":"Song","sequence":"additional","affiliation":[{"name":"University of Electronic Science and Technology of China, Chengdu, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models. arXiv preprint arXiv:2203.17274","author":"Bahng Hyojin","year":"2022","unstructured":"Hyojin Bahng , Ali Jahanian , Swami Sankaranarayanan , and Phillip Isola . 2022 . Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models. arXiv preprint arXiv:2203.17274 (2022). Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, and Phillip Isola. 2022. Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models. arXiv preprint arXiv:2203.17274 (2022)."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00628"},{"key":"e_1_3_2_2_3_1","first-page":"5","article-title":"Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling","volume":"4","author":"Camplani Massimo","year":"2015","unstructured":"Massimo Camplani , Sion L Hannuna , Majid Mirmehdi , Dima Damen , Adeline Paiement , Lili Tao , and Tilo Burghardt . 2015 . Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling .. In BMVC , Vol. 4. 5 . Massimo Camplani, Sion L Hannuna, Majid Mirmehdi, Dima Damen, Adeline Paiement, Lili Tao, and Tilo Burghardt. 2015. Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling.. In BMVC, Vol. 4. 5.","journal-title":"BMVC"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350975"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"e_1_3_2_2_7_1","volume-title":"Fusing twostream convolutional neural networks for RGB-T object tracking. Neurocomputing","year":"2018","unstructured":"Chenglong, Xiaohao, Zhao, Nan, Cao, Xiaochun, Tang, and Jin. 2018. Fusing twostream convolutional neural networks for RGB-T object tracking. Neurocomputing ( 2018 ). Chenglong, Xiaohao, Zhao, Nan, Cao, Xiaochun, Tang, and Jin. 2018. Fusing twostream convolutional neural networks for RGB-T object tracking. Neurocomputing (2018)."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00479"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00721"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00552"},{"key":"e_1_3_2_2_11_1","volume-title":"Clip-adapter: Better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544","author":"Gao Peng","year":"2021","unstructured":"Peng Gao , Shijie Geng , Renrui Zhang , Teli Ma , Rongyao Fang , Yongfeng Zhang , Hongsheng Li , and Yu Qiao . 2021 . Clip-adapter: Better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544 (2021). Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. 2021. Clip-adapter: Better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544 (2021)."},{"key":"e_1_3_2_2_12_1","first-page":"1","article-title":"DS-KCF: a real-time tracker for RGB-D data","volume":"16","author":"Hannuna Sion","year":"2016","unstructured":"Sion Hannuna , Massimo Camplani , Jake Hall , Majid Mirmehdi , Dima Damen , Tilo Burghardt , Adeline Paiement , and Lili Tao . 2016 . DS-KCF: a real-time tracker for RGB-D data . Journal of Real-Time Image Processing 16 , 5 (2016), 1 -- 20 . Sion Hannuna, Massimo Camplani, Jake Hall, Majid Mirmehdi, Dima Damen, Tilo Burghardt, Adeline Paiement, and Lili Tao. 2016. DS-KCF: a real-time tracker for RGB-D data. Journal of Real-Time Image Processing 16, 5 (2016), 1--20.","journal-title":"Journal of Real-Time Image Processing"},{"key":"e_1_3_2_2_13_1","volume-title":"GOT-10k: A Large High- Diversity Benchmark for Generic Object Tracking in the Wild","author":"Huang Lianghua","year":"2019","unstructured":"Lianghua Huang , Xin Zhao , and Kaiqi Huang . 2019. GOT-10k: A Large High- Diversity Benchmark for Generic Object Tracking in the Wild . IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2019 ), 1--1. https:\/\/doi.org\/10.1109\/ tpami.2019.2957464 Lianghua Huang, Xin Zhao, and Kaiqi Huang. 2019. GOT-10k: A Large High- Diversity Benchmark for Generic Object Tracking in the Wild. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019), 1--1. https:\/\/doi.org\/10.1109\/ tpami.2019.2957464"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2957464"},{"key":"e_1_3_2_2_15_1","volume-title":"Visual Prompt Tuning. arXiv preprint arXiv:2203.12119","author":"Jia Menglin","year":"2022","unstructured":"Menglin Jia , Luming Tang , Bor-Chun Chen , Claire Cardie , Serge Belongie , Bharath Hariharan , and Ser-Nam Lim . 2022. Visual Prompt Tuning. arXiv preprint arXiv:2203.12119 ( 2022 ). Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual Prompt Tuning. arXiv preprint arXiv:2203.12119 (2022)."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"crossref","unstructured":"U?ur Kart Joni-Kristian K\u00e4m\u00e4r\u00e4inen and Ji\"\u00ed Matas. 2018. How to Make an RGBD Tracker?. In ECCVW.  U?ur Kart Joni-Kristian K\u00e4m\u00e4r\u00e4inen and Ji\"\u00ed Matas. 2018. How to Make an RGBD Tracker?. In ECCVW.","DOI":"10.1007\/978-3-030-11009-3_8"},{"key":"e_1_3_2_2_17_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition.","author":"Alan","year":"2019","unstructured":"U?ur Kart, Alan Luke?i?, Matej Kristan , Joni-Kristian K\u00e4m\u00e4r\u00e4inen , and Ji\"\u00ed Matas. 2019 . Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters . In IEEE Conference on Computer Vision and Pattern Recognition. U?ur Kart, Alan Luke?i?, Matej Kristan, Joni-Kristian K\u00e4m\u00e4r\u00e4inen, and Ji\"\u00ed Matas. 2019. Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_18_1","volume-title":"Martin Danelljan, Alan Lukezic, Ondrej Drbohlav, Linbo He, Yushan Zhang, Song Yan, Jinyu Yang, Gustavo Fernandez, and et al.","author":"Kristan Matej","year":"2020","unstructured":"Matej Kristan , Ales Leonardis , Jiri Matas , Michael Felsberg , Roman Pflugfelder , Joni-Kristian Kamarainen , Luka ehovin Zajc , Martin Danelljan, Alan Lukezic, Ondrej Drbohlav, Linbo He, Yushan Zhang, Song Yan, Jinyu Yang, Gustavo Fernandez, and et al. 2020 . The Eighth Visual Object Tracking VOT2020 Challenge Results . Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Joni-Kristian Kamarainen, Luka ehovin Zajc, Martin Danelljan, Alan Lukezic, Ondrej Drbohlav, Linbo He, Yushan Zhang, Song Yan, Jinyu Yang, Gustavo Fernandez, and et al. 2020. The Eighth Visual Object Tracking VOT2020 Challenge Results."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00276"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW54120.2021.00305"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2831443"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.106977"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"crossref","unstructured":"C. Li W. Xue Y. Jia Z. Qu B. Luo and J. Tang. 2021. LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking. (2021).  C. Li W. Xue Y. Jia Z. Qu B. Luo and J. Tang. 2021. LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking. (2021).","DOI":"10.1109\/TIP.2021.3130533"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_2_25_1","volume-title":"prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586","author":"Liu Pengfei","year":"2021","unstructured":"Pengfei Liu , Weizhe Yuan , Jinlan Fu , Zhengbao Jiang , Hiroaki Hayashi , and Graham Neubig . 2021. Pre-train , prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 ( 2021 ). Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 (2021)."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2863604"},{"key":"e_1_3_2_2_27_1","volume-title":"Dualitygated mutual condition network for RGBT tracking","author":"Lu Andong","year":"2022","unstructured":"Andong Lu , Cun Qian , Chenglong Li , Jin Tang , and Liang Wang . 2022. Dualitygated mutual condition network for RGBT tracking . IEEE Transactions on Neural Networks and Learning Systems ( 2022 ). Andong Lu, Cun Qian, Chenglong Li, Jin Tang, and Liang Wang. 2022. Dualitygated mutual condition network for RGBT tracking. IEEE Transactions on Neural Networks and Learning Systems (2022)."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.01011"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01319"},{"key":"e_1_3_2_2_30_1","volume-title":"TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In European Conference on Computer Vision.","author":"M\u00fcller Matthias","year":"2018","unstructured":"Matthias M\u00fcller , Adel Bibi , Silvio Giancola , Salman Al-Subaihi , and Bernard Ghanem . 2018 . TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In European Conference on Computer Vision. Matthias M\u00fcller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, and Bernard Ghanem. 2018. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In European Conference on Computer Vision."},{"key":"e_1_3_2_2_31_1","volume-title":"International Conference on Pattern Recognition (ICPR).","author":"Qian Yanlin","year":"2020","unstructured":"Yanlin Qian , Song Yan , Alan Lukei , Matej Kristan , Joni-Kristian K\u00e4m\u00e4r\u00e4inen , and Ji\u00ed Matas . 2020 . DAL : A Deep Depth-aware Long-term Tracker . In International Conference on Pattern Recognition (ICPR). Yanlin Qian, Song Yan, Alan Lukei, Matej Kristan, Joni-Kristian K\u00e4m\u00e4r\u00e4inen, and Ji\u00ed Matas. 2020. DAL : A Deep Depth-aware Long-term Tracker. In International Conference on Pattern Recognition (ICPR)."},{"key":"e_1_3_2_2_32_1","volume-title":"International Conference on Machine Learning. PMLR, 8748--8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , 2021 . Learning transferable visual models from natural language supervision . In International Conference on Machine Learning. PMLR, 8748--8763 . Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763."},{"key":"e_1_3_2_2_33_1","volume-title":"Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877","author":"Touvron Hugo","year":"2020","unstructured":"Hugo Touvron , Matthieu Cord , Matthijs Douze , Francisco Massa , Alexandre Sablayrolles , and Herv\u00e9 J\u00e9gou . 2020. Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877 ( 2020 ). Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2020. Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877 (2020)."},{"key":"e_1_3_2_2_34_1","volume-title":"VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows. arXiv:2108.05015","author":"Li Jianing","year":"2021","unstructured":"XiaoWang, Jianing Li , Lin Zhu , Zhipeng Zhang , Zhe Chen , Xin Li , YaoweiWang, Yonghong Tian , and Feng Wu. 2021. VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows. arXiv:2108.05015 ( 2021 ). XiaoWang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, YaoweiWang, Yonghong Tian, and Feng Wu. 2021. VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows. arXiv:2108.05015 (2021)."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01028"},{"key":"e_1_3_2_2_36_1","volume-title":"Alpha- Refine: Boosting Tracking Performance by Precise Bounding Box Estimation. arXiv preprint arXiv:2012.06815","author":"Yan Bin","year":"2020","unstructured":"Bin Yan , Xinyu Zhang , DongWang, Huchuan Lu , and Xiaoyun Yang . 2020 . Alpha- Refine: Boosting Tracking Performance by Precise Bounding Box Estimation. arXiv preprint arXiv:2012.06815 (2020). Bin Yan, Xinyu Zhang, DongWang, Huchuan Lu, and Xiaoyun Yang. 2020. Alpha- Refine: Boosting Tracking Performance by Precise Bounding Box Estimation. arXiv preprint arXiv:2012.06815 (2020)."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01055"},{"key":"e_1_3_2_2_38_1","volume-title":"Cpt: Colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797","author":"Yao Yuan","year":"2021","unstructured":"Yuan Yao , Ao Zhang , Zhengyan Zhang , Zhiyuan Liu , Tat-Seng Chua , and Maosong Sun . 2021 . Cpt: Colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797 (2021). Yuan Yao, Ao Zhang, Zhengyan Zhang, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2021. Cpt: Colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797 (2021)."},{"key":"e_1_3_2_2_39_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops. 0--0.","author":"Zhang Lichao","unstructured":"Lichao Zhang , Martin Danelljan , Abel Gonzalez-Garcia , Joost van de Weijer, and Fahad Shahbaz Khan. 2019. Multi-modal fusion for end-to-end rgb-t tracking . In Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops. 0--0. Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de Weijer, and Fahad Shahbaz Khan. 2019. Multi-modal fusion for end-to-end rgb-t tracking. In Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops. 0--0."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR48806.2021.9413315"},{"key":"e_1_3_2_2_41_1","volume-title":"Chen Change Loy, and Ziwei Liu","author":"Zhou Kaiyang","year":"2021","unstructured":"Kaiyang Zhou , Jingkang Yang , Chen Change Loy, and Ziwei Liu . 2021 . Learning to prompt for vision-language models. arXiv preprint arXiv:2109.01134 (2021). Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2021. Learning to prompt for vision-language models. arXiv preprint arXiv:2109.01134 (2021)."}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547851","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3547851","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:35Z","timestamp":1750186955000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547851"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":41,"alternative-id":["10.1145\/3503161.3547851","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3547851","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}