{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:16:33Z","timestamp":1777655793403,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T00:00:00Z","timestamp":1730073600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","award":["CE200100025,DP230101196"],"award-info":[{"award-number":["CE200100025,DP230101196"]}],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/501100000980","name":"Grains Research and Development Corporation","doi-asserted-by":"publisher","award":["UOQ2301-010OPX"],"award-info":[{"award-number":["UOQ2301-010OPX"]}],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/501100000980","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,10,28]]},"DOI":"10.1145\/3664647.3680599","type":"proceedings-article","created":{"date-parts":[[2024,10,26]],"date-time":"2024-10-26T06:59:41Z","timestamp":1729925981000},"page":"1593-1601","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Benchmarking In-the-Wild Multimodal Disease Recognition and A Versatile Baseline"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-0134-6438","authenticated-orcid":false,"given":"Tianqi","family":"Wei","sequence":"first","affiliation":[{"name":"The University of Queensland, Brisbane, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9385-144X","authenticated-orcid":false,"given":"Zhi","family":"Chen","sequence":"additional","affiliation":[{"name":"The University of Queensland, Brisbane, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9738-4949","authenticated-orcid":false,"given":"Zi","family":"Huang","sequence":"additional","affiliation":[{"name":"The University of Queensland, Brisbane, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0269-5649","authenticated-orcid":false,"given":"Xin","family":"Yu","sequence":"additional","affiliation":[{"name":"The University of Queensland, Brisbane, Australia"}]}],"member":"320","published-online":{"date-parts":[[2024,10,28]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"George N Agrios. 2005. Plant pathology."},{"key":"e_1_3_2_1_2_1","unstructured":"Jean-Baptiste Alayrac Je\" Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katherine Millican Malcolm Reynolds et al. 2022. Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems (2022) 23716--23736."},{"key":"e_1_3_2_1_3_1","volume-title":"A deep learning based approach for automated plant disease classification using vision transformer. Scienti! c Reports","author":"Borhani Yasamin","year":"2022","unstructured":"Yasamin Borhani, Javad Khoramdel, and Esmaeil Naja!. 2022. A deep learning based approach for automated plant disease classification using vision transformer. Scienti! c Reports (2022), 11554."},{"key":"e_1_3_2_1_4_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01101"},{"key":"e_1_3_2_1_6_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)."},{"key":"e_1_3_2_1_7_1","volume-title":"Cloob: Modern hop! eld networks with infoloob outperform clip. Advances in neural information processing systems","author":"F\u00fcrst Andreas","year":"2022","unstructured":"Andreas F\u00fcrst, Elisabeth Rumetshofer, Johannes Lehner, Viet T Tran, Fei Tang, Hubert Ramsauer, David Kreil, Michael Kopp, G\u00fcnter Klambauer, Angela Bitto, et al. 2022. Cloob: Modern hop! eld networks with infoloob outperform clip. Advances in neural information processing systems (2022), 20450--20468."},{"key":"e_1_3_2_1_8_1","volume-title":"Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision","author":"Gao Peng","year":"2024","unstructured":"Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. 2024. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision (2024), 581--595."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i1.25152"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_11_1","unstructured":"David Hughes Marcel Salath\u00e9 et al. 2015. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv preprint arXiv:1511.08060 (2015)."},{"key":"e_1_3_2_1_12_1","volume-title":"International conference on machine learning. 4904--4916","author":"Jia Chao","year":"2021","unstructured":"Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning. 4904--4916."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.197"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-022-14004-6"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01832"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611858"},{"key":"e_1_3_2_1_17_1","volume-title":"Supervision exists everywhere: A data e#cient contrastive language-image pre-training paradigm. arXiv preprint arXiv:2110.05208","author":"Li Yangguang","year":"2021","unstructured":"Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, and Junjie Yan. 2021. Supervision exists everywhere: A data e#cient contrastive language-image pre-training paradigm. arXiv preprint arXiv:2110.05208 (2021)."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 281--297","author":"James","unstructured":"James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 281--297."},{"key":"e_1_3_2_1_19_1","volume-title":"Sanjay Misra.","author":"Oyewola David Opeoluwa","year":"2021","unstructured":"David Opeoluwa Oyewola, Emmanuel Gbenga Dada, Sanjay Misra. 2021. Detecting cassava mosaic disease using a deep residual convolutional neural network with distinct block processing. PeerJ Computer Science (2021), e352."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.4249\/scholarpedia.1883"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01438"},{"key":"e_1_3_2_1_22_1","volume-title":"International conference on machine learning. 8748--8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. 8748--8763."},{"key":"e_1_3_2_1_23_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever et al. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_2_1_24_1","unstructured":"Alec Radford Je\"reyWu Rewon Child David Luan Dario Amodei Ilya Sutskever et al. 2019. Language models are unsupervised multitask learners. OpenAI blog (2019) 9."},{"key":"e_1_3_2_1_25_1","unstructured":"Jack W Rae Sebastian Borgeaud Trevor Cai Katie Millican Jordan Ho\"mann Francis Song John Aslanides Sarah Henderson Roman Ring Susannah Young et al. 2021. Scaling language models: Methods analysis & insights from training gopher. arXiv preprint arXiv:2112.11446 (2021)."},{"key":"e_1_3_2_1_26_1","volume-title":"Deep Bi-linear Convolution Neural Network for Plant Disease Identification and Classification. In International Conference on Advanced Informatics for Computing Research. 293--305","author":"Babu Ch Ramesh","year":"2020","unstructured":"Ch Ramesh Babu, Srinivasa Rao Dammavalam, V Sravan Kiran, N Rajasekhar, B Lalith Bharadwaj, Rohit Boddeda, and K Sai Vardhan. 2020. Deep Bi-linear Convolution Neural Network for Plant Disease Identification and Classification. In International Conference on Advanced Informatics for Computing Research. 293--305."},{"key":"e_1_3_2_1_27_1","volume-title":"Zahid Iqbal","author":"Sharif Muhammad","year":"2018","unstructured":"Muhammad Sharif, Muhammad Attique Khan, Zahid Iqbal, Muhammad Faisal Azam, M Ikram Ullah Lali, and Muhammad Younus Javed. 2018. Detection and classification of citrus diseases in agriculture based on optimized weighted segmentation and feature selection. Computers and electronics in agriculture (2018), 220--234."},{"key":"e_1_3_2_1_28_1","volume-title":"Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems","author":"Shu Manli","year":"2022","unstructured":"Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, and Chaowei Xiao. 2022. Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems (2022), 14274--14289."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3371158.3371196"},{"key":"e_1_3_2_1_30_1","volume-title":"The Plant Pathology Challenge 2020 data set to classify foliar disease of apples. Applications in plant sciences","author":"Thapa Ranjita","year":"2020","unstructured":"Ranjita Thapa, Kai Zhang, Noah Snavely, Serge Belongie, and Awais Khan. 2020. The Plant Pathology Challenge 2020 data set to classify foliar disease of apples. Applications in plant sciences (2020), e11390."},{"key":"e_1_3_2_1_31_1","volume-title":"Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al.","author":"Thoppilan Romal","year":"2022","unstructured":"Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al. 2022. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022)."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00257"},{"key":"e_1_3_2_1_33_1","volume-title":"Visualizing data using t-SNE. Journal of machine learning research","author":"der Maaten Laurens Van","year":"2008","unstructured":"Laurens Van der Maaten and Geo\"rey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research (2008)."},{"key":"e_1_3_2_1_34_1","volume-title":"T-CNN: Trilinear convolutional neural networks model for visual detection of plant diseases. Computers and Electronics in Agriculture","author":"Wang Dongfang","year":"2021","unstructured":"Dongfang Wang, Jun Wang, Wenrui Li, and Ping Guan. 2021. T-CNN: Trilinear convolutional neural networks model for visual detection of plant diseases. Computers and Electronics in Agriculture (2021), 106468."},{"key":"e_1_3_2_1_35_1","volume-title":"DHBP: A dual-stream hierarchical bilinear pooling model for plant disease multi-task classification. Computers and Electronics in Agriculture","author":"Wang Dongfang","year":"2022","unstructured":"Dongfang Wang, Jun Wang, Zhuang Ren, and Wenrui Li. 2022. DHBP: A dual-stream hierarchical bilinear pooling model for plant disease multi-task classification. Computers and Electronics in Agriculture (2022), 106788."},{"key":"e_1_3_2_1_36_1","volume-title":"Cpt: Colorful prompt tuning for pre-trained vision-language models. AI Open","author":"Yao Yuan","year":"2024","unstructured":"Yuan Yao, Ao Zhang, Zhengyan Zhang, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2024. Cpt: Colorful prompt tuning for pre-trained vision-language models. AI Open (2024), 30--38."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01460"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19833-5_29"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01631"},{"key":"e_1_3_2_1_40_1","volume-title":"Chen Change Loy, and Ziwei Liu","author":"Zhou Kaiyang","year":"2022","unstructured":"Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022. Learning to prompt for vision-language models. International Journal of Computer Vision (2022), 2337--2348."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00249"}],"event":{"name":"MM '24: The 32nd ACM International Conference on Multimedia","location":"Melbourne VIC Australia","acronym":"MM '24","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 32nd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664647.3680599","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3664647.3680599","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:56Z","timestamp":1750295876000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664647.3680599"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,28]]},"references-count":41,"alternative-id":["10.1145\/3664647.3680599","10.1145\/3664647"],"URL":"https:\/\/doi.org\/10.1145\/3664647.3680599","relation":{},"subject":[],"published":{"date-parts":[[2024,10,28]]},"assertion":[{"value":"2024-10-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}