{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T13:27:43Z","timestamp":1773840463300,"version":"3.50.1"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,10,31]]},"abstract":"<jats:p>\n            <jats:bold>Compositional Zero-shot Learning (CZSL)<\/jats:bold>\n            aims to identify novel compositions via known attribute\u2013object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance toward novel compositions. Previous remarkable works primarily addressed this issue by focusing on disentangling strategy or utilizing object-based conditional probabilities to constrain the selection space of attributes. Unfortunately, few studies have explored the problem from the perspective of modeling the mechanism of visual primitive interactions. Inspired by the success of vanilla adversarial learning in Cross-Domain Few-shot Learning, we take a step further and devise a model-agnostic and\n            <jats:bold>Primitive-based Adversarial Training (PBadv)<\/jats:bold>\n            method to deal with this problem. Besides, the latest studies highlight the weakness of the perception of hard compositions even under data-balanced conditions. To this end, we propose a novel over-sampling strategy with object-similarity guidance to augment target compositional training data. We performed detailed quantitative analysis and retrieval experiments on well-established datasets, such as UT-Zappos50K, MIT-States, and C-GQA, to validate the effectiveness of our proposed method, and the\n            <jats:bold>State-of-the-Art (SOTA)<\/jats:bold>\n            performance demonstrates the superiority of our approach. The code is available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/lisuyi\/PBadv_czsl\">https:\/\/github.com\/lisuyi\/PBadv_czsl<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3712596","type":"journal-article","created":{"date-parts":[[2025,1,17]],"date-time":"2025-01-17T11:33:41Z","timestamp":1737113621000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Contextual Interaction via Primitive-based Adversarial Training for Compositional Zero-shot Learning"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-4671-7891","authenticated-orcid":false,"given":"Suyi","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9031-4163","authenticated-orcid":false,"given":"Chenyi","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1023-1286","authenticated-orcid":false,"given":"Shidong","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom of Great Britain and Northern Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2445-6112","authenticated-orcid":false,"given":"Yang","family":"Long","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Durham University, Durham, United Kingdom of Great Britain and Northern Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1470-6998","authenticated-orcid":false,"given":"Zheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology Shenzhen, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4039-7618","authenticated-orcid":false,"given":"Haofeng","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,14]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"Layer normalization","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. In NeurIPS.","journal-title":"NeurIPS"},{"key":"e_1_3_1_3_2","first-page":"7734","volume-title":"ACM MM","author":"Bin Yi","year":"2024","unstructured":"Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, and Heng Tao Shen. 2024. GalleryGPT: Analyzing paintings with large multimodal models. In ACM MM, 7734\u20137743."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-019-8208-z"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00957"},{"key":"e_1_3_1_8_2","volume-title":"ICLR","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR."},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3506852"},{"key":"e_1_3_1_10_2","first-page":"24575","volume-title":"CVPR","author":"Fu Yuqian","year":"2023","unstructured":"Yuqian Fu, Yu Xie, Yanwei Fu, and Yu-Gang Jiang. 2023. Styleadv: Meta style adversarial training for cross-domain few-shot learning. In CVPR, 24575\u201324584."},{"key":"e_1_3_1_11_2","first-page":"1","volume-title":"ICLR","author":"Goodfellow Ian J.","year":"2015","unstructured":"Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In ICLR, 1\u201314."},{"key":"e_1_3_1_12_2","first-page":"15315","volume-title":"CVPR","author":"Hao Shaozhe","year":"2023","unstructured":"Shaozhe Hao, Kai Han, and Kwan-Yee K. Wong. 2023. Learning attention as disentangler for compositional zero-shot learning. In CVPR, 15315\u201315324."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20044-1_2"},{"key":"e_1_3_1_15_2","first-page":"24005","volume-title":"CVPR","author":"Huang Siteng","year":"2024","unstructured":"Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, and Donglin Wang. 2024. Troika: Multi-path cross-modal traction for compositional zero-shot learning. In CVPR, 24005\u201324014."},{"key":"e_1_3_1_16_2","first-page":"1383","volume-title":"CVPR","author":"Isola Phillip","year":"2015","unstructured":"Phillip Isola, Joseph J. Lim, and Edward H. Adelson. 2015. Discovering states and transformations in image collections. In CVPR, 1383\u20131391."},{"key":"e_1_3_1_17_2","first-page":"2498","volume-title":"AAAI","author":"Jiang Chenyi","year":"2024","unstructured":"Chenyi Jiang and Haofeng Zhang. 2024. Revealing the proximate long-tail distribution in compositional zero-shot learning. In AAAI, 2498\u20132506."},{"key":"e_1_3_1_18_2","volume-title":"NeurIPS Workshop","author":"Karthik Shyamgopal","year":"2021","unstructured":"Shyamgopal Karthik, Massimiliano Mancini, and Zeynep Akata. 2021. Revisiting visual product for compositional zero-shot learning. In NeurIPS Workshop."},{"key":"e_1_3_1_19_2","first-page":"9336","volume-title":"CVPR","author":"Karthik Shyamgopal","year":"2022","unstructured":"Shyamgopal Karthik, Massimiliano Mancini, and Zeynep Akata. 2022. KG-SP: Knowledge guided simple primitives for open world compositional zero-shot learning. In CVPR, 9336\u20139345."},{"key":"e_1_3_1_20_2","first-page":"5675","volume-title":"ICCV","author":"Kim Hanjae","year":"2023","unstructured":"Hanjae Kim, Jiyoung Lee, Seongheon Park, and Kwanghoon Sohn. 2023. Hierarchical visual primitive experts for compositional zero-shot learning. In ICCV, 5675\u20135685."},{"key":"e_1_3_1_21_2","first-page":"1","volume-title":"ICLR","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR, 1\u201314."},{"key":"e_1_3_1_22_2","volume-title":"ICLR","author":"Kurakin Alexey","year":"2017","unstructured":"Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In ICLR."},{"key":"e_1_3_1_23_2","volume-title":"Towards More Human-Like Concept Learning in Machines: Compositionality, Causality and Learning-to-Learn","author":"Lake Brenden M.","year":"2014","unstructured":"Brenden M. Lake. 2014. Towards More Human-Like Concept Learning in Machines: Compositionality, Causality and Learning-to-Learn. Ph.D. Dissertation. Massachusetts Institute of Technology."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2024.125230"},{"key":"e_1_3_1_25_2","first-page":"11316","volume-title":"CVPR","author":"Li Yong-Lu","year":"2020","unstructured":"Yong-Lu Li, Yue Xu, Xiaohan Mao, and Cewu Lu. 2020. Symmetry and group in attribute-object compositions. In CVPR, 11316\u201311325."},{"key":"e_1_3_1_26_2","volume-title":"ICLR","author":"Liu Yanpei","year":"2017","unstructured":"Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2017. Delving into transferable adversarial examples and black-box attacks. In ICLR."},{"key":"e_1_3_1_27_2","first-page":"23560","volume-title":"CVPR","author":"Lu Xiaocheng","year":"2023","unstructured":"Xiaocheng Lu, Song Guo, Ziming Liu, and Jingcai Guo. 2023. Decomposed soft prompt guided fusion enhancing for compositional zero-shot learning. In CVPR, 23560\u201323569."},{"key":"e_1_3_1_28_2","volume-title":"ICLR","author":"Madry Aleksander","year":"2018","unstructured":"Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In ICLR."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00518"},{"issue":"3","key":"e_1_3_1_30_2","doi-asserted-by":"crossref","first-page":"1545","DOI":"10.1109\/TPAMI.2022.3163667","article-title":"Learning graph embeddings for open world compositional zero-shot learning","volume":"46","author":"Mancini Massimiliano","year":"2022","unstructured":"Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, and Zeynep Akata. 2022. Learning graph embeddings for open world compositional zero-shot learning. IEEE Trans. Pattern Anal. Mach. Intell. 46, 3 (2022), 1545\u20131560.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"e_1_3_1_31_2","first-page":"1792","volume-title":"CVPR","author":"Misra Ishan","year":"2017","unstructured":"Ishan Misra, Abhinav Gupta, and Martial Hebert. 2017. From red wine to red tomato: Composition with context. In CVPR, 1792\u20131801."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.17"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00101"},{"key":"e_1_3_1_34_2","first-page":"169","volume-title":"ECCV","author":"Nagarajan Tushar","year":"2018","unstructured":"Tushar Nagarajan and Kristen Grauman. 2018. Attributes as operators: Factorizing unseen attribute-object compositions. In ECCV, 169\u2013185."},{"key":"e_1_3_1_35_2","first-page":"807","volume-title":"ICML","author":"Nair Vinod","year":"2010","unstructured":"Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In ICML, 807\u2013814."},{"key":"e_1_3_1_36_2","volume-title":"ICLR","author":"Nayak Nihal V.","year":"2023","unstructured":"Nihal V. Nayak, Peilin Yu, and Stephen H. Bach. 2023. Learning to compose soft prompts for compositional zero-shot learning. In ICLR."},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","first-page":"109916","DOI":"10.1016\/j.patcog.2023.109916","article-title":"Compositional zero-shot learning using multi-branch graph convolution and cross-layer knowledge sharing","volume":"145","author":"Panda Aditya","year":"2024","unstructured":"Aditya Panda and Dipti Prasad Mukherjee. 2024. Compositional zero-shot learning using multi-branch graph convolution and cross-layer knowledge sharing. Pattern Recognit. 145 (2024), 109916.","journal-title":"Pattern Recognit"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_1_39_2","first-page":"8748","volume-title":"ICML","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In ICML, 8748\u20138763."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1249"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01329"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_3_1_43_2","unstructured":"Christian Szegedy Wojciech Zaremba Ilya Sutskever Joan Bruna Dumitru Erhan Ian Goodfellow and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv:1312.6199. Retrieved from https:\/\/arxiv.org\/abs\/1312.6199"},{"key":"e_1_3_1_44_2","first-page":"24377","volume-title":"CVPR","author":"Tang Bowen","year":"2024","unstructured":"Bowen Tang, Zheng Wang, Yi Bin, Qi Dou, Yang Yang, and Heng Tao Shen. 2024. Ensemble diversity facilitates adversarial transferability. In CVPR, 24377\u201324386."},{"key":"e_1_3_1_45_2","volume-title":"ICLR","author":"Tram\u00e8r Florian","year":"2018","unstructured":"Florian Tram\u00e8r, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2018. Ensemble adversarial training: Attacks and defenses. In ICLR."},{"key":"e_1_3_1_46_2","first-page":"1075","volume-title":"IJCAI","author":"Wang Haoqing","year":"2021","unstructured":"Haoqing Wang and Zhi-Hong Deng. 2021. Cross-domain few-shot classification via adversarial task augmentation. In IJCAI, 1075\u20131081."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3665496"},{"key":"e_1_3_1_48_2","first-page":"11197","volume-title":"CVPR","author":"Wang Qingsheng","year":"2023","unstructured":"Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, and Chunhua Shen. 2023. Learning conditional attributes for compositional zero-shot learning. In CVPR, 11197\u201311206."},{"key":"e_1_3_1_49_2","volume-title":"ICLR","author":"Wang Ren","year":"2021","unstructured":"Ren Wang, Kaidi Xu, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Chuang Gan, and Meng Wang. 2021. On fast adversarial robustness adaptation in model-agnostic meta-learning. In ICLR."},{"key":"e_1_3_1_50_2","first-page":"5774","volume-title":"WACV","author":"Xu Guangyue","year":"2024","unstructured":"Guangyue Xu, Joyce Chai, and Parisa Kordjamshidi. 2024. GIPCOL: Graph-injected soft prompting for compositional zero-shot learning. In WACV, 5774\u20135783."},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3587097"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01026"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3200578"},{"key":"e_1_3_1_54_2","first-page":"192","volume-title":"CVPR","author":"Yu Aron","year":"2014","unstructured":"Aron Yu and Kristen Grauman. 2014. Fine-grained visual comparisons with local learning. In CVPR, 192\u2013199."},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3488719"},{"key":"e_1_3_1_56_2","first-page":"339","volume-title":"ECCV","author":"Zhang Tian","year":"2022","unstructured":"Tian Zhang, Kongming Liang, Ruoyi Du, Xian Sun, Zhanyu Ma, and Jun Guo. 2022. Learning invariant visual representations for compositional zero-shot learning. In ECCV, 339\u2013355."},{"key":"e_1_3_1_57_2","first-page":"10823","volume-title":"CVPR","author":"Zhang Xiao","year":"2019","unstructured":"Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, and Hongsheng Li. 2019. Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In CVPR, 10823\u201310832."},{"key":"e_1_3_1_58_2","first-page":"1721","volume-title":"WACV","author":"Zheng Zhaoheng","year":"2024","unstructured":"Zhaoheng Zheng, Haidong Zhu, and Ram Nevatia. 2024. CAILA: Concept-aware intra-layer adapters for compositional zero-shot learning. In WACV, 1721\u20131731."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3712596","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T21:25:16Z","timestamp":1760477116000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3712596"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,14]]},"references-count":57,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10,31]]}},"alternative-id":["10.1145\/3712596"],"URL":"https:\/\/doi.org\/10.1145\/3712596","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,14]]},"assertion":[{"value":"2024-06-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}