{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T18:06:39Z","timestamp":1773511599702,"version":"3.50.1"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:p>\n            Many edge computing applications based on computer vision have harnessed the power of deep learning. As an emerging deep learning model for vision, Vision Transformer models have recently achieved record-breaking performance in various vision tasks. But many recent studies on the robustness of the Vision Transformer have shown that the Vision Transformer is still vulnerable to adversarial attacks and is easily affected by adversarial attacks, causing the model to misclassify the input. In this work, we ask an intriguing question: \u201cCan Adversarial Perturbations against Vision Transformers be detected with model explanations?\u201d Driven by this question, we observe that benign samples and adversarial examples have different attribution maps after applying the Grad-CAM interpretability method on the Vision Transformer model. We demonstrate that an adversarial example is a\n            <jats:italic toggle=\"yes\">Feature Shift<\/jats:italic>\n            of the input data, which leads to an\n            <jats:italic toggle=\"yes\">Attention Deviation<\/jats:italic>\n            of the visual model. We propose a framework for capturing the\n            <jats:italic toggle=\"yes\">Attention Deviation<\/jats:italic>\n            of vision models to defend against adversarial attacks. Furthermore, experiments show that our model achieves expectative results.\n          <\/jats:p>","DOI":"10.1145\/3674981","type":"journal-article","created":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T07:26:45Z","timestamp":1719905205000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge Computing"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7048-9284","authenticated-orcid":false,"given":"Jiaxing","family":"Li","sequence":"first","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6404-8853","authenticated-orcid":false,"given":"Yu\u2019an","family":"Tan","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7574-4512","authenticated-orcid":false,"given":"Jie","family":"Yang","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1534-3658","authenticated-orcid":false,"given":"Zhengdao","family":"Li","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0572-1655","authenticated-orcid":false,"given":"Heng","family":"Ye","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8696-3273","authenticated-orcid":false,"given":"Chenxiao","family":"Xia","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1931-366X","authenticated-orcid":false,"given":"Yuanzhang","family":"Li","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,8,12]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Ahmed Aldahdooh Wassim Hamidouche and Olivier Deforges. 2021. Reveal of vision transformers robustness against adversarial attacks. arXiv:2106.03734. Retrieved from https:\/\/arxiv.org\/abs\/2106.03734"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0130140"},{"key":"e_1_3_1_4_2","article-title":"Are transformers more robust than CNNs?","author":"Bai Yutong","year":"2021","unstructured":"Yutong Bai, Jieru Mei, Alan L Yuille, and Cihang Xie. 2021. Are transformers more robust than CNNs? In Proc. Adv. Neural Inf. Proces. Syst.","journal-title":"Proc. Adv. Neural Inf. Proces. Syst"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","unstructured":"Josh Beal Eric Kim Eric Tzeng Dong Huk Park Andrew Zhai and Dmitry Kislyuk. 2020. Toward transformer-based object detection. arXiv:2012.09958. Retrieved from 10.48550\/arXiv.2012.09958","DOI":"10.48550\/arXiv.2012.09958"},{"key":"e_1_3_1_6_2","first-page":"10231","volume-title":"Proc. IEEE Int. Conf. Comput. Vis.","author":"Bhojanapalli Srinadh","year":"2021","unstructured":"Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner, Daliang Li, Thomas Unterthiner, and Andreas Veit. 2021. Understanding robustness of transformers for image classification. In Proc. IEEE Int. Conf. Comput. Vis. 10231\u201310241."},{"key":"e_1_3_1_7_2","article-title":"Language models are few-shot learners","author":"Brown Tom B","year":"2020","unstructured":"Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Proc. Adv. Neural Inf. Proces. Syst.","journal-title":"Proc. Adv. Neural Inf. Proces. Syst."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","unstructured":"Mathilde Caron Hugo Touvron Ishan Misra Herv\u00e9 J\u00e9gou Julien Mairal Piotr Bojanowski and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. arXiv:2104.14294. Retrieved from 10.48550\/arXiv.2104.14294","DOI":"10.48550\/arXiv.2104.14294"},{"key":"e_1_3_1_10_2","first-page":"1383","volume-title":"Proc. Int. Conf. Mach. Learn","author":"Chalasani Prasad","year":"2020","unstructured":"Prasad Chalasani, Jiefeng Chen, Amrita Roy Chowdhury, Xi Wu, and Somesh Jha. 2020. Concise explanations of neural networks using adversarial training. In Proc. Int. Conf. Mach. Learn. PMLR, 1383\u20131391."},{"key":"e_1_3_1_11_2","first-page":"839","volume-title":"Proc. IEEE Winter Conf. Appl. Comput. Vis.","author":"Chattopadhay Aditya","year":"2018","unstructured":"Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. 2018. Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks. In Proc. IEEE Winter Conf. Appl. Comput. Vis. IEEE, 839\u2013847."},{"key":"e_1_3_1_12_2","first-page":"397","volume-title":"Proc. IEEE Int. Conf. Comput. Vis","author":"Chefer Hila","year":"2021","unstructured":"Hila Chefer, Shir Gur, and Lior Wolf. 2021. Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In Proc. IEEE Int. Conf. Comput. Vis. 397\u2013406."},{"key":"e_1_3_1_13_2","first-page":"782","volume-title":"Proc. IEEE Int. Conf. Computer Vis. Pattern Recognit","author":"Chefer Hila","year":"2021","unstructured":"Hila Chefer, Shir Gur, and Lior Wolf. 2021b. Transformer interpretability beyond attention visualization. In Proc. IEEE Int. Conf. Computer Vis. Pattern Recognit. 782\u2013791."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1285"},{"key":"e_1_3_1_15_2","volume-title":"Proc. Conf. N. Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol.","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. Conf. N. Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol."},{"key":"e_1_3_1_16_2","volume-title":"Proc. Int. Conf. Learn. Representations","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. Int. Conf. Learn. Representations."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3506852"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474595"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524619"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","unstructured":"Ruigang Fu Qingyong Hu Xiaohu Dong Yulan Guo Yinghui Gao and Biao Li. 2020. Axiom-based Grad-CAM: Towards accurate visualization and explanation of CNNs. arXiv:2008.02312. Retrieved from 10.48550\/arXiv.2008.02312","DOI":"10.48550\/arXiv.2008.02312"},{"key":"e_1_3_1_21_2","volume-title":"Proc. Int. Conf. Learn. Representations","author":"Fu Yonggan","year":"2022","unstructured":"Yonggan Fu, Shunyao Zhang, Shang Wu, Cheng Wan, and Yingyan Lin. 2022. Patch-Fool: Are vision transformers always robust against adversarial perturbations? In Proc. Int. Conf. Learn. Representations. Retrieved from https:\/\/openreview.net\/forum?id=28ib9tf6zhr"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.3390\/app9112286"},{"key":"e_1_3_1_23_2","unstructured":"Jacob Gildenblat and contributors. 2021. PyTorch Library for CAM Methods. Retrieved from https:\/\/github.com\/jacobgil\/pytorch-grad-cam."},{"key":"e_1_3_1_24_2","volume-title":"Proc. Int. Conf. Learn. Representations","author":"Goodfellow Ian J","year":"2015","unstructured":"Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proc. Int. Conf. Learn. Representations."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2019.05.084"},{"key":"e_1_3_1_26_2","volume-title":"Proc. Adv. Neural Inf. Process. Syst","author":"Ignatiev Alexey","year":"2019","unstructured":"Alexey Ignatiev, Nina Narodytska, and Joao Marques-Silva. 2019. On relating explanations and adversarial examples. In Proc. Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_27_2","volume-title":"Proc. Adv. Neural Inf. Proces. Syst","author":"Ilyas Andrew","year":"2019","unstructured":"Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. 2019. Adversarial examples are not bugs, they are features. In Proc. Adv. Neural Inf. Proces. Syst."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3089943"},{"key":"e_1_3_1_29_2","first-page":"9215","volume-title":"Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit","author":"Li Kunpeng","year":"2018","unstructured":"Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, and Yun Fu. 2018. Tell me where to look: Guided attention inference network. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 9215\u20139223."},{"issue":"7","key":"e_1_3_1_30_2","first-page":"1","article-title":"Recoverable privacy-preserving image classification through noise-like adversarial examples","volume":"20","author":"Liu Jun","year":"2023","unstructured":"Jun Liu, Jiantao Zhou, Jinyu Tian, and Weiwei Sun. 2023. Recoverable privacy-preserving image classification through noise-like adversarial examples. ACM Transactions on Multimedia Computing, Communications and Applications 20, 7 (2023), 1\u201327.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220027"},{"issue":"2","key":"e_1_3_1_32_2","first-page":"598","article-title":"Tformer: A transmission-friendly ViT model for IoT devices","volume":"34","author":"Lu Zhichao","year":"2022","unstructured":"Zhichao Lu, Chuntao Ding, Felix Juefei-Xu, Vishnu Naresh Boddeti, Shangguang Wang, and Yun Yang. 2022. Tformer: A transmission-friendly ViT model for IoT devices. IEEE Transactions on Parallel and Distributed Systems 34, 2 (2022), 598\u2013610.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_1_33_2","volume-title":"Proc. Int. Conf. Learn. Representations","author":"Madry Aleksander","year":"2018","unstructured":"Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proc. Int. Conf. Learn. Representations."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","unstructured":"Xiaofeng Mao Gege Qi Yuefeng Chen Xiaodan Li Shaokai Ye Yuan He and Hui Xue. 2021. Rethinking the design principles of robust vision transformer. arXiv:2105.07926. Retrieved from 10.48550\/arXiv.2105.07926","DOI":"10.48550\/arXiv.2105.07926"},{"key":"e_1_3_1_35_2","first-page":"135","volume-title":"Proc. 2017 ACM SIGSAC Conf. Comput. Commun. Secur","author":"Meng Dongyu","year":"2017","unstructured":"Dongyu Meng and Hao Chen. 2017. Magnet: A two-pronged defense against adversarial examples. In Proc. 2017 ACM SIGSAC Conf. Comput. Commun. Secur. 135\u2013147."},{"key":"e_1_3_1_36_2","first-page":"2574","volume-title":"Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit","author":"Moosavi-Dezfooli Seyed-Mohsen","year":"2016","unstructured":"Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: A simple and accurate method to fool deep neural networks. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 2574\u20132582."},{"key":"e_1_3_1_37_2","first-page":"1","volume-title":"Proc. Int. Jt. Conf. Neural Netw.","author":"Muhammad Mohammed Bany","year":"2020","unstructured":"Mohammed Bany Muhammad and Mohammed Yeasin. 2020. Eigen-cam: Class activation map using principal components. In Proc. Int. Jt. Conf. Neural Netw. IEEE, 1\u20137."},{"key":"e_1_3_1_38_2","first-page":"4574","volume-title":"Proc. Int. Conf. Artif. Intell. Stat.","author":"Pawelczyk Martin","year":"2022","unstructured":"Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, and Himabindu Lakkaraju. 2022. Exploring counterfactual explanations through the lens of adversarial examples: A theoretical and empirical analysis. In Proc. Int. Conf. Artif. Intell. Stat. PMLR, 4574\u20134594."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643831"},{"key":"e_1_3_1_40_2","first-page":"8748","volume-title":"Proc. Int. Conf. Mach. Learn.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In Proc. Int. Conf. Mach. Learn. PMLR, 8748\u20138763."},{"key":"e_1_3_1_41_2","first-page":"983","volume-title":"Proc. IEEE Winter Conf. Appl. Comput. Vis","author":"Desai S.","year":"2020","unstructured":"S. Desai and Harish Guruprasad Ramaswamy. 2020. Ablation-CAM: Visual explanations for deep convolutional network via gradient-free localization. In Proc. IEEE Winter Conf. Appl. Comput. Vis. 983\u2013991."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","unstructured":"Aditya Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen and Ilya Sutskever. 2021. Zero-shot text-to-image generation. (2021). arXiv:2102.12092. Retrieved from 10.48550\/arXiv.2102.12092","DOI":"10.48550\/arXiv.2102.12092"},{"key":"e_1_3_1_43_2","first-page":"4322","volume-title":"Proc. IEEE Int. Conf. Computer Vis. Pattern Recognit","author":"Rony J\u00e9r\u00f4me","year":"2019","unstructured":"J\u00e9r\u00f4me Rony, Luiz G Hafemann, Luiz S Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. 2019. Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. In Proc. IEEE Int. Conf. Computer Vis. Pattern Recognit. 4322\u20134330."},{"key":"e_1_3_1_44_2","volume-title":"Proc. AAAI Conf. Artif. Intell","volume":"32","author":"Ross Andrew","year":"2018","unstructured":"Andrew Ross and Finale Doshi-Velez. 2018. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Proc. AAAI Conf. Artif. Intell., Vol. 32."},{"key":"e_1_3_1_45_2","first-page":"618","volume-title":"Proc. IEEE Int. Conf. Comput. Vis","author":"Selvaraju Ramprasaath R","year":"2017","unstructured":"Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proc. IEEE Int. Conf. Comput. Vis. 618\u2013626."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","unstructured":"Rulin Shao Zhouxing Shi Jinfeng Yi Pin-Yu Chen and Cho-Jui Hsieh. 2021. On the adversarial robustness of vision transformers. arXiv:2103.15670. Retrieved from 10.48550\/arXiv.2103.15670","DOI":"10.48550\/arXiv.2103.15670"},{"key":"e_1_3_1_47_2","first-page":"3145","volume-title":"Proc. Int. Conf. Mach. Learn.","author":"Shrikumar Avanti","year":"2017","unstructured":"Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In Proc. Int. Conf. Mach. Learn. JMLR, 3145\u20133153."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","unstructured":"Avanti Shrikumar Peyton Greenside Anna Shcherbina and Anshul Kundaje. 2016. Not just a black box: Learning important features through propagating activation differences. arXiv:1605.01713. Retrieved from 10.48550\/arXiv.1605.01713","DOI":"10.48550\/arXiv.1605.01713"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","unstructured":"Daniel Smilkov Nikhil Thorat Been Kim Fernanda Vi\u00e9gas and Martin Wattenberg. 2017. Smoothgrad: Removing noise by adding noise. arXiv:1706.03825. Retrieved from 10.48550\/arXiv.1706.03825","DOI":"10.48550\/arXiv.1706.03825"},{"key":"e_1_3_1_50_2","first-page":"4126","volume-title":"Proc. Adv. Neural Inf. Proces. Syst","author":"Srinivas Suraj","year":"2019","unstructured":"Suraj Srinivas and Fran\u00e7ois Fleuret. 2019. Full-gradient representation for neural network visualization. In Proc. Adv. Neural Inf. Proces. Syst. 4126\u20134135."},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","unstructured":"Robin Strudel Ricardo Garcia Ivan Laptev and Cordelia Schmid. 2021. Segmenter: Transformer for semantic segmentation. arXiv:2105.05633. Retrieved from 10.48550\/arXiv.2105.05633","DOI":"10.48550\/arXiv.2105.05633"},{"key":"e_1_3_1_52_2","first-page":"3319","volume-title":"Proc. Int. Conf. Mach. Learn","author":"Sundararajan Mukund","year":"2017","unstructured":"Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proc. Int. Conf. Mach. Learn. PMLR, 3319\u20133328."},{"key":"e_1_3_1_53_2","volume-title":"Proc. Int. Conf. Learn. Representations (Poster)","author":"Szegedy Christian","year":"2014","unstructured":"Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In Proc. Int. Conf. Learn. Representations (Poster)."},{"key":"e_1_3_1_54_2","first-page":"5998","volume-title":"Proc. Adv. Neural Inf. Proces. Syst","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. Adv. Neural Inf. Proces. Syst. 5998\u20136008."},{"key":"e_1_3_1_55_2","first-page":"24","volume-title":"Proc. IEEE Int. Conf. Computer Vis. Pattern Recognit. Workshops","author":"Wang Haofan","year":"2020","unstructured":"Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, and Xia Hu. 2020b. Score-CAM: Score-weighted visual explanations for convolutional neural networks. In Proc. IEEE Int. Conf. Computer Vis. Pattern Recognit. Workshops. 24\u201325."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2020.102634"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","unstructured":"Ross Wightman. 2019. PyTorch Image Models. Retrieved from https:\/\/github.com\/rwightman\/pytorch-image-models. 10.5281\/zenodo.4414861","DOI":"10.5281\/zenodo.4414861"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3572843"},{"key":"e_1_3_1_59_2","volume-title":"Proc. Int. Conf. Learn. Representations (Poster)","author":"Xu Kaidi","year":"2019","unstructured":"Kaidi Xu, Sijia Liu, Pu Zhao, Pin-Yu Chen, Huan Zhang, Quanfu Fan, Deniz Erdogmus, Yanzhi Wang, and Xue Lin. 2019. Structured adversarial attack: Towards general implementation and better interpretability. In Proc. Int. Conf. Learn. Representations (Poster)."},{"key":"e_1_3_1_60_2","first-page":"6639","volume-title":"Proc. AAAI Conf. Artif. Intell.","volume":"34","author":"Yang Puyudi","year":"2020","unstructured":"Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, and Michael Jordan. 2020. ML-LOO: Detecting adversarial examples with feature attribution. In Proc. AAAI Conf. Artif. Intell., Vol. 34. 6639\u20136647."},{"key":"e_1_3_1_61_2","volume-title":"Proc. Adv. Neural Inf. Proces. Syst.","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Proc. Adv. Neural Inf. Proces. Syst."},{"key":"e_1_3_1_62_2","first-page":"6881","volume-title":"Proc. IEEE Int. Conf. Computer Vis. Pattern Recognit","author":"Zheng Sixiao","year":"2021","unstructured":"Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr, and Li Zhang. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proc. IEEE Int. Conf. Computer Vis. Pattern Recognit. 6881\u20136890."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3674981","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T20:36:40Z","timestamp":1755031000000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674981"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,12]]},"references-count":61,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8,31]]}},"alternative-id":["10.1145\/3674981"],"URL":"https:\/\/doi.org\/10.1145\/3674981","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,12]]},"assertion":[{"value":"2024-02-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-15","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}