{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T20:34:34Z","timestamp":1778618074195,"version":"3.51.4"},"reference-count":74,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,1,11]],"date-time":"2021-01-11T00:00:00Z","timestamp":1610323200000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2021,1,11]],"date-time":"2021-01-11T00:00:00Z","timestamp":1610323200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"funder":[{"name":"National Science Foundation Center for Big Learning"},{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["K001892-00-S05"],"award-info":[{"award-number":["K001892-00-S05"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["SN COMPUT. SCI."],"published-print":{"date-parts":[[2021,2]]},"DOI":"10.1007\/s42979-020-00390-x","type":"journal-article","created":{"date-parts":[[2021,1,11]],"date-time":"2021-01-11T19:26:26Z","timestamp":1610393186000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["An Empirical Study on the Relation Between Network Interpretability and Adversarial Robustness"],"prefix":"10.1007","volume":"2","author":[{"given":"Adam","family":"Noack","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Isaac","family":"Ahern","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dejing","family":"Dou","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6230-2376","authenticated-orcid":false,"given":"Boyang","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,1,11]]},"reference":[{"key":"390_CR1","unstructured":"Adebayo J, Gilmer J, Muelly M, Goodfellow IJ, Hardt M, Kim B. Sanity checks for saliency maps. In: Advances in Neural Information Processing Systems (NeurIPS). 2018"},{"key":"390_CR2","unstructured":"Ahern I, Noack A, Guzman-Nateras L, Dou D, Li B, Huan J. NormLime: A new feature importance metric for explaining deep neural networks. 2019. arXiv Preprint arXiv:1909.04200"},{"key":"390_CR3","unstructured":"Anil C, Lucas J, Grosse RB. Sorting out Lipschitz function approximation. In: The International Conference on Machine Learning (ICML). 2018"},{"key":"390_CR4","first-page":"1803","volume":"11","author":"D Baehrens","year":"2010","unstructured":"Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, M\u00fcller KR. How to explain individual classification decisions. J Mach Learn Res. 2010;11:1803\u201331.","journal-title":"J Mach Learn Res"},{"key":"390_CR5","doi-asserted-by":"crossref","unstructured":"Bau D, Zhou B, Khosla A, Oliva A, Torralba A. Network dissection: quantifying interpretability of deep visual representations. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017; 3319\u20133327.","DOI":"10.1109\/CVPR.2017.354"},{"key":"390_CR6","unstructured":"Brendel W, Rauber J, Bethge M. Decision-based adversarial attacks: reliable attacks against black-box machine learning models. 2017. arXiv Preprint arXiv:1712.04248"},{"key":"390_CR7","unstructured":"Bubeck S, Price E, Razenshteyn I. Adversarial examples from computational constraints.\u00a0In: The International Conference on Machine Learning (ICML). 2019."},{"key":"390_CR8","unstructured":"Carlini N, Athalye A, Papernot N, Brendel W, Rauber J, Tsipras D, Goodfellow I, Madry A, Kurakin A. On evaluating adversarial robustness. 2019. arXiv Preprint arXiv:1902.06705"},{"key":"390_CR9","doi-asserted-by":"crossref","unstructured":"Carlini N, Wagner D. Adversarial examples are not easily detected: bypassing ten detection methods. In: The 10th ACM Workshop on Artificial Intelligence and Security. 2017","DOI":"10.1145\/3128572.3140444"},{"key":"390_CR10","doi-asserted-by":"crossref","unstructured":"Carlini N, Wagner DA. Towards evaluating the robustness of neural networks. In: The IEEE Symposium on Security and Privacy, 2017","DOI":"10.1109\/SP.2017.49"},{"key":"390_CR11","unstructured":"Chalasani P, Jha S, Sadagopan A, Wu X. Adversarial learning and explainability in structured datasets. 2018. arXiv Preprint arXiv:1810.06583"},{"key":"390_CR12","unstructured":"Chan A, Tay Y, Ong YS, Fu J. Jacobian adversarially regularized networks for robustness. In: International Conference on Learning Representations (ICLR). 2020"},{"key":"390_CR13","doi-asserted-by":"crossref","unstructured":"Chen PY, Zhang H, Sharma Y, Yi J, Hsieh CJ. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: The 10th ACM Workshop on Artificial Intelligenceand Security. 2017;15\u201326","DOI":"10.1145\/3128572.3140448"},{"key":"390_CR14","unstructured":"Cisse M, Bojanowski P, Grave E, Dauphin Y, Usunier N. Parseval networks: Improving robustness to adversarial examples. In: The International Conference on Machine Learning (ICML). 2017;854\u2013863"},{"key":"390_CR15","unstructured":"Dhurandhar A, Chen PY, Luss R, Tu CC, Ting P, Shanmugam K, Das P. Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Advances in Neural Information Processing Systems (NeurIPS). 2018"},{"key":"390_CR16","unstructured":"Dong, Y., Su, H., Zhu, J., Bao, F.: Towards interpretable deep neural networks by leveraging adversarial examples. In: AAAI-19 Workshop on Network Interpretability for Deep Learning. 2017"},{"key":"390_CR17","doi-asserted-by":"crossref","unstructured":"Drucker, H., LeCun, Y.: Double backpropagation increasing generalization performance. In: The International Joint Conference on Neural Networks; 1992. p. 145\u2013150","DOI":"10.1109\/IJCNN.1991.155328"},{"key":"390_CR18","unstructured":"Etmann C, Lunz S, Maass P, Sch\u00f6nlieb CB. On the connection between adversarial robustness and saliency map interpretability. In: The International Conference on Machine Learning (ICML). 2019"},{"key":"390_CR19","doi-asserted-by":"crossref","unstructured":"Fong R, Vedaldi A. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. 2018. arXiv preprint arXiv:1801.03454","DOI":"10.1109\/CVPR.2018.00910"},{"key":"390_CR20","unstructured":"Ghorbani A, Abid A, Zou JY. Interpretation of neural networks is fragile. In: The AAAI Conference on Artificial Intelligence (AAAI); 2017"},{"key":"390_CR21","unstructured":"Gilmer J, Metz L, Faghri F, Schoenholz SS, Raghu M, Wattenberg M, Goodfellow IJ. Adversarial spheres. In: Workshop of International Conference on Learning Representations (ICLR). 2018"},{"key":"390_CR22","unstructured":"Gong Z, Wang W, Ku WS. Adversarial and clean data are not twins. 2017. arXiv Preprint arXiv:1704.04960"},{"key":"390_CR23","unstructured":"Goodfellow I, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (ICLR). 2015"},{"key":"390_CR24","unstructured":"Grosse K, Manoharan P, Papernot N, Backes M, McDaniel P. On the (statistical) detection of adversarial examples. 2017. arXiv Preprint arXiv:1702.06280"},{"key":"390_CR25","unstructured":"Hein M, Andriushchenko M. Formal guarantees on the robustness of a classifier against adversarial manipulation. In: Advances in Neural Information Processing Systems (NeurIPS). 2017"},{"key":"390_CR26","unstructured":"Hoffman J, Roberts DA, Yaida S. Robust learning with jacobian regularization. 2019. arXiv Preprint arXiv:1908.02729"},{"key":"390_CR27","unstructured":"Ilyas A, Engstrom L, Athalye A, Lin J. Black-box adversarial attackswith limited queries and information. In: The International Conference on Machine Learning (ICML). 2018"},{"key":"390_CR28","unstructured":"Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A. Adversarial examples are not bugs, they are features. In: Advances in Neural Information Processing Systems (NeurIPS). 2019"},{"key":"390_CR29","unstructured":"Jain P, Rao N, Dhillon IS. Structured sparse regression via greedy hard thresholding. In: Advances in Neural Information Processing Systems (NeurIPS). 2016"},{"key":"390_CR30","doi-asserted-by":"crossref","unstructured":"Jakubovitz D, Giryes R. Improving DNN robustness to adversarial attacks using jacobian regularization. In: The European Conference on Computer Vision (ECCV). 2018","DOI":"10.1007\/978-3-030-01258-8_32"},{"key":"390_CR31","doi-asserted-by":"crossref","unstructured":"Kindermans PJ, Hooker S, Adebayo J, Alber M, Sch\u00fctt KT, D\u00e4hne S, Erhan D, Kim B. The (Un)reliability of saliency methods. In: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, vol. 11700. Springer; 2019","DOI":"10.1007\/978-3-030-28954-6_14"},{"key":"390_CR32","unstructured":"Kindermans PJ, Sch\u00fctt KT, Alber M, M\u00fcller KR, Erhan D, Kim B, D\u00e4hne S. Learning how to explain neural networks: PatternNet and PatternAttribution. 2017"},{"key":"390_CR33","unstructured":"Koh PW, Liang P. Understanding black-box predictions via influence functions. In: International Conference on Machine Learning (ICML); 2017."},{"key":"390_CR34","doi-asserted-by":"crossref","unstructured":"Lamb A, Verma V, Kannala J, Bengio Y. Interpolated adversarial training: achieving robust neural networks without sacrificing too much accuracy. 2019. arXiv Preprint arXiv:1906.06784","DOI":"10.1145\/3338501.3357369"},{"key":"390_CR35","unstructured":"Lampinen AK, Ganguli S. An analytic theory of generalization dynamics and transfer learning in deep linear networks. In: International Conference on Learning Representations (ICLR), 2019"},{"key":"390_CR36","unstructured":"Lanfredi RB, Schroeder JD, Tasdizen T. Quantifying the preferential direction of the model gradient in adversarial training with projected gradient descent. 2020. arXiv Preprint arXiv:1909.04200"},{"key":"390_CR37","unstructured":"Li B, Chen C, Wang W, Carin L. Second-order adversarial attack and certifiable robustness. In: Advances in Neural Information Processing Systems (NeurIPS); 2019"},{"key":"390_CR38","unstructured":"Liu Y, Chen X, Liu C, Song D. Delving into transferable adversarial examples and black-box attacks. In: The International Conference on Learning Representation (ICLR). 2017"},{"key":"390_CR39","doi-asserted-by":"crossref","unstructured":"L\u00e9cuyer M, Atlidakis V, Geambasu R, Hsu D, Jana S. Certified robustness to adversarial examples with differential privacy. In: 2019 IEEE Symposium on Security and Privacy (SP). 2019;656\u2013672","DOI":"10.1109\/SP.2019.00044"},{"key":"390_CR40","unstructured":"Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations (ICLR). 2017"},{"key":"390_CR41","unstructured":"Metzen JH, Genewein T, Fischer V, Bischoff B. On detecting adversarial perturbations. 2017. arXiv Preprint arXiv:1909.04200"},{"key":"390_CR42","doi-asserted-by":"crossref","unstructured":"Montavon G, Samek W, M\u00fcller K. Methods for interpreting and understanding deep neural networks. 2017. arXiv Preprint arXiv:1909.04200","DOI":"10.1016\/j.dsp.2017.10.011"},{"key":"390_CR43","doi-asserted-by":"crossref","unstructured":"Moosavi-Dezfooli S, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016","DOI":"10.1109\/CVPR.2016.282"},{"key":"390_CR44","doi-asserted-by":"crossref","unstructured":"Moosavi-Dezfooli SM, Fawzi A, Fawzi O, Frossard P. Universal adversarial perturbations. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017","DOI":"10.1109\/CVPR.2017.17"},{"key":"390_CR45","unstructured":"Nakkiran P. Adversarial robustness may be at odds with simplicity. 2019. arXiv Preprint arXiv:1909.04200"},{"key":"390_CR46","unstructured":"Oberman AM, Calder J. Lipschitz regularized deep neural networks converge and generalize.2018. arXiv Preprint arXiv:1909.04200"},{"key":"390_CR47","doi-asserted-by":"crossref","unstructured":"Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A. Practical black-box attacks against machine learning. In: The 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506\u2013519. ACM, New York, NY, USA. 2017","DOI":"10.1145\/3052973.3053009"},{"key":"390_CR48","doi-asserted-by":"crossref","unstructured":"Papernot N, McDaniel PD, Jha S, Fredrikson M, Celik ZB, Swami A. The limitations of deep learning in adversarial settings. IEEE European Symposium on Security and Privacy. 2016","DOI":"10.1109\/EuroSP.2016.36"},{"key":"390_CR49","doi-asserted-by":"crossref","unstructured":"Papernot N, McDaniel PD, Wu X, Jha S, Swami A. Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE Symposium on Security and Privacy. 2016","DOI":"10.1109\/SP.2016.41"},{"key":"390_CR50","unstructured":"Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. PyTorch TorchVision. 2017. https:\/\/github.com\/pytorch\/vision. Accessed 9 Jul 2020"},{"key":"390_CR51","doi-asserted-by":"crossref","unstructured":"Ross AS, Doshi-Velez F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In: AAAI. 2018","DOI":"10.1609\/aaai.v32i1.11504"},{"key":"390_CR52","unstructured":"Roth K, Kilcher Y, Hofmann T. The odds are odd: A statistical test for detecting adversarial examples. In: The International Conference on Machine Learning (ICML). 2019"},{"issue":"23","key":"390_CR53","doi-asserted-by":"publisher","first-page":"11537","DOI":"10.1073\/pnas.1820226116","volume":"116","author":"AM Saxe","year":"2019","unstructured":"Saxe AM, McClelland JL, Ganguli S. A mathematical theory of semantic development in deep neural networks. Proc Natl Acad Sci. 2019;116(23):11537\u201346.","journal-title":"Proc Natl Acad Sci"},{"key":"390_CR54","unstructured":"Schmidt L, Santurkar S, Tsipras D, Talwar K, Madry A. Adversarially robust generalization requires more data. In: Advances in Neural Information Processing Systems (NeurIPS). 2018"},{"key":"390_CR55","doi-asserted-by":"crossref","unstructured":"Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. ICCV. 2017","DOI":"10.1109\/ICCV.2017.74"},{"key":"390_CR56","unstructured":"Shafahi A, Najibi M, Ghiasi MA, Xu Z, Dickerson J, Studer C, Davis LS, Taylor G, Goldstein T. Adversarial training for free! In: Advances in Neural Information Processing Systems (NeurIPS), 2019; 3358\u20133369"},{"key":"390_CR57","unstructured":"Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. In: The International Conference on Machine Learning (ICML). 2017"},{"key":"390_CR58","unstructured":"Simon-Gabriel CJ, Ollivier Y, Bottou L, Sch\u00f6lkopf B, Lopez-Paz D. First-order adversarial vulnerability of neural networks and input dimension. In: K.\u00a0Chaudhuri, R.\u00a0Salakhutdinov (eds.) The 36th International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research, vol.\u00a097, pp. 5809\u20135817. PMLR, Long Beach, California, USA. 2019"},{"key":"390_CR59","unstructured":"Smilkov D, Thorat N, Kim B, Vi\u00e9gas FB, Wattenberg M. SmoothGrad: removing noise by adding noise. In: The International Conference on Machine Learning (ICML). 2017"},{"key":"390_CR60","doi-asserted-by":"publisher","first-page":"4265","DOI":"10.1109\/TSP.2017.2708039","volume":"65","author":"J Sokolic","year":"2016","unstructured":"Sokolic J, Giryes R, Sapiro G, Rodrigues MRD. Robust large margin deep neural networks. IEEE Trans Signal Process. 2016;65:4265\u201380.","journal-title":"IEEE Trans Signal Process"},{"key":"390_CR61","unstructured":"Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: The all convolutional net. In: ICLR Workshop. 2014"},{"key":"390_CR62","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-12586-7","volume-title":"Market structure and equilibrium: 1st edition translation into English","author":"H von Stackelberg","year":"2011","unstructured":"von Stackelberg H. Market structure and equilibrium: 1st edition translation into English. Berlin: Springer; 2011."},{"key":"390_CR63","unstructured":"Stutz D, Hein M, Schiele B. Confidence-calibrated adversarial training: Generalizing to unseen attacks. 2019. arXiv Preprint arXiv:1909.04200"},{"key":"390_CR64","unstructured":"Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. In: The International Conference on Machine Learning (ICML). 2017"},{"key":"390_CR65","unstructured":"Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R. Intriguing properties of neural networks. In: The International Conference on Learning Representation (ICLR). 2014"},{"key":"390_CR66","unstructured":"TensorFlow: TensorFlow models repository. 2017. arXiv:1909.04200. Accessed 9 Jul 2020."},{"key":"390_CR67","unstructured":"Tram\u00e8r F, Kurakin A, Papernot N, Goodfellow I, Boneh D, McDaniel P. Ensemble adversarial training: Attacks and defenses. 2017. arXiv Preprint arXiv:1909.04200"},{"key":"390_CR68","unstructured":"Tsipras D, Santurkar S, Engstrom L, Turner A, Madry A. Robustness may be at odds with accuracy. In: The International Conference on Learning Representation (ICLR). 2019"},{"key":"390_CR69","unstructured":"Uesato J, Alayrac JB, Huang PS, Stanforth R, Fawzi A, Kohli P. Are labels required for improving adversarial robustness? In: Advances in Neural Information Processing Systems (NeurIPS). 2019"},{"key":"390_CR70","unstructured":"Uesato J, O\u2019Donoghue B, van\u00a0den Oord A, Kohli P. Adversarial risk and the dangers of evaluating against weak attacks. 2018. arXiv preprint arXiv:1909.04200"},{"key":"390_CR71","doi-asserted-by":"crossref","unstructured":"Xie C, Zhang Z, Zhou Y, Bai S, Wang J, Ren Z, Yuille A. Improving transferability of adversarial examples with input diversity. 2018. arXiv Preprint arXiv:1712.04248","DOI":"10.1109\/CVPR.2019.00284"},{"key":"390_CR72","doi-asserted-by":"crossref","unstructured":"Zagoruyko S, Komodakis N. Wide residual networks. In: The British Machine Vision Conference (BMVC). 2016","DOI":"10.5244\/C.30.87"},{"key":"390_CR73","doi-asserted-by":"crossref","unstructured":"Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In: The European Conference on Computer Vision (ECCV). 2014; 818\u2013833","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"390_CR74","unstructured":"Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Object detectors emerge in deep scene cnns. In: International Conference on Learning Representations (ICLR). 2015"}],"container-title":["SN Computer Science"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-020-00390-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s42979-020-00390-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-020-00390-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T05:43:24Z","timestamp":1724305404000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s42979-020-00390-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,11]]},"references-count":74,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,2]]}},"alternative-id":["390"],"URL":"https:\/\/doi.org\/10.1007\/s42979-020-00390-x","relation":{},"ISSN":["2662-995X","2661-8907"],"issn-type":[{"value":"2662-995X","type":"print"},{"value":"2661-8907","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,11]]},"assertion":[{"value":"15 July 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 November 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 January 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with Ethical Standards"}},{"value":"The work done by Adam Noack was funded by the NSF Center for Big Learning (CBL) and a grant from the Air Force Research Laboratory and Defense Advanced Research Projects Agency, under agreement number FA8750-16-C-0166, subcontract K001892-00-S05. Isaac Ahern\u2019s work was funded by the NSF CBL. Dejing Dou was originally funded by the NSF CBL and now works at Baidu. Boyang Li originally worked with Baidu, but now works at Nanyang Technological University.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This article does not contain any studies with human participants or animals performed by any of the authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"All of the codes used for our experiments can be found at this url:.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability"}}],"article-number":"32"}}