{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T15:51:05Z","timestamp":1778860265252,"version":"3.51.4"},"reference-count":78,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2022,11,29]],"date-time":"2022-11-29T00:00:00Z","timestamp":1669680000000},"content-version":"vor","delay-in-days":332,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The proliferation of Deep Neural Networks in various domains has seen an increased need for interpretability of these models. Preliminary work done along this line, and papers that surveyed such, are focused on high-level representation analysis. However, a recent branch of work has concentrated on interpretability at a more granular level of analyzing neurons within these models. In this paper, we survey the work done on neuron analysis including: i) methods to discover and understand neurons in a network; ii) evaluation methods; iii) major findings including cross architectural comparisons that neuron analysis has unraveled; iv) applications of neuron probing such as: controlling the model, domain adaptation, and so forth; and v) a discussion on open issues and future research directions.<\/jats:p>","DOI":"10.1162\/tacl_a_00519","type":"journal-article","created":{"date-parts":[[2022,11,29]],"date-time":"2022-11-29T18:33:47Z","timestamp":1669746827000},"page":"1285-1303","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":20,"title":["Neuron-level Interpretation of Deep NLP Models: A Survey"],"prefix":"10.1162","volume":"10","author":[{"given":"Hassan","family":"Sajjad","sequence":"first","affiliation":[{"name":"Faculty of Computer Science, Dalhousie University, Canada. hsajjad@dal.ca"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nadir","family":"Durrani","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute, HBKU, Doha, Qatar. ndurrani@hbku.edu.qa"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fahim","family":"Dalvi","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute, HBKU, Doha, Qatar. faimaduddin@hbku.edu.qa"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2022,11,22]]},"reference":[{"key":"2022112918312882300_bib1","article-title":"Fine- grained analysis of sentence embeddings using auxiliary prediction tasks","author":"Adi","year":"2016","journal-title":"arXiv preprint arXiv:1608.04207"},{"key":"2022112918312882300_bib2","article-title":"Interfaces for explaining transformer language models","author":"Alammar","year":"2020"},{"key":"2022112918312882300_bib3","doi-asserted-by":"publisher","first-page":"249","DOI":"10.18653\/v1\/2021.acl-demo.30","article-title":"Ecco: An open source library for the explainability of transformer language models","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations","author":"Alammar","year":"2021"},{"key":"2022112918312882300_bib4","article-title":"Understanding neural networks and individual neuron importance via information-ordered cumulative ablation","author":"Amjad","year":"2018"},{"key":"2022112918312882300_bib5","article-title":"On the pitfalls of analyzing individual neurons in language models","volume-title":"International Conference on Learning Representations","author":"Antverg","year":"2022"},{"key":"2022112918312882300_bib6","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.deeplo-1.3","article-title":"Idani: Inference-time domain adaptation via neuron-level interventions","author":"Antverg","year":"2022"},{"key":"2022112918312882300_bib7","article-title":"Neural machine translation by jointly learning to align and translate","author":"Bahdanau","year":"2014","journal-title":"arXiv preprint arXiv:1409.0473"},{"key":"2022112918312882300_bib8","article-title":"Identifying and controlling important neurons in neural machine translation","volume-title":"International Conference on Learning Representations","author":"Bau","year":"2019"},{"key":"2022112918312882300_bib9","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1080","article-title":"What do neural machine translation models learn about morphology?","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL)","author":"Belinkov","year":"2017"},{"issue":"1","key":"2022112918312882300_bib10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1162\/coli_a_00367","article-title":"On the linguistic representational power of neural machine translation models","volume":"45","author":"Belinkov","year":"2020","journal-title":"Computational Linguistics"},{"issue":"1","key":"2022112918312882300_bib11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1162\/coli_a_00367","article-title":"On the linguistic representational power of neural machine translation models","volume":"46","author":"Belinkov","year":"2020","journal-title":"Computational Linguistics"},{"key":"2022112918312882300_bib12","first-page":"1","article-title":"Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks","volume-title":"Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Belinkov","year":"2017"},{"key":"2022112918312882300_bib13","article-title":"Unsupervised feature learning and deep learning: A review and new perspectives","volume":"abs\/1206.5538","author":"Bengio","year":"2012","journal-title":"CoRR"},{"key":"2022112918312882300_bib14","article-title":"Computing optimal subsets","volume-title":"Proceedings of the Twenty Second AAAI Conference on Artificial Intelligence (AAAI, Oral presentation)","author":"Binshtok","year":"2007"},{"key":"2022112918312882300_bib15","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1198","article-title":"What you can cram into a single vector: Probing sentence embeddings for linguistic properties","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)","author":"Conneau","year":"2018"},{"key":"2022112918312882300_bib16","article-title":"Knowledge neurons in pretrained transformers","volume":"abs\/2104.08696","author":"Dai","year":"2021","journal-title":"CoRR"},{"key":"2022112918312882300_bib17","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016309","article-title":"What is one grain of sand in the desert? Analyzing individual neurons in deep nlp models","volume-title":"Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI, Oral presentation)","author":"Dalvi","year":"2019"},{"key":"2022112918312882300_bib18","article-title":"Discovering latent concepts learned in BERT","volume-title":"International Conference on Learning Representations","author":"Dalvi","year":"2022"},{"key":"2022112918312882300_bib19","doi-asserted-by":"crossref","first-page":"4908","DOI":"10.18653\/v1\/2020.emnlp-main.398","article-title":"Analyzing redundancy in pretrained transformer models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP-2020)","author":"Dalvi","year":"2020"},{"key":"2022112918312882300_bib20","doi-asserted-by":"publisher","first-page":"3243","DOI":"10.18653\/v1\/2020.emnlp-main.262","article-title":"How do decisions emerge across layers in neural models? Interpretation with differentiable masking","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"De Cao","year":"2020"},{"key":"2022112918312882300_bib21","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2022112918312882300_bib22","article-title":"The shapley taylor interaction index","author":"Dhamdhere","year":"2020"},{"key":"2022112918312882300_bib23","article-title":"How important is a neuron?","author":"Dhamdhere","year":"2018","journal-title":"CoRR"},{"key":"2022112918312882300_bib24","article-title":"Linguistic correlation analysis: Discovering salient neurons in deep NLP models","author":"Durrani","year":"2022"},{"key":"2022112918312882300_bib25","doi-asserted-by":"publisher","first-page":"4947","DOI":"10.18653\/v1\/2021.findings-acl.438","article-title":"How transfer learning impacts linguistic knowledge in deep NLP models?","volume-title":"Findings of the Association for Computational Linguistics: ACL 2021","author":"Durrani","year":"2021"},{"key":"2022112918312882300_bib26","doi-asserted-by":"publisher","first-page":"4865","DOI":"10.18653\/v1\/2020.emnlp-main.395","article-title":"Analyzing individual neurons in pre-trained language models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Durrani","year":"2020"},{"key":"2022112918312882300_bib27","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1162\/tacl_a_00359","article-title":"Amnesic probing: Behavioral explanation with amnesic counterfactuals","volume":"9","author":"Elazar","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2022112918312882300_bib28","unstructured":"Dumitru\n              Erhan\n            , YoshuaBengio, AaronCourville, and PascalVincent. 2009. Visualizing higher- layer features of a deep network. Technical Report 1341, University of Montreal. Also presented at the ICML 2009 Workshop on Learning Feature Hierarchies, Montreal, Canada."},{"key":"2022112918312882300_bib29","doi-asserted-by":"publisher","first-page":"1491","DOI":"10.3115\/v1\/P15-1144","article-title":"Sparse overcomplete word vector representations","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Faruqui","year":"2015"},{"issue":"2","key":"2022112918312882300_bib30","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1162\/coli_a_00404","article-title":"CausaLM: Causal model explanation through counterfactual language models","volume":"47","author":"Feder","year":"2021","journal-title":"Computational Linguistics"},{"key":"2022112918312882300_bib31","doi-asserted-by":"publisher","first-page":"3719","DOI":"10.18653\/v1\/D18-1407","article-title":"Pathologies of neural models make interpretations difficult","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Feng","year":"2018"},{"key":"2022112918312882300_bib32","article-title":"The lottery ticket hypothesis: Finding sparse, trainable neural networks","volume-title":"International Conference on Learning Representations","author":"Frankle","year":"2019"},{"key":"2022112918312882300_bib33","doi-asserted-by":"publisher","first-page":"32","DOI":"10.3115\/v1\/N15-1004","article-title":"A compositional and interpretable semantic space","volume-title":"Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Fyshe","year":"2015"},{"key":"2022112918312882300_bib34","doi-asserted-by":"publisher","first-page":"3275","DOI":"10.18653\/v1\/D18-1365","article-title":"Explaining character-aware neural networks for word-level prediction: Do they discover linguistic rules?","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Godin","year":"2018"},{"key":"2022112918312882300_bib35","article-title":"Pruning-then-expanding model for domain adaptation of neural machine translation","author":"Shuhao","year":"2021"},{"key":"2022112918312882300_bib36","doi-asserted-by":"publisher","first-page":"197","DOI":"10.18653\/v1\/2020.emnlp-main.15","article-title":"Intrinsic probing through dimension selection","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Hennigen","year":"2020"},{"key":"2022112918312882300_bib37","doi-asserted-by":"publisher","first-page":"2733","DOI":"10.18653\/v1\/D19-1275","article-title":"Designing and interpreting probes with control tasks","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP)","author":"Hewitt","year":"2019"},{"key":"2022112918312882300_bib38","unstructured":"Dieuwke\n              Hupkes\n            \n          . 2020. Hierarchy and interpretability in neural models of language processing. Ph.D. thesis. University of Amsterdam."},{"key":"2022112918312882300_bib39","doi-asserted-by":"crossref","DOI":"10.24963\/ijcai.2018\/796","article-title":"Visualisation and \u2018diagnostic classifiers\u2019 reveal how recurrent and recursive neural networks process hierarchical structure","author":"Hupkes","year":"2018"},{"key":"2022112918312882300_bib40","first-page":"1359","article-title":"Semantics-based machine translation with hyperedge replacement grammars","volume-title":"Proceedings of COLING 2012","author":"Jones","year":"2012"},{"issue":"4","key":"2022112918312882300_bib41","doi-asserted-by":"publisher","first-page":"761","DOI":"10.1162\/COLI_a_00300","article-title":"Representation of linguistic form and function in recurrent neural networks","volume":"43","author":"K\u00e1d\u00e1r","year":"2017","journal-title":"Computational Linguistics"},{"key":"2022112918312882300_bib42","article-title":"Visualizing and understanding recurrent networks","author":"Karpathy","year":"2015","journal-title":"arXiv preprint arXiv:1506.02078"},{"key":"2022112918312882300_bib43","doi-asserted-by":"publisher","first-page":"11","DOI":"10.18653\/v1\/N19-1002","article-title":"The emergence of number and syntax units in LSTM language models","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Lakretz","year":"2019"},{"key":"2022112918312882300_bib44","first-page":"681","article-title":"Visualizing and understanding neural models in NLP","volume-title":"Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Li","year":"2016"},{"key":"2022112918312882300_bib45","article-title":"Understanding neural networks through representation erasure","author":"Li","year":"2016","journal-title":"CoRR"},{"key":"2022112918312882300_bib46","article-title":"The mythos of model interpretability","volume-title":"ICML Workshop on Human Interpretability in Machine Learning (WHI)","author":"Lipton","year":"2016"},{"key":"2022112918312882300_bib47","first-page":"1073","article-title":"Linguistic knowledge and transferability of contextual representations","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Liu","year":"2019"},{"key":"2022112918312882300_bib48","first-page":"4765","article-title":"A unified approach to interpreting model predictions","volume-title":"Advances in Neural Information Processing Systems 30","author":"Lundberg","year":"2017"},{"key":"2022112918312882300_bib49","article-title":"NLP\u2019s generalization problem, and how researchers are tackling it","author":"Marasovi\u0107","year":"2018","journal-title":"The Gradient"},{"key":"2022112918312882300_bib50","article-title":"UMAP: Uniform manifold approximation and projection for dimension reduction","author":"Mclnnes","year":"2020"},{"key":"2022112918312882300_bib51","article-title":"Under the hood of neural networks: Characterizing learned representations by functional neuron populations and network ablations","author":"Meyes","year":"2020","journal-title":"CoRR"},{"key":"2022112918312882300_bib52","doi-asserted-by":"publisher","first-page":"6792","DOI":"10.18653\/v1\/2020.emnlp-main.552","article-title":"Asking without telling: Exploring latent ontologies in contextual representations","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Michael","year":"2020"},{"key":"2022112918312882300_bib53","article-title":"Efficient estimation of word representations in vector space","volume-title":"Proceedings of the ICLR Workshop","author":"Mikolov","year":"2013"},{"key":"2022112918312882300_bib54","article-title":"Compositional explanations of neurons","author":"Jesse","year":"2020","journal-title":"CoRR"},{"key":"2022112918312882300_bib55","unstructured":"W.\n              James Murdoch\n            , Peter J.Liu, and BinYu. 2018. Beyond word importance: Contextual decomposition to extract interactions from lstms."},{"key":"2022112918312882300_bib56","article-title":"Discovery of natural language concepts in individual units of CNNs","author":"Na","year":"2019","journal-title":"CoRR"},{"key":"2022112918312882300_bib57","doi-asserted-by":"publisher","DOI":"10.23915\/distill.00010","article-title":"The building blocks of interpretability","author":"Olah","year":"2018","journal-title":"Distill"},{"key":"2022112918312882300_bib58","doi-asserted-by":"publisher","first-page":"4609","DOI":"10.18653\/v1\/2020.acl-main.420","article-title":"Information-theoretic probing for linguistic structure","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Pimentel","year":"2020"},{"key":"2022112918312882300_bib59","doi-asserted-by":"publisher","first-page":"325","DOI":"10.18653\/v1\/W18-5437","article-title":"Interpretable textual neuron representations for NLP","volume-title":"Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Poerner","year":"2018"},{"issue":"8","key":"2022112918312882300_bib60","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"2022112918312882300_bib61","first-page":"6078","article-title":"SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability","volume-title":"Advances in Neural Information Processing Systems 30","author":"Raghu","year":"2017"},{"key":"2022112918312882300_bib62","article-title":"Effect of post-processing on contextualized word representations","author":"Sajjad","year":"2021","journal-title":"CoRR"},{"key":"2022112918312882300_bib63","doi-asserted-by":"publisher","first-page":"3082","DOI":"10.18653\/v1\/2022.naacl-main.225","article-title":"Analyzing encoded concepts in transformer language models","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Sajjad","year":"2022"},{"key":"2022112918312882300_bib64","article-title":"Implicit representations of event properties within contextual language models: Searching for \u201ccausativity neurons\u201d","volume-title":"International Conference on Computational Semantics (IWCS)","author":"Seyffarth","year":"2021"},{"key":"2022112918312882300_bib65","article-title":"A latent-variable model for intrinsic probing","author":"Stanczak","year":"2022","journal-title":"CoRR"},{"issue":"10","key":"2022112918312882300_bib66","doi-asserted-by":"publisher","first-page":"1951","DOI":"10.1002\/asi.21382","article-title":"Concepts and semantic relations in information science","volume":"61","author":"Stock","year":"2010","journal-title":"Journal of the American Society for Information Science and Technology"},{"key":"2022112918312882300_bib67","article-title":"Finding experts in transformer models","author":"Suau","year":"2020","journal-title":"CoRR"},{"key":"2022112918312882300_bib68","article-title":"Axiomatic attribution for deep networks","author":"Sundararajan","year":"2017"},{"key":"2022112918312882300_bib69","article-title":"Sequence to sequence learning with neural networks","author":"Sutskever","year":"2014","journal-title":"CoRR"},{"key":"2022112918312882300_bib70","doi-asserted-by":"publisher","first-page":"4593","DOI":"10.18653\/v1\/P19-1452","article-title":"BERT rediscovers the classical NLP pipeline","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Tenney","year":"2019"},{"key":"2022112918312882300_bib71","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1503","article-title":"The importance of being recurrent for modeling hierarchical structure","author":"Ke","year":"2018","journal-title":"arXiv preprint arXiv:1803.03585"},{"key":"2022112918312882300_bib72","article-title":"Unsupervised transfer learning via BERT neuron selection","author":"Valipour","year":"2019","journal-title":"CoRR"},{"key":"2022112918312882300_bib73","first-page":"1114","article-title":"A survey of formal grammars and algorithms for recognition and transformation in mechanical translation.","volume-title":"IFIP Congress (2)","author":"Vauquois","year":"1968"},{"key":"2022112918312882300_bib74","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2020.emnlp-main.14","article-title":"Information- theoretic probing with minimum description length","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing","author":"Voita","year":"2020"},{"key":"2022112918312882300_bib75","article-title":"Similarity analysis of contextual word representation models","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)","author":"John","year":"2020"},{"key":"2022112918312882300_bib76","doi-asserted-by":"publisher","first-page":"5823","DOI":"10.18653\/v1\/D19-1591","article-title":"What part of the neural network does this? Understanding LSTMs by measuring and dissecting neurons","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Ji","year":"2019"},{"key":"2022112918312882300_bib77","doi-asserted-by":"publisher","first-page":"359","DOI":"10.18653\/v1\/W18-5448","article-title":"Language modeling teaches you more than translation does: Lessons learned through auxiliary syntactic task analysis","volume-title":"Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Zhang","year":"2018"},{"key":"2022112918312882300_bib78","article-title":"Visualizing deep neural network decisions: Prediction difference analysis","author":"Zintgraf","year":"2017","journal-title":"CoRR"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00519\/2060745\/tacl_a_00519.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00519\/2060745\/tacl_a_00519.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,29]],"date-time":"2022-11-29T18:34:12Z","timestamp":1669746852000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00519\/113852\/Neuron-level-Interpretation-of-Deep-NLP-Models-A"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":78,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00519","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022]]},"published":{"date-parts":[[2022]]}}}