{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T04:40:33Z","timestamp":1760157633289,"version":"build-2065373602"},"publisher-location":"Cham","reference-count":52,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783032083166"},{"type":"electronic","value":"9783032083173"}],"license":[{"start":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T00:00:00Z","timestamp":1760227200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T00:00:00Z","timestamp":1760227200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Concept Activation Vectors (CAVs) are widely used to model human-understandable concepts as directions within the latent space of neural networks. They are trained by identifying directions from the activations of concept samples to those of non-concept samples. However, this method often produces similar, non-orthogonal directions for correlated concepts, such as \u201cbeard\u201d and \u201cnecktie\u201d within the CelebA dataset, which frequently co-occur in images of men. This entanglement complicates the interpretation of concepts in isolation and can lead to undesired effects in CAV applications, such as activation steering. To address this issue, we introduce a post-hoc concept disentanglement method that employs a non-orthogonality loss, facilitating the identification of orthogonal concept directions while preserving directional correctness. We evaluate our approach with real-world and controlled correlated concepts in CelebA and a synthetic FunnyBirds dataset with VGG16 and ResNet18 architectures. We further demonstrate the superiority of orthogonalized concept representations in activation steering tasks, allowing (1) the <jats:italic>insertion<\/jats:italic> of isolated concepts into input images through generative models and (2) the <jats:italic>removal<\/jats:italic> of concepts for effective shortcut suppression with reduced impact on correlated concepts in comparison to baseline CAVs. (Code is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/erenerogullari\/cav-disentanglement\" ext-link-type=\"uri\">https:\/\/github.com\/erenerogullari\/cav-disentanglement<\/jats:ext-link>.)\n<\/jats:p>","DOI":"10.1007\/978-3-032-08317-3_4","type":"book-chapter","created":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T03:36:30Z","timestamp":1760153790000},"page":"68-89","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Post-hoc Concept Disentanglement: From Correlated to\u00a0Isolated Concept Representations"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-1269-0550","authenticated-orcid":false,"given":"Eren","family":"Erogullari","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0762-7258","authenticated-orcid":false,"given":"Sebastian","family":"Lapuschkin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6283-3265","authenticated-orcid":false,"given":"Wojciech","family":"Samek","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5681-6231","authenticated-orcid":false,"given":"Frederik","family":"Pahde","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,10,12]]},"reference":[{"issue":"9","key":"4_CR1","doi-asserted-by":"publisher","first-page":"1006","DOI":"10.1038\/s42256-023-00711-8","volume":"5","author":"R Achtibat","year":"2023","unstructured":"Achtibat, R., et al.: From attribution maps to human-understandable explanations through concept relevance propagation. Nat. Mach. Intell. 5(9), 1006\u20131019 (2023)","journal-title":"Nat. Mach. Intell."},{"key":"4_CR2","doi-asserted-by":"crossref","unstructured":"Anders, C.J., Weber, L., Neumann, D., Samek, W., M\u00fcller, K.R., Lapuschkin, S.: Finding and removing clever hans: using explanation methods to debug and improve deep models. Inf. Fusion 77, 261\u2013295 (2022)","DOI":"10.1016\/j.inffus.2021.07.015"},{"key":"4_CR3","unstructured":"Anders, C.J., Neumann, D., Samek, W., M\u00fcller, K.-R., Lapuschkin, S.: Software for dataset-wide xai: from local explanations to global insights with zennit, corelay, and virelay. arXiv preprint arXiv:2106.13200 (2021)"},{"key":"4_CR4","doi-asserted-by":"crossref","unstructured":"Bach, S., Binder, A., Montavon, G., Klauschen, F., M\u00fcller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10(7), (2015)","DOI":"10.1371\/journal.pone.0130140"},{"key":"4_CR5","doi-asserted-by":"crossref","unstructured":"Bareeva, D., Dreyer, M., Pahde, F., Samek, W., Lapuschkin, S.: Reactive model correction: mitigating harm to task-relevant features via conditional bias suppression. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 3532\u20133541 (2024)","DOI":"10.1109\/CVPRW63382.2024.00357"},{"issue":"48","key":"4_CR6","doi-asserted-by":"publisher","first-page":"30071","DOI":"10.1073\/pnas.1907375117","volume":"117","author":"D Bau","year":"2020","unstructured":"Bau, D., Zhu, J.-Y., Strobelt, H., Lapedriza, A., Zhou, B., Torralba, A.: Understanding the role of individual units in a deep neural network. Proc. Natl. Acad. Sci. 117(48), 30071\u201330078 (2020)","journal-title":"Proc. Natl. Acad. Sci."},{"key":"4_CR7","doi-asserted-by":"crossref","unstructured":"Bouchacourt, D., Tomioka, R., Nowozin, S.: Multi-level variational autoencoder: learning disentangled representations from grouped observations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.\u00a032 (2018)","DOI":"10.1609\/aaai.v32i1.11867"},{"key":"4_CR8","doi-asserted-by":"crossref","unstructured":"Brinker, T.J., et\u00a0al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur. J. Can. 113, 47\u201354 (2019)","DOI":"10.1016\/j.ejca.2019.04.001"},{"key":"4_CR9","doi-asserted-by":"crossref","unstructured":"Brocki, L., Chung, N.C.: Concept saliency maps to visualize relevant features in deep generative models. In: 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1771\u20131778. IEEE (2019)","DOI":"10.1109\/ICMLA.2019.00287"},{"issue":"12","key":"4_CR10","doi-asserted-by":"publisher","first-page":"772","DOI":"10.1038\/s42256-020-00265-z","volume":"2","author":"Z Chen","year":"2020","unstructured":"Chen, Z., Bei, Y., Rudin, C.: Concept whitening for interpretable image recognition. Nat. Mach. Intell. 2(12), 772\u2013782 (2020)","journal-title":"Nat. Mach. Intell."},{"key":"4_CR11","doi-asserted-by":"crossref","unstructured":"Chormai, P., Herrmann, J., M\u00fcller, K.R., Montavon, G.: Disentangled explanations of neural network predictions by finding relevant subspaces. IEEE Trans. Pattern Anal. Mach. Intell. (2024)","DOI":"10.1109\/TPAMI.2024.3388275"},{"key":"4_CR12","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1023\/A:1022627411411","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273\u2013297 (1995)","journal-title":"Mach. Learn."},{"key":"4_CR13","unstructured":"Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, et\u00a0al. Toy models of superposition. arXiv preprint arXiv:2209.10652 (2022)"},{"key":"4_CR14","doi-asserted-by":"crossref","unstructured":"Fel, T., et al.: Craft: concept recursive activation factorization for explainability. In: CVPR, pp. 2711\u20132721 (2023)","DOI":"10.1109\/CVPR52729.2023.00266"},{"issue":"11","key":"4_CR15","doi-asserted-by":"publisher","first-page":"665","DOI":"10.1038\/s42256-020-00257-z","volume":"2","author":"R Geirhos","year":"2020","unstructured":"Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., et al.: Shortcut learning in deep neural networks. Nat Mach Intell 2(11), 665\u2013673 (2020)","journal-title":"Nat Mach Intell"},{"key":"4_CR16","unstructured":"Gutierrez Basulto, V., Schockaert, S.: From knowledge graph embedding to ontology embedding? an analysis of the compatibility between vector space representations and rules (2018)"},{"key":"4_CR17","doi-asserted-by":"crossref","unstructured":"Haufe, S., et al.: On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage 87, 96\u2013110 (2014)","DOI":"10.1016\/j.neuroimage.2013.10.067"},{"key":"4_CR18","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR pp. 770\u2013778 (2016)","DOI":"10.1109\/CVPR.2016.90"},{"key":"4_CR19","doi-asserted-by":"crossref","unstructured":"Hesse, R., Schaub-Meyer, S., Roth, S.: Funnybirds: a synthetic vision dataset for a part-based analysis of explainable ai methods. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 3981\u20133991 (2023)","DOI":"10.1109\/ICCV51070.2023.00368"},{"key":"4_CR20","unstructured":"Hjelm, R.D., et\u00a0al.: Learning deep representations by mutual information estimation and maximization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)"},{"key":"4_CR21","doi-asserted-by":"crossref","unstructured":"Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42(1), 80\u201386 (2000)","DOI":"10.1080\/00401706.2000.10485983"},{"key":"4_CR22","first-page":"2250","volume":"2024","author":"M Jackermeier","year":"2024","unstructured":"Jackermeier, M., Chen, J., Horrocks, I.: Dual box embeddings for the description logic el++. Proc. ACM Web Conf. 2024, 2250\u20132258 (2024)","journal-title":"Proc. ACM Web Conf."},{"key":"4_CR23","unstructured":"Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., et\u00a0al.: Interpretability beyond feature attribution: quantitative testing with concept activation vectors (tcav). In: ICML, pp. 2668\u20132677. PMLR (2018)"},{"key":"4_CR24","unstructured":"Kim, H., Mnih, A.: Disentangling by factorising. In: ICML, pp. 2649\u20132658. PMLR (2018)"},{"key":"4_CR25","first-page":"17994","volume":"35","author":"A Kumar","year":"2022","unstructured":"Kumar, A., Tan, C., Sharma, A.: Probing classifiers are unreliable for concept removal and detection. Adv. Neural. Inf. Process. Syst. 35, 17994\u201318008 (2022)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"4_CR26","doi-asserted-by":"crossref","unstructured":"Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV), December 2015","DOI":"10.1109\/ICCV.2015.425"},{"key":"4_CR27","unstructured":"Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: NeurIPS, vol. 30 (2017)"},{"key":"4_CR28","doi-asserted-by":"crossref","unstructured":"Nanda, N., Lee, A., Wattenberg, M.: Emergent linear representations in world models of self-supervised sequence models. In: Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pp. 16\u201330 (2023)","DOI":"10.18653\/v1\/2023.blackboxnlp-1.2"},{"key":"4_CR29","doi-asserted-by":"crossref","unstructured":"Neuhaus, Y., Augustin, M., Boreiko, V., Hein, M.: Spurious features everywhere-large-scale detection of harmful spurious features in imagenet. In: ICCV (2023)","DOI":"10.1109\/ICCV51070.2023.01851"},{"key":"4_CR30","unstructured":"Nicolson, A., Schut, L., Noble, J.A., Gal, Y.: Understanding concept activation vectors. In: TMLR, Explaining explainability (2025)"},{"key":"4_CR31","doi-asserted-by":"crossref","unstructured":"Olah, C., Mordvintsev, A., Schubert, L.: Feature visualization. Distill 2(11) (2017)","DOI":"10.23915\/distill.00007"},{"key":"4_CR32","doi-asserted-by":"crossref","unstructured":"\u00d6zcep, \u00d6.L., Leemhuis, M., Wolter, D.: Embedding ontologies in the description logic alc by axis-aligned cones. J. Artif. Intell. Res. 78, 217\u2013267 (2023)","DOI":"10.1613\/jair.1.13939"},{"key":"4_CR33","doi-asserted-by":"crossref","unstructured":"Pahde, F., Wiegand, T., Lapuschkin, S., Samek, W.: Ensuring medical ai safety: explainable ai-driven detection and mitigation of spurious model behavior and associated data. arXiv preprint arXiv:2501.13818 (2025)","DOI":"10.1007\/s10994-025-06834-w"},{"key":"4_CR34","unstructured":"Pahde, F., et al.: Navigating neural space: revisiting concept activation vectors to overcome directional divergence. arXiv preprint arXiv:2202.03482 (2022)"},{"key":"4_CR35","doi-asserted-by":"crossref","unstructured":"Pahde, F., Dreyer, M., Samek, W., Lapuschkin, S.: Reveal to revise: an explainable ai life cycle for iterative bias correction of deep models. In: MICCAI, pp. 596\u2013606. Springer (2023)","DOI":"10.1007\/978-3-031-43895-0_56"},{"key":"4_CR36","doi-asserted-by":"crossref","unstructured":"Preechakul, K., Chatthee, N., Wizadwongsa, S., Suwajanakorn, S.: Diffusion autoencoders: toward a meaningful and decodable representation. In: CVPR, pp. 10619\u201310629 (2022)","DOI":"10.1109\/CVPR52688.2022.01036"},{"key":"4_CR37","unstructured":"Radford, A., Jozefowicz, R., Sutskever, I.: Learning to generate reviews and discovering sentiment. arXiv preprint arXiv:1704.01444 (2017)"},{"key":"4_CR38","doi-asserted-by":"crossref","unstructured":"Rakowski, A., Monti, R., Huryn, V., Lemanczyk, M., Ohler, U., Lippert, C.: Metadata-guided feature disentanglement for functional genomics. Bioinformatics 40 (2024)","DOI":"10.1093\/bioinformatics\/btae403"},{"key":"4_CR39","doi-asserted-by":"crossref","unstructured":"Ravfogel, S., Elazar, Y., Gonen, H., Twiton, M., Goldberg, Y.: Null it out: guarding protected attributes by iterative nullspace projection. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) ACL, pp. 7237\u20137256, July 2020","DOI":"10.18653\/v1\/2020.acl-main.647"},{"key":"4_CR40","doi-asserted-by":"crossref","unstructured":"Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618\u2013626 (2017)","DOI":"10.1109\/ICCV.2017.74"},{"key":"4_CR41","unstructured":"Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y., (eds.) ICLR 2015 (2015)"},{"key":"4_CR42","doi-asserted-by":"crossref","unstructured":"Singh, K.K., Ojha, U., Lee, Y.J.: Finegan: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: CVPR, pp. 6490\u20136499 (2019)","DOI":"10.1109\/CVPR.2019.00665"},{"key":"4_CR43","unstructured":"Smilkov, D., Thorat, N., Kim, B., Vi\u00e9gas, F., Wattenberg, M.: Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)"},{"key":"4_CR44","unstructured":"Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arxiv 2014. In: Workshop Track at International Conference on Learning Representations (2014)"},{"key":"4_CR45","unstructured":"Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML (2017)"},{"key":"4_CR46","unstructured":"Tr\u00e4uble, F., et al.: On disentangled representations learned from correlated data. In: ICML, pp. 10401\u201310412. PMLR (2021)"},{"key":"4_CR47","doi-asserted-by":"crossref","unstructured":"Travaini, G.V., Pacchioni, F., Bellumore, S., Bosia, M., De Micco, F.: Machine learning and criminal justice: a systematic review of advanced methodology for recidivism risk prediction. Int. J. Environ. Res. Public Health 19(17), 10594 (2022)","DOI":"10.3390\/ijerph191710594"},{"key":"4_CR48","unstructured":"Vielhaben, J., Bluecher, S., Strodthoff, N.: A unifying framework with completeness guarantees. TMLR, Multi-dimensional concept discovery (mcd) (2023)"},{"key":"4_CR49","doi-asserted-by":"crossref","unstructured":"Wang, X., Chen, H., Wu, Z., Zhu, W., et\u00a0al.: Disentangled representation learning. IEEE Trans. Pattern Anal. Mach. Intell. (2024)","DOI":"10.1109\/TPAMI.2024.3420937"},{"key":"4_CR50","unstructured":"Yuksekgonul, M., Wang, M., Zou, J.: Post-hoc concept bottleneck models. In: ICLR Workshops (2022)"},{"key":"4_CR51","doi-asserted-by":"crossref","unstructured":"Zavr\u0161nik, A.: Algorithmic justice: algorithms and big data in criminal justice settings. Eur. J. Criminol. 18(5) (2021)","DOI":"10.1177\/1477370819876762"},{"key":"4_CR52","doi-asserted-by":"crossref","unstructured":"Zhang, R., Madumal, P., Miller, T., Ehinger, K.A., Rubinstein, B.I.: Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.\u00a035, pp. 11682\u201311690 (2021)","DOI":"10.1609\/aaai.v35i13.17389"}],"container-title":["Communications in Computer and Information Science","Explainable Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-032-08317-3_4","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T04:03:41Z","timestamp":1760155421000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-032-08317-3_4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,12]]},"ISBN":["9783032083166","9783032083173"],"references-count":52,"URL":"https:\/\/doi.org\/10.1007\/978-3-032-08317-3_4","relation":{},"ISSN":["1865-0929","1865-0937"],"issn-type":[{"type":"print","value":"1865-0929"},{"type":"electronic","value":"1865-0937"}],"subject":[],"published":{"date-parts":[[2025,10,12]]},"assertion":[{"value":"12 October 2025","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"xAI","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"World Conference on Explainable Artificial Intelligence","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Istanbul","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"T\u00fcrkiye","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2025","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"9 July 2025","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"11 July 2025","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"3","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"xai2025","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/xaiworldconference.com\/2025\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}