{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T18:22:59Z","timestamp":1770229379015,"version":"3.49.0"},"reference-count":65,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T00:00:00Z","timestamp":1743033600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF GRFP","doi-asserted-by":"publisher","award":["2141064"],"award-info":[{"award-number":["2141064"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF GRFP","doi-asserted-by":"publisher","award":["PHY-2019786"],"award-info":[{"award-number":["PHY-2019786"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF grant","doi-asserted-by":"publisher","award":["2141064"],"award-info":[{"award-number":["2141064"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF grant","doi-asserted-by":"publisher","award":["PHY-2019786"],"award-info":[{"award-number":["PHY-2019786"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Rothberg Family Fund","award":["2141064"],"award-info":[{"award-number":["2141064"]}]},{"name":"Rothberg Family Fund","award":["PHY-2019786"],"award-info":[{"award-number":["PHY-2019786"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: (1) The \u201catomic\u201d small-scale structure contains \u201ccrystals\u201d whose faces are parallelograms or trapezoids, generalizing well-known examples such as (man:woman::king:queen). We find that the quality of such parallelograms and associated function vectors improves greatly when projecting out global distractor directions such as word length, which is efficiently performed with linear discriminant analysis. (2) The \u201cbrain\u201d intermediate-scale structure has significant spatial modularity; for example, math and code features form a \u201clobe\u201d akin to functional lobes seen in neural fMRI images. We quantify the spatial locality of these lobes with multiple metrics and find that clusters of co-occurring features, at coarse enough scale, also cluster together spatially far more than one would expect if feature geometry were random. (3) The \u201cgalaxy\u201d-scale large-scale structure of the feature point cloud is not isotropic, but instead has a power law of eigenvalues with steepest slope in middle layers. We also quantify how the clustering entropy depends on the layer.<\/jats:p>","DOI":"10.3390\/e27040344","type":"journal-article","created":{"date-parts":[[2025,3,28]],"date-time":"2025-03-28T04:36:48Z","timestamp":1743136608000},"page":"344","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["The Geometry of Concepts: Sparse Autoencoder Feature Structure"],"prefix":"10.3390","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6496-9991","authenticated-orcid":false,"given":"Yuxiao","family":"Li","sequence":"first","affiliation":[{"name":"Beneficial AI Foundation (BAIF), Cambridge, MA 02139, USA"}]},{"given":"Eric J.","family":"Michaud","sequence":"additional","affiliation":[{"name":"Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"},{"name":"Institute for Artificial Intelligence and Fundamental Interaction, Cambridge, MA 02139, USA"}]},{"given":"David D.","family":"Baek","sequence":"additional","affiliation":[{"name":"Institute for Artificial Intelligence and Fundamental Interaction, Cambridge, MA 02139, USA"},{"name":"Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]},{"given":"Joshua","family":"Engels","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]},{"given":"Xiaoqing","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7670-7190","authenticated-orcid":false,"given":"Max","family":"Tegmark","sequence":"additional","affiliation":[{"name":"Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"},{"name":"Institute for Artificial Intelligence and Fundamental Interaction, Cambridge, MA 02139, USA"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,27]]},"reference":[{"key":"ref_1","unstructured":"Hurst, A., Lerer, A., Goucher, A.P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., and Radford, A. (2024). Gpt-4o system card. arXiv."},{"key":"ref_2","unstructured":"(2025, March 24). The Claude 3 Model Family: Opus, Sonnet, Haiku. Available online: https:\/\/www-cdn.anthropic.com\/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627\/Model_Card_Claude_3.pdf."},{"key":"ref_3","unstructured":"Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., and Bi, X. (2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Slattery, P., Saeri, A.K., Grundy, E.A., Graham, J., Noetel, M., Uuk, R., Dao, J., Pour, S., Casper, S., and Thompson, N. (2024). The ai risk repository: A comprehensive meta-review, database, and taxonomy of risks from artificial intelligence. arXiv.","DOI":"10.70777\/agi.v1i1.10881"},{"key":"ref_5","unstructured":"Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S.R., Cheng, N., Durmus, E., Hatfield-Dodds, Z., and Johnston, S.R. (2023). Towards understanding sycophancy in language models. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Park, P.S., Goldstein, S., O\u2019Gara, A., Chen, M., and Hendrycks, D. (2023). AI deception: A survey of examples, risks, and potential solutions. arXiv. arXiv.","DOI":"10.1016\/j.patter.2024.100988"},{"key":"ref_7","unstructured":"Marks, S., Treutlein, J., Bricken, T., Lindsey, J., Marcus, J., Mishra-Sharma, S., Ziegler, D., Ameisen, E., Batson, J., and Belonax, T. (2024). Auditing Language Models for Hidden Objectives. arXiv."},{"key":"ref_8","unstructured":"Ngo, R., Chan, L., and Mindermann, S. (2022). The alignment problem from a deep learning perspective. arXiv."},{"key":"ref_9","unstructured":"Bereska, L., and Gavves, E. (2024). Mechanistic Interpretability for AI Safety\u2014A Review. arXiv."},{"key":"ref_10","unstructured":"Sharkey, L., Chughtai, B., Batson, J., Lindsey, J., Wu, J., Bushnaq, L., Goldowsky-Dill, N., Heimersheim, S., Ortega, A., and Bloom, J. (2025). Open Problems in Mechanistic Interpretability. arXiv."},{"key":"ref_11","unstructured":"Huben, R., Cunningham, H., Smith, L.R., Ewart, A., and Sharkey, L. (2024, January 7\u201311). Sparse Autoencoders Find Highly Interpretable Features in Language Models. Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_12","unstructured":"Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., and Askell, A. (2025, March 24). Towards Monosemanticity: Decomposing Language Models with Dictionary Learning. Transformer Circuits Thread 2023. Available online: https:\/\/transformer-circuits.pub\/2023\/monosemantic-features\/index.html."},{"key":"ref_13","unstructured":"Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., and Jones, A. (2025, March 24). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread 2024. Available online: https:\/\/transformer-circuits.pub\/2024\/scaling-monosemanticity\/index.html."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Faruqui, M., Tsvetkov, Y., Yogatama, D., Dyer, C., and Smith, N. (2015). Sparse overcomplete word vector representations. arXiv.","DOI":"10.3115\/v1\/P15-1144"},{"key":"ref_15","unstructured":"Zhang, J., Chen, Y., Cheung, B., and Olshausen, B.A. (2019). Word embedding visualization via dictionary learning. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yun, Z., Chen, Y., Olshausen, B.A., and LeCun, Y. (2021). Transformer visualization via dictionary learning: Contextualized embedding as a linear superposition of transformer factors. arXiv.","DOI":"10.18653\/v1\/2021.deelio-1.1"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1038\/381607a0","article-title":"Emergence of simple-cell receptive field properties by learning a sparse code for natural images","volume":"381","author":"Olshausen","year":"1996","journal-title":"Nature"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3311","DOI":"10.1016\/S0042-6989(97)00169-7","article-title":"Sparse coding with an overcomplete basis set: A strategy employed by V1?","volume":"37","author":"Olshausen","year":"1997","journal-title":"Vis. Res."},{"key":"ref_19","unstructured":"Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., and Chen, C. (2025, March 24). Toy Models of Superposition. Transformer Circuits Thread 2022. Available online: https:\/\/transformer-circuits.pub\/2022\/toy_model\/index.html."},{"key":"ref_20","unstructured":"Park, K., Choe, Y.J., and Veitch, V. (2023). The linear representation hypothesis and the geometry of large language models. arXiv."},{"key":"ref_21","unstructured":"Olah, C. (2025, March 24). What is a Linear Representation? What is a Multidimensional Feature? Transformer Circuits Thread 2024. Available online: https:\/\/transformer-circuits.pub\/2024\/july-update\/index.html#linear-representations."},{"key":"ref_22","unstructured":"Engels, J., Liao, I., Michaud, E.J., Gurnee, W., and Tegmark, M. (2024). Not All Language Model Features Are Linear. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Lieberum, T., Rajamanoharan, S., Conmy, A., Smith, L., Sonnerat, N., Varma, V., Kram\u00e1r, J., Dragan, A., Shah, R., and Nanda, N. (2024). Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2. arXiv.","DOI":"10.18653\/v1\/2024.blackboxnlp-1.19"},{"key":"ref_24","first-page":"1","article-title":"Intrinsic dimension of data representations in deep neural networks","volume":"32","author":"Ansuini","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3440755","article-title":"Evolution of semantic similarity\u2014A survey","volume":"54","author":"Chandrasekaran","year":"2021","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Watanabe, S. (2009). Algebraic Geometry and Statistical Learning Theory, Cambridge University Press.","DOI":"10.1017\/CBO9780511800474"},{"key":"ref_27","unstructured":"Rushing, C., and Nanda, N. (2024). Explorations of Self-Repair in Language Models. arXiv."},{"key":"ref_28","unstructured":"Belrose, N., Furman, Z., Smith, L., Halawi, D., Ostrovsky, I., McKinney, L., Biderman, S., and Steinhardt, J. (2023). Eliciting latent predictions from transformers with the tuned lens. arXiv."},{"key":"ref_29","first-page":"16318","article-title":"Towards automated circuit discovery for mechanistic interpretability","volume":"36","author":"Conmy","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_30","unstructured":"Park, K., Choe, Y.J., Jiang, Y., and Veitch, V. (2024). The geometry of categorical and hierarchical concepts in large language models. arXiv."},{"key":"ref_31","unstructured":"Mendel, J. (2025, March 24). SAE Feature Geometry is Outside the Superposition Hypothesis. AI Alignment Forum 2024. Available online: https:\/\/www.alignmentforum.org\/posts\/MFBTjb2qf3ziWmzz6\/sae-feature-geometry-is-outside-the-superposition-hypothesis."},{"key":"ref_32","unstructured":"Smith, L. (2025, March 24). The \u2018Strong\u2019 Feature Hypothesis Could be Wrong. AI Alignment Forum 2024. Available online: https:\/\/www.lesswrong.com\/posts\/tojtPCCRpKLSHBdpn\/the-strong-feature-hypothesis-could-be-wrong."},{"key":"ref_33","unstructured":"Bussmann, B., Pearce, M., Leask, P., Bloom, J.I., Sharkey, L., and Nanda, N. (2025, March 24). Showing SAE Latents Are Not Atomic Using Meta-SAEs.AI Alignment Forum 2024. Available online: https:\/\/www.alignmentforum.org\/posts\/TMAmHh4DdMr4nCSr5\/showing-sae-latents-are-not-atomic-using-meta-saes."},{"key":"ref_34","unstructured":"Drozd, A., Gladkova, A., and Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: Beyond king - man + woman = queen. Proceedings of the Coling 2016, the 26th International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C.D. Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ma, L., and Zhang, Y. (November, January 29). Using Word2Vec to process big text data. Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2015.7364114"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Nanda, N., Lee, A., and Wattenberg, M. (2023). Emergent linear representations in world models of self-supervised sequence models. arXiv.","DOI":"10.18653\/v1\/2023.blackboxnlp-1.2"},{"key":"ref_38","unstructured":"Li, K., Hopkins, A.K., Bau, D., Vi\u00e9gas, F., Pfister, H., and Wattenberg, M. (2022). Emergent world representations: Exploring a sequence model trained on a synthetic task. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Michaud, E.J., Liao, I., Lad, V., Liu, Z., Mudide, A., Loughridge, C., Guo, Z.C., Kheirkhah, T.R., Vukeli\u0107, M., and Tegmark, M. (2024). Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code. Entropy, 26.","DOI":"10.3390\/e26121046"},{"key":"ref_40","unstructured":"Marks, S., and Tegmark, M. (2023). The geometry of truth: Emergent linear structure in large language model representations of true\/false datasets. arXiv."},{"key":"ref_41","unstructured":"Gurnee, W., and Tegmark, M. (2023). Language models represent space and time. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Heinzerling, B., and Inui, K. (2024). Monotonic representation of numeric properties in language models. arXiv.","DOI":"10.18653\/v1\/2024.acl-short.18"},{"key":"ref_43","unstructured":"Todd, E., Li, M.L., Sharma, A.S., Mueller, A., Wallace, B.C., and Bau, D. (2023). Function vectors in large language models. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Hendel, R., Geva, M., and Globerson, A. (2023). In-context learning creates task vectors. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.624"},{"key":"ref_45","unstructured":"Kharlapenko, D., Nanda, N., and Conmy, A. (2024, March 24). Extracting SAE Task features for In-Context Learning. AI Alignment Forum 2024. Available online: https:\/\/www.alignmentforum.org\/posts\/5FGXmJ3wqgGRcbyH7\/extracting-sae-task-features-for-in-context-learning."},{"key":"ref_46","first-page":"17359","article-title":"Locating and editing factual associations in gpt","volume":"35","author":"Meng","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Ahmed, M., Seraj, R., and Islam, S.M.S. (2020). The k-means algorithm: A comprehensive survey and performance evaluation. Electronics, 9.","DOI":"10.3390\/electronics9081295"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Xanthopoulos, P., Pardalos, P.M., Trafalis, T.B., Xanthopoulos, P., Pardalos, P.M., and Trafalis, T.B. (2013). Linear discriminant analysis. Robust Data Mining, Springer.","DOI":"10.1007\/978-1-4419-9878-1"},{"key":"ref_49","unstructured":"Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., and Nabeshima, N. (2020). The pile: An 800gb dataset of diverse text for language modeling. arXiv."},{"key":"ref_50","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_51","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Vinh, N.X., Epps, J., and Bailey, J. (2009, January 14\u201318). Information theoretic measures for clusterings comparison: Is a correction for chance necessary?. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553511"},{"key":"ref_53","unstructured":"Mueller, A., Brinkmann, J., Li, M., Marks, S., Pal, K., Prakash, N., Rager, C., Sankaranarayanan, A., Sharma, A.S., and Sun, J. (2024). The quest for the right mediator: A history, survey, and theoretical grounding of causal interpretability. arXiv."},{"key":"ref_54","unstructured":"Olah, C. (2025, March 24). Transformer Circuits Thread: Interpretability Dreams; An Informal Note on Future Goals for Mechanistic Interpretability. Transformer Circuits Thread\u00a02023. Available online: https:\/\/transformer-circuits.pub\/2023\/interpretability-dreams\/index.html."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"19790","DOI":"10.1073\/pnas.1314922110","article-title":"Quantifying causal emergence shows that macro can beat micro","volume":"110","author":"Hoel","year":"2013","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1146\/annurev.astro.36.1.189","article-title":"Star formation in galaxies along the Hubble sequence","volume":"36","author":"Kennicutt","year":"1998","journal-title":"Annu. Rev. Astron. Astrophys."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1086\/143018","article-title":"Extragalactic Nebulae","volume":"64","author":"Hubble","year":"1926","journal-title":"Astrophys. J."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"281913","DOI":"10.1155\/2010\/281913","article-title":"Dark matter substructure and dwarf galactic satellites","volume":"2010","author":"Kravtsov","year":"2010","journal-title":"Adv. Astron."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1093\/biomet\/20A.1-2.32","article-title":"The generalised product moment distribution in samples from a normal multivariate population","volume":"20","author":"Wishart","year":"1928","journal-title":"Biometrika"},{"key":"ref_60","first-page":"4","article-title":"Distribution of eigenvalues for some sets of random matrices","volume":"72","author":"Marchenko","year":"1967","journal-title":"Mat. Sb."},{"key":"ref_61","unstructured":"Dasarathy, B.V. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, IEEE Computer Society Tutorial."},{"key":"ref_62","first-page":"9","article-title":"Sample estimate of the entropy of a random vector","volume":"23","author":"Kozachenko","year":"1987","journal-title":"Probl. Peredachi Informatsii"},{"key":"ref_63","first-page":"223","article-title":"Nouvelles Recherches Sur La Distribution Florale","volume":"44","author":"Jaccard","year":"1908","journal-title":"Bull. De La Soci\u00e9t\u00e9 Vaudoise Des Sci. Nat."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"297","DOI":"10.2307\/1932409","article-title":"Measures of the Amount of Ecologic Association Between Species","volume":"26","author":"Dice","year":"1945","journal-title":"Ecology"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"579","DOI":"10.2307\/2340126","article-title":"On the Methods of Measuring Association Between Two Attributes","volume":"75","author":"Yule","year":"1912","journal-title":"J. R. Stat. Soc."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/4\/344\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:02:48Z","timestamp":1760029368000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/4\/344"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,27]]},"references-count":65,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["e27040344"],"URL":"https:\/\/doi.org\/10.3390\/e27040344","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,27]]}}}