{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T13:59:14Z","timestamp":1774447154756,"version":"3.50.1"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U21B2026, 62372260"],"award-info":[{"award-number":["U21B2026, 62372260"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Program of China","award":["2024YFC3307403"],"award-info":[{"award-number":["2024YFC3307403"]}]},{"name":"Research Project of Quan Cheng Laboratory, China","award":["QCL20250105"],"award-info":[{"award-number":["QCL20250105"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2026,3,31]]},"abstract":"<jats:p>Recommendation model interpretation aims to reveal the relationships between inputs, model internal representations, and outputs to enhance the transparency, interpretability, and trustworthiness of recommendation systems. However, the inherent complexity and opacity of deep learning models pose challenges for model-level interpretation. Moreover, most existing methods for interpreting recommendation models are tailored to specific architectures or model types, limiting their generalizability across different types of recommenders.<\/jats:p>\n                  <jats:p>\n                    In this article, we propose RecSAE, a generalizable probing framework that interprets\n                    <jats:italic toggle=\"yes\">Rec<\/jats:italic>\n                    ommendation models with\n                    <jats:italic toggle=\"yes\">S<\/jats:italic>\n                    parse\n                    <jats:italic toggle=\"yes\">A<\/jats:italic>\n                    uto\n                    <jats:italic toggle=\"yes\">E<\/jats:italic>\n                    ncoders. The framework extracts interpretable latents from the internal representations of recommendation models and links them to semantic concepts for interpretations. It does not alter original models during interpretations and also enables targeted tuning to models. Experiments on three types of recommendation models (general, graph-based, sequential) with four widely used public datasets demonstrate the effectiveness and generalization of the RecSAE framework. The interpreted concepts are further validated by human experts, showing strong alignment with human perception. Overall, RecSAE serves as a novel step in both model-level interpretations to various types of recommendation models without affecting their functions and offering potential for targeted tuning of models. The code and data are available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/Alice1998\/RecSAE\">https:\/\/github.com\/Alice1998\/RecSAE<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3795529","type":"journal-article","created":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T12:17:43Z","timestamp":1769861863000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Understanding Internal Representations of Recommendation Models with Sparse Autoencoders"],"prefix":"10.1145","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8875-1850","authenticated-orcid":false,"given":"Jiayin","family":"Wang","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China and Quan Cheng Laboratory, Jinan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0985-6636","authenticated-orcid":false,"given":"Xiaoyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5604-7527","authenticated-orcid":false,"given":"Weizhi","family":"Ma","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9393-4854","authenticated-orcid":false,"given":"Zhiqiang","family":"Guo","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3158-1920","authenticated-orcid":false,"given":"Min","family":"Zhang","sequence":"additional","affiliation":[{"name":"Computer Science, Tsinghua University, Beijing, China and Quan Cheng Laboratory, Jinan, China"}]}],"member":"320","published-online":{"date-parts":[[2026,3,25]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"AI@Meta. 2024. Llama 3 Model Card. Retrieved from https:\/\/github.com\/meta-llama\/llama3\/blob\/main\/MODEL_CARD.md"},{"key":"e_1_3_2_3_2","unstructured":"Ann. 2025. On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy. Retrieved from https:\/\/openreview.net\/forum?id=DSOTgzeH3w&utm_source=chatgpt.com"},{"key":"e_1_3_2_4_2","unstructured":"Anthropic. 2024. The Engineering Challenges of Scaling Interpretability. Retrieved from https:\/\/www.anthropic.com\/research\/engineering-challenges-interpretability?utm_source=chatgpt.com"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331211"},{"key":"e_1_3_2_6_2","article-title":"Language models can explain neurons in language models","author":"Bills Steven","year":"2023","unstructured":"Steven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, and William Saunders. 2023. Language models can explain neurons in language models. OpenAI (2023).","journal-title":"OpenAI"},{"key":"e_1_3_2_7_2","article-title":"Towards monosemanticity: Decomposing language models with dictionary learning","author":"Bricken Trenton","year":"2023","unstructured":"Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nicholas L. Turner, Cem Anil, Carson Denison, Amanda Askell, et al. 2023. Towards monosemanticity: Decomposing language models with dictionary learning. Anthropic (2023).","journal-title":"Anthropic"},{"issue":"4","key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"285","DOI":"10.21512\/comtech.v7i4.3746","article-title":"Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF)","volume":"7","author":"Christian Hans","year":"2016","unstructured":"Hans Christian, Mikhael Pramodana Agus, and Derwin Suhartono. 2016. Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech: Computer, Mathematics and Engineering Applications 7, 4 (2016), 285\u2013294.","journal-title":"ComTech: Computer, Mathematics and Engineering Applications"},{"key":"e_1_3_2_9_2","unstructured":"Hoagy Cunningham Aidan Ewart Logan Riggs Robert Huben and Lee Sharkey. 2023. Sparse autoencoders find highly interpretable features in language models. arXiv:2309.08600. Retrieved from https:\/\/arxiv.org\/abs\/2309.08600"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583303"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3640457.3687094"},{"key":"e_1_3_2_12_2","first-page":"75644","article-title":"Evaluating neuron interpretation methods of NLP models","volume":"36","author":"Fan Yimin","year":"2023","unstructured":"Yimin Fan, Fahim Dalvi, Nadir Durrani, and Hassan Sajjad. 2023. Evaluating neuron interpretation methods of NLP models. Advances in Neural Information Processing Systems 36 (2023), 75644\u201375668.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_13_2","unstructured":"Alex Foote Neel Nanda Esben Kran Ioannis Konstas Shay Cohen and Fazl Barez. 2023. Neuron to graph: Interpreting language model neurons at scale. arXiv:2305.19911. Retrieved from https:\/\/arxiv.org\/abs\/2305.19911"},{"key":"e_1_3_2_14_2","unstructured":"Leo Gao Tom Dupr\u00e9 la Tour Henk Tillman Gabriel Goh Rajan Troll Alec Radford Ilya Sutskever Jan Leike and Jeffrey Wu. 2024. Scaling and evaluating sparse autoencoders. arXiv:2406.04093. Retrieved from https:\/\/arxiv.org\/abs\/2406.04093"},{"key":"e_1_3_2_15_2","unstructured":"Yunfan Gao Tao Sheng Youlin Xiang Yun Xiong Haofen Wang and Jiawei Zhang. 2023. Chat-REC: Towards interactive and explainable LLMs-augmented recommender system. arXiv:2303.14524. Retrieved from https:\/\/arxiv.org\/abs\/2303.14524"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3652891"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.5555\/1711907.1711925"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462939"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/2827872"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401063"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.conb.2021.08.002","article-title":"Interpreting neural computations by examining intrinsic and embedding dimensionality of neural activity","volume":"70","author":"Jazayeri Mehrdad","year":"2021","unstructured":"Mehrdad Jazayeri and Srdjan Ostojic. 2021. Interpreting neural computations by examining intrinsic and embedding dimensionality of neural activity. Current Opinion in Neurobiology 70 (2021), 113\u2013120.","journal-title":"Current Opinion in Neurobiology"},{"key":"e_1_3_2_22_2","unstructured":"Adam Jermyn and Adly Templeton. 2024. Ghost grads: An improvement on resampling. Transformer Circuits Thread (2024). Retrieved from https:\/\/transformer-circuits.pub\/2024\/jan-update\/index.html#dict-learningresampling"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3450613.3456846"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2018.00035"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics11010141"},{"key":"e_1_3_2_26_2","unstructured":"Yuxuan Lei Jianxun Lian Jing Yao Xu Huang Defu Lian and Xing Xie. 2023. Recexplainer: Aligning large language models for recommendation model interpretability. arXiv:2311.10947. Retrieved from https:\/\/arxiv.org\/abs\/2311.10947"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/1232722.1232727"},{"key":"e_1_3_2_28_2","unstructured":"Jiayu Li Hanyu Li Zhiyu He Weizhi Ma Peijie Sun Min Zhang and Shaoping Ma. 2024. ReChorus2. 0: A modular and task-flexible recommendation library. arXiv:2405.18058. Retrieved from https:\/\/arxiv.org\/abs\/2405.18058"},{"key":"e_1_3_2_29_2","unstructured":"Lei Li Yongfeng Zhang and Li Chen. 2021. Personalized transformer for explainable recommendation. arXiv:2105.11601. Retrieved from https:\/\/arxiv.org\/abs\/2105.11601"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3580488"},{"key":"e_1_3_2_31_2","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.neucom.2020.08.011","article-title":"Explaining the black-box model: A survey of local interpretation methods for deep neural networks","volume":"419","author":"Liang Yu","year":"2021","unstructured":"Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. 2021. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing 419 (2021), 168\u2013182.","journal-title":"Neurocomputing"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3358017"},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.dsp.2017.10.011","article-title":"Methods for interpreting and understanding deep neural networks","volume":"73","author":"Montavon Gr\u00e9goire","year":"2018","unstructured":"Gr\u00e9goire Montavon, Wojciech Samek, and Klaus-Robert M\u00fcller. 2018. Methods for interpreting and understanding deep neural networks. Digital Signal Processing 73 (2018), 1\u201315.","journal-title":"Digital Signal Processing"},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.elerap.2016.09.003","article-title":"Recommendation quality, transparency, and website quality for trust-building in recommendation agents","volume":"19","author":"Nilashi Mehrbakhsh","year":"2016","unstructured":"Mehrbakhsh Nilashi, Dietmar Jannach, Othman bin Ibrahim, Mohammad Dalvi Esfahani, and Hossein Ahmadi. 2016. Recommendation quality, transparency, and website quality for trust-building in recommendation agents. Electronic Commerce Research and Applications 19 (2016), 70\u201384.","journal-title":"Electronic Commerce Research and Applications"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2011.134"},{"issue":"1","key":"e_1_3_2_36_2","doi-asserted-by":"crossref","first-page":"851","DOI":"10.12785\/ijcds\/130168","article-title":"Recommendation systems: Types, applications, and challenges","volume":"13","author":"Patel Dhruval","year":"2023","unstructured":"Dhruval Patel, Foram Patel, and Uttam Chauhan. 2023. Recommendation systems: Types, applications, and challenges. International Journal of Computing and Digital Systems 13, 1 (2023), 851\u2013868.","journal-title":"International Journal of Computing and Digital Systems"},{"key":"e_1_3_2_37_2","first-page":"10299","article-title":"Recommender systems with generative retrieval","volume":"36","author":"Rajput Shashank","year":"2023","unstructured":"Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al. 2023. Recommender systems with generative retrieval. Advances in Neural Information Processing Systems 36 (2023), 10299\u201310315.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_38_2","unstructured":"Steffen Rendle Christoph Freudenthaler Zeno Gantner and Lars Schmidt-Thieme. 2012. BPR: Bayesian personalized ranking from implicit feedback. arXiv:1205.2618. Retrieved from https:\/\/arxiv.org\/abs\/1205.2618"},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: A graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw Peter J.","year":"1987","unstructured":"Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 53\u201365.","journal-title":"Journal of Computational and Applied Mathematics"},{"key":"e_1_3_2_40_2","doi-asserted-by":"crossref","first-page":"1285","DOI":"10.1162\/tacl_a_00519","article-title":"Neuron-level interpretation of deep nlp models: A survey","volume":"10","author":"Sajjad Hassan","year":"2022","unstructured":"Hassan Sajjad, Nadir Durrani, and Fahim Dalvi. 2022. Neuron-level interpretation of deep nlp models: A survey. Transactions of the Association for Computational Linguistics 10 (2022), 1285\u20131303.","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3498366.3505791"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507777"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/506443.506619"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313710"},{"issue":"5","key":"e_1_3_2_45_2","first-page":"2137","article-title":"Neural attention frameworks for explainable recommendation","volume":"33","author":"Tal Omer","year":"2021","unstructured":"Omer Tal, Yang Liu, Jimmy Huang, Xiaohui Yu, and Bushra Aljbawi. 2021. Neural attention frameworks for explainable recommendation. IEEE Transactions on Knowledge and Data Engineering 33, 5 (2021), 2137\u20132150.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482420"},{"key":"e_1_3_2_47_2","article-title":"Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet","author":"Templeton Adly","year":"2024","unstructured":"Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, Adam Pearce, Craig Citro, Emmanuel Ameisen, Andy Jones, et al. 2024. Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet. Anthropic (2024).","journal-title":"Anthropic"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2010.939537"},{"key":"e_1_3_2_49_2","unstructured":"Michael Tsang Dehua Cheng Hanpeng Liu Xue Feng Eric Zhou and Yan Liu. 2020. Feature interaction interpretability: A case for explaining ad-recommendation systems via neural interaction detection. arXiv:2006.10966. Retrieved from https:\/\/arxiv.org\/abs\/2006.10966"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557464"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3627673.3679569"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2018.00074"},{"key":"e_1_3_2_53_2","unstructured":"Xiang Wang Dingxian Wang Canran Xu Xiangnan He Yixin Cao and Tat-Seng Chua. 2018. Explainable reasoning over knowledge graphs for recommendation. arXiv:1811.04540. Retrieved from https:\/\/arxiv.org\/abs\/1811.04540"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412038"},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","unstructured":"Jun Xiao Hao Ye Xiangnan He Hanwang Zhang Fei Wu and Tat-Seng Chua. 2017. Attentional factorization machines: learning the weight of feature interactions via attention networks. arXiv:1708.04617. Retrieved from https:\/\/arxiv.org\/abs\/1708.04617","DOI":"10.24963\/ijcai.2017\/435"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3605357"},{"key":"e_1_3_2_57_2","unstructured":"Yelp. 2018. Yelp Open Dataset. Retrieved from https:\/\/business.yelp.com\/data\/resources\/open-dataset\/"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3690381"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000066"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609579"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/TETCI.2021.3100641"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371790"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401171"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3795529","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T12:14:46Z","timestamp":1774440886000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3795529"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,25]]},"references-count":62,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,3,31]]}},"alternative-id":["10.1145\/3795529"],"URL":"https:\/\/doi.org\/10.1145\/3795529","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,25]]},"assertion":[{"value":"2025-07-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-21","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-03-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}