{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T14:29:36Z","timestamp":1753885776778,"version":"3.41.2"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"name":"Beijing Municipal Science and Technology Project","award":["Z241100004224009"],"award-info":[{"award-number":["Z241100004224009"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62425206, 62141607"],"award-info":[{"award-number":["62425206, 62141607"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2025,7,31]]},"abstract":"<jats:p>Massive amounts of data are the foundation of data-driven recommendation models. As an inherent nature of big data, data heterogeneity widely exists in real-world recommendation systems. It reflects the differences in the properties among sub-populations. Ignoring the heterogeneity in recommendation data could mislead the models, hurt the sub-populational robustness, and finally limit the performance of recommendation models. However, data heterogeneity has not received substantial attention within the recommendation community, prompting us to adequately explore and exploit data heterogeneity to solve these challenges and enhance data analysis. In this study, we specifically focus on two representative categories of heterogeneity in recommendation data: heterogeneity of prediction mechanism and covariate distribution. To explore the data heterogeneity, we propose an algorithm based on bilevel clustering. Additionally, we demonstrate how the explored data heterogeneity can be exploited for prediction and debias in recommendation scenarios, specifically by building models using multiple sub-models and augmenting the propensity score estimation. Extensive experiments conducted on real-world data substantiate the existence of heterogeneity in recommendation data and validate the effectiveness of exploring and exploiting data heterogeneity in improving recommendation performance.<\/jats:p>","DOI":"10.1145\/3737290","type":"journal-article","created":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T12:36:51Z","timestamp":1748349411000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Exploring and Exploiting Data Heterogeneity in Recommendation"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-1180-6395","authenticated-orcid":false,"given":"Zimu","family":"Wang","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9159-1752","authenticated-orcid":false,"given":"Jiashuo","family":"Liu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6000-6936","authenticated-orcid":false,"given":"Hao","family":"Zou","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4788-1127","authenticated-orcid":false,"given":"Xingxuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-1536-1179","authenticated-orcid":false,"given":"Yue","family":"He","sequence":"additional","affiliation":[{"name":"School of Information, Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6373-5298","authenticated-orcid":false,"given":"Dongxu","family":"Liang","sequence":"additional","affiliation":[{"name":"Beijing Kuaishou Technology Co Ltd, Haidian, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2957-8511","authenticated-orcid":false,"given":"Peng","family":"Cui","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,7,9]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Himan Abdollahpouri and Masoud Mansoury. 2020. Multi-sided exposure bias in recommendation. arXiv:2006.15772. Retrieved from https:\/\/arxiv.org\/abs\/2006.15772"},{"key":"e_1_3_2_3_2","unstructured":"Himan Abdollahpouri Masoud Mansoury Robin Burke and Bamshad Mobasher. 2019. The unfairness of popularity bias in recommendation. arXiv:1907.13286. Retrieved from https:\/\/arxiv.org\/abs\/1907.13286"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3418487"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687713"},{"key":"e_1_3_2_6_2","unstructured":"Martin Arjovsky L\u00e9on Bottou Ishaan Gulrajani and David Lopez-Paz. 2019. Invariant risk minimization. arXiv:1907.02893. Retrieved from https:\/\/arxiv.org\/abs\/1907.02893"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330745"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3240360"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"1257","DOI":"10.1109\/IV47402.2020.9304789","volume-title":"2020 IEEE Intelligent Vehicles Symposium (IV)","author":"Breitenstein Jasmin","year":"2020","unstructured":"Jasmin Breitenstein, Jan-Aike Term\u00f6hlen, Daniel Lipinski, and Tim Fingscheidt. 2020. Systematization of corner cases for visual perception in automated driving. In 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1257\u20131264."},{"issue":"1","key":"e_1_3_2_10_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/03610927408827101","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Cali\u0144ski Tadeusz","year":"1974","unstructured":"Tadeusz Cali\u0144ski and Jerzy Harabasz. 1974. A dendrite method for cluster analysis. Commun. Stat. -Theory and Methods 3, 1 (1974), 1\u201327.","journal-title":"Commun. Stat. -Theory and Methods"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3209998"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2936475"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1136\/bmjqs-2018-008370"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462919"},{"key":"e_1_3_2_15_2","unstructured":"Jiawei Chen Hande Dong Xiang Wang Fuli Feng Meng Wang and Xiangnan He. 2020. Bias and debias in recommender system: A survey and future directions. arXiv:2010.03240. Retrieved from https:\/\/arxiv.org\/abs\/2010.03240"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271742"},{"key":"e_1_3_2_17_2","unstructured":"Andrew Collins Dominika Tkaczyk Akiko Aizawa and Joeran Beel. 2018. A study of position bias in digital library recommender systems. arXiv:1802.06565. Retrieved from https:\/\/arxiv.org\/abs\/1802.06565"},{"key":"e_1_3_2_18_2","unstructured":"John Duchi and Hongseok Namkoong. 2018. Learning models with uniform performance via distributionally robust optimization. arXiv:1810.08750. Retrieved from https:\/\/arxiv.org\/abs\/1810.08750"},{"issue":"1","key":"e_1_3_2_19_2","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1089\/omi.2017.0174","article-title":"Not everyone fits the mold: Intratumor and intertumor heterogeneity and innovative cancer drug design and development","volume":"22","author":"Dzobo Kevin","year":"2018","unstructured":"Kevin Dzobo, Dimakatso Alice Senthebane, Nicholas Ekow Thomford, Arielle Rowe, Collet Dandara, and M. Iqbal Parker. 2018. Not everyone fits the mold: Intratumor and intertumor heterogeneity and innovative cancer drug design and development. Omics: A Journal of Integrative Biology 22, 1 (2018), 17\u201334.","journal-title":"Omics: A Journal of Integrative Biology"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1093\/nsr\/nwt032"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557624"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462917"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3437963.3441724"},{"key":"e_1_3_2_24_2","article-title":"Learning the k in k-means","volume":"16","author":"Hamerly Greg","year":"2003","unstructured":"Greg Hamerly and Charles Elkan. 2003. Learning the k in k-means. In Advances in Neural Information Processing Systems, Vol. 16.","journal-title":"Advances in Neural Information Processing Systems, Vol"},{"key":"e_1_3_2_25_2","first-page":"5126","volume-title":"International Joint Conference on Artificial Intelligence","author":"He Jingrui","year":"2017","unstructured":"Jingrui He. 2017. Learning from data heterogeneity: Algorithms and applications. In International Joint Conference on Artificial Intelligence, 5126\u20135130."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080777"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401063"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052569"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/1229179.1229181"},{"key":"e_1_3_2_30_2","first-page":"2564","volume-title":"International Conference on Machine Learning","author":"Kearns Michael","year":"2018","unstructured":"Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning. PMLR, 2564\u20132572."},{"key":"e_1_3_2_31_2","first-page":"117","volume-title":"2022 IEEE International Symposium on Workload Characterization (IISWC)","author":"Young Geun Kim","year":"2022","unstructured":"Young Geun Kim and Carole-JeanWu. 2022. FedGPO: Heterogeneity-aware global parameter optimization for efficient federated learning. In 2022 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 117\u2013129."},{"issue":"2","key":"e_1_3_2_32_2","first-page":"446","article-title":"Group recommendations: Survey and perspectives","volume":"33","author":"Kompan Michal","year":"2014","unstructured":"Michal Kompan and Maria Bielikova. 2014. Group recommendations: Survey and perspectives. Comput. Inform. 33, 2 (Jun. 2014), 446\u2013476. Retrieved from https:\/\/www.cai.sk\/ojs\/index.php\/cai\/article\/view\/1077","journal-title":"Comput. Inform"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401944"},{"issue":"2","key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"280","DOI":"10.2307\/3545921","article-title":"On definition and quantification of heterogeneity","volume":"73","author":"Li H.","year":"1995","unstructured":"H. Li and J. F. Reynolds. 1995. On definition and quantification of heterogeneity. Oikos 73, 2 (1995), 280\u2013284.","journal-title":"Oikos"},{"key":"e_1_3_2_35_2","volume-title":"11th International Conference on Learning Representations","author":"Li Haoxuan","year":"2023","unstructured":"Haoxuan Li, Chunyuan Zheng, and Peng Wu. 2023. StableDR: Stabilized doubly robust learning for recommendation on data missing not at random. In 11th International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=3VO1y5N7K1H"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449866"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3610302"},{"key":"e_1_3_2_38_2","first-page":"1754","volume-title":"24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","author":"Lian Jianxun","unstructured":"Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1754\u20131763."},{"key":"e_1_3_2_39_2","first-page":"831","volume-title":"43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Liu Dugang","unstructured":"Dugang Liu, Pengxiang Cheng, Zhenhua Dong, Xiuqiang He, Weike Pan, and Zhong Ming. 2020. A general knowledge distillation framework for counterfactual recommendation via uniform data. In 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 831\u2013840."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460231.3474263"},{"key":"e_1_3_2_41_2","first-page":"6804","volume-title":"International Conference on Machine Learning","author":"Liu Jiashuo","year":"2021","unstructured":"Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, and Zheyan Shen. 2021. Heterogeneous risk minimization. In International Conference on Machine Learning. PMLR, 6804\u20136814."},{"key":"e_1_3_2_42_2","unstructured":"Jiashuo Liu Zheyuan Hu Peng Cui Bo Li and Zheyan Shen. 2021. Kernelized heterogeneous risk minimization. arXiv:2110.12425. Retrieved from https:\/\/arxiv.org\/abs\/2110.12425"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220007"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3615471"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3523227.3546759"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3470948"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482297"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/79.543975"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2010.127"},{"issue":"2","key":"e_1_3_2_50_2","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1198\/000313005X42831","article-title":"Heterogeneity and causality: Unit heterogeneity and design sensitivity in observational studies","volume":"59","author":"Rosenbaum Paul R.","year":"2005","unstructured":"Paul R. Rosenbaum. 2005. Heterogeneity and causality: Unit heterogeneity and design sensitivity in observational studies. Am. Stat. 59, 2 (2005), 147\u2013152.","journal-title":"Am. Stat"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3412262"},{"key":"e_1_3_2_52_2","volume-title":"Workshop on Decision Making for Modern Information Retrieval System (WSDM \u201922)","author":"Saito Yuta","year":"2022","unstructured":"Yuta Saito, Suguru Yaginuma, Taketo Naito, and Kazuhide Nakata. 2022. Unbiased recommender learning from biased graded implicit feedback. In Workshop on Decision Making for Modern Information Retrieval System (WSDM \u201922)."},{"key":"e_1_3_2_53_2","first-page":"1670","volume-title":"In International Conference on Machine Learning","author":"Schnabel Tobias","year":"2016","unstructured":"Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as treatments: Debiasing learning and evaluation. In International Conference on Machine Learning. PMLR, 1670\u20131679."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3481941"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3582435"},{"key":"e_1_3_2_56_2","first-page":"7331","volume-title":"International Conference on Artificial Intelligence and Statistics","author":"Smith Freddie Bickford","year":"2023","unstructured":"Freddie Bickford Smith, Andreas Kirsch, Sebastian Farquhar, Yarin Gal, Adam Foster, and Tom Rainforth. 2023. Prediction-oriented Bayesian active learning. In International Conference on Artificial Intelligence and Statistics. PMLR, 7331\u20137348."},{"key":"e_1_3_2_57_2","first-page":"1161","volume-title":"28th ACM International Conference on Information and Knowledge Management","author":"Song Weiping","year":"2019","unstructured":"Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In 28th ACM International Conference on Information and Knowledge Management, 1161\u20131170."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/1835804.1835895"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/2507157.2507160"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3412236"},{"key":"e_1_3_2_61_2","unstructured":"Naftali Tishby Fernando C. Pereira and William Bialek. 2000. The information bottleneck method. arXiv:physics\/0004057. Retrieved from https:\/\/arxiv.org\/abs\/physics\/0004057"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1019956318069"},{"issue":"1","key":"e_1_3_2_63_2","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1080\/00031305.1982.10482778","article-title":"Simpson\u2019s paradox in real life","volume":"36","author":"Wagner Clifford H.","year":"1982","unstructured":"Clifford H. Wagner. 1982. Simpson\u2019s paradox in real life. Am. Stat. 36, 1 (1982), 46\u201348.","journal-title":"Am. Stat"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3539618.3591663"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401136"},{"key":"e_1_3_2_66_2","first-page":"6638","volume-title":"International Conference on Machine Learning","author":"Wang Xiaojie","year":"2019","unstructured":"Xiaojie Wang, Rui Zhang, Yu Sun, and Jianzhong Qi. 2019. Doubly robust joint learning for recommendation on data missing not at random. In International Conference on Machine Learning. PMLR, 6638\u20136647."},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3547333"},{"key":"e_1_3_2_68_2","doi-asserted-by":"crossref","unstructured":"Yejing Wang Dong Xu Xiangyu Zhao Zhiren Mao Peng Xiang Ling Yan Yao Hu Zijian Zhang Xuetao Wei and Qidong Liu. 2024. GPRec: Bi-level user modeling for deep recommenders. arXiv:2410.20730. Retrieved from https:\/\/arxiv.org\/abs\/2410.20730","DOI":"10.1109\/ICDM59182.2024.00058"},{"key":"e_1_3_2_69_2","volume-title":"Neural Information Processing Systems (NeurIPS)","author":"Wang Zifeng","year":"2020","unstructured":"Zifeng Wang, Xi Chen, Rui Wen, Shao-Lun Huang, Ercan E. Kuruoglu, and Yefeng Zheng. 2020. Information theoretic counterfactual learning from missing-not-at-random feedback. In Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_2_70_2","first-page":"1969","volume-title":"28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","author":"Wang Zimu","year":"2022","unstructured":"Zimu Wang, Yue He, Jiashuo Liu, Wenchao Zou, Philip S. Yu, and Peng Cui. 2022. Invariant preference learning for general debiasing in recommendation. In 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1969\u20131978."},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00357-022-09413-z"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.109973"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599296"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3737290","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,9]],"date-time":"2025-07-09T15:02:53Z","timestamp":1752073373000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3737290"}},"subtitle":[],"editor":[{"name":"Xiaohui Yu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,7,9]]},"references-count":72,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,7,31]]}},"alternative-id":["10.1145\/3737290"],"URL":"https:\/\/doi.org\/10.1145\/3737290","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2025,7,9]]},"assertion":[{"value":"2024-05-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-16","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}