{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,12]],"date-time":"2025-09-12T18:44:39Z","timestamp":1757702679410,"version":"3.41.0"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,7,31]],"date-time":"2024-07-31T00:00:00Z","timestamp":1722384000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Recomm. Syst."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>\n            Fairness-aware recommendation eliminates discrimination issues to build trustworthy recommendation systems. Existing fairness-aware approaches ignore accounting for rich user and item attributes and thus cannot capture the impact of attributes on affecting recommendation fairness. These real-world attributes severely cause unfair recommendations by favoring items with popular attributes, leading to item exposure unfairness in recommendations. Moreover, existing approaches mostly mitigate unfairness for static recommendation models, e.g., collaborative filtering. Static models can not handle dynamic user interactions with the system that reflect users\u2019 preferences shift through time. Thus, static models are limited in their ability to adapt to user behavior shifts to gain long-run user satisfaction. As user and item attributes are largely involved in modern recommenders and user interactions are naturally dynamic, it is essential to develop a novel method that eliminates unfairness caused by attributes meanwhile embrace the dynamic modeling of user behavior shifts. In this article, we propose\n            <jats:italic>Constrained Off-policy Learning over Heterogeneous Information for Fairness-aware Recommendation (Fair-HINpolicy)<\/jats:italic>\n            , which uses recent advances in context-aware off-policy learning to produce fairness-aware recommendations with rich attributes from a Heterogeneous Information Network. In particular, we formulate the off-policy learning as a Constrained Markov Decision Process (CMDP) by dynamically constraining the fairness of item exposure at each iteration. We also design an attentive action sampling to reduce the search space for off-policy learning. Our solution adaptively receives HIN-augmented corrections for counterfactual risk minimization, and ultimately yields an effective policy that maximizes long-term user satisfaction. We extensively evaluate our method through simulations on large-scale real-world datasets, obtaining favorable results compared with state-of-the-art methods.\n          <\/jats:p>","DOI":"10.1145\/3629172","type":"journal-article","created":{"date-parts":[[2023,10,26]],"date-time":"2023-10-26T21:44:54Z","timestamp":1698356694000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Constrained Off-policy Learning over Heterogeneous Information for Fairness-aware Recommendation"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3643-3353","authenticated-orcid":false,"given":"Xiangmeng","family":"Wang","sequence":"first","affiliation":[{"name":"Data Science and Machine Intelligence Lab, University of Technology Sydney, Broadway, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8308-9551","authenticated-orcid":false,"given":"Qian","family":"Li","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering Computing and Mathematical Sciences, Curtin University, Perth, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6376-9667","authenticated-orcid":false,"given":"Dianer","family":"Yu","sequence":"additional","affiliation":[{"name":"Data Science and Machine Intelligence Lab, University of Technology Sydney, Broadway, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3370-471X","authenticated-orcid":false,"given":"Qing","family":"Li","sequence":"additional","affiliation":[{"name":"Hong Kong Polytechnic University, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4493-6663","authenticated-orcid":false,"given":"Guandong","family":"Xu","sequence":"additional","affiliation":[{"name":"Data Science and Machine Intelligence Lab, University of Technology Sydney, Broadway, Australia"}]}],"member":"320","published-online":{"date-parts":[[2024,7,31]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3109859.3109912"},{"key":"e_1_3_3_3_2","unstructured":"Himan Abdollahpouri Masoud Mansoury Robin Burke and Bamshad Mobasher. 2019. The unfairness of popularity bias in recommendation. arXiv:1907.13286. Retrieved from https:\/\/arxiv.org\/abs\/1907.13286"},{"key":"e_1_3_3_4_2","doi-asserted-by":"crossref","unstructured":"M. Mehdi Afsar Trafford Crump and Behrouz Far. 2022. Reinforcement learning based recommender systems: A survey. Comput. Surveys 55 7 (2022) 1\u201338.","DOI":"10.1145\/3543846"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371832"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330745"},{"key":"e_1_3_3_7_2","unstructured":"Fan Chen Junyu Zhang and Zaiwen Wen. 2022. A Near-optimal primal-dual method for off-policy learning in CMDP. Advances in Neural Information Processing Systems 35 (2022) 10521\u201310532."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174225"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290999"},{"key":"e_1_3_3_10_2","unstructured":"Xiaocong Chen Lina Yao Julian McAuley Guangling Zhou and Xianzhi Wang. 2021.A survey of deep reinforcement learning in recommender systems: A systematic review and future directions. arXiv:2109.03540. Retrieved from https:\/\/arxiv.org\/abs\/2109.03540"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2959100.2959190"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10479-005-5724-z"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3411962"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098036"},{"issue":"7","key":"e_1_3_3_15_2","article-title":"Adaptive subgradient methods for online learning and stochastic optimization.","volume":"12","author":"Duchi John","year":"2011","unstructured":"John Duchi, Elad Hazan, and Yoram Singer. 2011.Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 7 (2011), 2121\u20132159.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401051"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3437963.3441824"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531973"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3488560.3498487"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.1994.8753425"},{"key":"e_1_3_3_21_2","unstructured":"William L. Hamilton Rex Ying and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach California USA) (NIPS\u201917). Curran Associates Inc. Red Hook NY 1025\u20131035."},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482327"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052569"},{"key":"e_1_3_3_24_2","doi-asserted-by":"crossref","DOI":"10.1002\/9781118548387","volume-title":"Applied Logistic Regression","author":"Jr David W. Hosmer","year":"2013","unstructured":"David W. Hosmer Jr, Stanley Lemeshow, and Rodney X. Sturdivant. 2013. Applied Logistic Regression.John Wiley & Sons."},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.22"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380027"},{"key":"e_1_3_3_27_2","unstructured":"Zhipeng Huang and Nikos Mamoulis. 2017.Heterogeneous information network embedding for meta path based proximity.arXiv:1701.05291. Retrieved from https:\/\/arxiv.org\/abs\/1701.05291"},{"key":"e_1_3_3_28_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Joachims Thorsten","year":"2018","unstructured":"Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. 2018.Deep learning with logged bandit feedback. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3213586.3226206"},{"key":"e_1_3_3_30_2","unstructured":"Aviral Kumar Justin Fu George Tucker and Sergey Levine. 2019.Stabilizing off-policy Q-learning via bootstrapping error reduction. Curran Associates Inc. Red Hook NY USA."},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533725"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449866"},{"key":"e_1_3_3_33_2","unstructured":"Yunqi Li Hanxiong Chen Shuyuan Xu Yingqiang Ge Juntao Tan Shuchang Liu and Yongfeng Zhang. 2022.Fairness in recommendation: A survey. arXiv:2205.13619. Retrieved from https:\/\/arxiv.org\/abs\/2205.13619"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462966"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-47426-3_13"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380130"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340631.3394860"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3470948"},{"key":"e_1_3_3_39_2","unstructured":"Steffen Rendle Christoph Freudenthaler Zeno Gantner and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (Montreal Quebec Canada) (UAI\u201909). AUAI Press Arlington Virginia 452\u2013461."},{"key":"e_1_3_3_40_2","doi-asserted-by":"crossref","unstructured":"Yehezkel S. Resheff Yanai Elazar Moni Shahar and Oren Sar Shalom. 2018.Privacy and fairness in recommender systems via adversarial training of user representations. arXiv:1807.03521. Retrieved from https:\/\/arxiv.org\/abs\/1807.03521","DOI":"10.5220\/0007361204760482"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403121"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2833443"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220088"},{"key":"e_1_3_3_44_2","article-title":"Policy learning for fairness in ranking","volume":"32","author":"Singh Ashudeep","year":"2019","unstructured":"Ashudeep Singh and Thorsten Joachims. 2019.Policy learning for fairness in ranking. Advances in Neural Information Processing Systems 32 (2019).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.1050.0451"},{"key":"e_1_3_3_46_2","first-page":"814","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Swaminathan Adith","year":"2015","unstructured":"Adith Swaminathan and Thorsten Joachims. 2015.Counterfactual risk minimization: Learning from logged bandit feedback. In Proceedings of the International Conference on Machine Learning. PMLR, 814\u2013823."},{"key":"e_1_3_3_47_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017.Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach California USA) (NIPS\u201917). Curran Associates Inc. Red Hook NY USA 6000\u20136010."},{"key":"e_1_3_3_48_2","doi-asserted-by":"crossref","unstructured":"Xiangmeng Wang Qian Li Dianer Yu Peng Cui Zhichao Wang and Guandong Xu. 2022. Causal disentanglement for semantics-aware intent learning in recommendation. IEEE Transactions on Knowledge and Data Engineering. 35 10 (oct 2023) 9836\u20139849.","DOI":"10.1109\/TKDE.2022.3159802"},{"key":"e_1_3_3_49_2","unstructured":"Xiangmeng Wang Qian Li Dianer Yu Wei Huang and Guandong Xu. 2023.Causal neural graph collaborative filtering. arXiv:2307.04384. Retrieved from https:\/\/arxiv.org\/abs\/2307.04384"},{"key":"e_1_3_3_50_2","unstructured":"Xiangmeng Wang Qian Li Dianer Yu Qing Li and Guandong Xu. 2023.Counterfactual explanation for fairness in recommendation. arXiv:2307.04386. Retrieved from https:\/\/arxiv.org\/abs\/2307.04386"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3532021"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","unstructured":"Xiangmeng Wang Qian Li Dianer Yu and Guandong Xu. 2022.Off-policy learning over heterogeneous information for recommendation.Proceedings of the ACM Web Conference. Association for Computing Machinery New York NY 2348\u20132359. DOI:10.1145\/3485447.3512072","DOI":"10.1145\/3485447.3512072"},{"key":"e_1_3_3_53_2","unstructured":"Xiangmeng Wang Qian Li Dianer Yu and Guandong Xu. 2022. Reinforced path reasoning for counterfactual explainable recommendation. arXiv:2207.06674. Retrieved from https:\/\/arxiv.org\/abs\/2207.06674"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-47426-3_14"},{"key":"e_1_3_3_55_2","doi-asserted-by":"crossref","unstructured":"Xiao Wang Yuanfu Lu Chuan Shi Ruijia Wang Peng Cui and Shuai Mou. 2022. Dynamic heterogeneous information network embedding With meta-path based proximity. IEEE Transactions on Knowledge and Data Engineering 34 3 (2022) 1117\u20131132.","DOI":"10.1109\/TKDE.2020.2993870"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.1111\/resp.12312"},{"key":"e_1_3_3_58_2","first-page":"1","article-title":"Wilcoxon signed-rank test","author":"Woolson Robert F.","year":"2007","unstructured":"Robert F. Woolson. 2007. Wilcoxon signed-rank test. Wiley Encyclopedia of Clinical Trials (2007), 1\u20133.","journal-title":"Wiley Encyclopedia of Clinical Trials"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401147"},{"key":"e_1_3_3_60_2","unstructured":"Guandong Xu Tri Dung Duong Qian Li Shaowu Liu and Xianzhi Wang. 2020.Causality learning: A new perspective for interpretable machine learning. arXiv:2006.16789. Retrieved from https:\/\/arxiv.org\/abs\/2006.16789"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i8.20855"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCC50000.2020.9219587"},{"key":"e_1_3_3_63_2","first-page":"5453","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Xu Keyulu","year":"2018","unstructured":"Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. 2018.Representation learning on graphs with jumping knowledge networks. In Proceedings of the International Conference on Machine Learning. PMLR, 5453\u20135462."},{"key":"e_1_3_3_64_2","article-title":"Beyond parity: Fairness objectives for collaborative filtering","volume":"30","author":"Yao Sirui","year":"2017","unstructured":"Sirui Yao and Bert Huang. 2017.Beyond parity: Fairness objectives for collaborative filtering. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_65_2","doi-asserted-by":"crossref","unstructured":"Dianer Yu Qian Li Xiangmeng Wang Zhichao Wang Yanan Cao and Guandong Xu. 2022. Semantics-guided disentangled learning for recommendation. In Advances in Knowledge Discovery and Data Mining Jo\u00e3o Gama Tianrui Li Yang Yu Enhong Chen Yu Zheng and Fei Teng (Eds.). Springer International Publishing Cham 249\u2013261.","DOI":"10.1007\/978-3-031-05933-9_20"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.01.089"},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330961"},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00371-019-01691-w"},{"key":"e_1_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462875"},{"key":"e_1_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219886"}],"container-title":["ACM Transactions on Recommender Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3629172","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3629172","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:18Z","timestamp":1750178178000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3629172"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,31]]},"references-count":69,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3629172"],"URL":"https:\/\/doi.org\/10.1145\/3629172","relation":{},"ISSN":["2770-6699"],"issn-type":[{"type":"electronic","value":"2770-6699"}],"subject":[],"published":{"date-parts":[[2024,7,31]]},"assertion":[{"value":"2022-10-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-02","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-31","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}