{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T10:56:46Z","timestamp":1760785006550,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,7,21]],"date-time":"2021-07-21T00:00:00Z","timestamp":1626825600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["61832001"],"award-info":[{"award-number":["61832001"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Alibaba-PKU"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2022,4,30]]},"abstract":"<jats:p>\n            Impression regulation plays an important role in various online ranking systems,\n            <jats:italic>e.g.<\/jats:italic>\n            , e-commerce ranking systems always need to achieve local commercial demands on some pre-labeled target items like fresh item cultivation and fraudulent item counteracting while maximizing its global revenue. However, local impression regulation may cause \u201cbutterfly effects\u201d on the global scale,\n            <jats:italic>e.g.<\/jats:italic>\n            , in e-commerce, the price preference fluctuation in initial conditions (overpriced or underpriced items) may create a significantly different outcome, thus affecting shopping experience and bringing economic losses to platforms. To prevent \u201cbutterfly effects\u201d, some researchers define their regulation objectives with global constraints, by using contextual bandit at the page-level that requires all items on one page sharing the same regulation action, which fails to conduct impression regulation on individual items. To address this problem, in this article, we propose a personalized impression regulation method that can directly makes regulation decisions for each user-item pair. Specifically, we model the regulation problem as a\n            <jats:underline>C<\/jats:underline>\n            onstrained\n            <jats:underline>D<\/jats:underline>\n            ual-level\n            <jats:underline>B<\/jats:underline>\n            andit (CDB) problem, where the local regulation action and reward signals are at the item-level while the global effect constraint on the platform impression can be calculated at the page-level only. To handle the asynchronous signals, we first expand the page-level constraint to the item-level and then derive the policy updating as a second-order cone optimization problem. Our CDB approaches the optimal policy by iteratively solving the optimization problem. Experiments are performed on both offline and online datasets, and the results, theoretically and empirically, demonstrate CDB outperforms state-of-the-art algorithms.\n          <\/jats:p>","DOI":"10.1145\/3461340","type":"journal-article","created":{"date-parts":[[2021,7,21]],"date-time":"2021-07-21T21:25:55Z","timestamp":1626902755000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking Systems"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5056-0351","authenticated-orcid":false,"given":"Zhao","family":"Li","sequence":"first","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0218-4195","authenticated-orcid":false,"given":"Junshuai","family":"Song","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"given":"Zehong","family":"Hu","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"given":"Zhen","family":"Wang","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"given":"Jun","family":"Gao","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2021,7,21]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_2_1_1_1","DOI":"10.5555\/3305381.3305384"},{"doi-asserted-by":"publisher","key":"e_1_2_1_2_1","DOI":"10.1145\/2600057.2602844"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 29th Annual Conference on Learning Theory. 4\u201318","author":"Agrawal Shipra","year":"2016","unstructured":"Shipra Agrawal , Nikhil R. Devanur , and Lihong Li . 2016 . An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives . In Proceedings of the 29th Annual Conference on Learning Theory. 4\u201318 . Shipra Agrawal, Nikhil R. Devanur, and Lihong Li. 2016. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Proceedings of the 29th Annual Conference on Learning Theory. 4\u201318."},{"doi-asserted-by":"publisher","key":"e_1_2_1_4_1","DOI":"10.1609\/aaai.v33i01.33013"},{"doi-asserted-by":"publisher","key":"e_1_2_1_5_1","DOI":"10.1145\/3178876.3186039"},{"key":"e_1_2_1_6_1","volume-title":"32nd AAAI Conference on Artificial Intelligence. 957\u2013964","author":"Cai Qingpeng","year":"2018","unstructured":"Qingpeng Cai , Aris Filos-Ratsikas , Pingzhong Tang , and Yiwei Zhang . 2018 . Reinforcement mechanism design for fraudulent behaviour in e-commerce . In 32nd AAAI Conference on Artificial Intelligence. 957\u2013964 . Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, and Yiwei Zhang. 2018. Reinforcement mechanism design for fraudulent behaviour in e-commerce. In 32nd AAAI Conference on Artificial Intelligence. 957\u2013964."},{"doi-asserted-by":"publisher","key":"e_1_2_1_7_1","DOI":"10.5555\/3237383.3237925"},{"doi-asserted-by":"publisher","key":"e_1_2_1_8_1","DOI":"10.1145\/2988450.2988454"},{"unstructured":"Yinlam Chow Ofir Nachum Aleksandra Faust Edgar Duenez-Guzman and Mohammad Ghavamzadeh. 2019. Lyapunov-based safe policy optimization for continuous control. arXiv:1901.10031. Retrieved from https:\/\/arxiv.org\/abs\/1901.10031.  Yinlam Chow Ofir Nachum Aleksandra Faust Edgar Duenez-Guzman and Mohammad Ghavamzadeh. 2019. Lyapunov-based safe policy optimization for continuous control. arXiv:1901.10031. Retrieved from https:\/\/arxiv.org\/abs\/1901.10031.","key":"e_1_2_1_9_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_10_1","DOI":"10.1145\/3298689.3347031"},{"doi-asserted-by":"publisher","key":"e_1_2_1_11_1","DOI":"10.1145\/2959100.2959190"},{"doi-asserted-by":"publisher","key":"e_1_2_1_12_1","DOI":"10.1145\/3230667"},{"doi-asserted-by":"publisher","key":"e_1_2_1_13_1","DOI":"10.1145\/3308558.3313533"},{"doi-asserted-by":"publisher","key":"e_1_2_1_14_1","DOI":"10.1145\/3038912.3052569"},{"doi-asserted-by":"publisher","key":"e_1_2_1_15_1","DOI":"10.1145\/2939672.2939747"},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.1145\/3219819.3219846"},{"unstructured":"Yujing Hu Weixun Wang Hangtian Jia Yixiang Wang Yingfeng Chen Jianye Hao Feng Wu and Changjie Fan. 2020. Learning to utilize shaping rewards: A new approach of reward shaping. arXiv:2011.02669. Retrieved from https:\/\/arxiv.org\/abs\/2011.02669.   Yujing Hu Weixun Wang Hangtian Jia Yixiang Wang Yingfeng Chen Jianye Hao Feng Wu and Changjie Fan. 2020. Learning to utilize shaping rewards: A new approach of reward shaping. arXiv:2011.02669. Retrieved from https:\/\/arxiv.org\/abs\/2011.02669.","key":"e_1_2_1_17_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_18_1","DOI":"10.5555\/3306127.3331846"},{"doi-asserted-by":"publisher","key":"e_1_2_1_19_1","DOI":"10.1145\/3109859.3109933"},{"volume-title":"Online Controlled Experiments and a\/b Testing","author":"Kohavi Ron","unstructured":"Ron Kohavi and Roger Longbotham . 2017. Online Controlled Experiments and a\/b Testing . Springer , 922\u2013929. Ron Kohavi and Roger Longbotham. 2017. Online Controlled Experiments and a\/b Testing. Springer, 922\u2013929.","key":"e_1_2_1_20_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_21_1","DOI":"10.1145\/3159652.3159729"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. PMLR, 2645\u20132654","author":"Lee Hyun-Suk","year":"2020","unstructured":"Hyun-Suk Lee , Cong Shen , James Jordon , and Mihaela Schaar . 2020 . Contextual constrained learning for dose-finding clinical trials . In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. PMLR, 2645\u20132654 . Hyun-Suk Lee, Cong Shen, James Jordon, and Mihaela Schaar. 2020. Contextual constrained learning for dose-finding clinical trials. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. PMLR, 2645\u20132654."},{"doi-asserted-by":"publisher","key":"e_1_2_1_23_1","DOI":"10.1109\/ICDE.2019.00205"},{"unstructured":"Timothy P. Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv:1509.02971. Retrieved from https:\/\/arxiv.org\/abs\/1509.02971.  Timothy P. Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv:1509.02971. Retrieved from https:\/\/arxiv.org\/abs\/1509.02971.","key":"e_1_2_1_24_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_25_1","DOI":"10.1145\/3097983.3098011"},{"doi-asserted-by":"publisher","key":"e_1_2_1_26_1","DOI":"10.5555\/3294996.3295085"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning. 6651\u20136660","author":"Majumdar Somdeb","year":"2020","unstructured":"Somdeb Majumdar , Shauharda Khadka , Santiago Miret , Stephen Mcaleer , and Kagan Tumer . 2020 . Evolutionary reinforcement learning for sample-efficient multiagent coordination . In Proceedings of the 37th International Conference on Machine Learning. 6651\u20136660 . Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen Mcaleer, and Kagan Tumer. 2020. Evolutionary reinforcement learning for sample-efficient multiagent coordination. In Proceedings of the 37th International Conference on Machine Learning. 6651\u20136660."},{"doi-asserted-by":"publisher","key":"e_1_2_1_28_1","DOI":"10.1145\/2806416.2806647"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.1145\/1278366.1278372"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.1145\/3399712"},{"doi-asserted-by":"publisher","key":"e_1_2_1_31_1","DOI":"10.1145\/3219819.3219828"},{"doi-asserted-by":"publisher","key":"e_1_2_1_32_1","DOI":"10.5555\/3172929"},{"doi-asserted-by":"publisher","key":"e_1_2_1_33_1","DOI":"10.5555\/3045118.3045319"},{"doi-asserted-by":"crossref","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347.  John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347.","key":"e_1_2_1_34_1","DOI":"10.1149\/MA2017-02\/39\/1707"},{"volume-title":"Proceedings of the 27th International Florida Artificial Intelligence Research Society Conference. 81\u201386","author":"Carlos","unstructured":"Carlos E. Seminario and David C. Wilson. 2014. Assessing impacts of a power user attack on a matrix factorization collaborative recommender system . In Proceedings of the 27th International Florida Artificial Intelligence Research Society Conference. 81\u201386 . Carlos E. Seminario and David C. Wilson. 2014. Assessing impacts of a power user attack on a matrix factorization collaborative recommender system. In Proceedings of the 27th International Florida Artificial Intelligence Research Society Conference. 81\u201386.","key":"e_1_2_1_35_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_36_1","DOI":"10.1145\/2645710.2645722"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning. 5710\u20135718","author":"Shen Weiran","year":"2019","unstructured":"Weiran Shen , Sebastien Lahaie , and Renato Paes Leme . 2019 . Learning to clear the market . In Proceedings of the 36th International Conference on Machine Learning. 5710\u20135718 . Weiran Shen, Sebastien Lahaie, and Renato Paes Leme. 2019. Learning to clear the market. In Proceedings of the 36th International Conference on Machine Learning. 5710\u20135718."},{"doi-asserted-by":"publisher","key":"e_1_2_1_38_1","DOI":"10.1609\/aaai.v34i02.5600"},{"doi-asserted-by":"publisher","key":"e_1_2_1_39_1","DOI":"10.5555\/3398761.3398905"},{"doi-asserted-by":"publisher","key":"e_1_2_1_40_1","DOI":"10.5555\/3306127.3331696"},{"doi-asserted-by":"publisher","key":"e_1_2_1_41_1","DOI":"10.1007\/978-3-319-46128-1_17"},{"doi-asserted-by":"publisher","key":"e_1_2_1_42_1","DOI":"10.1145\/3018661.3018676"},{"doi-asserted-by":"publisher","key":"e_1_2_1_43_1","DOI":"10.1145\/3178876.3186079"},{"doi-asserted-by":"publisher","key":"e_1_2_1_44_1","DOI":"10.5555\/3305890.3306020"},{"doi-asserted-by":"publisher","key":"e_1_2_1_45_1","DOI":"10.5555\/3009657.3009806"},{"doi-asserted-by":"publisher","key":"e_1_2_1_46_1","DOI":"10.5555\/3171837.3172032"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning. 9797\u20139806","author":"Wachi Akifumi","year":"2020","unstructured":"Akifumi Wachi and Yanan Sui . 2020 . Safe reinforcement learning in constrained Markov decision processes . In Proceedings of the 37th International Conference on Machine Learning. 9797\u20139806 . Akifumi Wachi and Yanan Sui. 2020. Safe reinforcement learning in constrained Markov decision processes. In Proceedings of the 37th International Conference on Machine Learning. 9797\u20139806."},{"doi-asserted-by":"publisher","key":"e_1_2_1_48_1","DOI":"10.1109\/ICDE.2019.00203"},{"doi-asserted-by":"publisher","key":"e_1_2_1_49_1","DOI":"10.1109\/ICDE.2018.00162"},{"doi-asserted-by":"publisher","key":"e_1_2_1_50_1","DOI":"10.5555\/2969239.2969288"},{"unstructured":"Jia-Qi Yang Xiang Li Shuguang Han Tao Zhuang De-Chuan Zhan Xiaoyi Zeng and Bin Tong. 2020. Capturing delayed feedback in conversion rate prediction via elapsed-time sampling. arXiv:2012.03245. Retrieved from https:\/\/arxiv.org\/abs\/2012.03245.  Jia-Qi Yang Xiang Li Shuguang Han Tao Zhuang De-Chuan Zhan Xiaoyi Zeng and Bin Tong. 2020. Capturing delayed feedback in conversion rate prediction via elapsed-time sampling. arXiv:2012.03245. Retrieved from https:\/\/arxiv.org\/abs\/2012.03245.","key":"e_1_2_1_51_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_52_1","DOI":"10.1145\/3234943"},{"doi-asserted-by":"publisher","key":"e_1_2_1_53_1","DOI":"10.5555\/3304222.3304317"},{"doi-asserted-by":"publisher","key":"e_1_2_1_54_1","DOI":"10.5555\/3305890.3306115"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461340","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3461340","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:35Z","timestamp":1750195715000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461340"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,21]]},"references-count":54,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4,30]]}},"alternative-id":["10.1145\/3461340"],"URL":"https:\/\/doi.org\/10.1145\/3461340","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2021,7,21]]},"assertion":[{"value":"2020-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}