{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,25]],"date-time":"2026-06-25T10:07:30Z","timestamp":1782382050724,"version":"3.54.5"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,10,27]],"date-time":"2020-10-27T00:00:00Z","timestamp":1603756800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"GRF","award":["14201819"],"award-info":[{"award-number":["14201819"]}]},{"name":"Chongqing High-Technology Innovation and Application Development Funds","award":["cstc2019jscx-msxm0652 and cstc2019jscx-fxyd0385"],"award-info":[{"award-number":["cstc2019jscx-msxm0652 and cstc2019jscx-fxyd0385"]}]},{"DOI":"10.13039\/501100001809","name":"National Nature Science Foundation of China","doi-asserted-by":"crossref","award":["61902042"],"award-info":[{"award-number":["61902042"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Internet Technol."],"published-print":{"date-parts":[[2020,11,30]]},"abstract":"<jats:p>Feedback-based reputation systems are widely deployed in E-commerce systems. Evidence shows that earning a reputable label (for sellers of such systems) may take a substantial amount of time, and this implies a reduction of profit. We propose to enhance sellers\u2019 reputation via price discounts. However, the challenges are as follows: (1) The demands from buyers depend on both the discount and reputation, and (2) the demands are unknown to the seller. To address these challenges, we first formulate a profit maximization problem via a semi-Markov decision process to explore the optimal tradeoffs in selecting price discounts. We prove the monotonicity of the optimal profit and optimal discount. Based on the monotonicity, we design a Q-learning with forward projection (QLFP) algorithm, which infers the optimal discount from historical transaction data. We prove that the QLFP algorithm convergences to the optimal policy. We conduct trace-driven simulations using a dataset from eBay to evaluate the QLFP algorithm. Evaluation results show that QLFP improves the profit by as high as 50% over both Q-learning and Speedy Q-learning. The QLFP algorithm also improves both the reputation and profit by as high as two times over the scheme of not providing any price discount.<\/jats:p>","DOI":"10.1145\/3400024","type":"journal-article","created":{"date-parts":[[2020,10,28]],"date-time":"2020-10-28T01:19:25Z","timestamp":1603847965000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["A Reinforcement Learning Approach to Optimize Discount and Reputation Tradeoffs in E-commerce Systems"],"prefix":"10.1145","volume":"20","author":[{"given":"Hong","family":"Xie","sequence":"first","affiliation":[{"name":"Chongqing University, Shazhengjie, Shapingba, Chongqing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yongkun","family":"Li","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"John C. S.","family":"Lui","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong, Hong Kong SAR"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,10,27]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Mohammad Gheshlaghi Azar Remi Munos Mohammad Ghavamzadeh and Hilbert Kappen. 2011. Speedy Q-learning. In Advances in Neural Information Processing Systems.  Mohammad Gheshlaghi Azar Remi Munos Mohammad Ghavamzadeh and Hilbert Kappen. 2011. Speedy Q-learning. In Advances in Neural Information Processing Systems."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.2307\/4132332"},{"key":"e_1_2_1_3_1","volume-title":"Tsitsiklis","author":"Bertsekas Dimitri P.","year":"1996"},{"key":"e_1_2_1_4_1","volume-title":"Convex Optimization","author":"Boyd Stephen"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NIPS\u201994)","author":"Steven"},{"key":"e_1_2_1_6_1","volume-title":"Fundamental Methods of Mathematical Economics","author":"Chiang Alpha C."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/501158.501177"},{"key":"e_1_2_1_8_1","volume-title":"Devraj and Sean Meyn","author":"Adithya","year":"2017"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.45"},{"key":"e_1_2_1_10_1","unstructured":"eBay. 1995. eBay Classifies Sellers into Twelve Stars. Retrieved from http:\/\/pages.ebay.com\/help\/feedback\/scores-reputation.html.  eBay. 1995. eBay Classifies Sellers into Twelve Stars. Retrieved from http:\/\/pages.ebay.com\/help\/feedback\/scores-reputation.html."},{"key":"e_1_2_1_11_1","unstructured":"Fortune500. 2015. Retrieved from http:\/\/fortune.com\/fortune500\/.  Fortune500. 2015. Retrieved from http:\/\/fortune.com\/fortune500\/."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988727"},{"key":"e_1_2_1_13_1","volume-title":"Article 1 (December","author":"Hoffman Kevin","year":"2009"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1530-9134.2006.00103.x"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.2015.1425"},{"key":"e_1_2_1_16_1","first-page":"983","article-title":"Price, quality, and reputation: Evidence from an online field experiment","volume":"37","author":"Jin Ginger Zhe","year":"2006","journal-title":"AND J. Econ."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/775152.775242"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1064009.1064033"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.2307\/1060783"},{"key":"e_1_2_1_20_1","first-page":"9","article-title":"Eliciting informative feedback: The peer-prediction method. Manage","volume":"51","author":"Miller Nolan","year":"2005","journal-title":"Sci."},{"key":"e_1_2_1_21_1","volume-title":"Taylor","author":"Muchnik Lev","year":"2013"},{"key":"e_1_2_1_22_1","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman Martin L."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/355112.355122"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1566374.1566423"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177729586"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/PTP.2003.1231514"},{"key":"e_1_2_1_27_1","volume-title":"Barto","author":"Sutton Richard S.","year":"1998"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the IEEE International Conference on Data Mining (ICDM\u201915)","author":"Xie Hong"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.peva.2015.06.009"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSC.2017.2730206"},{"key":"e_1_2_1_31_1","article-title":"Enhancing reputation via price discounts in E-commerce systems: A data-driven approach","volume":"20","author":"Xie Hong","year":"2018","journal-title":"ACM Trans. Knowl. Discov. Data"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2004.1318566"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2736277.2741650"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1159913.1159945"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2013.177"}],"container-title":["ACM Transactions on Internet Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3400024","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3400024","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:50Z","timestamp":1750195910000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3400024"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,27]]},"references-count":35,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,11,30]]}},"alternative-id":["10.1145\/3400024"],"URL":"https:\/\/doi.org\/10.1145\/3400024","relation":{},"ISSN":["1533-5399","1557-6051"],"issn-type":[{"value":"1533-5399","type":"print"},{"value":"1557-6051","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,10,27]]},"assertion":[{"value":"2019-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-10-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}