{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T13:56:01Z","timestamp":1776174961616,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":58,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T00:00:00Z","timestamp":1691107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"A*STAR, Singapore","award":["Grant No. H19\/01\/a0\/023 ? Diabetes Clinic of the Future"],"award-info":[{"award-number":["Grant No. H19\/01\/a0\/023 ? Diabetes Clinic of the Future"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,8,6]]},"DOI":"10.1145\/3580305.3599800","type":"proceedings-article","created":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T18:13:58Z","timestamp":1691172838000},"page":"4673-4684","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9445-6516","authenticated-orcid":false,"given":"Mila","family":"Nambiar","sequence":"first","affiliation":[{"name":"Institute for Infocomm Research (I2R), A*STAR, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7275-3296","authenticated-orcid":false,"given":"Supriyo","family":"Ghosh","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), A*STAR, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0598-8571","authenticated-orcid":false,"given":"Priscilla","family":"Ong","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), A*STAR, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7419-0327","authenticated-orcid":false,"given":"Yu En","family":"Chan","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), A*STAR, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5482-2646","authenticated-orcid":false,"given":"Yong Mong","family":"Bee","sequence":"additional","affiliation":[{"name":"Singapore General Hospital, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5893-4306","authenticated-orcid":false,"given":"Pavitra","family":"Krishnaswamy","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), A*STAR, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2023,8,4]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"International Conference on Machine Learning. PMLR, 104--114","author":"Agarwal Rishabh","year":"2020","unstructured":"Rishabh Agarwal , Dale Schuurmans , and Mohammad Norouzi . 2020 . An optimistic perspective on offline reinforcement learning . In International Conference on Machine Learning. PMLR, 104--114 . Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. 2020. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning. PMLR, 104--114."},{"key":"e_1_3_2_2_2_1","volume-title":"Standards of Medical Care in Diabetes-2022 Abridged for Primary Care Providers. Clinical Diabetes 40, 1 (01","author":"American Diabetes Association","year":"2022","unstructured":"American Diabetes Association . 2022. Standards of Medical Care in Diabetes-2022 Abridged for Primary Care Providers. Clinical Diabetes 40, 1 (01 2022 ), 10--38. https:\/\/doi.org\/10. 2337\/cd22-as01 arXiv:https:\/\/diabetesjournals.org\/clinical\/article-pdf\/40\/1\/10\/684479\/diaclincd22as01.pdf American Diabetes Association. 2022. Standards of Medical Care in Diabetes-2022 Abridged for Primary Care Providers. Clinical Diabetes 40, 1 (01 2022), 10--38. https:\/\/doi.org\/10. 2337\/cd22-as01 arXiv:https:\/\/diabetesjournals.org\/clinical\/article-pdf\/40\/1\/10\/684479\/diaclincd22as01.pdf"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1175\/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622407.1622416"},{"key":"e_1_3_2_2_5_1","volume-title":"Lin (Eds.)","volume":"33","author":"Chen Xinyue","year":"2020","unstructured":"Xinyue Chen , Zijian Zhou , Zheng Wang , Che Wang , Yanqiu Wu , and Keith Ross . 2020 . BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H . Lin (Eds.) , Vol. 33 . Curran Associates, Inc. , 18353--18363. https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/ d55cbf210f175f4a37916eafe6c04f0d-Paper.pdf Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, and Keith Ross. 2020. BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 18353--18363. https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/ d55cbf210f175f4a37916eafe6c04f0d-Paper.pdf"},{"key":"e_1_3_2_2_6_1","volume-title":"Wortman Vaughan (Eds.)","volume":"34","author":"Fatemi Mehdi","year":"2021","unstructured":"Mehdi Fatemi , Taylor W Killian , Jayakumar Subramanian , and Marzyeh Ghassemi . 2021 . Medical Dead-ends and Learning to Identify High-Risk States and Treatments. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J . Wortman Vaughan (Eds.) , Vol. 34 . Curran Associates, Inc., 4856--4870. https:\/\/proceedings.neurips.cc\/paper\/ 2021\/ file\/26405399c51ad7b13b504e74eb7c696c-Paper.pdf Mehdi Fatemi, Taylor W Killian, Jayakumar Subramanian, and Marzyeh Ghassemi. 2021. Medical Dead-ends and Learning to Identify High-Risk States and Treatments. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 4856--4870. https:\/\/proceedings.neurips.cc\/paper\/2021\/ file\/26405399c51ad7b13b504e74eb7c696c-Paper.pdf"},{"key":"#cr-split#-e_1_3_2_2_7_1.1","unstructured":"Mehdi Fatemi Mary Wu Jeremy Petch Walter Nelson Stuart J. Connolly Alexander Benz Anthony Carnicelli and Marzyeh Ghassemi. 2022. Semi-Markov Offline Reinforcement Learning for Healthcare. https:\/\/doi.org\/10.48550\/ARXIV.2203. 09365 10.48550\/ARXIV.2203"},{"key":"#cr-split#-e_1_3_2_2_7_1.2","unstructured":"Mehdi Fatemi Mary Wu Jeremy Petch Walter Nelson Stuart J. Connolly Alexander Benz Anthony Carnicelli and Marzyeh Ghassemi. 2022. Semi-Markov Offline Reinforcement Learning for Healthcare. https:\/\/doi.org\/10.48550\/ARXIV.2203. 09365"},{"key":"e_1_3_2_2_8_1","volume-title":"International Conference on Machine Learning. PMLR","author":"Fu Justin","year":"2019","unstructured":"Justin Fu , Aviral Kumar , Matthew Soh , and Sergey Levine . 2019 . Diagnosing bottlenecks in deep q-learning algorithms . In International Conference on Machine Learning. PMLR , 2021--2030. Justin Fu, Aviral Kumar, Matthew Soh, and Sergey Levine. 2019. Diagnosing bottlenecks in deep q-learning algorithms. In International Conference on Machine Learning. PMLR, 2021--2030."},{"key":"e_1_3_2_2_9_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"2062","author":"Fujimoto Scott","year":"2019","unstructured":"Scott Fujimoto , David Meger , and Doina Precup . 2019 . Off-Policy Deep Reinforcement Learning without Exploration . In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 2052-- 2062 . https:\/\/proceedings.mlr.press\/v97\/fujimoto19a.html Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-Policy Deep Reinforcement Learning without Exploration. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 2052--2062. https:\/\/proceedings.mlr.press\/v97\/fujimoto19a.html"},{"key":"e_1_3_2_2_10_1","volume-title":"2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 1322--1328","author":"He Haibo","year":"2008","unstructured":"Haibo He , Yang Bai , Edwardo A. Garcia , and Shutao Li . 2008 . ADASYN: Adaptive synthetic sampling approach for imbalanced learning . In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 1322--1328 . https:\/\/doi.org\/10.1109\/IJCNN.2008.4633969 10.1109\/IJCNN.2008.4633969 Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 1322--1328. https:\/\/doi.org\/10.1109\/IJCNN.2008.4633969"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1080\/00401706.1995.10484303"},{"key":"e_1_3_2_2_12_1","volume-title":"Leo Anthony Celi, and Roger G Mark","author":"Johnson Alistair E.W.","year":"2016","unstructured":"Alistair E.W. Johnson , Tom J. Pollard , Lu Shen , Li-wei H. Lehman , Mengling Feng , Mohammad Ghassemi , Benjamin Moody , Peter Szolovits , Leo Anthony Celi, and Roger G Mark . 2016 . MIMIC-III , a freely accessible critical care database. 3 (May. 2016). https:\/\/doi.org\/10.1038\/sdata.2016.35 10.1038\/sdata.2016.35 Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. 3 (May. 2016). https:\/\/doi.org\/10.1038\/sdata.2016.35"},{"key":"#cr-split#-e_1_3_2_2_13_1.1","unstructured":"Taylor W. Killian Haoran Zhang Jayakumar Subramanian Mehdi Fatemi and Marzyeh Ghassemi. 2020. An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare. https:\/\/doi.org\/10.48550\/ARXIV.2011. 11235 10.48550\/ARXIV.2011"},{"key":"#cr-split#-e_1_3_2_2_13_1.2","unstructured":"Taylor W. Killian Haoran Zhang Jayakumar Subramanian Mehdi Fatemi and Marzyeh Ghassemi. 2020. An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare. https:\/\/doi.org\/10.48550\/ARXIV.2011. 11235"},{"key":"e_1_3_2_2_14_1","volume-title":"The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. 24 (Nov","author":"Komorowski Matthieu","year":"2018","unstructured":"Matthieu Komorowski , Leo A. Celi , Omar Badawi , Anthony C. Gordon , and A. Aldo Faisal . 2018. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. 24 (Nov . 2018 ). https:\/\/doi.org\/10.1038\/ s41591-018-0213-5 Matthieu Komorowski, Leo A. Celi, Omar Badawi, Anthony C. Gordon, and A. Aldo Faisal. 2018. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. 24 (Nov. 2018). https:\/\/doi.org\/10.1038\/ s41591-018-0213-5"},{"key":"e_1_3_2_2_15_1","volume-title":"Offline Reinforcement Learning with Fisher Divergence Critic Regularization. In International Conference on Machine Learning.","author":"Kostrikov Ilya","year":"2021","unstructured":"Ilya Kostrikov , Jonathan Tompson , Rob Fergus , and Ofir Nachum . 2021 . Offline Reinforcement Learning with Fisher Divergence Critic Regularization. In International Conference on Machine Learning. Ilya Kostrikov, Jonathan Tompson, Rob Fergus, and Ofir Nachum. 2021. Offline Reinforcement Learning with Fisher Divergence Critic Regularization. In International Conference on Machine Learning."},{"key":"e_1_3_2_2_16_1","volume-title":"Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufman, 179--186","author":"Kubat Miroslav","year":"1997","unstructured":"Miroslav Kubat and Stan Matwin . 1997 . Addressing the Curse of Imbalanced Training Sets: One Sided Selection . In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufman, 179--186 . Miroslav Kubat and Stan Matwin. 1997. Addressing the Curse of Imbalanced Training Sets: One Sided Selection. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufman, 179--186."},{"key":"e_1_3_2_2_17_1","volume-title":"Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32","author":"Kumar Aviral","year":"2019","unstructured":"Aviral Kumar , Justin Fu , Matthew Soh , George Tucker , and Sergey Levine . 2019. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 ( 2019 ). Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. 2019. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_3_2_2_18_1","volume-title":"Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction","author":"Kumar Aviral","unstructured":"Aviral Kumar , Justin Fu , George Tucker , and Sergey Levine . 2019. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction . Curran Associates Inc., Red Hook, NY, USA. Aviral Kumar, Justin Fu, George Tucker, and Sergey Levine. 2019. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. Curran Associates Inc., Red Hook, NY, USA."},{"key":"e_1_3_2_2_19_1","volume-title":"Lin (Eds.)","volume":"33","author":"Kumar Aviral","year":"2020","unstructured":"Aviral Kumar , Aurick Zhou , George Tucker , and Sergey Levine . 2020 . Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H . Lin (Eds.) , Vol. 33 . Curran Associates, Inc., 1179--1191. https:\/\/proceedings. neurips.cc\/paper\/ 2020\/file\/0d2b2061826a5df3221116a5085a6052-Paper.pdf Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1179--1191. https:\/\/proceedings. neurips.cc\/paper\/2020\/file\/0d2b2061826a5df3221116a5085a6052-Paper.pdf"},{"key":"e_1_3_2_2_20_1","volume-title":"Batch reinforcement learning. Reinforcement learning: State-of-the-art","author":"Lange Sascha","year":"2012","unstructured":"Sascha Lange , Thomas Gabel , and Martin Riedmiller . 2012. Batch reinforcement learning. Reinforcement learning: State-of-the-art ( 2012 ), 45--73. Sascha Lange, Thomas Gabel, and Martin Riedmiller. 2012. Batch reinforcement learning. Reinforcement learning: State-of-the-art (2012), 45--73."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.2147\/CLEP.S300663"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/EMBC.2018.8513203"},{"key":"e_1_3_2_2_23_1","volume-title":"Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining","author":"Charles","unstructured":"Charles X. Ling and Chenghui Li. 1998. Data Mining for Direct Marketing: Problems and Solutions . In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining ( New York, NY) (KDD'98). AAAI Press, 73--79. Charles X. Ling and Chenghui Li. 1998. Data Mining for Direct Marketing: Problems and Solutions. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (New York, NY) (KDD'98). AAAI Press, 73--79."},{"key":"e_1_3_2_2_24_1","volume-title":"Proc. 2020","author":"Lu Mingyu","year":"2020","unstructured":"Mingyu Lu , Zachary Shahn , Daby Sow , Finale Doshi-Velez , and Li-Wei H Lehman . 2020 . Is deep Reinforcement Learning ready for practical applications in healthcare? A sensitivity analysis of duel-DDQN for hemodynamic management in sepsis patients. AMIA Annu. Symp . Proc. 2020 (2020), 773--782. Mingyu Lu, Zachary Shahn, Daby Sow, Finale Doshi-Velez, and Li-Wei H Lehman. 2020. Is deep Reinforcement Learning ready for practical applications in healthcare? A sensitivity analysis of duel-DDQN for hemodynamic management in sepsis patients. AMIA Annu. Symp. Proc. 2020 (2020), 773--782."},{"key":"#cr-split#-e_1_3_2_2_25_1.1","unstructured":"Jiafei Lyu Xiaoteng Ma Xiu Li and Zongqing Lu. 2022. Mildly Conservative Q-Learning for Offline Reinforcement Learning. https:\/\/doi.org\/10.48550\/ARXIV. 2206.04745 10.48550\/ARXIV"},{"key":"#cr-split#-e_1_3_2_2_25_1.2","unstructured":"Jiafei Lyu Xiaoteng Ma Xiu Li and Zongqing Lu. 2022. Mildly Conservative Q-Learning for Offline Reinforcement Learning. https:\/\/doi.org\/10.48550\/ARXIV. 2206.04745"},{"key":"e_1_3_2_2_26_1","volume-title":"Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Alex Graves , Ioannis Antonoglou , Daan Wierstra , and Martin Riedmiller . 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 ( 2013 ). Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)."},{"key":"e_1_3_2_2_27_1","volume-title":"38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). https:\/\/doi.org\/10","author":"Nemati S","year":"2016","unstructured":"S Nemati , M.M. Ghassemi , and G. Clifford . 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach . In 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). https:\/\/doi.org\/10 .1109\/EMBC. 2016 .7591355 10.1109\/EMBC.2016.7591355 S Nemati, M.M. Ghassemi, and G. Clifford. 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). https:\/\/doi.org\/10.1109\/EMBC.2016.7591355"},{"key":"e_1_3_2_2_28_1","volume-title":"2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2978--2981","author":"Nemati Shamim","year":"2016","unstructured":"Shamim Nemati , Mohammad M. Ghassemi , and Gari D. Clifford . 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach . In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2978--2981 . https:\/\/doi.org\/ 10.1109\/EMBC. 2016 .7591355 10.1109\/EMBC.2016.7591355 Shamim Nemati, Mohammad M. Ghassemi, and Gari D. Clifford. 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2978--2981. https:\/\/doi.org\/ 10.1109\/EMBC.2016.7591355"},{"key":"e_1_3_2_2_29_1","volume-title":"Proc. 2018 (Dec.","author":"Peng Xuefeng","year":"2018","unstructured":"Xuefeng Peng , Yi Ding , David Wihl , Omer Gottesman , Matthieu Komorowski , Li-Wei H Lehman , Andrew Ross , Aldo Faisal , and Finale Doshi-Velez . 2018 . Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. AMIA Annu. Symp . Proc. 2018 (Dec. 2018), 887--896. Xuefeng Peng, Yi Ding, David Wihl, Omer Gottesman, Matthieu Komorowski, Li-Wei H Lehman, Andrew Ross, Aldo Faisal, and Finale Doshi-Velez. 2018. Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. AMIA Annu. Symp. Proc. 2018 (Dec. 2018), 887--896."},{"key":"#cr-split#-e_1_3_2_2_30_1.1","unstructured":"Aniruddh Raghu Matthieu Komorowski Imran Ahmed Leo Celi Peter Szolovits and Marzyeh Ghassemi. 2017. Deep Reinforcement Learning for Sepsis Treatment. https:\/\/doi.org\/10.48550\/ARXIV.1711.09602 10.48550\/ARXIV.1711.09602"},{"key":"#cr-split#-e_1_3_2_2_30_1.2","unstructured":"Aniruddh Raghu Matthieu Komorowski Imran Ahmed Leo Celi Peter Szolovits and Marzyeh Ghassemi. 2017. Deep Reinforcement Learning for Sepsis Treatment. https:\/\/doi.org\/10.48550\/ARXIV.1711.09602"},{"key":"e_1_3_2_2_31_1","volume-title":"Proceedings of the 2nd Machine Learning for Healthcare Conference (Proceedings of Machine Learning Research","volume":"163","author":"Raghu Aniruddh","year":"2017","unstructured":"Aniruddh Raghu , Matthieu Komorowski , Leo Anthony Celi , Peter Szolovits , and Marzyeh Ghassemi . 2017 . Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach . In Proceedings of the 2nd Machine Learning for Healthcare Conference (Proceedings of Machine Learning Research , Vol. 68), Finale Doshi-Velez, Jim Fackler, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens (Eds.). PMLR, 147-- 163 . https: \/\/proceedings.mlr.press\/v68\/raghu17a.html Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach. In Proceedings of the 2nd Machine Learning for Healthcare Conference (Proceedings of Machine Learning Research, Vol. 68), Finale Doshi-Velez, Jim Fackler, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens (Eds.). PMLR, 147--163. https: \/\/proceedings.mlr.press\/v68\/raghu17a.html"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2020.102003"},{"key":"e_1_3_2_2_33_1","volume-title":"Brown","author":"Schamberg Gabriel","year":"2020","unstructured":"Gabriel Schamberg , Marcus Badgeley , and Emery N . Brown . 2020 . Controlling Level of Unconsciousness by Titrating Propofol with Deep Reinforcement Learning. AI in Medicine ( 2020). Gabriel Schamberg, Marcus Badgeley, and Emery N. Brown. 2020. Controlling Level of Unconsciousness by Titrating Propofol with Deep Reinforcement Learning. AI in Medicine (2020)."},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/IGARSS.1996.516705"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.2337\/diacare.28.3.600"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"crossref","unstructured":"David Silver Julian Schrittwieser Karen Simonyan Ioannis Antonoglou Aja Huang Arthur Guez Thomas Hubert Lucas Baker Matthew Lai Adrian Bolton etal 2017. Mastering the game of go without human knowledge. nature 550 7676 (2017) 354--359.  David Silver Julian Schrittwieser Karen Simonyan Ioannis Antonoglou Aja Huang Arthur Guez Thomas Hubert Lucas Baker Matthew Lai Adrian Bolton et al. 2017. Mastering the game of go without human knowledge. nature 550 7676 (2017) 354--359.","DOI":"10.1038\/nature24270"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.2196\/27858"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1177\/1558689806292430"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1002\/mp.12625"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"e_1_3_2_2_41_1","volume-title":"Garnett (Eds.)","volume":"31","author":"Wang Qing","year":"2018","unstructured":"Qing Wang , Jiechao Xiong , Lei Han , peng sun, Han Liu , and Tong Zhang . 2018 . Exponentially Weighted Imitation Learning for Batched Historical Data. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R . Garnett (Eds.) , Vol. 31 . Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper\/ 2018\/file\/ 4aec1b3435c52abbdf8334ea0e7141e0-Paper.pdf Qing Wang, Jiechao Xiong, Lei Han, peng sun, Han Liu, and Tong Zhang. 2018. Exponentially Weighted Imitation Learning for Batched Historical Data. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/ 4aec1b3435c52abbdf8334ea0e7141e0-Paper.pdf"},{"key":"e_1_3_2_2_42_1","volume-title":"Learning from Delayed Rewards. Ph. D. Dissertation","author":"Watkins C.J.C.H.","unstructured":"C.J.C.H. Watkins . 1989. Learning from Delayed Rewards. Ph. D. Dissertation . University of Cambridge England. C.J.C.H. Watkins. 1989. Learning from Delayed Rewards. Ph. D. Dissertation. University of Cambridge England."},{"key":"#cr-split#-e_1_3_2_2_43_1.1","unstructured":"Yifan Wu George Tucker and Ofir Nachum. 2019. Behavior Regularized Offline Reinforcement Learning. https:\/\/doi.org\/10.48550\/ARXIV.1911.11361 10.48550\/ARXIV.1911.11361"},{"key":"#cr-split#-e_1_3_2_2_43_1.2","unstructured":"Yifan Wu George Tucker and Ofir Nachum. 2019. Behavior Regularized Offline Reinforcement Learning. https:\/\/doi.org\/10.48550\/ARXIV.1911.11361"},{"key":"e_1_3_2_2_44_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"11328","author":"Wu Yue","year":"2021","unstructured":"Yue Wu , Shuangfei Zhai , Nitish Srivastava , Joshua M Susskind , Jian Zhang , Ruslan Salakhutdinov , and Hanlin Goh . 2021 . Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning . In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 11319-- 11328 . https:\/\/proceedings.mlr.press\/ v139\/wu21i.html Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M Susskind, Jian Zhang, Ruslan Salakhutdinov, and Hanlin Goh. 2021. Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 11319--11328. https:\/\/proceedings.mlr.press\/ v139\/wu21i.html"},{"key":"e_1_3_2_2_45_1","unstructured":"Yiqin Yang Xiaoteng Ma Chenghao Li Zewu Zheng Qiyuan Zhang Gao Huang Jun Yang and Qianchuan Zhao. 2021. Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning. https:\/\/doi.org\/10. 48550\/ARXIV.2106.03400  Yiqin Yang Xiaoteng Ma Chenghao Li Zewu Zheng Qiyuan Zhang Gao Huang Jun Yang and Qianchuan Zhao. 2021. Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning. https:\/\/doi.org\/10. 48550\/ARXIV.2106.03400"},{"key":"e_1_3_2_2_46_1","volume-title":"Reinforcement learning in healthcare: a survey. Comput. Surveys 33(1)","author":"Yu Chao","year":"2021","unstructured":"Chao Yu , Jiming Liu , Shamim Nemati , and Guosheng Yin . 2021. Reinforcement learning in healthcare: a survey. Comput. Surveys 33(1) ( 2021 ). Chao Yu, Jiming Liu, Shamim Nemati, and Guosheng Yin. 2021. Reinforcement learning in healthcare: a survey. Comput. Surveys 33(1) (2021)."},{"key":"e_1_3_2_2_47_1","volume-title":"Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC medical informatics and decision making","author":"Yu Chao","year":"2020","unstructured":"Chao Yu , Guoqi Ren , and Yinzhao Dong . 2020. Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC medical informatics and decision making ( 2020 ). Chao Yu, Guoqi Ren, and Yinzhao Dong. 2020. Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC medical informatics and decision making (2020)."},{"key":"e_1_3_2_2_48_1","volume-title":"Deep Inverse Reinforcement Learning for Sepsis Treatment. In 2019 IEEE International Conference on Healthcare Informatics (ICHI). 1--3. https:\/\/doi.org\/10","author":"Yu Chao","year":"2019","unstructured":"Chao Yu , Guoqi Ren , and Jiming Liu . 2019 . Deep Inverse Reinforcement Learning for Sepsis Treatment. In 2019 IEEE International Conference on Healthcare Informatics (ICHI). 1--3. https:\/\/doi.org\/10 .1109\/ICHI.2019.8904645 10.1109\/ICHI.2019.8904645 Chao Yu, Guoqi Ren, and Jiming Liu. 2019. Deep Inverse Reinforcement Learning for Sepsis Treatment. In 2019 IEEE International Conference on Healthcare Informatics (ICHI). 1--3. https:\/\/doi.org\/10.1109\/ICHI.2019.8904645"},{"key":"e_1_3_2_2_49_1","volume-title":"Combo: Conservative offline model-based policy optimization. Advances in neural information processing systems 34","author":"Yu Tianhe","year":"2021","unstructured":"Tianhe Yu , Aviral Kumar , Rafael Rafailov , Aravind Rajeswaran , Sergey Levine , and Chelsea Finn . 2021 . Combo: Conservative offline model-based policy optimization. Advances in neural information processing systems 34 (2021), 28954--28967. Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, and Chelsea Finn. 2021. Combo: Conservative offline model-based policy optimization. Advances in neural information processing systems 34 (2021), 28954--28967."},{"key":"e_1_3_2_2_50_1","first-page":"14129","article-title":"Mopo: Model-based offline policy optimization","volume":"33","author":"Yu Tianhe","year":"2020","unstructured":"Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Y Zou , Sergey Levine , Chelsea Finn , and Tengyu Ma . 2020 . Mopo: Model-based offline policy optimization . Advances in Neural Information Processing Systems 33 (2020), 14129 -- 14142 . Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y Zou, Sergey Levine, Chelsea Finn, and Tengyu Ma. 2020. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems 33 (2020), 14129--14142.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_51_1","unstructured":"Daochen Zha Kwei-Herng Lai Qiaoyu Tan Sirui Ding Na Zou and Xia Hu. 2022. Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning. arXiv:2208.12433 [cs.LG]  Daochen Zha Kwei-Herng Lai Qiaoyu Tan Sirui Ding Na Zou and Xia Hu. 2022. Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning. arXiv:2208.12433 [cs.LG]"},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s40265-020-01435-4"},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2020.3014556"}],"event":{"name":"KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Long Beach CA USA","acronym":"KDD '23","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599800","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580305.3599800","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:23Z","timestamp":1750182563000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599800"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,4]]},"references-count":58,"alternative-id":["10.1145\/3580305.3599800","10.1145\/3580305"],"URL":"https:\/\/doi.org\/10.1145\/3580305.3599800","relation":{},"subject":[],"published":{"date-parts":[[2023,8,4]]},"assertion":[{"value":"2023-08-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}