{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T21:51:30Z","timestamp":1778709090179,"version":"3.51.4"},"reference-count":112,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,9,15]],"date-time":"2023-09-15T00:00:00Z","timestamp":1694736000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Science Alliance, The University of Tennessee, and the Laboratory Directed Research"},{"name":"Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U.S. Department of Energy"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2024,2,29]]},"abstract":"<jats:p>Reinforcement learning (RL) can assist in medical decision making using patient data collected in electronic health record (EHR) systems. RL, a type of machine learning, can use these data to develop treatment policies. However, RL models are typically trained using imperfect retrospective EHR data. Therefore, if care is not taken in training, RL policies can propagate existing bias in healthcare. Literature that considers and addresses the issues of bias and fairness in sequential decision making are reviewed. The major themes to mitigate bias that emerge relate to (1) data management; (2) algorithmic design; and (3) clinical understanding of the resulting policies.<\/jats:p>","DOI":"10.1145\/3609502","type":"journal-article","created":{"date-parts":[[2023,7,18]],"date-time":"2023-07-18T12:26:44Z","timestamp":1689683204000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":23,"title":["Bias in Reinforcement Learning: A Review in Healthcare Applications"],"prefix":"10.1145","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8944-5599","authenticated-orcid":false,"given":"Benjamin","family":"Smith","sequence":"first","affiliation":[{"name":"University of Tennessee: Bredesen Center for Interdisciplinary Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6818-2048","authenticated-orcid":false,"given":"Anahita","family":"Khojandi","sequence":"additional","affiliation":[{"name":"University of Tennessee: Department of Industrial and Systems Engineering"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4692-8579","authenticated-orcid":false,"given":"Rama","family":"Vasudevan","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory: Center for Nanophase Materials Sciences"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,9,15]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"crossref","unstructured":"Shahriar Akter Grace McCarthy Shahriar Sajib Katina Michael Yogesh K. Dwivedi John D\u2019Ambra and K. N. Shen. 2021. Algorithmic bias in data-driven innovation in the age of AI. International Journal of Information Management 60 (2021). https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0268401221000803","DOI":"10.1016\/j.ijinfomgt.2021.102387"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1080\/19466315.2015.1077726"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1057\/hs.2012.11"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2020.3027443"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1287\/ijds.2022.0015"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Matt Baucum Anahita Khojandi Rama Vasudevan and Ritesh Ramdhani. 2023. Optimizing patient-specific medication regimen policies using wearable sensors in parkinson\u2019s disease. Management Science 0 0 (2023).","DOI":"10.1287\/mnsc.2023.4747"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1142\/9789813207813_0021"},{"key":"e_1_3_1_9_2","article-title":"Using decision making theory to inform clinical practice","author":"Bekker Hillary","year":"2015","unstructured":"Hillary Bekker. 2015. Using decision making theory to inform clinical practice. Shared Decision Making in Healthcare: Achieving Evidence-based Patient Choice (2015).","journal-title":"Shared Decision Making in Healthcare: Achieving Evidence-based Patient Choice"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1370\/afm.749"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1177\/135581960300800109"},{"key":"e_1_3_1_12_2","article-title":"Cost-sensitive learning for imbalanced classification","author":"Brownlee Jason","year":"2020","unstructured":"Jason Brownlee. 2020. Cost-sensitive learning for imbalanced classification. Machine Learning Mastery (Jan2020). https:\/\/machinelearningmastery.com\/cost-sensitive-learning-for-imbalanced-classification\/","journal-title":"Machine Learning Mastery"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.apergo.2013.04.023"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-020-19393-6"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_3_1_16_2","unstructured":"Irene Chen Fredrik D. Johansson and David Sontag. 2018. Why Is My Classifier Discriminatory? (Dec2018). https:\/\/arxiv.org\/abs\/1805.12002"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICTAI52525.2021.00123"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.5694\/j.1326-5377.2004.tb05928.x"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-03098-8_31"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.2308\/isys-10254"},{"key":"e_1_3_1_21_2","article-title":"An introduction to deep reinforcement learning","volume":"1811","author":"Fran\u00e7ois-Lavet Vincent","year":"2018","unstructured":"Vincent Fran\u00e7ois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, and Joelle Pineau. 2018. An introduction to deep reinforcement learning. CoRR abs\/1811.12560 (2018). arxiv:1811.12560http:\/\/arxiv.org\/abs\/1811.12560","journal-title":"CoRR"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1475-6773.2010.01110.x"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3437963.3441824"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3488560.3498487"},{"key":"e_1_3_1_25_2","volume-title":"Proceedings of the Decision Awareness in Reinforcement Learning Workshop at ICML 2022","author":"Geng Xinyang","unstructured":"Xinyang Geng, Kevin Li, Abhishek Gupta, Aviral Kumar, and Sergey Levine. [n.d.]. Effective offline RL needs going beyond pessimism: Representations and distributional shift. In Proceedings of the Decision Awareness in Reinforcement Learning Workshop at ICML 2022."},{"key":"e_1_3_1_26_2","unstructured":"Yue Geng and Xinyu Luo. 2018. Cost-Sensitive Convolution based Neural Networks for Imbalanced Time-Series Classification. (2018). arxiv:cs.LG\/1801.04396"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00134-014-3406-5"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.3122\/jabfm.2017.04.170046"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v38i3.2741"},{"key":"e_1_3_1_30_2","unstructured":"Marek Grze\u015b. 2017. Reward shaping in episodic reinforcement learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (AAMAS\u201917) . International Foundation for Autonomous Agents and Multiagent Systems Richland SC 565\u2013573."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1258\/jrsm.2010.100104"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICTAI50040.2020.00068"},{"key":"e_1_3_1_33_2","article-title":"Calibration for the (Computationally-Identifiable) Masses","volume":"1711","author":"H\u00e9bert-Johnson \u00darsula","year":"2017","unstructured":"\u00darsula H\u00e9bert-Johnson, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. 2017. Calibration for the (Computationally-Identifiable) Masses. CoRR abs\/1711.08513 (2017). arxiv:1711.08513http:\/\/arxiv.org\/abs\/1711.08513","journal-title":"CoRR"},{"key":"e_1_3_1_34_2","article-title":"Explainability in deep reinforcement learning","volume":"2008","author":"Heuillet Alexandre","year":"2020","unstructured":"Alexandre Heuillet, Fabien Couthouis, and Natalia D\u00edaz Rodr\u00edguez. 2020. Explainability in deep reinforcement learning. CoRR abs\/2008.06693 (2020). arxiv:2008.06693https:\/\/arxiv.org\/abs\/2008.06693","journal-title":"CoRR"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.2471\/BLT.19.234732"},{"key":"e_1_3_1_36_2","doi-asserted-by":"crossref","unstructured":"Sara Hooker. 2021. Moving Beyond \u201cAlgorithmic Bias is a Data Problem\u201d. (Apr2021). https:\/\/www.sciencedirect.com\/science\/article\/pii\/S2666389921000611","DOI":"10.1016\/j.patter.2021.100241"},{"key":"e_1_3_1_37_2","article-title":"Learning to utilize shaping rewards: A new approach of reward shaping","volume":"2011","author":"Hu Yujing","year":"2020","unstructured":"Yujing Hu, Weixun Wang, Hangtian Jia, Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, and Changjie Fan. 2020. Learning to utilize shaping rewards: A new approach of reward shaping. CoRR abs\/2011.02669 (2020). arXiv:2011.02669https:\/\/arxiv.org\/abs\/2011.02669","journal-title":"CoRR"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2017.03.009"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12874-017-0442-1"},{"key":"e_1_3_1_40_2","unstructured":"Alistair Johnson Tom Pollard and Roger Mark. 2016. MIMIC-III Clinical Database. (Sept2016). https:\/\/physionet.org\/content\/mimiciii\/1.4\/"},{"issue":"11","key":"e_1_3_1_41_2","first-page":"S28\u2013S33","article-title":"Genetic variation, classification and \u2019race\u2019","volume":"36","author":"Jorde Lynn B.","year":"2004","unstructured":"Lynn B. Jorde and Stephen P. Wooding. 2004. Genetic variation, classification and \u2019race\u2019. Nature Genetics 36, 11 (2004), S28\u2013S33.","journal-title":"Nature Genetics"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","unstructured":"Christopher J. Kelly Alan Karthikesalingam Mustafa Suleyman Greg Corrado and Dominic King. 2019. Key Challenges for Delivering Clinical Impact with Artificial Intelligence. (Oct2019). 10.1186\/s12916-019-1426-2","DOI":"10.1186\/s12916-019-1426-2"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1287\/ijoc.2013.0586"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2016.2621"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocaa127"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1177\/0272989X12448929"},{"key":"e_1_3_1_47_2","first-page":"88","volume-title":"ECAI","author":"Kukar Matjaz","year":"1998","unstructured":"Matjaz Kukar and Igor Kononenko. 1998. Cost-sensitive learning with neural networks.. In ECAI, Vol. 15. Citeseer, 88\u201394."},{"key":"e_1_3_1_48_2","article-title":"Stabilizing off-policy q-learning via bootstrapping error reduction","volume":"32","author":"Kumar Aviral","year":"2019","unstructured":"Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. 2019. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_49_2","first-page":"1179","article-title":"Conservative q-learning for offline reinforcement learning","volume":"33","author":"Kumar Aviral","year":"2020","unstructured":"Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179\u20131191.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_50_2","unstructured":"Kimmo K\u00e4rkk\u00e4inen and Jungseock Joo. 2019. FairFace: Face Attribute Dataset for Balanced Race Gender and Age. (2019). arxiv:cs.CV\/1908.04913"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-020-0301-z"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jclinepi.2018.04.005"},{"key":"e_1_3_1_53_2","article-title":"Offline reinforcement learning: Tutorial, review, and perspectives on open problems","volume":"2005","author":"Levine Sergey","year":"2020","unstructured":"Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. CoRR abs\/2005.01643 (2020). arxiv:2005.01643https:\/\/arxiv.org\/abs\/2005.01643","journal-title":"CoRR"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.2169\/internalmedicine.4664-20"},{"key":"e_1_3_1_55_2","article-title":"Simultaneous double Q-learning with conservative advantage learning for actor-critic methods","author":"Li Qing","year":"2022","unstructured":"Qing Li, Wengang Zhou, Zhenbo Lu, and Houqiang Li. 2022. Simultaneous double Q-learning with conservative advantage learning for actor-critic methods. arXiv preprint arXiv:2205.03819 (2022).","journal-title":"arXiv preprint arXiv:2205.03819"},{"key":"e_1_3_1_56_2","first-page":"101878","article-title":"Electronic health records based reinforcement learning for treatment optimizing","author":"Li Tian-Hao","year":"2021","unstructured":"Tian-Hao Li, Zhi-Shun Wang, Wei Lu, Qian Zhang, and Deng-Feng Li. 2021. Electronic health records based reinforcement learning for treatment optimizing. Information Systems (2021), 101878.","journal-title":"Information Systems"},{"key":"e_1_3_1_57_2","article-title":"Deep reinforcement learning: An overview.","volume":"1701","author":"Li Yuxi","year":"2017","unstructured":"Yuxi Li. 2017. Deep reinforcement learning: An overview. CoRR abs\/1701.07274 (2017). http:\/\/dblp.uni-trier.de\/db\/journals\/corr\/corr1701.html#Li17b","journal-title":"CoRR"},{"key":"e_1_3_1_58_2","unstructured":"Enlu Lin Qiong Chen and Xiaoming Qi. 2019. Deep Reinforcement Learning for Imbalanced Classification. (2019). arxiv:cs.LG\/1901.01379"},{"key":"e_1_3_1_59_2","article-title":"Deep reinforcement learning for imbalanced classification","volume":"1901","author":"Lin Enlu","year":"2019","unstructured":"Enlu Lin, Qiong Chen, and Xiaoming Qi. 2019. Deep reinforcement learning for imbalanced classification. CoRR abs\/1901.01379 (2019). arxiv:1901.01379http:\/\/arxiv.org\/abs\/1901.01379","journal-title":"CoRR"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.5555\/3091574.3091594"},{"key":"e_1_3_1_61_2","article-title":"Deep reinforcement learning for dynamic treatment regimes on medical registry data","volume":"1801","author":"Liu Ning","year":"2018","unstructured":"Ning Liu, Ying Liu, Brent Logan, Zhiyuan Xu, Jian Tang, and Yanzhi Wang. 2018. Deep reinforcement learning for dynamic treatment regimes on medical registry data. CoRR abs\/1801.09271 (2018). arxiv:1801.09271http:\/\/arxiv.org\/abs\/1801.09271","journal-title":"CoRR"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","unstructured":"S. Liu K. C. See K. Y. Ngiam L. A. Celi X. Sun and M. Feng. 2020. Reinforcement learning for clinical decision support in critical care: comprehensive review. Journal of Medical Internet Research 22 7 (2020) e18477. 10.2196\/18477","DOI":"10.2196\/18477"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","unstructured":"Zeyu Liu Anahita Khojandi Xueping Li Akram Mohammed Robert L. Davis and Rishikesan Kamaleswaran. 2022. A machine learning\u2013enabled partially observable markov decision process framework for early sepsis prediction. INFORMS J. on Computing 34 4 (July-August 2022) 2039\u20132057. 10.1287\/ijoc.2022.1176","DOI":"10.1287\/ijoc.2022.1176"},{"key":"e_1_3_1_64_2","first-page":"773","volume-title":"AMIA Annual Symposium Proceedings","volume":"2020","author":"Lu MingYu","year":"2020","unstructured":"MingYu Lu, Zachary Shahn, Daby Sow, Finale Doshi-Velez, and Li-wei H Lehman. 2020. Is deep reinforcement learning ready for practical applications in healthcare? A sensitivity analysis of duel-DDQN for hemodynamic management in sepsis patients. In AMIA Annual Symposium Proceedings, Vol. 2020. American Medical Informatics Association, 773."},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0212665"},{"key":"e_1_3_1_66_2","article-title":"Playing Atari with deep reinforcement learning","volume":"1312","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing Atari with deep reinforcement learning. CoRR abs\/1312.5602 (2013). arxiv:1312.5602http:\/\/arxiv.org\/abs\/1312.5602","journal-title":"CoRR"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1002\/sim.2022"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/EMBC.2016.7591355"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12982-021-00095-3"},{"key":"e_1_3_1_70_2","article-title":"Subgoal-based reward shaping to improve efficiency in reinforcement learning","volume":"2104","author":"Okudo Takato","year":"2021","unstructured":"Takato Okudo and Seiji Yamada. 2021. Subgoal-based reward shaping to improve efficiency in reinforcement learning. CoRR abs\/2104.06411 (2021). arxiv:2104.06411https:\/\/arxiv.org\/abs\/2104.06411","journal-title":"CoRR"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-27645-3_1"},{"key":"e_1_3_1_72_2","doi-asserted-by":"crossref","unstructured":"Trishan Panch Heather Mattie and Rifat Atun. 2019. Artificial Intelligence and Algorithmic Bias: Implications for Health Systems. (Dec2019). https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC6875681\/","DOI":"10.7189\/jogh.09.020318"},{"key":"e_1_3_1_73_2","unstructured":"Sonali Parbhoo Jasmina Bogojeska Maurizio Zazzi Volker Roth and Finale Doshi-Velez. 2017. Combining Kernel and Model Based Learning for HIV Therapy Selection. (Jul2017). https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5543338\/"},{"key":"e_1_3_1_74_2","first-page":"589","volume-title":"Proceedings of the Machine Learning for Healthcare Conference","author":"Parbhoo Sonali","year":"2020","unstructured":"Sonali Parbhoo, Mario Wieser, Volker Roth, and Finale Doshi-Velez. 2020. Transfer learning from well-curated to less-resourced populations with HIV. In Proceedings of the Machine Learning for Healthcare Conference. PMLR, 589\u2013609."},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1001\/jama.2019.18058"},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3494672"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2018.178"},{"key":"e_1_3_1_78_2","article-title":"Explainable reinforcement learning: A survey","volume":"2005","author":"Puiutta Erika","year":"2020","unstructured":"Erika Puiutta and Eric M. S. P. Veith. 2020. Explainable reinforcement learning: A survey. CoRR abs\/2005.06247 (2020). arxiv:2005.06247https:\/\/arxiv.org\/abs\/2005.06247","journal-title":"CoRR"},{"key":"e_1_3_1_79_2","article-title":"Continuous state-space models for optimal sepsis treatment - A deep reinforcement learning approach","volume":"1705","author":"Raghu Aniruddh","year":"2017","unstructured":"Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Continuous state-space models for optimal sepsis treatment - A deep reinforcement learning approach. CoRR abs\/1705.08422 (2017). arxiv:1705.08422http:\/\/arxiv.org\/abs\/1705.08422","journal-title":"CoRR"},{"key":"e_1_3_1_80_2","article-title":"A novel policy for pre-trained deep reinforcement learning for speech emotion recognition","volume":"2101","author":"Rajapakshe Thejan","year":"2021","unstructured":"Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Bj\u00f6rn W. Schuller, and Jiajun Liu. 2021. A novel policy for pre-trained deep reinforcement learning for speech emotion recognition. CoRR abs\/2101.00738 (2021). arXiv:2101.00738https:\/\/arxiv.org\/abs\/2101.00738","journal-title":"CoRR"},{"key":"e_1_3_1_81_2","doi-asserted-by":"crossref","unstructured":"Alvin Rajkomar Michael Howell and Michaela Hardt. 2018. Ensuring Fairness in Machine Learning to Advance Health Equity. (2018). https:\/\/pubmed.ncbi.nlm.nih.gov\/30508424\/","DOI":"10.7326\/M18-1990"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2012.01.009"},{"key":"e_1_3_1_83_2","unstructured":"Elsa Riachi Muhammad Mamdani Michael Fralick and Frank Rudzicz. 2021. Challenges for Reinforcement Learning in Healthcare. (2021). arxiv:cs.LG\/2103.05612"},{"key":"e_1_3_1_84_2","first-page":"167","article-title":"Health disparities: Gaps in access, quality and affordability of medical care","volume":"123","author":"Riley Wayne J.","year":"2012","unstructured":"Wayne J. Riley. 2012. Health disparities: Gaps in access, quality and affordability of medical care. Transactions of the American Clinical and Climatological Association 123 (2012), 167.","journal-title":"Transactions of the American Clinical and Climatological Association"},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1177\/0272989X20985752"},{"key":"e_1_3_1_86_2","doi-asserted-by":"publisher","DOI":"10.4338\/ACI-2010-03-RA-0019"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.3390\/app9245287"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","unstructured":"Andrew Schaefer and Matthew Bailey. 2005. Modeling Medical Treatment using Markov Decision Processes . (2005). 10.10072F1-4020-8066-2_23.pdf","DOI":"10.10072F1-4020-8066-2_23.pdf"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1155\/2015\/560108"},{"key":"e_1_3_1_90_2","doi-asserted-by":"publisher","DOI":"10.1136\/bmj.b2393"},{"key":"e_1_3_1_91_2","volume-title":"Reinforcement Learning: An Introduction (2nd ed.)","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). The MIT Press."},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2021.104366"},{"key":"e_1_3_1_93_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12916-018-1093-8"},{"key":"e_1_3_1_94_2","article-title":"Clinician-in-the-loop decision making: Reinforcement learning with near-optimal set-valued policies","volume":"2007","author":"Tang Shengpu","year":"2020","unstructured":"Shengpu Tang, Aditya Modi, Michael W. Sjoding, and Jenna Wiens. 2020. Clinician-in-the-loop decision making: Reinforcement learning with near-optimal set-valued policies. CoRR abs\/2007.12678 (2020). arxiv:2007.12678https:\/\/arxiv.org\/abs\/2007.12678","journal-title":"CoRR"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12874-016-0122-6"},{"key":"e_1_3_1_96_2","doi-asserted-by":"publisher","unstructured":"Alexandra Chouldechovaand Aaron Roth. 2020. A Snapshot of the Frontiers of Fairness in Machine Learning . (May2020). 10.1145\/3376898","DOI":"10.1145\/3376898"},{"key":"e_1_3_1_97_2","unstructured":"R. Vincent. 2014. Reinforcement learning in models of adaptive medical treatment strategies. McGill University (Canada). 2014."},{"key":"e_1_3_1_98_2","doi-asserted-by":"crossref","unstructured":"Darshali A. Vyas Leo G. Eisenstein and David S. Jones. 2020. Hidden in plain sight\u2014reconsidering the use of race correction in clinical algorithms. New England Journal of Medicine 383 9 (2020) 874\u2013882.","DOI":"10.1056\/NEJMms2004740"},{"key":"e_1_3_1_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/EMBC44109.2020.9175311"},{"key":"e_1_3_1_100_2","first-page":"1144","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","author":"Wen Min","year":"2021","unstructured":"Min Wen, Osbert Bastani, and Ufuk Topcu. 2021. Algorithms for fairness in sequential decision making. In Proceedings of the International Conference on Artificial Intelligence and Statistics(PMLR), 1144\u20131152."},{"key":"e_1_3_1_101_2","article-title":"Representation and reinforcement learning for personalized glycemic control in septic patients","volume":"1712","author":"Weng Wei-Hung","year":"2017","unstructured":"Wei-Hung Weng, Mingwu Gao, Ze He, Susu Yan, and Peter Szolovits. 2017. Representation and reinforcement learning for personalized glycemic control in septic patients. CoRR abs\/1712.00654 (2017). arxiv:1712.00654http:\/\/arxiv.org\/abs\/1712.00654","journal-title":"CoRR"},{"key":"e_1_3_1_102_2","doi-asserted-by":"publisher","DOI":"10.1056\/NEJM199308263290907"},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11606-018-4653-x"},{"key":"e_1_3_1_104_2","first-page":"S106\u2013S113","article-title":"Prediction modeling using EHR data: Challenges, strategies, and a comparison of machine learning approaches","author":"Wu Jionglin","year":"2010","unstructured":"Jionglin Wu, Jason Roy, and Walter F. Stewart. 2010. Prediction modeling using EHR data: Challenges, strategies, and a comparison of machine learning approaches. Medical Care (2010), S106\u2013S113.","journal-title":"Medical Care"},{"key":"e_1_3_1_105_2","article-title":"Single episode policy transfer in reinforcement learning","volume":"1910","author":"Yang Jiachen","year":"2019","unstructured":"Jiachen Yang, Brenden K. Petersen, Hongyuan Zha, and Daniel M. Faissol. 2019. Single episode policy transfer in reinforcement learning. CoRR abs\/1910.07719 (2019). arXiv:1910.07719http:\/\/arxiv.org\/abs\/1910.07719","journal-title":"CoRR"},{"key":"e_1_3_1_106_2","article-title":"Algorithmic fairness and bias mitigation for clinical machine learning: A new utility for deep reinforcement learning","author":"Yang Jenny","year":"2022","unstructured":"Jenny Yang, Andrew A. S. Soltan, and David A. Clifton. 2022. Algorithmic fairness and bias mitigation for clinical machine learning: A new utility for deep reinforcement learning. medRxiv (2022).","journal-title":"medRxiv"},{"key":"e_1_3_1_107_2","volume-title":"Proceedings of the LLARLA Workshop, FAIM","volume":"2018","author":"Yao Jiayu","year":"2018","unstructured":"Jiayu Yao, Taylor Killian, George Konidaris, and Finale Doshi-Velez. 2018. Direct policy transfer via hidden parameter markov decision processes. In Proceedings of the LLARLA Workshop, FAIM, Vol. 2018."},{"key":"e_1_3_1_108_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477600"},{"key":"e_1_3_1_109_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557474"},{"key":"e_1_3_1_110_2","article-title":"Quick learner automated vehicle adapting its roadmanship to varying traffic cultures with meta reinforcement learning","volume":"2104","author":"Zhang Songan","year":"2021","unstructured":"Songan Zhang, Lu Wen, Huei Peng, and H. Eric Tseng. 2021. Quick learner automated vehicle adapting its roadmanship to varying traffic cultures with meta reinforcement learning. CoRR abs\/2104.08876 (2021). arXiv:2104.08876https:\/\/arxiv.org\/abs\/2104.08876","journal-title":"CoRR"},{"key":"e_1_3_1_111_2","doi-asserted-by":"crossref","unstructured":"Yang Zhao Zoie Shui-Yee Wong and Kwok Leung Tsui. 2018. A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events\u2019 Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection. (May2018). https:\/\/www.hindawi.com\/journals\/jhe\/2018\/6275435\/","DOI":"10.1155\/2018\/6275435"},{"key":"e_1_3_1_112_2","article-title":"Oversampling for imbalanced time series data","volume":"2004","author":"Zhu Tuanfei","year":"2020","unstructured":"Tuanfei Zhu, Yaping Lin, and Yonghe Liu. 2020. Oversampling for imbalanced time series data. CoRR abs\/2004.06373 (2020). arxiv:2004.06373https:\/\/arxiv.org\/abs\/2004.06373","journal-title":"CoRR"},{"key":"e_1_3_1_113_2","article-title":"Transfer learning in deep reinforcement learning: A survey","volume":"2009","author":"Zhu Zhuangdi","year":"2020","unstructured":"Zhuangdi Zhu, Kaixiang Lin, and Jiayu Zhou. 2020. Transfer learning in deep reinforcement learning: A survey. CoRR abs\/2009.07888 (2020). arXiv:2009.07888https:\/\/arxiv.org\/abs\/2009.07888","journal-title":"CoRR"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609502","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3609502","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:24Z","timestamp":1750178784000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609502"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,15]]},"references-count":112,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,2,29]]}},"alternative-id":["10.1145\/3609502"],"URL":"https:\/\/doi.org\/10.1145\/3609502","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,15]]},"assertion":[{"value":"2022-02-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-15","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}