{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T15:41:58Z","timestamp":1775576518720,"version":"3.50.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:00:00Z","timestamp":1675296000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:00:00Z","timestamp":1675296000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Deep Reinforcement Learning (DRL) has been increasingly attempted in assisting clinicians for real-time treatment of sepsis. While a value function quantifies the performance of policies in such decision-making processes, most value-based DRL algorithms cannot evaluate the target value function precisely and are not as safe as clinical experts. In this study, we propose a Weighted Dueling Double Deep Q-Network with embedded human Expertise (WD3QNE). A target Q value function with adaptive dynamic weight is designed to improve the estimate accuracy and human expertise in decision-making is leveraged. In addition, the random forest algorithm is employed for feature selection to improve model interpretability. We test our algorithm against state-of-the-art value function methods in terms of expected return, survival rate, action distribution and external validation. The results demonstrate that WD3QNE obtains the highest survival rate of 97.81% in MIMIC-III dataset. Our proposed method is capable of providing reliable treatment decisions with embedded clinician expertise.<\/jats:p>","DOI":"10.1038\/s41746-023-00755-5","type":"journal-article","created":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T11:10:32Z","timestamp":1675336232000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":52,"title":["A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis"],"prefix":"10.1038","volume":"6","author":[{"given":"XiaoDan","family":"Wu","sequence":"first","affiliation":[]},{"given":"RuiChang","family":"Li","sequence":"additional","affiliation":[]},{"given":"Zhen","family":"He","sequence":"additional","affiliation":[]},{"given":"TianZhi","family":"Yu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7810-7279","authenticated-orcid":false,"given":"ChangQing","family":"Cheng","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,2]]},"reference":[{"key":"755_CR1","doi-asserted-by":"publisher","first-page":"801","DOI":"10.1001\/jama.2016.0287","volume":"315","author":"M Singer","year":"2016","unstructured":"Singer, M. et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). Jama 315, 801\u2013810 (2016).","journal-title":"Jama"},{"key":"755_CR2","doi-asserted-by":"publisher","first-page":"e0000012","DOI":"10.1371\/journal.pdig.0000012","volume":"1","author":"T Nanayakkara","year":"2022","unstructured":"Nanayakkara, T. et al. Unifying cardiovascular modelling with deep reinforcement learning for uncertainty aware control of sepsis treatment. PLoS Digital Health 1, e0000012 (2022).","journal-title":"PLoS Digital Health"},{"key":"755_CR3","doi-asserted-by":"publisher","first-page":"1181","DOI":"10.1007\/s00134-021-06506-y","volume":"47","author":"L Evans","year":"2021","unstructured":"Evans, L. et al. Surviving sepsis campaign: international guidelines for management of sepsis and septic shock 2021. Intensive Care Med. 47, 1181\u20131247 (2021).","journal-title":"Intensive Care Med."},{"key":"755_CR4","doi-asserted-by":"publisher","first-page":"101820","DOI":"10.1016\/j.artmed.2020.101820","volume":"104","author":"SM Lauritsen","year":"2020","unstructured":"Lauritsen, S. M. et al. Early detection of sepsis utilizing deep learning on electronic health record event sequences. Artif. Intell. Med. 104, 101820 (2020).","journal-title":"Artif. Intell. Med."},{"key":"755_CR5","doi-asserted-by":"publisher","unstructured":"Kallfelz, M. et al. MIMIC-IV demo data in the OMOP Common Data Model (version 0.9). PhysioNet. https:\/\/doi.org\/10.13026\/p1f5-7x35 (2021).","DOI":"10.13026\/p1f5-7x35"},{"key":"755_CR6","first-page":"1","volume":"8","author":"AA Robles","year":"2021","unstructured":"Robles, A. A. et al. Data-driven curation process for describing the blood glucose management in the intensive care unit. Sci. Data 8, 1\u201313 (2021).","journal-title":"Sci. Data"},{"key":"755_CR7","doi-asserted-by":"publisher","first-page":"e5909","DOI":"10.2196\/medinform.5909","volume":"4","author":"T Desautels","year":"2016","unstructured":"Desautels, T. et al. Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach. JMIR Med. Inform. 4, e5909 (2016).","journal-title":"JMIR Med. Inform."},{"key":"755_CR8","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1111\/acem.12876","volume":"23","author":"RA Taylor","year":"2016","unstructured":"Taylor, R. A. et al. Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data\u2013driven, machine learning approach. Acad. Emerg. Med. 23, 269\u2013278 (2016).","journal-title":"Acad. Emerg. Med."},{"key":"755_CR9","doi-asserted-by":"publisher","first-page":"392","DOI":"10.1016\/j.ajem.2020.09.013","volume":"45","author":"A Rodr\u00edguez","year":"2021","unstructured":"Rodr\u00edguez, A. et al. Supervised classification techniques for prediction of mortality in adult patients with sepsis. Am. J. Emerg. Med. 45, 392\u2013397 (2021).","journal-title":"Am. J. Emerg. Med."},{"key":"755_CR10","doi-asserted-by":"publisher","first-page":"102227","DOI":"10.1016\/j.artmed.2021.102227","volume":"123","author":"G Schamberg","year":"2022","unstructured":"Schamberg, G. et al. Continuous action deep reinforcement learning for propofol dosing during general anesthesia. Artif. Intell. Med. 123, 102227 (2022).","journal-title":"Artif. Intell. Med."},{"key":"755_CR11","doi-asserted-by":"publisher","first-page":"101964","DOI":"10.1016\/j.artmed.2020.101964","volume":"109","author":"A Coronato","year":"2020","unstructured":"Coronato, A. et al. Reinforcement learning for intelligent healthcare applications: a survey. Artif. Intell. Med. 109, 101964 (2020).","journal-title":"Artif. Intell. Med."},{"key":"755_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3477600","volume":"55","author":"C Yu","year":"2021","unstructured":"Yu, C. et al. Reinforcement learning in healthcare: a survey. ACM Comput. Surv. (CSUR) 55, 1\u201336 (2021).","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"755_CR13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-56847-4","volume":"10","author":"Z Zhang","year":"2020","unstructured":"Zhang, Z., Zheng, B. & Liu, N. Individualized fluid administration for critically ill patients with sepsis with an interpretable dynamic treatment regimen model. Sci. Rep. 10, 1\u20139 (2020).","journal-title":"Sci. Rep."},{"key":"755_CR14","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1007\/s00134-019-05898-2","volume":"46","author":"M Komorowski","year":"2020","unstructured":"Komorowski, M. Clinical management of sepsis can be improved by artificial intelligence: yes. Intensive Care Med. 46, 375\u2013377 (2020).","journal-title":"Intensive Care Med."},{"key":"755_CR15","unstructured":"Rummery, G. A. & Mahesan, N. On-line Q-learning using connectionist systems. 37 (University of Cambridge Press, Cambridge, 1994)."},{"key":"755_CR16","unstructured":"Komorowski, M. et al. A markov decision process to suggest optimal treatment of severe infections in intensive care. Neural Information Processing Systems Workshop on Machine Learning for Health (2016)."},{"key":"755_CR17","unstructured":"Raghu, A. Komorowski, M. Celi, L. A. Szolovits, P. & Ghassemi, M. Continuous state-space models for optimal sepsis treatment: A deep reinforcement learning approach. In Proceedings of the 2nd Machine Learning for Healthcare Conference, 147\u2013163 (2017)."},{"key":"755_CR18","doi-asserted-by":"publisher","first-page":"102193","DOI":"10.1016\/j.artmed.2021.102193","volume":"121","author":"S Ebrahimi","year":"2021","unstructured":"Ebrahimi, S. & Lim, G. J. A reinforcement learning approach for finding optimal policy of adaptive radiation therapy considering uncertain tumor biological response. Artif. Intell. Med. 121, 102193 (2021).","journal-title":"Artif. Intell. Med."},{"key":"755_CR19","doi-asserted-by":"publisher","first-page":"e18477","DOI":"10.2196\/18477","volume":"22","author":"S Liu","year":"2020","unstructured":"Liu, S. et al. Reinforcement learning for clinical decision support in critical care: comprehensive review. J. Med. Internet Res. 22, e18477 (2020).","journal-title":"J. Med. Internet Res."},{"key":"755_CR20","doi-asserted-by":"publisher","first-page":"102003","DOI":"10.1016\/j.artmed.2020.102003","volume":"112","author":"L Roggeveen","year":"2021","unstructured":"Roggeveen, L. et al. Transatlantic transferability of a new reinforcement learning model for optimizing haemodynamic treatment for critically ill patients with sepsis. Artif. Intell. Med. 112, 102003 (2021).","journal-title":"Artif. Intell. Med."},{"key":"755_CR21","doi-asserted-by":"crossref","unstructured":"Mnih, V. et al. Human-level control through deep reinforcement learning. Nature. 518, 529\u2013533 (2015).","DOI":"10.1038\/nature14236"},{"key":"755_CR22","doi-asserted-by":"crossref","unstructured":"Van, H. H. Guez, A. & Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, 30 (2016).","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"755_CR23","unstructured":"Lv, P. et al. Integrated double estimator architecture for reinforcement learning. IEEE Trans. Cybernetics. 52, 1\u201312 (2020)."},{"key":"755_CR24","doi-asserted-by":"publisher","first-page":"378","DOI":"10.1007\/s00134-020-05947-1","volume":"46","author":"J Garnacho-Montero","year":"2020","unstructured":"Garnacho-Montero, J. & Mart\u00edn-Loeches, I. Clinical management of sepsis can be improved by artificial intelligence: no. Intensive Care Med. 46, 378\u2013380 (2020).","journal-title":"Intensive Care Med."},{"key":"755_CR25","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1109\/MSP.2017.2743240","volume":"34","author":"K Arulkumaran","year":"2017","unstructured":"Arulkumaran, K. et al. Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34, 26\u201338 (2017).","journal-title":"IEEE Signal Process. Mag."},{"key":"755_CR26","doi-asserted-by":"publisher","first-page":"102847","DOI":"10.1016\/j.bspc.2021.102847","volume":"69","author":"C Sun","year":"2021","unstructured":"Sun, C. et al. Personalized vital signs control based on continuous action-space reinforcement learning with supervised experience. Biomed. Signal Process. Control 69, 102847 (2021).","journal-title":"Biomed. Signal Process. Control"},{"key":"755_CR27","doi-asserted-by":"crossref","unstructured":"Wang, L. et al. Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2447\u20132456 (2018).","DOI":"10.1145\/3219819.3219961"},{"key":"755_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2016.35","volume":"3","author":"AE Johnson","year":"2016","unstructured":"Johnson, A. E. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 1\u20139 (2016).","journal-title":"Sci. Data"},{"key":"755_CR29","unstructured":"Brockman, G. et al. OpenAI Gym. Preprint at https:\/\/arxiv.org\/abs\/1606.01540 (2016)."},{"key":"755_CR30","unstructured":"Wang, Z. et al. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, 1995\u20132003 (2016)."},{"key":"755_CR31","unstructured":"Raghu, A. et al. Deep reinforcement learning for sepsis treatment. Preprint at https:\/\/arxiv.org\/abs\/1711.09602 (2017)."},{"key":"755_CR32","doi-asserted-by":"publisher","first-page":"1716","DOI":"10.1038\/s41591-018-0213-5","volume":"24","author":"M Komorowski","year":"2018","unstructured":"Komorowski, M. et al. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24, 1716\u20131720 (2018).","journal-title":"Nat. Med."},{"key":"755_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2018.178","volume":"5","author":"T Pollard","year":"2018","unstructured":"Pollard, T. et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data 5, 1\u201313 (2018).","journal-title":"Sci. Data"},{"key":"755_CR34","doi-asserted-by":"publisher","first-page":"103762","DOI":"10.1016\/j.jbi.2021.103762","volume":"117","author":"Y Jia","year":"2021","unstructured":"Jia, Y. et al. Safety-driven design of machine learning for sepsis treatment. J. Biomed. Inform. 117, 103762 (2021).","journal-title":"J. Biomed. Inform."},{"key":"755_CR35","doi-asserted-by":"publisher","first-page":"101814","DOI":"10.1016\/j.artmed.2020.101814","volume":"103","author":"J Li","year":"2020","unstructured":"Li, J. et al. A multicenter random forest model for effective prognosis prediction in collaborative clinical research network. Artif. Intell. Med. 103, 101814 (2020).","journal-title":"Artif. Intell. Med."},{"key":"755_CR36","doi-asserted-by":"publisher","first-page":"925","DOI":"10.1007\/s00134-018-5085-0","volume":"44","author":"MM Levy","year":"2018","unstructured":"Levy, M. M., Evans, L. E. & Rhodes, A. The surviving sepsis campaign bundle: 2018 update. Intensive Care Med. 44, 925\u2013928 (2018).","journal-title":"Intensive Care Med."},{"key":"755_CR37","doi-asserted-by":"crossref","unstructured":"Yu, C. Ren, G. & Liu, J. Deep inverse reinforcement learning for sepsis treatment. In Proceedings of 2019 IEEE International Conference on Healthcare Informatics (ICHI), 1\u20133 (2019).","DOI":"10.1109\/ICHI.2019.8904645"},{"key":"755_CR38","doi-asserted-by":"publisher","first-page":"101896","DOI":"10.1016\/j.artmed.2020.101896","volume":"109","author":"X Wu","year":"2020","unstructured":"Wu, X. et al. Extracting deep features from short ECG signals for early atrial fibrillation detection. Artif. Intell. Med. 109, 101896 (2020).","journal-title":"Artif. Intell. Med."},{"key":"755_CR39","doi-asserted-by":"publisher","first-page":"102049","DOI":"10.1016\/j.artmed.2021.102049","volume":"114","author":"RK Bania","year":"2021","unstructured":"Bania, R. K. & Halder, A. R-HEFS: rough set based heterogeneous ensemble feature selection method for medical data classification. Artif. Intell. Med. 114, 102049 (2021).","journal-title":"Artif. Intell. Med."},{"key":"755_CR40","doi-asserted-by":"crossref","unstructured":"Zhang, Y. et al. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1315\u20131324 (2017).","DOI":"10.1145\/3097983.3098109"},{"key":"755_CR41","doi-asserted-by":"publisher","first-page":"101836","DOI":"10.1016\/j.artmed.2020.101836","volume":"104","author":"M Tejedor","year":"2020","unstructured":"Tejedor, M., Woldaregay, A. Z. & Godtliebsen, F. Reinforcement learning application in diabetes blood glucose control: a systematic review. Artif. Intell. Med. 104, 101836 (2020).","journal-title":"Artif. Intell. Med."},{"key":"755_CR42","unstructured":"Jiang, N. & Li, L. Doubly robust off-policy value evaluation for reinforcement learning. In Proceedings of International Conference on Machine Learning (PMLR), 652\u2013661 (2016)."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00755-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00755-5","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00755-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T11:26:31Z","timestamp":1675337191000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00755-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,2]]},"references-count":42,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["755"],"URL":"https:\/\/doi.org\/10.1038\/s41746-023-00755-5","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-1777822\/v1","asserted-by":"object"}]},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,2]]},"assertion":[{"value":"16 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 January 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"15"}}