{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:27:42Z","timestamp":1772908062403,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,14]],"date-time":"2021-08-14T00:00:00Z","timestamp":1628899200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,14]]},"DOI":"10.1145\/3447548.3467255","type":"proceedings-article","created":{"date-parts":[[2021,8,12]],"date-time":"2021-08-12T06:12:09Z","timestamp":1628748729000},"page":"1120-1128","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Dialogue Based Disease Screening Through Domain Customized Reinforcement Learning"],"prefix":"10.1145","author":[{"given":"Zhuo","family":"Liu","sequence":"first","affiliation":[{"name":"Ping An Healthcare Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanxuan","family":"Li","sequence":"additional","affiliation":[{"name":"Ping An Healthcare Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xingzhi","family":"Sun","sequence":"additional","affiliation":[{"name":"Ping An Healthcare Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fei","family":"Wang","sequence":"additional","affiliation":[{"name":"Weill Cornell Medicine, Cornell University, New York, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gang","family":"Hu","sequence":"additional","affiliation":[{"name":"Ping An Healthcare Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guotong","family":"Xie","sequence":"additional","affiliation":[{"name":"Ping An Healthcare Technology&amp;Ping An Healthcare and Technology Co.,Ltd.&amp;Ping An International Smart City Technology Co.,Ltd., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,8,14]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"2020. Classification of Diseases Functioning and Disability. https:\/\/www.cdc. gov\/nchs\/icd  2020. Classification of Diseases Functioning and Disability. https:\/\/www.cdc. gov\/nchs\/icd"},{"key":"e_1_3_2_2_2_1","first-page":"679","article-title":"A Markovian decision process","volume":"6","author":"Bellman Richard","year":"1957","unstructured":"Richard Bellman . 1957 . A Markovian decision process . Journal of mathematics and mechanics 6 , 5 (1957), 679 -- 684 . Richard Bellman. 1957. A Markovian decision process. Journal of mathematics and mechanics 6, 5 (1957), 679--684.","journal-title":"Journal of mathematics and mechanics"},{"key":"e_1_3_2_2_3_1","volume-title":"Stephanie Allassonniere, Julien Stirnemann, Emmanuel Spaggiari, and Antoine Neuraz.","author":"Besson Remi","year":"2018","unstructured":"Remi Besson , Erwan Le Pennec , Stephanie Allassonniere, Julien Stirnemann, Emmanuel Spaggiari, and Antoine Neuraz. 2018 . A model-based reinforcement learning approach for a rare disease diagnostic task. arXiv preprint arXiv:1811.10112 (2018). Remi Besson, Erwan Le Pennec, Stephanie Allassonniere, Julien Stirnemann, Emmanuel Spaggiari, and Antoine Neuraz. 2018. A model-based reinforcement learning approach for a rare disease diagnostic task. arXiv preprint arXiv:1811.10112 (2018)."},{"key":"e_1_3_2_2_4_1","volume-title":"Random forests. Machine learning 45, 1","author":"Breiman Leo","year":"2001","unstructured":"Leo Breiman . 2001. Random forests. Machine learning 45, 1 ( 2001 ), 5--32. Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32."},{"key":"e_1_3_2_2_5_1","volume-title":"Hybrid actor-critic reinforcement learning in parameterized action space. arXiv preprint arXiv:1903.01344","author":"Fan Zhou","year":"2019","unstructured":"Zhou Fan , Rui Su , Weinan Zhang , and Yong Yu. 2019. Hybrid actor-critic reinforcement learning in parameterized action space. arXiv preprint arXiv:1903.01344 ( 2019 ). Zhou Fan, Rui Su, Weinan Zhang, and Yong Yu. 2019. Hybrid actor-critic reinforcement learning in parameterized action space. arXiv preprint arXiv:1903.01344 (2019)."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.2307\/2346830"},{"key":"e_1_3_2_2_7_1","unstructured":"Hao-Cheng Kao Kai-Fu Tang and Edward Y Chang. 2018. Context-Aware Symptom Checking for Disease Diagnosis Using Hierarchical Reinforcement Learning.. In AAAI. 2305--2313.  Hao-Cheng Kao Kai-Fu Tang and Edward Y Chang. 2018. Context-Aware Symptom Checking for Disease Diagnosis Using Hierarchical Reinforcement Learning.. In AAAI. 2305--2313."},{"key":"e_1_3_2_2_8_1","volume-title":"Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems. 3146--3154.","author":"Ke Guolin","year":"2017","unstructured":"Guolin Ke , Qi Meng , Thomas Finley , Taifeng Wang , Wei Chen , Weidong Ma , Qiwei Ye , and Tie-Yan Liu . 2017 . Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems. 3146--3154. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems. 3146--3154."},{"key":"e_1_3_2_2_9_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_2_10_1","volume-title":"Task-oriented Dialogue System for Automatic Disease Diagnosis via Hierarchical Reinforcement Learning. arXiv preprint arXiv:2004.14254","author":"Liao Kangenbei","year":"2020","unstructured":"Kangenbei Liao , Qianlong Liu , Zhongyu Wei , Baolin Peng , Qin Chen , Weijian Sun , and Xuanjing Huang . 2020. Task-oriented Dialogue System for Automatic Disease Diagnosis via Hierarchical Reinforcement Learning. arXiv preprint arXiv:2004.14254 ( 2020 ). Kangenbei Liao, Qianlong Liu, Zhongyu Wei, Baolin Peng, Qin Chen, Weijian Sun, and Xuanjing Huang. 2020. Task-oriented Dialogue System for Automatic Disease Diagnosis via Hierarchical Reinforcement Learning. arXiv preprint arXiv:2004.14254 (2020)."},{"key":"e_1_3_2_2_11_1","volume-title":"Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971","author":"Lillicrap Timothy P","year":"2015","unstructured":"Timothy P Lillicrap , Jonathan J Hunt , Alexander Pritzel , Nicolas Heess , Tom Erez , Yuval Tassa , David Silver , and Daan Wierstra . 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 ( 2015 ). Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)."},{"key":"e_1_3_2_2_12_1","volume-title":"Bbq-networks: Efficient exploration in deep reinforcement learning for task-oriented dialogue systems. arXiv preprint arXiv:1608.05081","author":"Lipton Zachary C","year":"2016","unstructured":"Zachary C Lipton , Xiujun Li , Jianfeng Gao , Lihong Li , Faisal Ahmed , and Li Deng . 2016 . Bbq-networks: Efficient exploration in deep reinforcement learning for task-oriented dialogue systems. arXiv preprint arXiv:1608.05081 (2016). Zachary C Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, and Li Deng. 2016. Bbq-networks: Efficient exploration in deep reinforcement learning for task-oriented dialogue systems. arXiv preprint arXiv:1608.05081 (2016)."},{"key":"e_1_3_2_2_13_1","volume-title":"The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909","author":"Lowe Ryan","year":"2015","unstructured":"Ryan Lowe , Nissan Pow , Iulian Serban , and Joelle Pineau . 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909 ( 2015 ). Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle Pineau. 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909 (2015)."},{"key":"e_1_3_2_2_14_1","volume-title":"International conference on machine learning. 1928--1937","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih , Adria Puigdomenech Badia , Mehdi Mirza , Alex Graves , Timothy Lillicrap , Tim Harley , David Silver , and Koray Kavukcuoglu . 2016 . Asynchronous methods for deep reinforcement learning . In International conference on machine learning. 1928--1937 . Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. 1928--1937."},{"key":"e_1_3_2_2_15_1","volume-title":"Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Alex Graves , Ioannis Antonoglou , Daan Wierstra , and Martin Riedmiller . 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 ( 2013 ). Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"crossref","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Andrei A Rusu Joel Veness Marc G Bellemare Alex Graves Martin Riedmiller Andreas K Fidjeland Georg Ostrovski etal 2015. Human-level control through deep reinforcement learning. nature 518 7540 (2015) 529--533.  Volodymyr Mnih Koray Kavukcuoglu David Silver Andrei A Rusu Joel Veness Marc G Bellemare Alex Graves Martin Riedmiller Andreas K Fidjeland Georg Ostrovski et al. 2015. Human-level control through deep reinforcement learning. nature 518 7540 (2015) 529--533.","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_2_17_1","volume-title":"Near-optimal representation learning for hierarchical reinforcement learning. arXiv preprint arXiv:1810.01257","author":"Nachum Ofir","year":"2018","unstructured":"Ofir Nachum , Shixiang Gu , Honglak Lee , and Sergey Levine . 2018. Near-optimal representation learning for hierarchical reinforcement learning. arXiv preprint arXiv:1810.01257 ( 2018 ). Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Near-optimal representation learning for hierarchical reinforcement learning. arXiv preprint arXiv:1810.01257 (2018)."},{"key":"e_1_3_2_2_18_1","volume-title":"Honglak Lee, and Sergey Levine.","author":"Nachum Ofir","year":"2018","unstructured":"Ofir Nachum , Shixiang Shane Gu , Honglak Lee, and Sergey Levine. 2018 . Dataefficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems . 3303--3313. Ofir Nachum, Shixiang Shane Gu, Honglak Lee, and Sergey Levine. 2018. Dataefficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems. 3303--3313."},{"key":"e_1_3_2_2_19_1","unstructured":"Ofir Nachum Mohammad Norouzi Kelvin Xu and Dale Schuurmans. 2017. Bridging the gap between value and policy based reinforcement learning. In Advances in Neural Information Processing Systems. 2775--2785.  Ofir Nachum Mohammad Norouzi Kelvin Xu and Dale Schuurmans. 2017. Bridging the gap between value and policy based reinforcement learning. In Advances in Neural Information Processing Systems. 2775--2785."},{"key":"e_1_3_2_2_20_1","unstructured":"Andrew Y Ng Michael I Jordan Yair Weiss etal 2002. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 2 (2002) 849--856.  Andrew Y Ng Michael I Jordan Yair Weiss et al. 2002. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 2 (2002) 849--856."},{"key":"e_1_3_2_2_21_1","volume-title":"Refuel: Exploring sparse features in deep reinforcement learning for fast disease diagnosis. In Advances in neural information processing systems. 7322--7331.","author":"Peng Yu-Shao","year":"2018","unstructured":"Yu-Shao Peng , Kai-Fu Tang , Hsuan-Tien Lin , and Edward Chang . 2018 . Refuel: Exploring sparse features in deep reinforcement learning for fast disease diagnosis. In Advances in neural information processing systems. 7322--7331. Yu-Shao Peng, Kai-Fu Tang, Hsuan-Tien Lin, and Edward Chang. 2018. Refuel: Exploring sparse features in deep reinforcement learning for fast disease diagnosis. In Advances in neural information processing systems. 7322--7331."},{"key":"e_1_3_2_2_22_1","volume-title":"International conference on machine learning. PMLR","author":"Schulman John","year":"2015","unstructured":"John Schulman , Sergey Levine , Pieter Abbeel , Michael Jordan , and Philipp Moritz . 2015 . Trust region policy optimization . In International conference on machine learning. PMLR , 1889--1897. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889--1897."},{"key":"e_1_3_2_2_23_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , and Oleg Klimov . 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 ( 2017 ). John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)."},{"key":"e_1_3_2_2_24_1","unstructured":"Richard S Sutton David A McAllester Satinder P Singh and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.  Richard S Sutton David A McAllester Satinder P Singh and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063."},{"key":"e_1_3_2_2_25_1","volume-title":"NIPS Workshop on Deep Reinforcement Learning.","author":"Tang Kai-Fu","year":"2016","unstructured":"Kai-Fu Tang , Hao-Cheng Kao , Chun-Nan Chou , and Edward Y Chang . 2016 . Inquire and diagnose: Neural symptom checking ensemble using deep reinforcement learning . In NIPS Workshop on Deep Reinforcement Learning. Kai-Fu Tang, Hao-Cheng Kao, Chun-Nan Chou, and Edward Y Chang. 2016. Inquire and diagnose: Neural symptom checking ensemble using deep reinforcement learning. In NIPS Workshop on Deep Reinforcement Learning."},{"key":"e_1_3_2_2_26_1","volume-title":"Feudal networks for hierarchical reinforcement learning. arXiv preprint arXiv:1703.01161","author":"Vezhnevets Alexander Sasha","year":"2017","unstructured":"Alexander Sasha Vezhnevets , Simon Osindero , Tom Schaul , Nicolas Heess , Max Jaderberg , David Silver , and Koray Kavukcuoglu . 2017. Feudal networks for hierarchical reinforcement learning. arXiv preprint arXiv:1703.01161 ( 2017 ). Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. Feudal networks for hierarchical reinforcement learning. arXiv preprint arXiv:1703.01161 (2017)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00443"},{"key":"e_1_3_2_2_28_1","volume-title":"Machine learning 8, 3--4","author":"Watkins Christopher JCH","year":"1992","unstructured":"Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning 8, 3--4 ( 1992 ), 279--292. Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8, 3--4 (1992), 279--292."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-2033"},{"key":"e_1_3_2_2_30_1","volume-title":"A network-based endto-end trainable task-oriented dialogue system. arXiv preprint arXiv:1604.04562","author":"Wen Tsung-Hsien","year":"2016","unstructured":"Tsung-Hsien Wen , David Vandyke , Nikola Mrksic , Milica Gasic , Lina M RojasBarahona , Pei-Hao Su , Stefan Ultes , and Steve Young . 2016. A network-based endto-end trainable task-oriented dialogue system. arXiv preprint arXiv:1604.04562 ( 2016 ). Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M RojasBarahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2016. A network-based endto-end trainable task-oriented dialogue system. arXiv preprint arXiv:1604.04562 (2016)."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33017346"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.11182"}],"event":{"name":"KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Virtual Event Singapore","acronym":"KDD '21","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447548.3467255","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447548.3467255","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:28Z","timestamp":1750191508000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447548.3467255"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,14]]},"references-count":32,"alternative-id":["10.1145\/3447548.3467255","10.1145\/3447548"],"URL":"https:\/\/doi.org\/10.1145\/3447548.3467255","relation":{},"subject":[],"published":{"date-parts":[[2021,8,14]]},"assertion":[{"value":"2021-08-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}