{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T19:44:24Z","timestamp":1771703064678,"version":"3.50.1"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686318","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>Reinforcement learning (RL) holds promise for supporting personalised decision-making in healthcare, but existing approaches often struggle to incorporate patient expertise and individual preferences, key components for clinically viable AI systems. This work introduces PAINT (Preference Adaptation for Individualised Treatment), a general framework for preference-guided offline RL in safety-critical settings. PAINT combines sketch-based reward annotation with safety-constrained policy optimisation, enabling fine-grained preference capture from historical patient data without requiring action labels. A reward model trained on this feedback guides offline RL while supporting tunable sensitivity to preference signals and enforcing clinical safety constraints. Using type 1 diabetes (T1D) management as a case study, in-silico evaluation with the FDA-accepted T1D simulator demonstrates that can PAINT reduces patient risk by 15% over commercial baselines under guidance, while enabling preference-driven adaptations such as improved management during challenging mealtime events and enhanced robustness to dosing errors. The method further shows resilience to real-world challenges including sample size, annotation noise, and inter-patient variability. These findings suggest PAINT offers a practical pathway for integrating human feedback into offline RL in patient settings, with broader implications for developing trustworthy and adaptive AI systems in healthcare.<\/jats:p>","DOI":"10.3233\/faia251067","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:50:31Z","timestamp":1761126631000},"source":"Crossref","is-referenced-by-count":2,"title":["Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback"],"prefix":"10.3233","author":[{"given":"Harry","family":"Emerson","sequence":"first","affiliation":[{"name":"University of Bristol, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sam Gordon","family":"James","sequence":"additional","affiliation":[{"name":"University of Bristol, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthew","family":"Guy","sequence":"additional","affiliation":[{"name":"University of Bristol, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ryan","family":"McConville","sequence":"additional","affiliation":[{"name":"University of Bristol, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA251067","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:50:32Z","timestamp":1761126632000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA251067"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia251067","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}