{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T19:27:03Z","timestamp":1775503623529,"version":"3.50.1"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62276168 and 62176225"],"award-info":[{"award-number":["62276168 and 62176225"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Scientific Foundation for Youth Scholars of Shenzhen University","award":["868-000001032177"],"award-info":[{"award-number":["868-000001032177"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Auton. Adapt. Syst."],"published-print":{"date-parts":[[2026,3,31]]},"abstract":"<jats:p>\n                    When intelligent agents act in a stochastic environment, the principle of maximizing expected rewards is used to optimize their policies. The rationality of\n                    <jats:italic toggle=\"yes\">the maximum rewards<\/jats:italic>\n                    becomes a single objective when agents\u2019 decision problems are solved in most cases. This sometimes leads to the agents\u2019 behaviors (the optimal policies for solving the decision problems) that are not\n                    <jats:italic toggle=\"yes\">legible<\/jats:italic>\n                    . In other words, it is difficult for users (or other agents and even humans) to understand the agents\u2019 intentions when they are executing the optimal policies. Hence, it becomes pertinent to consider the legibility of agents\u2019 decision problems. The key challenge lies in formulating a proper legibility function in the problems. Using domain experts\u2019 inputs leans to be subjective and inconsistent in specifying legibility values, and the manual approach quickly becomes infeasible in a complex problem domain. In this article, we aim to learn such a legibility function parallel to developing a (conventional) reward function. We adopt inverse reinforcement learning techniques to automate a legibility function in agents\u2019 decision problems. We first demonstrate the effectiveness of the inverse reinforcement learning technique when legibility is solely considered in a decision problem. Things become complicated when both the reward and legibility functions are to be found. We develop a multi-objective inverse reinforcement learning method to automate the two functions in a good balance simultaneously. We vary problem domains in the performance study and provide empirical results in support.\n                  <\/jats:p>","DOI":"10.1145\/3736417","type":"journal-article","created":{"date-parts":[[2025,5,21]],"date-time":"2025-05-21T10:05:46Z","timestamp":1747821946000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Automate Legibility through Inverse Reinforcement Learning"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0546-9628","authenticated-orcid":false,"given":"Buxin","family":"Zeng","sequence":"first","affiliation":[{"name":"Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, United Kingdom of Great Britain and Northern Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5715-2855","authenticated-orcid":false,"given":"Yinghui","family":"Pan","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence &amp; National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0821-4623","authenticated-orcid":false,"given":"Jing","family":"Tang","sequence":"additional","affiliation":[{"name":"Newcastle Business School, Northumbria University, Newcastle upon Tyne, United Kingdom of Great Britain and Northern Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5246-403X","authenticated-orcid":false,"given":"Yifeng","family":"Zeng","sequence":"additional","affiliation":[{"name":"Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, United Kingdom of Great Britain and Northern Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,3,10]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"e_1_3_1_3_2","unstructured":"Ankit Agrawal and Jane Cleland-Huang. 2021. RescueAR: Augmented reality supported collaboration for UAV driven emergency response systems. arXiv:2110.00180. Retrieved from https:\/\/arxiv.org\/abs\/2110.00180"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2021.103500"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.113816"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390162"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-021-05984-x"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.5555\/560669"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/RO-MAN47096.2020.9223338"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.asej.2020.11.005"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12369-017-0400-4"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1329125.1329387"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2013.IX.024"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/HRI.2013.6483603"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12559-023-10201-z"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3174264"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2007.01.004"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3068708"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3168446"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1186\/s13244-024-01660-5"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/1622737.1622748"},{"key":"e_1_3_1_22_2","unstructured":"Rishab Khincha Soundarya Krishnan Tirtharaj Dash Lovekesh Vig and Ashwin Srinivasan. 2020. Constructing and evaluating an explainable model for COVID-19 diagnosis from chest X-rays. arXiv:2012.10787. Retrieved from https:\/\/arxiv.org\/abs\/2012.10787"},{"key":"e_1_3_1_23_2","unstructured":"Jan Hendrik Kirchner Yining Chen Harri Edwards Jan Leike Nat McAleese and Yuri Burda. 2024. Prover-verifier games improve legibility of LLM outputs. arXiv:2407.13692. Retrieved from https:\/\/arxiv.org\/abs\/2407.13692"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/s13218-010-0043-1"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/IIAI-AAI53430.2021.00078"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3455780"},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Anagha Kulkarni Siddharth Srivastava and Subbarao Kambhampati. 2019. Signaling friends and head-faking enemies simultaneously: Balancing goal obfuscation and goal legibility. arXiv:1905.10672. Retrieved from https:\/\/arxiv.org\/abs\/1905.10672","DOI":"10.65109\/CQTO1916"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1038\/44565"},{"key":"e_1_3_1_29_2","volume-title":"Proceedings of the International Conference on Social Robotics (ICSR)","author":"Lichtenth\u00e4ler Christina","year":"2011","unstructured":"Christina Lichtenth\u00e4ler, Tamara Lorenz, and Alexandra Kirsch. 2011. Towards a legibility metric: How to measure the perceived value of a robot. In Proceedings of the International Conference on Social Robotics (ICSR)."},{"key":"e_1_3_1_30_2","first-page":"2442","article-title":"Signs of the time: Making AI legible","volume":"1","author":"Lindley Joseph Galen","year":"2020","unstructured":"Joseph Galen Lindley, Paul Coulton, Haider Ali Akmal, and Franziska Louise Pilling. 2020. Signs of the time: Making AI legible. DRS2020: Synergy 1 (2020), 2442\u20132459.","journal-title":"DRS2020: Synergy"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.5555\/3545946.3599167"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3171221.3171255"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/RO-MAN50785.2021.9515318"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.5555\/3398761.3399031"},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","unstructured":"Richard Mortier Hamed Haddadi Tristan Henderson Derek McAuley and Jon Crowcroft. 2014. Human-data interaction: The human face of the data-driven society. arXiv:1412.6159. Retrieved from https:\/\/arxiv.org\/abs\/1412.6159","DOI":"10.2139\/ssrn.2508051"},{"key":"e_1_3_1_36_2","first-page":"663","volume-title":"Proceedings of the 17th International Conference on Machine Learning","author":"Ng Andrew Y.","year":"2000","unstructured":"Andrew Y. Ng and Stuart Russell. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning, 663\u2013670."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/HRI.2016.7451762"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3334480.3381820"},{"key":"e_1_3_1_39_2","first-page":"548","article-title":"Legible AI by design: Design research to frame, design, empirically test and evaluate AI iconography","author":"Pilling Franziska","year":"2020","unstructured":"Franziska Pilling, Haider Ali Akmal, Adrian Gradinar, Joseph Lindley, and Paul Coulton. 2020. Legible AI by design: Design research to frame, design, empirically test and evaluate AI iconography. In Proceedings of Swiss Design Network Symposium 2021 Conference, 548\u2013565.","journal-title":"Proceedings of Swiss Design Network Symposium 2021 Conference,"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2024.120128"},{"key":"e_1_3_1_41_2","first-page":"2586","volume-title":"Proceedings of the 20th International Joint Conference on Artificial intelligence (IJCAI)","volume":"7","author":"Ramachandran Deepak","year":"2007","unstructured":"Deepak Ramachandran and Eyal Amir. 2007. Bayesian inverse reinforcement learning. In Proceedings of the 20th International Joint Conference on Artificial intelligence (IJCAI), Vol. 7, 2586\u20132591."},{"issue":"1","key":"e_1_3_1_42_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-031-01576-2","article-title":"Multi-objective decision making","volume":"11","author":"Roijers Diederik M.","year":"2017","unstructured":"Diederik M. Roijers and Shimon Whiteson. 2017. Multi-objective decision making. Synthesis Lectures on Artificial Intelligence and Machine Learning 11, 1 (2017), 1\u2013129.","journal-title":"Synthesis Lectures on Artificial Intelligence and Machine Learning"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/506443.506619"},{"key":"e_1_3_1_44_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ADPRL.2013.6615007"},{"key":"e_1_3_1_46_2","unstructured":"Sebastian Wallkotter Mohamed Chetouani and Ginevra Castellano. 2022. A new approach to evaluating legibility: Comparing legibility frameworks using framework-independent robot motion trajectories. arXiv:2201.05765. Retrieved from https:\/\/arxiv.org\/abs\/2201.05765"},{"issue":"3","key":"e_1_3_1_47_2","first-page":"279","article-title":"Q-learning","volume":"8","author":"Watkins Christopher J. C. H.","year":"1992","unstructured":"Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3\u20134 (1992), 279\u2013292.","journal-title":"Machine Learning"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CAI59869.2024.00143"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v25i1.8017"},{"key":"e_1_3_1_50_2","first-page":"1433","volume-title":"Proceedings of the 23rd national conference on Artificial intelligence (AAAI)","volume":"8","author":"Ziebart Brian D.","year":"2008","unstructured":"Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd national conference on Artificial intelligence (AAAI), Vol. 8, 1433\u20131438."}],"container-title":["ACM Transactions on Autonomous and Adaptive Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3736417","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T14:10:35Z","timestamp":1773583835000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3736417"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,10]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,3,31]]}},"alternative-id":["10.1145\/3736417"],"URL":"https:\/\/doi.org\/10.1145\/3736417","relation":{},"ISSN":["1556-4665","1556-4703"],"issn-type":[{"value":"1556-4665","type":"print"},{"value":"1556-4703","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,10]]},"assertion":[{"value":"2024-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-03-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}