{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T17:19:56Z","timestamp":1776100796723,"version":"3.50.1"},"reference-count":115,"publisher":"Association for Computing Machinery (ACM)","issue":"EICS","license":[{"start":{"date-parts":[[2024,6,17]],"date-time":"2024-06-17T00:00:00Z","timestamp":1718582400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2024,6,17]]},"abstract":"<jats:p>As the number of selectable items increases, point-and-click interfaces rapidly become complex, leading to a decrease in usability. Adaptive user interfaces can reduce this complexity by automatically adjusting an interface to only display the most relevant items. A core challenge for developing adaptive interfaces is to infer user intent and chose adaptations accordingly. Current methods rely on tediously hand-crafted rules or carefully collected user data. Furthermore, heuristics need to be recrafted and data regathered for every new task and interface. To address this issue, we formulate interface adaptation as a multi-agent reinforcement learning problem. Our approach learns adaptation policies without relying on heuristics or real user data, facilitating the development of adaptive interfaces across various tasks with minimal adjustments needed. In our formulation, a user agent mimics a real user and learns to interact with an interface via point-and-click actions. Simultaneously, an interface agent learns interface adaptations, to maximize the user agent's efficiency, by observing the user agent's behavior. For our evaluation, we substituted the simulated user agent with actual users. Our study involved twelve participants and concentrated on automatic toolbar item assignment. The results show that the policies we developed in simulation effectively apply to real users. These users were able to complete tasks with fewer actions and in similar times compared to methods trained with real data. Additionally, we demonstrated our method's efficiency and generalizability across four different interfaces and tasks.<\/jats:p>","DOI":"10.1145\/3661147","type":"journal-article","created":{"date-parts":[[2024,6,17]],"date-time":"2024-06-17T18:25:37Z","timestamp":1718648737000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["MARLUI: Multi-Agent Reinforcement Learning for Adaptive Point-and-Click UIs"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2536-0208","authenticated-orcid":false,"given":"Thomas","family":"Langerak","sequence":"first","affiliation":[{"name":"ETH Z\u00fcrich, Department of Computer Science, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3511-8565","authenticated-orcid":false,"given":"Sammy","family":"Christen","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Department of Computer Science, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3406-8412","authenticated-orcid":false,"given":"Mert","family":"Albaba","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Department of Computer Science, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7162-0133","authenticated-orcid":false,"given":"Christoph","family":"Gebhardt","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Department of Computer Science, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9655-9519","authenticated-orcid":false,"given":"Christian","family":"Holz","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Department of Computer Science, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5068-3474","authenticated-orcid":false,"given":"Otmar","family":"Hilliges","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Department of Computer Science, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,6,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15327051hci1204_5"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-247X(65)90154-X"},{"key":"e_1_2_1_3_1","volume-title":"Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528","author":"Baker Bowen","year":"2019","unstructured":"Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. 2019. Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989734.1989744"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/964442.964459"},{"key":"e_1_2_1_6_1","volume-title":"Hierarchical reinforcement learning and decision making. Current opinion in neurobiology 22, 6","author":"Botvinick Matthew Michael","year":"2012","unstructured":"Matthew Michael Botvinick. 2012. Hierarchical reinforcement learning and decision making. Current opinion in neurobiology 22, 6 (2012), 956--962."},{"key":"e_1_2_1_7_1","unstructured":"Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:1606.01540 [cs.LG]"},{"key":"e_1_2_1_8_1","volume-title":"Adaptive user interfaces","author":"Browne Dermot","unstructured":"Dermot Browne, Peter Totterdell, and Mike Norman. 2016. Adaptive user interfaces. Elsevier."},{"key":"e_1_2_1_9_1","unstructured":"S Card T Moran and A Newell. 1983. T he Psychology of Human Computer Interaction."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/358886.358895"},{"key":"e_1_2_1_11_1","volume-title":"The model human processor- An engineering model of human performance. Handbook of perception and human performance. 2, 45-1","author":"Card K.","year":"1986","unstructured":"Stuart. K. Card, Thomas. P. Moran, and Allen Newell. 1986. The model human processor- An engineering model of human performance. Handbook of perception and human performance. 2, 45-1 (1986)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376701"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290999"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702483"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3586183.3606717"},{"key":"e_1_2_1_16_1","volume-title":"SynH2R: Synthesizing Hand-Object Motions for Learning Human-to-Robot Handovers. arXiv preprint arXiv:2311.05599","author":"Christen Sammy","year":"2023","unstructured":"Sammy Christen, Lan Feng, Wei Yang, Yu-Wei Chao, Otmar Hilliges, and Jie Song. 2023. SynH2R: Synthesizing Hand-Object Motions for Learning Human-to-Robot Handovers. arXiv preprint arXiv:2311.05599 (2023)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3060403"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794065"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00931"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1240624.1240723"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2013.06.002"},{"key":"e_1_2_1_22_1","volume-title":"St\u00e9phane Canu, and Christian Wolf.","author":"Debard Quentin","year":"2020","unstructured":"Quentin Debard, Jilles Steeve Dibangoye, St\u00e9phane Canu, and Christian Wolf. 2020. Learning 3D Navigation Protocols on Touch Interfaces with Cooperative Multi-agent Reinforcement Learning. In Machine Learning and Knowledge Discovery in Databases, Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, and C\u00e9line Robardet (Eds.). Springer International Publishing, Cham, 35--52."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cognition.2012.10.010"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173856"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1719970.1719980"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMPSACW.2014.45"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-021-93760-1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0055392"},{"key":"e_1_2_1_29_1","volume-title":"Nando De Freitas, and Shimon Whiteson.","author":"Foerster Jakob","year":"2016","unstructured":"Jakob Foerster, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. Advances in neural information processing systems 29 (2016)."},{"key":"e_1_2_1_30_1","volume-title":"Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. Cerebral cortex 22, 3","author":"Frank Michael J","year":"2012","unstructured":"Michael J Frank and David Badre. 2012. Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. Cerebral cortex 22, 3 (2012), 509--526."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2013.2282190"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3593434.3593452"},{"key":"e_1_2_1_33_1","volume-title":"A Comparative Study on Reward Models for UI Adaptation with Reinforcement Learning. methods 13","author":"Gaspar-Figueiredo Daniel","year":"2023","unstructured":"Daniel Gaspar-Figueiredo, Marta Fern\u00e1ndez-Diego, Silvia Abrahao, and Emilio Insfran. 2023. A Comparative Study on Reward Models for UI Adaptation with Reinforcement Learning. methods 13 (2023), 14."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332165.3347933"},{"key":"e_1_2_1_35_1","volume-title":"Artificial Intelligence for Human Computer Interaction: A Modern Approach","author":"Gebhardt Christoph","unstructured":"Christoph Gebhardt and Otmar Hilliges. 2021. Optimal Control to Support High-Level User Goals in Human-Computer Interaction. In Artificial Intelligence for Human Computer Interaction: A Modern Approach. Springer, 33--72."},{"key":"e_1_2_1_36_1","volume-title":"Hierarchical Reinforcement Learning as a Model of Human Task Interleaving. Computational Brain and Behavior","author":"Gebhardt Christoph","year":"2021","unstructured":"Christoph Gebhardt, Antti Oulasvirta, and Otmar Hilliges. 2021. Hierarchical Reinforcement Learning as a Model of Human Task Interleaving. Computational Brain and Behavior (2021). https:\/\/arxiv.org\/pdf\/2001.02122.pdf"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-44355-8_1"},{"key":"e_1_2_1_38_1","volume-title":"Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 349, 6245","author":"Gershman Samuel J","year":"2015","unstructured":"Samuel J Gershman, Eric J Horvitz, and Joshua B Tenenbaum. 2015. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 349, 6245 (2015), 273--278."},{"key":"e_1_2_1_39_1","volume-title":"Multistability and perceptual inference. Neural computation 24, 1","author":"Gershman Samuel J","year":"2012","unstructured":"Samuel J Gershman, Edward Vul, and Joshua B Tenenbaum. 2012. Multistability and perceptual inference. Neural computation 24, 1 (2012), 1--24."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3298689.3346956"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783446.2783574"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544549.3585913"},{"key":"e_1_2_1_43_1","volume-title":"On the rate of gain of information. Quarterly Journal of experimental psychology 4, 1","author":"Hick William E","year":"1952","unstructured":"William E Hick. 1952. On the rate of gain of information. Quarterly Journal of experimental psychology 4, 1 (1952), 11--26."},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence","author":"Horvitz Eric","year":"1998","unstructured":"Eric Horvitz, Jack Breese, David Heckerman, David Hovel, and Koos Rommelse. 1998. The Lumi\u00e8Re Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (Madison, Wisconsin) (UAI'98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 256--265."},{"key":"e_1_2_1_45_1","unstructured":"Ronald A Howard. 1960. Dynamic programming and markov processes. (1960)."},{"key":"e_1_2_1_46_1","volume-title":"Interaction as an emergent property of a Partially Observable Markov Decision Process. Computational interaction","author":"Howes Andrew","year":"2018","unstructured":"Andrew Howes, Xiuli Chen, Aditya Acharya, and Richard L Lewis. 2018. Interaction as an emergent property of a Partially Observable Markov Decision Process. Computational interaction (2018), 287--310."},{"key":"e_1_2_1_47_1","volume-title":"Advances in Neural Information Processing Systems (NIPS '18)","author":"Hu Zehong","year":"2018","unstructured":"Zehong Hu, Yitao Liang, Jie Zhang, Zhao Li, and Yang Liu. 2018. Inference aided reinforcement learning for incentive mechanism design in crowdsourcing. In Advances in Neural Information Processing Systems (NIPS '18). 5508--5518. https:\/\/arxiv.org\/abs\/1806.00206"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","unstructured":"Zool Hilmi Ismail and Nohaidda Sariff. 2018. A Survey and Analysis of Cooperative Multi-Agent Robot Systems: Challenges and Directions. In Applications of Mobile Robots Efren Gorrostieta Hurtado (Ed.). IntechOpen Rijeka Chapter 1. https:\/\/doi.org\/10.5772\/intechopen.79337","DOI":"10.5772\/intechopen.79337"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.aau6249"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/MPRV.2005.80"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445483"},{"key":"e_1_2_1_52_1","volume-title":"Multitasking in driving as optimal adaptation under uncertainty. Human factors 63, 8","author":"Jokinen Jussi PP","year":"2021","unstructured":"Jussi PP Jokinen, Tuomo Kujala, and Antti Oulasvirta. 2021. Multitasking in driving as optimal adaptation under uncertainty. Human factors 63, 8 (2021), 1324--1341."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3536325"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/1349822.1349854"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15327051hci1204_4"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300863"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCIC.2010.5705768"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2642918.2647386"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858036.2858111"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3612783.3612810"},{"key":"e_1_2_1_61_1","volume-title":"Anna Maria Feit, and Otmar Hilliges","author":"Langerak Thomas","year":"2021","unstructured":"Thomas Langerak, Sammy Christen, Anna Maria Feit, and Otmar Hilliges. 2021. Generalizing User Models through Hybrid Hierarchical Control. (2021)."},{"key":"e_1_2_1_62_1","volume-title":"Collaborative interface agents. Readings in agents","author":"Lashkari Yezdi","year":"1997","unstructured":"Yezdi Lashkari, Max Metral, and Pattie Maes. 1997. Collaborative interface agents. Readings in agents (1997), 111--116."},{"key":"e_1_2_1_63_1","volume-title":"Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037","author":"Leibo Joel Z","year":"2017","unstructured":"Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. 2017. Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037 (2017)."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308557.3308704"},{"key":"e_1_2_1_65_1","unstructured":"Eric Liang Richard Liaw Philipp Moritz Robert Nishihara Roy Fox Ken Goldberg Joseph E. Gonzalez Michael I. Jordan and Ion Stoica. 2018. RLlib: Abstractions for Distributed Reinforcement Learning. arXiv:1712.09381 [cs.AI]"},{"key":"e_1_2_1_66_1","volume-title":"Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems (AAMAS '15)","author":"Liebman Elad","year":"2015","unstructured":"Elad Liebman, Maytal Saar-Tsechansky, and Peter Stone. 2015. DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems (AAMAS '15). 591--599. https:\/\/arxiv.org\/abs\/1401.1880"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332165.3347945"},{"key":"e_1_2_1_68_1","volume-title":"Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027","author":"Liu Feng","year":"2018","unstructured":"Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018). https:\/\/arxiv.org\/abs\/1810.12027"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858036.2858425"},{"key":"e_1_2_1_70_1","volume-title":"Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning. In International Conference on Learning Representations.","author":"Long Qian","year":"2020","unstructured":"Qian Long, Zihan Zhou, Abhinav Gupta, Fei Fang, Yi Wu, and Xiaolong Wang. 2020. Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning. In International Conference on Learning Representations."},{"key":"e_1_2_1_71_1","volume-title":"OpenAI Pieter Abbeel, and Igor Mordatch","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_1_72_1","first-page":"177","article-title":"Responding to cognitive overload: Co-adaptation between users and technology","volume":"30","author":"Mackay Wendy","year":"2000","unstructured":"Wendy Mackay. 2000. Responding to cognitive overload: Co-adaptation between users and technology. Intellectica 30, 1 (2000), 177--193.","journal-title":"Intellectica"},{"key":"e_1_2_1_73_1","volume-title":"Readings in human-computer interaction","author":"Maes Pattie","unstructured":"Pattie Maes. 1995. Agents that reduce work and information overload. In Readings in human-computer interaction. Elsevier, 811--821."},{"key":"e_1_2_1_74_1","first-page":"40","article-title":"IEMS-an approach that combines handcrafted rules with learnt instance based rules","volume":"9","author":"McCreath Eric","year":"2006","unstructured":"Eric McCreath, Judy Kay, and Elisabeth Crawford. 2006. IEMS-an approach that combines handcrafted rules with learnt instance based rules. Aust. J. Intell. Inf. Process. Syst. 9, 1 (2006), 40--53.","journal-title":"Aust. J. Intell. Inf. Process. Syst."},{"key":"e_1_2_1_75_1","doi-asserted-by":"crossref","unstructured":"Abhinav Mehrotra and Robert Hendley. 2015. Designing Content-driven Intelligent Notification Mechanisms for Mobile Applications. (2015) 813--824.","DOI":"10.1145\/2750858.2807544"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1080\/10447318.2020.1824742"},{"key":"e_1_2_1_77_1","volume-title":"Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3564038"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2005.06.002"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.2969687"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/3266037.3266087"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/3131608"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3517739"},{"key":"e_1_2_1_84_1","volume-title":"Xiaojun Bi, and Andrew Howes.","author":"Oulasvirta Antti","year":"2018","unstructured":"Antti Oulasvirta, Per Ola Kristensson, Xiaojun Bi, and Andrew Howes. 2018. Computational interaction. Oxford University Press."},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173758"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1145\/2632048.2632062"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-017-0468-y"},{"key":"e_1_2_1_89_1","volume-title":"Proceedings of the 7th international conference on Human computer interaction with mobile devices & services. 239--242","author":"Reilly Derek","year":"2005","unstructured":"Derek Reilly, Michael Welsman-Dinelle, Colin Bate, and Kori Inkpen. 2005. Just point and click? Using handhelds to interact with paper maps. In Proceedings of the 7th international conference on Human computer interaction with mobile devices & services. 239--242."},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCSW.2005.58"},{"key":"e_1_2_1_91_1","volume-title":"COLLAGEN: A collaboration manager for software interface agents. In Computational Models of Mixed-Initiative Interaction","author":"Rich Charles","year":"1998","unstructured":"Charles Rich and Candace L Sidner. 1998. COLLAGEN: A collaboration manager for software interface agents. In Computational Models of Mixed-Initiative Interaction. Springer, 149--184."},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2021.01.009"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-0417(00)00015-2"},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300765"},{"key":"e_1_2_1_95_1","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs.LG]"},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.1145\/325737.325859"},{"key":"e_1_2_1_97_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2494218"},{"key":"e_1_2_1_98_1","volume-title":"Proceedings of the national academy of sciences 39","author":"Shapley Lloyd S","year":"1953","unstructured":"Lloyd S Shapley. 1953. Stochastic games. Proceedings of the national academy of sciences 39, 10 (1953), 1095--1100."},{"key":"e_1_2_1_99_1","doi-asserted-by":"publisher","DOI":"10.1145\/1502650.1502690"},{"key":"e_1_2_1_100_1","doi-asserted-by":"publisher","DOI":"10.1145\/1502650.1502670"},{"key":"e_1_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICAC54203.2021.9671076"},{"key":"e_1_2_1_102_1","doi-asserted-by":"publisher","DOI":"10.1145\/1719970.1720035"},{"key":"e_1_2_1_103_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025171.3025207"},{"key":"e_1_2_1_104_1","doi-asserted-by":"publisher","DOI":"10.1145\/238218.238323"},{"key":"e_1_2_1_105_1","volume-title":"Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management. arXiv preprint arXiv:1707.00130","author":"Su Pei-Hao","year":"2017","unstructured":"Pei-Hao Su, Pawel Budzianowski, Stefan Ultes, Milica Gasic, and Steve Young. 2017. Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management. arXiv preprint arXiv:1707.00130 (2017). https:\/\/arxiv.org\/abs\/1707.00130"},{"key":"e_1_2_1_106_1","unstructured":"Richard S Sutton Andrew G Barto et al. 1998. Introduction to reinforcement learning. (1998)."},{"key":"e_1_2_1_107_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6217"},{"key":"e_1_2_1_108_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445497"},{"key":"e_1_2_1_109_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6221"},{"key":"e_1_2_1_110_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00027"},{"key":"e_1_2_1_111_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218213012500042"},{"key":"e_1_2_1_112_1","first-page":"24611","article-title":"The surprising effectiveness of ppo in cooperative multi-agent games","volume":"35","author":"Yu Chao","year":"2022","unstructured":"Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. 2022. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems 35 (2022), 24611--24624.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_113_1","volume-title":"The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955","author":"Yu Chao","year":"2021","unstructured":"Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. 2021. The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)."},{"key":"e_1_2_1_114_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2015.2403394"},{"key":"e_1_2_1_115_1","volume-title":"Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control","author":"Zhang Kaiqing","year":"2021","unstructured":"Kaiqing Zhang, Zhuoran Yang, and Tamer Ba\u015far. 2021. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control (2021), 321--384."}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3661147","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3661147","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T23:20:15Z","timestamp":1755904815000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3661147"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,17]]},"references-count":115,"journal-issue":{"issue":"EICS","published-print":{"date-parts":[[2024,6,17]]}},"alternative-id":["10.1145\/3661147"],"URL":"https:\/\/doi.org\/10.1145\/3661147","relation":{},"ISSN":["2573-0142"],"issn-type":[{"value":"2573-0142","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,17]]},"assertion":[{"value":"2023-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}