{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T07:56:58Z","timestamp":1780473418931,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":53,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,4,25]],"date-time":"2025-04-25T00:00:00Z","timestamp":1745539200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,4,26]]},"DOI":"10.1145\/3706598.3713600","type":"proceedings-article","created":{"date-parts":[[2025,4,24]],"date-time":"2025-04-24T03:24:33Z","timestamp":1745465073000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":42,"title":["AppAgent: Multimodal Agents as Smartphone Users"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6344-2824","authenticated-orcid":false,"given":"Chi","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Engineering, Westlake University, Hangzhou, Zhejiang, China and Tencent, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4376-0109","authenticated-orcid":false,"given":"Zhao","family":"Yang","sequence":"additional","affiliation":[{"name":"Shanghai Supwisdom, Shanghai, China and Tencent, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5423-7284","authenticated-orcid":false,"given":"Jiaxuan","family":"Liu","sequence":"additional","affiliation":[{"name":"Tencent, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1123-9673","authenticated-orcid":false,"given":"Yanda","family":"Li","sequence":"additional","affiliation":[{"name":"University of Technology Sydney, Sydney, NSW, Australia and Tencent, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-5853-3117","authenticated-orcid":false,"given":"Yucheng","family":"Han","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore and Tencent, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9347-1367","authenticated-orcid":false,"given":"Xin","family":"Chen","sequence":"additional","affiliation":[{"name":"Tencent, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4637-0045","authenticated-orcid":false,"given":"Zebiao","family":"Huang","sequence":"additional","affiliation":[{"name":"Tencent, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5277-8709","authenticated-orcid":false,"given":"Bin","family":"Fu","sequence":"additional","affiliation":[{"name":"Tencent, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5570-2710","authenticated-orcid":false,"given":"Gang","family":"Yu","sequence":"additional","affiliation":[{"name":"Tencent, shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,4,25]]},"reference":[{"key":"e_1_3_3_2_2_2","volume-title":"arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.01691","author":"Ahn Michael","year":"2022","unstructured":"Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario\u00a0Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, and Andy Zeng. 2022. Do As I Can and Not As I Say: Grounding Language in Robotic Affordances. In arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.01691."},{"key":"e_1_3_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Aude Billard Sylvain Calinon Ruediger Dillmann and Stefan Schaal. 2008. Survey: Robot programming by demonstration. Springer handbook of robotics (2008) 1371\u20131394.","DOI":"10.1007\/978-3-540-30301-5_60"},{"key":"e_1_3_3_2_4_2","unstructured":"Anthony Brohan Noah Brown Justice Carbajal Yevgen Chebotar Xi Chen Krzysztof Choromanski Tianli Ding Danny Driess Avinava Dubey Chelsea Finn et\u00a0al. 2023. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.15818 (2023)."},{"key":"e_1_3_3_2_5_2","doi-asserted-by":"crossref","unstructured":"Anthony Brohan Noah Brown Justice Carbajal Yevgen Chebotar Joseph Dabis Chelsea Finn Keerthana Gopalakrishnan Karol Hausman Alex Herzog Jasmine Hsu et\u00a0al. 2022. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2212.06817 (2022).","DOI":"10.15607\/RSS.2023.XIX.025"},{"key":"e_1_3_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.5555\/1795482"},{"key":"e_1_3_3_2_7_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de\u00a0Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman et\u00a0al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2107.03374 (2021)."},{"key":"e_1_3_3_2_8_2","volume-title":"Watch what I do: programming by demonstration","author":"Cypher Allen","year":"1993","unstructured":"Allen Cypher and Daniel\u00a0Conrad Halbert. 1993. Watch what I do: programming by demonstration. MIT press."},{"key":"e_1_3_3_2_9_2","unstructured":"Xiang Deng Yu Gu Boyuan Zheng Shijie Chen Sam Stevens Boshi Wang Huan Sun and Yu Su. 2024. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_2_10_2","unstructured":"Alexandre Drouin Maxime Gasse Massimo Caccia Issam\u00a0H Laradji Manuel Del\u00a0Verme Tom Marty L\u00e9o Boisvert Megh Thakkar Quentin Cappart David Vazquez et\u00a0al. 2024. WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks? arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.07718 (2024)."},{"key":"e_1_3_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Meta FAIR Anton Bakhtin Noam Brown Emily Dinan Gabriele Farina Colin Flaherty Daniel Fried Andrew Goff Jonathan Gray Hengyuan Hu et\u00a0al. 2022. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science 378 6624 (2022) 1067\u20131074.","DOI":"10.1126\/science.ade9097"},{"key":"e_1_3_3_2_12_2","unstructured":"Hiroki Furuta Kuang-Huei Lee Ofir Nachum Yutaka Matsuo Aleksandra Faust Shixiang\u00a0Shane Gu and Izzeddin Gur. 2023. Multimodal Web Navigation with Instruction-Finetuned Foundation Models. arxiv:https:\/\/arXiv.org\/abs\/2305.11854\u00a0[cs.LG]"},{"key":"e_1_3_3_2_13_2","unstructured":"Izzeddin Gur Hiroki Furuta Austin Huang Mustafa Safdari Yutaka Matsuo Douglas Eck and Aleksandra Faust. 2023. A Real-World WebAgent with Planning Long Context Understanding and Program Synthesis. arxiv:https:\/\/arXiv.org\/abs\/2307.12856\u00a0[cs.LG]"},{"key":"e_1_3_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/911909"},{"key":"e_1_3_3_2_15_2","unstructured":"Yucheng Han Chi Zhang Xin Chen Xu Yang Zhibin Wang Gang Yu Bin Fu and Hanwang Zhang. 2023. ChartLlama: A Multimodal LLM for Chart Understanding and Generation. arxiv:https:\/\/arXiv.org\/abs\/2311.16483\u00a0[cs.CV]"},{"key":"e_1_3_3_2_16_2","unstructured":"Sirui Hong Mingchen Zhuge Jonathan Chen Xiawu Zheng Yuheng Cheng Ceyao Zhang Jinlin Wang Zili Wang Steven Ka\u00a0Shing Yau Zijuan Lin Liyang Zhou Chenyu Ran Lingfeng Xiao Chenglin Wu and J\u00fcrgen Schmidhuber. 2023. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. arxiv:https:\/\/arXiv.org\/abs\/2308.00352\u00a0[cs.AI]"},{"key":"e_1_3_3_2_17_2","unstructured":"Wenyi Hong Weihan Wang Qingsong Lv Jiazheng Xu Wenmeng Yu Junhui Ji Yan Wang Zihan Wang Yuxuan Zhang Juanzi Li Bin Xu Yuxiao Dong Ming Ding and Jie Tang. 2023. CogAgent: A Visual Language Model for GUI Agents. arXiv preprint arXiv: 2312.08914 (2023)."},{"key":"e_1_3_3_2_18_2","unstructured":"Zhiting Hu and Tianmin Shu. 2023. Language Models Agent Models and World Models: The LAW for Machine Reasoning and Planning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.05230 (2023)."},{"key":"e_1_3_3_2_19_2","unstructured":"Jing\u00a0Yu Koh Robert Lo Lawrence Jang Vikram Duvvur Ming\u00a0Chong Lim Po-Yu Huang Graham Neubig Shuyan Zhou Ruslan Salakhutdinov and Daniel Fried. 2024. Visualwebarena: Evaluating multimodal agents on realistic visual web tasks. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2401.13649 (2024)."},{"key":"e_1_3_3_2_20_2","unstructured":"Jing\u00a0Yu Koh Stephen McAleer Daniel Fried and Ruslan Salakhutdinov. 2024. Tree search for language model agents. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.01476 (2024)."},{"key":"e_1_3_3_2_21_2","unstructured":"Yanda Li Chi Zhang Gang Yu Zhibin Wang Bin Fu Guosheng Lin Chunhua Shen Ling Chen and Yunchao Wei. 2023. StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data. arxiv:https:\/\/arXiv.org\/abs\/2308.10253\u00a0[cs.CV]"},{"key":"e_1_3_3_2_22_2","unstructured":"Haotian Liu Chunyuan Li Yuheng Li and Yong\u00a0Jae Lee. 2023. Improved Baselines with Visual Instruction Tuning."},{"key":"e_1_3_3_2_23_2","unstructured":"Haotian Liu Chunyuan Li Qingyang Wu and Yong\u00a0Jae Lee. 2023. Visual Instruction Tuning."},{"key":"e_1_3_3_2_24_2","unstructured":"Xiao Liu Hao Yu Hanchen Zhang Yifan Xu Xuanyu Lei Hanyu Lai Yu Gu Hangliang Ding Kaiwen Men Kejuan Yang Shudan Zhang Xiang Deng Aohan Zeng Zhengxiao Du Chenhui Zhang Sheng Shen Tianjun Zhang Yu Su Huan Sun Minlie Huang Yuxiao Dong and Jie Tang. 2023. AgentBench: Evaluating LLMs as Agents. arXiv preprint arXiv: 2308.03688 (2023)."},{"key":"e_1_3_3_2_25_2","unstructured":"OpenAI. 2021. ChatGPT. https:\/\/openai.com\/research\/chatgpt."},{"key":"e_1_3_3_2_26_2","unstructured":"OpenAI. 2023. GPT-4 Technical Report. arxiv:https:\/\/arXiv.org\/abs\/2303.08774\u00a0[cs.CL]"},{"key":"e_1_3_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3586183.3606763"},{"key":"e_1_3_3_2_28_2","unstructured":"Chen Qian Xin Cong Cheng Yang Weize Chen Yusheng Su Juyuan Xu Zhiyuan Liu and Maosong Sun. 2023. Communicative agents for software development. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.07924 (2023)."},{"key":"e_1_3_3_2_29_2","unstructured":"Christopher Rawles Alice Li Daniel Rodriguez Oriana Riva and Timothy Lillicrap. 2023. Android in the Wild: A Large-Scale Dataset for Android Device Control. arXiv preprint arXiv:2307.10088 (2023)."},{"key":"e_1_3_3_2_30_2","unstructured":"Scott Reed Konrad Zolna Emilio Parisotto Sergio\u00a0Gomez Colmenarejo Alexander Novikov Gabriel Barth-Maron Mai Gimenez Yury Sulsky Jackie Kay Jost\u00a0Tobias Springenberg et\u00a0al. 2022. A generalist agent. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2205.06175 (2022)."},{"key":"e_1_3_3_2_31_2","volume-title":"Advances in Neural Information Processing Systems","author":"Shen Yongliang","year":"2023","unstructured":"Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. 2023. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_3_2_32_2","unstructured":"Chunyi Sun Junlin Han Weijian Deng Xinlong Wang Zishan Qin and Stephen Gould. 2023. 3D-GPT: Procedural 3D Modeling with Large Language Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.12945 (2023)."},{"key":"e_1_3_3_2_33_2","unstructured":"Rohan Taori Ishaan Gulrajani Tianyi Zhang Yann Dubois Xuechen Li Carlos Guestrin Percy Liang and Tatsunori\u00a0B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. https:\/\/github.com\/tatsu-lab\/stanford_alpaca."},{"key":"e_1_3_3_2_34_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv:https:\/\/arXiv.org\/abs\/2302.13971\u00a0[cs.CL]"},{"key":"e_1_3_3_2_35_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian\u00a0Canton Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit\u00a0Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric\u00a0Michael Smith Ranjan Subramanian Xiaoqing\u00a0Ellen Tan Binh Tang Ross Taylor Adina Williams Jian\u00a0Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arxiv:https:\/\/arXiv.org\/abs\/2307.09288\u00a0[cs.CL]"},{"key":"e_1_3_3_2_36_2","unstructured":"Junyang Wang Haiyang Xu Jiabo Ye Ming Yan Weizhou Shen Ji Zhang Fei Huang and Jitao Sang. 2024. Mobile-agent: Autonomous multi-modal mobile device agent with visual perception. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2401.16158 (2024)."},{"key":"e_1_3_3_2_37_2","unstructured":"Zihao Wang Shaofei Cai Anji Liu Yonggang Jin Jinbing Hou Bowei Zhang Haowei Lin Zhaofeng He Zilong Zheng Yaodong Yang Xiaojian Ma and Yitao Liang. 2023. JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models. arxiv:https:\/\/arXiv.org\/abs\/2311.05997\u00a0[cs.AI]"},{"key":"e_1_3_3_2_38_2","unstructured":"Zora\u00a0Zhiruo Wang Jiayuan Mao Daniel Fried and Graham Neubig. 2024. Agent workflow memory. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2409.07429 (2024)."},{"key":"e_1_3_3_2_39_2","unstructured":"Hao Wen Yuanchun Li Guohong Liu Shanhui Zhao Tao Yu Toby Jia-Jun Li Shiqi Jiang Yunhao Liu Yaqin Zhang and Yunxin Liu. 2023. Empowering LLM to use Smartphone for Intelligent Task Automation. arXiv preprint arXiv: 2308.15272 (2023)."},{"key":"e_1_3_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3586183.3606824"},{"key":"e_1_3_3_2_41_2","unstructured":"Zhiheng Xi Wenxiang Chen Xin Guo Wei He Yiwen Ding Boyang Hong Ming Zhang Junzhe Wang Senjie Jin Enyu Zhou et\u00a0al. 2023. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.07864 (2023)."},{"key":"e_1_3_3_2_42_2","unstructured":"Tianbao Xie Danyang Zhang Jixuan Chen Xiaochuan Li Siheng Zhao Ruisheng Cao Toh\u00a0Jing Hua Zhoujun Cheng Dongchan Shin Fangyu Lei et\u00a0al. 2024. Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.07972 (2024)."},{"key":"e_1_3_3_2_43_2","unstructured":"Tianbao Xie Fan Zhou Zhoujun Cheng Peng Shi Luoxuan Weng Yitao Liu Toh\u00a0Jing Hua Junning Zhao Qian Liu Che Liu Leo\u00a0Z. Liu Yiheng Xu Hongjin Su Dongchan Shin Caiming Xiong and Tao Yu. 2023. OpenAgents: An Open Platform for Language Agents in the Wild. arxiv:https:\/\/arXiv.org\/abs\/2310.10634\u00a0[cs.CL]"},{"key":"e_1_3_3_2_44_2","unstructured":"Yuzhuang Xu Shuo Wang Peng Li Fuwen Luo Xiaolong Wang Weidong Liu and Yang Liu. 2023. Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.04658 (2023)."},{"key":"e_1_3_3_2_45_2","unstructured":"An Yan Zhengyuan Yang Wanrong Zhu Kevin Lin Linjie Li Jianfeng Wang Jianwei Yang Yiwu Zhong Julian McAuley Jianfeng Gao Zicheng Liu and Lijuan Wang. 2023. GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation. arXiv preprint arXiv: 2311.07562 (2023)."},{"key":"e_1_3_3_2_46_2","unstructured":"Hui Yang Sifu Yue and Yunzhong He. 2023. Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions. arxiv:https:\/\/arXiv.org\/abs\/2306.02224\u00a0[cs.AI]"},{"key":"e_1_3_3_2_47_2","unstructured":"Zhengyuan Yang Linjie Li Kevin Lin Jianfeng Wang Chung-Ching Lin Zicheng Liu and Lijuan Wang. 2023. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.17421 (2023)."},{"key":"e_1_3_3_2_48_2","unstructured":"Zhengyuan Yang Linjie Li Kevin Lin Jianfeng Wang Chung-Ching Lin Zicheng Liu and Lijuan Wang. 2023. The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision). arXiv preprint arXiv: 2309.17421 (2023)."},{"key":"e_1_3_3_2_49_2","volume-title":"ICLR","author":"Yao Shunyu","year":"2023","unstructured":"Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. In ICLR."},{"key":"e_1_3_3_2_50_2","unstructured":"Aohan Zeng Xiao Liu Zhengxiao Du Zihan Wang Hanyu Lai Ming Ding Zhuoyi Yang Yifan Xu Wendi Zheng Xiao Xia et\u00a0al. 2022. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2210.02414 (2022)."},{"key":"e_1_3_3_2_51_2","unstructured":"Zhuosheng Zhan and Aston Zhang. 2023. You Only Look at Screens: Multimodal Chain-of-Action Agents. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.11436 (2023)."},{"key":"e_1_3_3_2_52_2","unstructured":"Lianmin Zheng Wei-Lin Chiang Ying Sheng Siyuan Zhuang Zhanghao Wu Yonghao Zhuang Zi Lin Zhuohan Li Dacheng Li Eric.\u00a0P Xing Hao Zhang Joseph\u00a0E. Gonzalez and Ion Stoica. 2023. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arxiv:https:\/\/arXiv.org\/abs\/2306.05685\u00a0[cs.CL]"},{"key":"e_1_3_3_2_53_2","unstructured":"Andy Zhou Kai Yan Michal Shlapentokh-Rothman Haohan Wang and Yu-Xiong Wang. 2023. Language agent tree search unifies reasoning acting and planning in language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.04406 (2023)."},{"key":"e_1_3_3_2_54_2","unstructured":"Shuyan Zhou Frank\u00a0F Xu Hao Zhu Xuhui Zhou Robert Lo Abishek Sridhar Xianyi Cheng Tianyue Ou Yonatan Bisk Daniel Fried et\u00a0al. 2023. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.13854 (2023)."}],"event":{"name":"CHI 2025: CHI Conference on Human Factors in Computing Systems","location":"Yokohama Japan","acronym":"CHI '25","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3706598.3713600","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3706598.3713600","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T05:09:09Z","timestamp":1751605749000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3706598.3713600"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,25]]},"references-count":53,"alternative-id":["10.1145\/3706598.3713600","10.1145\/3706598"],"URL":"https:\/\/doi.org\/10.1145\/3706598.3713600","relation":{},"subject":[],"published":{"date-parts":[[2025,4,25]]},"assertion":[{"value":"2025-04-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}