{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T10:55:27Z","timestamp":1777632927783,"version":"3.51.4"},"reference-count":90,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:p>Automating enterprise workflows could unlock $4 trillion\/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12--18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques.<\/jats:p>","DOI":"10.14778\/3681954.3681964","type":"journal-article","created":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T16:23:36Z","timestamp":1725035016000},"page":"2805-2812","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Automating the Enterprise with Foundation Models"],"prefix":"10.14778","volume":"17","author":[{"given":"Michael","family":"Wornow","sequence":"first","affiliation":[{"name":"Stanford University"}]},{"given":"Avanika","family":"Narayan","sequence":"additional","affiliation":[{"name":"Stanford University"}]},{"given":"Krista","family":"Opsahl-Ong","sequence":"additional","affiliation":[{"name":"Stanford University"}]},{"given":"Quinn","family":"McIntyre","sequence":"additional","affiliation":[{"name":"Stanford University"}]},{"given":"Nigam","family":"Shah","sequence":"additional","affiliation":[{"name":"Stanford University"}]},{"given":"Christopher","family":"R\u00e9","sequence":"additional","affiliation":[{"name":"Stanford University"}]}],"member":"320","published-online":{"date-parts":[[2024,8,30]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Michael Ahn Anthony Brohan Noah Brown Yevgen Chebotar Omar Cortes Byron David Chelsea Finn Chuyuan Fu Keerthana Gopalakrishnan Karol Hausman et al. 2022. Do as i can not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022)."},{"key":"e_1_2_1_2_1","unstructured":"Automation Anywhere. 2020. https:\/\/www.automationanywhere.com\/company\/press-room\/global-research-reveals-worlds-most-hated-office-tasks"},{"key":"e_1_2_1_3_1","volume-title":"NeurIPS 2023 Foundation Models for Decision Making Workshop.","author":"Assouel Rim","year":"2023","unstructured":"Rim Assouel, Tom Marty, Massimo Caccia, Issam H Laradji, Alexandre Drouin, Sai Rajeswar, Hector Palacios, Quentin Cappart, David Vazquez, Nicolas Chapados, et al. 2023. The Unsolved Challenges of LLMs as Generalist Web Agents: A Case Study. In NeurIPS 2023 Foundation Models for Decision Making Workshop."},{"key":"e_1_2_1_4_1","volume-title":"Fabrizio Maria Maggi, Andrea Marrella, Massimo Mecella, and Allar Soo.","author":"Augusto Adriano","year":"2018","unstructured":"Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, Andrea Marrella, Massimo Mecella, and Allar Soo. 2018. Automated discovery of process models from event logs: Review and benchmark. IEEE transactions on knowledge and data engineering 31, 4 (2018), 686--705."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-22655-7_23"},{"key":"e_1_2_1_7_1","unstructured":"Rohan Bavishi Erich Elsen Curtis Hawthorne Maxwell Nye Augustus Odena Arushi Somani and Sa\u011fnak Ta\u015f\u0131rlar. 2023. Introducing our Multimodal Models. https:\/\/www.adept.ai\/blog\/fuyu-8b"},{"key":"e_1_2_1_8_1","volume-title":"Levine","author":"Bayley Matthew","year":"2013","unstructured":"Matthew Bayley and Ed Levine. 2013. Hospital revenue cycle operations: opportunities created by the ACA. Management (2013)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735703.2735708"},{"key":"e_1_2_1_10_1","volume-title":"Closing the Digital\" Skill\" Divide: The Payoff for Workers, Business, and the Economy. National Skills Coalition","author":"Bergson-Shilcock Amanda","year":"2023","unstructured":"Amanda Bergson-Shilcock and Roderick Taylor. 2023. Closing the Digital\" Skill\" Divide: The Payoff for Workers, Business, and the Economy. National Skills Coalition (2023)."},{"key":"e_1_2_1_11_1","volume-title":"echnical Report). arXiv preprint arXiv:2307.12701","author":"Berti Alessandro","year":"2023","unstructured":"Alessandro Berti and Mahnaz Sadat Qafari. 2023. Leveraging Large Language Models (LLMs) for Process Mining (Technical Report). arXiv preprint arXiv:2307.12701 (2023)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824100"},{"key":"e_1_2_1_13_1","unstructured":"Rishi Bommasani Drew A Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)."},{"key":"e_1_2_1_15_1","unstructured":"Fabio Casati and Ming-Chien Shan. 2000. Process automation as the foundation for e-business. In VLDB. Citeseer 688--691."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58779-6_15"},{"key":"e_1_2_1_17_1","unstructured":"M Chui E Hazan R Roberts A Singla K Smaje A Sukharevsky L Yee and R Zemmel. 2023. The economic potential of generative AI The next productivity frontier The economic potential of generative AI: The next productivity frontier."},{"key":"e_1_2_1_18_1","volume-title":"Eduardo Souza dos Reis, Rodolfo Stoffel Antunes, Henrique Chaves Pacheco, Thayn\u00e3 da Silva Fran\u00e7a, Rodrigo da Rosa Righi, Jorge Luis Vict\u00f3ria Barbosa, Franklin Jebadoss, Jorge Montalvao, et al.","author":"da Costa Cristiano Andr\u00e9","year":"2023","unstructured":"Cristiano Andr\u00e9 da Costa, U\u00e9lison Jean Lopes dos Santos, Eduardo Souza dos Reis, Rodolfo Stoffel Antunes, Henrique Chaves Pacheco, Thayn\u00e3 da Silva Fran\u00e7a, Rodrigo da Rosa Righi, Jorge Luis Vict\u00f3ria Barbosa, Franklin Jebadoss, Jorge Montalvao, et al. 2023. Intelligent methods for business rule processing: State-of-the-art. arXiv preprint arXiv:2311.11775 (2023)."},{"key":"e_1_2_1_19_1","volume-title":"Challenges of using RPA in auditing: A socio-technical systems approach. Intelligent Systems in Accounting, Finance and Management","author":"Dahabiyeh Laila","year":"2023","unstructured":"Laila Dahabiyeh and Omar Mowafi. 2023. Challenges of using RPA in auditing: A socio-technical systems approach. Intelligent Systems in Accounting, Finance and Management (2023)."},{"key":"e_1_2_1_20_1","unstructured":"Xiang Deng Yu Gu Boyuan Zheng Shijie Chen Samuel Stevens Boshi Wang Huan Sun and Yu Su. 2023. Mind2Web: Towards a Generalist Agent for the Web. arXiv:2306.06070 [cs.CL]"},{"key":"e_1_2_1_21_1","volume-title":"Towards a unified agent with foundation models. arXiv preprint arXiv:2307.09668","author":"Palo Norman Di","year":"2023","unstructured":"Norman Di Palo, Arunkumar Byravan, Leonard Hasenclever, Markus Wulfmeier, Nicolas Heess, and Martin Riedmiller. 2023. Towards a unified agent with foundation models. arXiv preprint arXiv:2307.09668 (2023)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3576047"},{"key":"e_1_2_1_23_1","volume-title":"How well can large language models explain business processes? arXiv preprint arXiv:2401.12846","author":"Fahland Dirk","year":"2024","unstructured":"Dirk Fahland, Fabian Fournier, Lior Limonad, Inna Skarbovsky, and Ava JE Swevels. 2024. How well can large language models explain business processes? arXiv preprint arXiv:2401.12846 (2024)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.33736\/ijbs.4301.2021"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACVW60836.2024.00102"},{"key":"e_1_2_1_26_1","volume-title":"Shixiang Shane Gu, and Izzeddin Gur","author":"Furuta Hiroki","year":"2023","unstructured":"Hiroki Furuta, Ofir Nachum, Kuang-Huei Lee, Yutaka Matsuo, Shixiang Shane Gu, and Izzeddin Gur. 2023. Multimodal Web Navigation with Instruction-Finetuned Foundation Models. arXiv preprint arXiv:2305.11854 (2023)."},{"key":"e_1_2_1_27_1","volume-title":"An overview of workflow management: From process modeling to workflow automation infrastructure. Distributed and parallel Databases 3","author":"Georgakopoulos Diimitrios","year":"1995","unstructured":"Diimitrios Georgakopoulos, Mark Hornick, and Amit Sheth. 1995. An overview of workflow management: From process modeling to workflow automation infrastructure. Distributed and parallel Databases 3 (1995), 119--153."},{"key":"e_1_2_1_28_1","volume-title":"International Conference on Business Process Management. Springer, 453--465","author":"Grohs Michael","year":"2023","unstructured":"Michael Grohs, Luka Abb, Nourhan Elsayed, and Jana-Rebecca Rehse. 2023. Large Language Models can accomplish Business Process Management Tasks. In International Conference on Business Process Management. Springer, 453--465."},{"key":"e_1_2_1_29_1","volume-title":"Mustafa Safdari, Yutaka Matsuo, Douglas Eck, and Aleksandra Faust.","author":"Gur Izzeddin","year":"2023","unstructured":"Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, and Aleksandra Faust. 2023. A real-world webagent with planning, long context understanding, and program synthesis. arXiv preprint arXiv:2307.12856 (2023)."},{"key":"e_1_2_1_30_1","unstructured":"Hongliang He Wenlin Yao Kaixin Ma Wenhao Yu Yong Dai Hongming Zhang Zhenzhong Lan and Dong Yu. 2024. WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models. arXiv:2401.13919 [cs.CL]"},{"key":"e_1_2_1_31_1","unstructured":"Sarah Calkins Holloway Michael Peterson Andrew MacDonald and Bridget Scherbring Pollak. 2018. From revenue cycle management to revenue excellence."},{"key":"e_1_2_1_32_1","volume-title":"Zijuan Lin, Liyang Zhou, et al.","author":"Hong Sirui","year":"2023","unstructured":"Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. 2023. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352 (2023)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Wenyi Hong Weihan Wang Qingsong Lv Jiazheng Xu Wenmeng Yu Junhui Ji Yan Wang Zihan Wang Yuxiao Dong Ming Ding et al. 2023. CogAgent: A Visual Language Model for GUI Agents. arXiv preprint arXiv:2312.08914 (2023).","DOI":"10.1109\/CVPR52733.2024.01354"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2467802"},{"key":"e_1_2_1_35_1","volume-title":"International Conference on Machine Learning. PMLR, 9466--9482","author":"Humphreys Peter C","year":"2022","unstructured":"Peter C Humphreys, David Raposo, Tobias Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Adam Santoro, and Timothy Lillicrap. 2022. A data-driven approach for learning to control computers. In International Conference on Machine Learning. PMLR, 9466--9482."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-30429-4_19"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/306101.306112"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00324"},{"key":"e_1_2_1_39_1","volume-title":"CHORUS: Foundation Models for Unified Data Discovery and Exploration. arXiv preprint arXiv:2306.09610","author":"Kayali Moe","year":"2023","unstructured":"Moe Kayali, Anton Lykov, Ilias Fountalis, Nikolaos Vasiloglou, Dan Olteanu, and Dan Suciu. 2023. CHORUS: Foundation Models for Unified Data Discovery and Exploration. arXiv preprint arXiv:2306.09610 (2023)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.51542\/ijscia.v4i4.3"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12599-020-00641-4"},{"key":"e_1_2_1_42_1","unstructured":"Xavier Lhuer. 2016. The next acronym you need to know about: RPA (robotic process automation). (2016)."},{"key":"e_1_2_1_43_1","volume-title":"More agents is all you need. arXiv preprint arXiv:2402.05120","author":"Li Junyou","year":"2024","unstructured":"Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, and Deheng Ye. 2024. More agents is all you need. arXiv preprint arXiv:2402.05120 (2024)."},{"key":"e_1_2_1_44_1","volume-title":"Interactive task and concept learning from natural language instructions and gui demonstrations. arXiv preprint arXiv:1909.00031","author":"Jia-Jun Li Toby","year":"2019","unstructured":"Toby Jia-Jun Li, Marissa Radensky, Justin Jia, Kirielle Singarajah, Tom M Mitchell, and Brad A Myers. 2019. Interactive task and concept learning from natural language instructions and gui demonstrations. arXiv preprint arXiv:1909.00031 (2019)."},{"key":"e_1_2_1_45_1","volume-title":"Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118","author":"Liang Tian","year":"2023","unstructured":"Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. 2023. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118 (2023)."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554888"},{"key":"e_1_2_1_47_1","volume-title":"Devansh Arpit, et al.","author":"Liu Zhiwei","year":"2023","unstructured":"Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, et al. 2023. Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents. arXiv preprint arXiv:2308.05960 (2023)."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196908"},{"key":"e_1_2_1_49_1","volume-title":"Luis Eduardo Leyva-del Foyo, and Arnaldo Diaz-Ramirez","author":"Mejia-Alvarez Pedro","year":"2018","unstructured":"Pedro Mejia-Alvarez, Luis Eduardo Leyva-del Foyo, and Arnaldo Diaz-Ramirez. 2018. Interrupt Handling Schemes in Operating Systems. Springer."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2023.01.287"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.461"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/3574245.3574258"},{"key":"e_1_2_1_53_1","volume-title":"GPT-4 technical report. arXiv","author":"R","year":"2023","unstructured":"R OpenAI. 2023. GPT-4 technical report. arXiv (2023), 2303--08774."},{"key":"e_1_2_1_54_1","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022) 27730--27744."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3586183.3606763"},{"key":"e_1_2_1_56_1","first-page":"1","article-title":"Self-Driving Database Management Systems","volume":"4","author":"Pavlo Andrew","year":"2017","unstructured":"Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self-Driving Database Management Systems.. In CIDR, Vol. 4. 1.","journal-title":"CIDR"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476411"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.accinf.2023.100641"},{"key":"e_1_2_1_59_1","unstructured":"R1. 2022. Healthcare Financial Trends Report. https:\/\/www.r1rcm.com\/news\/healthcare-trends-and-data-show-clinical-shortage-tip-of-the-iceberg"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/2809974.2809977"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-40172-6"},{"key":"e_1_2_1_62_1","volume-title":"International Conference on Business Process Management. Springer, 44--56","author":"Rizk Yara","year":"2023","unstructured":"Yara Rizk, Praveen Venkateswaran, Vatche Isahagian, Austin Narcomey, and Vinod Muthusamy. 2023. A Case for Business Process-Specific Foundation Models. In International Conference on Business Process Management. Springer, 44--56."},{"key":"e_1_2_1_63_1","volume-title":"Relational world knowledge representation in contextual language models: A review. arXiv preprint arXiv:2104.05837","author":"Safavi Tara","year":"2021","unstructured":"Tara Safavi and Danai Koutra. 2021. Relational world knowledge representation in contextual language models: A review. arXiv preprint arXiv:2104.05837 (2021)."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.32628\/CSEIT2062106"},{"key":"e_1_2_1_65_1","volume-title":"Conference of the Italian Chapter of AIS. Springer, 201--216","author":"Sarilo-Kankaanranta Henriika","year":"2021","unstructured":"Henriika Sarilo-Kankaanranta and Lauri Frank. 2021. The Slow Adoption Rate of Software Robotics in Accounting and Payroll Services and the Role of Resistance to Change in Innovation-Decision Process. In Conference of the Italian Chapter of AIS. Springer, 201--216."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-155860869-6\/50086-X"},{"key":"e_1_2_1_67_1","volume-title":"Death by 1,000 clicks: Where electronic health records went wrong. Kaiser Health News 18","author":"Schulte Fred","year":"2019","unstructured":"Fred Schulte and Erika Fry. 2019. Death by 1,000 clicks: Where electronic health records went wrong. Kaiser Health News 18 (2019)."},{"key":"e_1_2_1_68_1","volume-title":"From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces. arXiv preprint arXiv:2306.00245","author":"Shaw Peter","year":"2023","unstructured":"Peter Shaw, Mandar Joshi, James Cohan, Jonathan Berant, Panupong Pasupat, Hexiang Hu, Urvashi Khandelwal, Kenton Lee, and Kristina Toutanova. 2023. From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces. arXiv preprint arXiv:2306.00245 (2023)."},{"key":"e_1_2_1_69_1","unstructured":"Yongliang Shen Kaitao Song Xu Tan Dongsheng Li Weiming Lu and Yueting Zhuang. 2023. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. arXiv:2303.17580 [cs.CL]"},{"key":"e_1_2_1_70_1","volume-title":"Reflexion: Language Agents with Verbal Reinforcement Learning.(2023). arXiv preprint cs.AI\/2303.11366","author":"Shinn Noah","year":"2023","unstructured":"Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learning.(2023). arXiv preprint cs.AI\/2303.11366 (2023)."},{"key":"e_1_2_1_71_1","volume-title":"Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models finetuned with human feedback. arXiv preprint arXiv:2305.14975","author":"Tian Katherine","year":"2023","unstructured":"Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, and Christopher D Manning. 2023. Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models finetuned with human feedback. arXiv preprint arXiv:2305.14975 (2023)."},{"key":"e_1_2_1_72_1","unstructured":"UIPath. 2022. UiPath Certified RPA Associate v1.0 - EXAM Description.pdf. https:\/\/start.uipath.com\/rs\/995-XLT-886\/images\/UiPath%20Certified%20RPA%20Associate%20v1.0%20-%20EXAM%20Description.pdf"},{"key":"e_1_2_1_73_1","first-page":"33","article-title":"Process mining in the large: a tutorial. Business Intelligence: Third European Summer School, eBISS 2013, Dagstuhl Castle, Germany, July 7--12, 2013","volume":"3","author":"Van der Aalst Wil MP","year":"2014","unstructured":"Wil MP Van der Aalst. 2014. Process mining in the large: a tutorial. Business Intelligence: Third European Summer School, eBISS 2013, Dagstuhl Castle, Germany, July 7--12, 2013, Tutorial Lectures 3 (2014), 33--76.","journal-title":"Tutorial Lectures"},{"key":"e_1_2_1_74_1","volume-title":"Large Language Models for Business Process Management: Opportunities and Challenges. arXiv preprint arXiv:2304.04309","author":"Vidgof Maxim","year":"2023","unstructured":"Maxim Vidgof, Stefan Bachhofner, and Jan Mendling. 2023. Large Language Models for Business Process Management: Opportunities and Challenges. arXiv preprint arXiv:2304.04309 (2023)."},{"key":"e_1_2_1_75_1","volume-title":"Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291","author":"Wang Guanzhi","year":"2023","unstructured":"Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023)."},{"key":"e_1_2_1_76_1","volume-title":"Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079","author":"Wang Weihan","year":"2023","unstructured":"Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, et al. 2023. Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079 (2023)."},{"key":"e_1_2_1_77_1","unstructured":"Zihao Wang Shaofei Cai Anji Liu Yonggang Jin Jinbing Hou Bowei Zhang Haowei Lin Zhaofeng He Zilong Zheng Yaodong Yang et al. 2023. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv preprint arXiv:2311.05997 (2023)."},{"key":"e_1_2_1_78_1","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824--24837.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_79_1","volume-title":"Robotic Process Automation-A Systematic Literature Review and Assessment Framework. arXiv preprint arXiv:2012.11951","author":"Wewerka Judith","year":"2020","unstructured":"Judith Wewerka and Manfred Reichert. 2020. Robotic Process Automation-A Systematic Literature Review and Assessment Framework. arXiv preprint arXiv:2012.11951 (2020)."},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581158"},{"key":"e_1_2_1_81_1","volume-title":"Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155","author":"Wu Qingyun","year":"2023","unstructured":"Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. 2023. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 (2023)."},{"key":"e_1_2_1_82_1","volume-title":"OS-Copilot: Towards Generalist Computer Agents with Self-Improvement. arXiv preprint arXiv:2402.07456","author":"Wu Zhiyong","year":"2024","unstructured":"Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, and Lingpeng Kong. 2024. OS-Copilot: Towards Generalist Computer Agents with Self-Improvement. arXiv preprint arXiv:2402.07456 (2024)."},{"key":"e_1_2_1_83_1","unstructured":"An Yan Zhengyuan Yang Wanrong Zhu Kevin Lin Linjie Li Jianfeng Wang Jianwei Yang Yiwu Zhong Julian McAuley Jianfeng Gao et al. 2023. Gpt-4v in wonderland: Large multimodal models for zero-shot smartphone gui navigation. arXiv preprint arXiv:2311.07562 (2023)."},{"key":"e_1_2_1_84_1","volume-title":"Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. arXiv preprint arXiv:2310.11441","author":"Yang Jianwei","year":"2023","unstructured":"Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, and Jianfeng Gao. 2023. Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. arXiv preprint arXiv:2310.11441 (2023)."},{"key":"e_1_2_1_85_1","volume-title":"AppAgent: Multimodal Agents as Smartphone Users. arXiv preprint arXiv:2312.13771","author":"Yang Zhao","year":"2023","unstructured":"Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. 2023. AppAgent: Multimodal Agents as Smartphone Users. arXiv preprint arXiv:2312.13771 (2023)."},{"key":"e_1_2_1_86_1","volume-title":"React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629","author":"Yao Shunyu","year":"2022","unstructured":"Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 (2022)."},{"key":"e_1_2_1_87_1","unstructured":"Yining Ye Xin Cong Shizuo Tian Jiannan Cao Hao Wang Yujia Qin Yaxi Lu Heyang Yu Huadong Wang Yankai Lin et al. 2023. ProAgent: From Robotic Process Automation to Agentic Process Automation. arXiv preprint arXiv:2311.10751 (2023)."},{"key":"e_1_2_1_88_1","volume-title":"Agflow: Agent-based cross-enterprise workflow management system. In VLDB. 697--698.","author":"Zeng Liangzhao","year":"2001","unstructured":"Liangzhao Zeng, Boualem Benatallah, Phuong Nguyen, and Anne HH Ngu. 2001. Agflow: Agent-based cross-enterprise workflow management system. In VLDB. 697--698."},{"key":"e_1_2_1_89_1","volume-title":"UFO: A UI-Focused Agent for Windows OS Interaction. arXiv preprint arXiv:2402.07939","author":"Zhang Chaoyun","year":"2024","unstructured":"Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, et al. 2024. UFO: A UI-Focused Agent for Windows OS Interaction. arXiv preprint arXiv:2402.07939 (2024)."},{"key":"e_1_2_1_90_1","unstructured":"Jingyi Zhang Jiaxing Huang Sheng Jin and Shijian Lu. 2023. Vision-Language Models for Vision Tasks: A Survey. arXiv:2304.00685 [cs.CV]"},{"key":"e_1_2_1_91_1","unstructured":"Boyuan Zheng Boyu Gou Jihyung Kil Huan Sun and Yu Su. 2024. GPT-4V(ision) is a Generalist Web Agent if Grounded. arXiv:2401.01614 [cs.IR]"},{"key":"e_1_2_1_92_1","volume-title":"Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854","author":"Zhou Shuyan","year":"2023","unstructured":"Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, et al. 2023. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854 (2023)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3681954.3681964","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,27]],"date-time":"2024-11-27T15:16:33Z","timestamp":1732720593000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3681954.3681964"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":90,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10.14778\/3681954.3681964"],"URL":"https:\/\/doi.org\/10.14778\/3681954.3681964","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"2024-08-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}