{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T00:13:20Z","timestamp":1777421600194,"version":"3.51.4"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>This paper introduces SagaLLM, a structured multi-agent architecture designed to address four foundational limitations of current LLM-based planning systems: unreliable self-validation, context loss, lack of transactional safeguards, and insufficient inter-agent coordination. While recent frameworks leverage LLMs for task decomposition and multi-agent communication, they often fail to ensure consistency, rollback, or constraint satisfaction across distributed workflows. SagaLLM bridges this gap by integrating the Saga transactional pattern with persistent memory, automated compensation, and independent validation agents. It leverages LLMs' generative reasoning to automate key tasks traditionally requiring hand-coded coordination logic, including state tracking, dependency analysis, log schema generation, and recovery orchestration. Although SagaLLM relaxes strict ACID guarantees, it ensures workflow-wide consistency and recovery through modular checkpointing and compensable execution. Empirical evaluations across planning domains demonstrate that standalone LLMs frequently violate interdependent constraints or fail to recover from disruptions. In contrast, SagaLLM achieves significant improvements in consistency, validation accuracy, and adaptive coordination under uncertainty\u2014establishing a robust foundation for real-world, scalable LLM-based multi-agent systems.<\/jats:p>","DOI":"10.14778\/3750601.3750611","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:05Z","timestamp":1758029885000},"page":"4874-4886","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning"],"prefix":"10.14778","volume":"18","author":[{"given":"Edward Y.","family":"Chang","sequence":"first","affiliation":[{"name":"Stanford University"}]},{"given":"Longling","family":"Geng","sequence":"additional","affiliation":[{"name":"Stanford University"}]}],"member":"320","published-online":{"date-parts":[[2025,9,16]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"https:\/\/aws.amazon.com\/step-functions\/","author":"Step Functions AWS","year":"2023","unstructured":"AWS Step Functions. https:\/\/aws.amazon.com\/step-functions\/, 2023. Accessed: 2025-03-04."},{"key":"e_1_2_1_2_1","volume-title":"https:\/\/azure.microsoft.com\/en-us\/products\/logic-apps\/","author":"Apps Azure Logic","year":"2023","unstructured":"Azure Logic Apps. https:\/\/azure.microsoft.com\/en-us\/products\/logic-apps\/, 2023. Accessed: 2025-03-04."},{"key":"e_1_2_1_3_1","unstructured":"Anthropic. Claude Technical Report. Technical report 2024. URL https:\/\/www.anthropic.com."},{"key":"e_1_2_1_4_1","volume-title":"International Conference on Learning Representations (ICLR)","author":"Brahman Faeze","year":"2024","unstructured":"Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, and Yejin Choi. PLASMA: Making small language models better procedural knowledge models for (counterfactual) planning. In International Conference on Learning Representations (ICLR), 2024."},{"key":"e_1_2_1_5_1","volume-title":"Why do multi-agent LLM systems fail? arXiv preprint arXiv:2503.13657","author":"Cemri Mert","year":"2025","unstructured":"Mert Cemri, Melissa Z. Pan, Shuyi Yang, Lakshya A. Agrawal, Bhavya Chopra, et al. Why do multi-agent LLM systems fail? arXiv preprint arXiv:2503.13657, 2025. URL https:\/\/arxiv.org\/abs\/2503.13657."},{"key":"e_1_2_1_6_1","volume-title":"IEEE 13th Computing and Communication Workshop and Conference","author":"Chang Edward Y.","year":"2023","unstructured":"Edward Y. Chang. Prompting large language models with the socratic method. In IEEE 13th Computing and Communication Workshop and Conference, 2023."},{"key":"e_1_2_1_7_1","volume-title":"The 10th International Conference on Computational Science and Computational Intelligence","author":"Chang Edward Y.","year":"2023","unstructured":"Edward Y. Chang. Examining GPT-4's capabilities and enhancement with SocraSynth. In The 10th International Conference on Computational Science and Computational Intelligence, December 2023."},{"key":"e_1_2_1_8_1","volume-title":"August","author":"Chang Edward Y.","year":"2024","unstructured":"Edward Y. Chang. EVINCE: Optimizing adversarial LLM dialogues via conditional statistics and information theory. arXiv preprint arXiv:2408.14575, August 2024."},{"key":"e_1_2_1_9_1","volume-title":"Multi-LLM Agent Collaborative Intelligence: The Path to Artificial General Intelligence","author":"Chang Edward Y.","year":"2025","unstructured":"Edward Y. Chang. Multi-LLM Agent Collaborative Intelligence: The Path to Artificial General Intelligence. ACM Books (accepted), 2025. Amazon (March 2024)."},{"key":"e_1_2_1_10_1","volume-title":"The unified cognitive consciousness theory (UCCT) for language models: Anchoring semantics, thresholds of activation, and emergent reasoning. arXiv preprint arXiv:2506.02139","author":"Chang Edward Y.","year":"2025","unstructured":"Edward Y. Chang. The unified cognitive consciousness theory (UCCT) for language models: Anchoring semantics, thresholds of activation, and emergent reasoning. arXiv preprint arXiv:2506.02139, 2025."},{"key":"e_1_2_1_11_1","volume-title":"Chang and Longling Geng. ALAS: A stateful multi-LLM agent framework for disruption-aware planning. arXiv preprint arXiv:2505.12501","author":"Edward","year":"2025","unstructured":"Edward Y. Chang and Longling Geng. ALAS: A stateful multi-LLM agent framework for disruption-aware planning. arXiv preprint arXiv:2505.12501, 2025. URL https:\/\/arxiv.org\/abs\/2505.12501."},{"issue":"3","key":"e_1_2_1_12_1","first-page":"2157","article-title":"A survey on evaluation of large language models","volume":"15","author":"Chang Yupeng","year":"2024","unstructured":"Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, et al. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), March 2024. ISSN 2157-6904.","journal-title":"ACM Transactions on Intelligent Systems and Technology"},{"key":"e_1_2_1_13_1","volume-title":"DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv preprint arXiv:2501.12948","author":"Daya Guo AI","year":"2025","unstructured":"DeepSeek-AI, Daya Guo, Dejian Yang, et al. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025. URL https:\/\/arxiv.org\/abs\/2501.12948."},{"key":"e_1_2_1_14_1","volume-title":"Improving factuality and reasoning in language models through multiagent debate. arXiv:2305.14325","author":"Du Yilun","year":"2023","unstructured":"Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. arXiv:2305.14325, 2023. URL https:\/\/arxiv.org\/abs\/2305.14325."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1894963"},{"key":"e_1_2_1_16_1","volume-title":"MIT Press","author":"Durfee Edmund H.","year":"1999","unstructured":"Edmund H. Durfee. Distributed problem solving and planning. In Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, pages 121\u2013164. MIT Press, Cambridge, MA, USA, 1999. ISBN 0262232030."},{"key":"e_1_2_1_17_1","volume-title":"AgentScope: A flexible yet robust multi-agent platform. arXiv preprint arXiv:2402.14034","author":"Gao Dawei","year":"2024","unstructured":"Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, et al. AgentScope: A flexible yet robust multi-agent platform. arXiv preprint arXiv:2402.14034, 2024. URL https:\/\/arxiv.org\/abs\/2402.14034."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/38713.38742"},{"key":"e_1_2_1_19_1","volume-title":"Source code for SagaLLM paper experiments. https:\/\/github.com\/genglongling\/SagaLLM","author":"Geng Longling","year":"2025","unstructured":"Longling Geng. Source code for SagaLLM paper experiments. https:\/\/github.com\/genglongling\/SagaLLM, 2025."},{"key":"e_1_2_1_20_1","volume-title":"Realm-Bench: A real-world planning benchmark for LLMs and multi-agent systems. arXiv:2502.18836","author":"Geng Longling","year":"2025","unstructured":"Longling Geng and Edward Y. Chang. Realm-Bench: A real-world planning benchmark for LLMs and multi-agent systems. arXiv:2502.18836, 2025."},{"key":"e_1_2_1_21_1","volume-title":"Harvard University Press","author":"G\u00f6del Kurt","year":"1967","unstructured":"Kurt G\u00f6del. On formally undecidable propositions of Principia Mathematica and related systems I. In Jean van Heijenoort, editor, From Frege to G\u00f6del: A Source Book in Mathematical Logic, 1879\u20131931, pages 596\u2013616. Harvard University Press, 1967. Translated by Jean van Heijenoort."},{"key":"e_1_2_1_22_1","first-page":"154","volume-title":"Proceedings of the Seventh International Conference on Very Large Data Bases","volume":"7","author":"Gray Jim","unstructured":"Jim Gray. The transaction concept: Virtues and limitations. In Proceedings of the Seventh International Conference on Very Large Data Bases, volume 7 of VLDB '81, pages 144\u2013154. VLDB Endowment, 1981."},{"key":"e_1_2_1_23_1","volume-title":"Found in the middle: Calibrating positional attention bias improves long context utilization. arXiv:2406.16008","author":"Hsieh Cheng-Yu","year":"2024","unstructured":"Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, et al. Found in the middle: Calibrating positional attention bias improves long context utilization. arXiv:2406.16008, 2024. URL https:\/\/arxiv.org\/abs\/2406.16008."},{"key":"e_1_2_1_24_1","volume-title":"International Conference on Learning Representations (ICLR)","author":"Huang Jie","year":"2024","unstructured":"Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, et al. Large language models cannot self-correct reasoning yet. In International Conference on Learning Representations (ICLR), 2024."},{"key":"e_1_2_1_25_1","volume-title":"Understanding the planning of llm agents: A survey. arXiv:2402.02716","author":"Huang Xu","year":"2024","unstructured":"Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, and more. Understanding the planning of llm agents: A survey. arXiv:2402.02716, 2024. URL https:\/\/arxiv.org\/abs\/2402.02716."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888900000205"},{"key":"e_1_2_1_27_1","volume-title":"CoRR","author":"Jiang Dongwei","year":"2024","unstructured":"Dongwei Jiang, Jingyu Zhang, Orion Weller, Nathaniel Weir, Benjamin Van Durme, et al. Self-[in]correct: LLMs struggle with refining self-generated responses. CoRR, 2024."},{"key":"e_1_2_1_28_1","volume-title":"LangGraph: Building structured applications with LLMs. https:\/\/github.com\/langchain-ai\/langgraph","author":"LangChain","year":"2024","unstructured":"LangChain AI. LangGraph: Building structured applications with LLMs. https:\/\/github.com\/langchain-ai\/langgraph, 2024."},{"key":"e_1_2_1_29_1","volume-title":"Distributed Sensor Networks: A Multiagent Perspective","author":"Lesser Victor","year":"2004","unstructured":"Victor Lesser, Charles L. Ortiz Jr., and Milind Tambe. Distributed Sensor Networks: A Multiagent Perspective, volume 9. Springer Science & Business Media, 2004."},{"key":"e_1_2_1_30_1","volume-title":"Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Communicative agents for \"mind\" exploration of large language model society. arXiv preprint arXiv:2303.17760","author":"Li Guohao","year":"2023","unstructured":"Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Communicative agents for \"mind\" exploration of large language model society. arXiv preprint arXiv:2303.17760, 2023. URL https:\/\/arxiv.org\/abs\/2303.17760."},{"key":"e_1_2_1_31_1","volume-title":"Dissecting chain-of-thought: Compositionality through in-context filtering and learning. arXiv preprint arXiv:2305.18869","author":"Li Yingcong","year":"2023","unstructured":"Yingcong Li, Kartik Sreenivasan, Angeliki Giannou, Dimitris Papailiopoulos, and Samet Oymak. Dissecting chain-of-thought: Compositionality through in-context filtering and learning. arXiv preprint arXiv:2305.18869, 2023. URL https:\/\/arxiv.org\/abs\/2305.18869."},{"key":"e_1_2_1_32_1","volume-title":"Large language models have intrinsic self-correction ability. arXiv preprint arXiv:2406.15673","author":"Liu Dancheng","year":"2024","unstructured":"Dancheng Liu, Amir Nassereldine, Ziming Yang, Chenhui Xu, Yuting Hu, et al. Large language models have intrinsic self-correction ability. arXiv preprint arXiv:2406.15673, 2024. URL https:\/\/arxiv.org\/abs\/2406.15673."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00638"},{"key":"e_1_2_1_34_1","volume-title":"Text and patterns: For effective chain of thought, it takes two to tango. arXiv preprint arXiv:2209.07686","author":"Madaan Aman","year":"2022","unstructured":"Aman Madaan and Amir Yazdanbakhsh. Text and patterns: For effective chain of thought, it takes two to tango. arXiv preprint arXiv:2209.07686, 2022. URL https:\/\/arxiv.org\/abs\/2209.07686."},{"key":"e_1_2_1_35_1","volume-title":"NoLiMa: Long-context evaluation beyond literal matching. arXiv preprint arXiv:2502.05167","author":"Modarressi Ali","year":"2025","unstructured":"Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, et al. NoLiMa: Long-context evaluation beyond literal matching. arXiv preprint arXiv:2502.05167, 2025. URL https:\/\/arxiv.org\/abs\/2502.05167."},{"key":"e_1_2_1_36_1","volume-title":"https:\/\/openai.com\/index\/hello-gpt-4o\/","author":"AI.","year":"2024","unstructured":"OpenAI. Hello GPT-4o. https:\/\/openai.com\/index\/hello-gpt-4o\/, 2024. Accessed: Jan. 30, 2025."},{"issue":"3","key":"e_1_2_1_37_1","first-page":"48","article-title":"An ACID alternative","volume":"6","author":"Pritchett Dan","year":"2008","unstructured":"Dan Pritchett. BASE: An ACID alternative. Queue, 6(3):48\u201355, 2008.","journal-title":"Queue"},{"key":"e_1_2_1_38_1","volume-title":"Microservices Patterns: With Examples in Java","author":"Richardson Chris","year":"2018","unstructured":"Chris Richardson. Microservices Patterns: With Examples in Java. Manning Publications, Shelter Island, NY, USA, 2018."},{"key":"e_1_2_1_39_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Stechly Kaya","year":"2024","unstructured":"Kaya Stechly, Karthik Valmeekam, and Subbarao Kambhampati. Chain of thoughtlessness? an analysis of CoT in planning. In Advances in Neural Information Processing Systems (NeurIPS), 2024. URL https:\/\/arxiv.org\/abs\/2405.04776."},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.is.2004.02.002","article-title":"ter Hofstede. YAWL: Yet another workflow language","volume":"30","author":"van der Aalst Wil M. P.","year":"2005","unstructured":"Wil M. P. van der Aalst and Arthur H. M. ter Hofstede. YAWL: Yet another workflow language. Information Systems, 30:245\u2013275, 2005. URL https:\/\/api.semanticscholar.org\/CorpusID:205487187.","journal-title":"Information Systems"},{"key":"e_1_2_1_41_1","first-page":"6008","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, et al. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pages 5998\u20136008, 2017."},{"key":"e_1_2_1_42_1","volume-title":"PlanGenLLMs: A modern survey of LLM planning capabilities. arXiv preprint arXiv:2502.11221","author":"Wei Hui","year":"2025","unstructured":"Hui Wei, Zihao Zhang, Shenghua He, Tian Xia, Shijia Pan, et al. PlanGenLLMs: A modern survey of LLM planning capabilities. arXiv preprint arXiv:2502.11221, 2025. URL https:\/\/arxiv.org\/abs\/2502.11221."},{"key":"e_1_2_1_43_1","volume-title":"NIPS '22","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, et al. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems (NeurIPS), NIPS '22, 2022."},{"key":"e_1_2_1_44_1","volume-title":"Morgan Kaufmann","author":"Weikum Gerhard","year":"2001","unstructured":"Gerhard Weikum and Gottfried Vossen. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann, San Francisco, CA, USA, 2001."},{"key":"e_1_2_1_45_1","volume-title":"to Multiagent Systems","author":"Wooldridge Michael","year":"2009","unstructured":"Michael Wooldridge. An Intro. to Multiagent Systems. John Wiley & Sons, 2009."},{"key":"e_1_2_1_46_1","volume-title":"Conference on Language Modeling (COLM)","author":"Wu Qingyun","year":"2024","unstructured":"Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, and Chi Wang. AutoGen: Enabling next-gen LLM applications via multi-agent conversation. In Conference on Language Modeling (COLM), August 2024."},{"key":"e_1_2_1_47_1","volume-title":"Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453","author":"Xiao Guangxuan","year":"2024","unstructured":"Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453, 2024. URL https:\/\/arxiv.org\/abs\/2309.17453."},{"key":"e_1_2_1_48_1","volume-title":"Failure modes of LLMs for causal reasoning on narratives. arXiv preprint arXiv:2410.23884","author":"Yamin Khurram","year":"2024","unstructured":"Khurram Yamin, Shantanu Gupta, Gaurav R. Ghosal, Zachary C. Lipton, and Bryan Wilder. Failure modes of LLMs for causal reasoning on narratives. arXiv preprint arXiv:2410.23884, 2024. URL https:\/\/arxiv.org\/abs\/2410.23884."},{"key":"e_1_2_1_49_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","volume":"36","author":"Yao Shinnosuke","year":"2024","unstructured":"Shinnosuke Yao, Dong Yu, Jianfeng Zhao, Izhak Shafran, Thomas Griffiths, et al. Tree of thoughts: Deliberate problem solving with large language models. In Advances in Neural Information Processing Systems (NeurIPS), volume 36, 2024."},{"key":"e_1_2_1_50_1","volume-title":"AFlow: Automating agentic workflow generation. arXiv preprint arXiv:2410.10762","author":"Zhang Jiayi","year":"2024","unstructured":"Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, et al. AFlow: Automating agentic workflow generation. arXiv preprint arXiv:2410.10762, 2024. URL https:\/\/arxiv.org\/abs\/2410.10762."},{"key":"e_1_2_1_51_1","volume-title":"A survey of large language models. arXiv preprint arXiv:2303.18223","author":"Zhao Wayne Xin","year":"2025","unstructured":"Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2025. URL https:\/\/arxiv.org\/abs\/2303.18223."},{"key":"e_1_2_1_52_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Zhao Zirui","year":"2023","unstructured":"Zirui Zhao, Wee Sun Lee, and David Hsu. Large language models as common-sense knowledge for large-scale task planning. In Advances in Neural Information Processing Systems (NeurIPS), 2023."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3750601.3750611","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:40:44Z","timestamp":1758030044000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3750601.3750611"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":52,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.14778\/3750601.3750611"],"URL":"https:\/\/doi.org\/10.14778\/3750601.3750611","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8]]},"assertion":[{"value":"2025-09-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}