{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T11:40:52Z","timestamp":1777462852291,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":29,"publisher":"ACM","funder":[{"name":"SNSF","award":["10.001.796"],"award-info":[{"award-number":["10.001.796"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,4,27]]},"DOI":"10.1145\/3805621.3807661","type":"proceedings-article","created":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T13:08:45Z","timestamp":1777381725000},"page":"386-396","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["O\n                    <scp>rbit<\/scp>\n                    : Efficient Agentic Inference using Priority Scheduling"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-6207-5905","authenticated-orcid":false,"given":"Sami","family":"Abuzakuk","sequence":"first","affiliation":[{"name":"Scalable Computing Systems Laboratory, EPFL, Lausanne, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8187-724X","authenticated-orcid":false,"given":"Anne-Marie","family":"Kermarrec","sequence":"additional","affiliation":[{"name":"Scalable Computing Systems Laboratory, EPFL, Lausanne, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2347-1585","authenticated-orcid":false,"family":"Palak","sequence":"additional","affiliation":[{"name":"Scalable Computing Systems Laboratory, EPFL, Lausanne, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7826-1599","authenticated-orcid":false,"given":"Rafael","family":"Pires","sequence":"additional","affiliation":[{"name":"Scalable Computing Systems Laboratory, EPFL, Lausanne, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1928-1549","authenticated-orcid":false,"given":"Rishi","family":"Sharma","sequence":"additional","affiliation":[{"name":"EPFL, Lausanne, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4157-4847","authenticated-orcid":false,"given":"Martijn","family":"de Vos","sequence":"additional","affiliation":[{"name":"EPFL, Lausanne, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,4,28]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"134","volume-title":"Taming throughput-latency tradeoff in llm inference with sarathi-serve","author":"Agrawal Amey","year":"2024","unstructured":"Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav Gulavani, Alexey Tumanov, and Ramachandran Ramjee. Taming throughput-latency tradeoff in llm inference with sarathi-serve. pages 117\u2013134, 2024."},{"key":"e_1_3_2_1_2_1","volume-title":"autogpt. https:\/\/github.com\/Significant-Gravitas\/AutoGPT","year":"2023","unstructured":"autogpt. autogpt. https:\/\/github.com\/Significant-Gravitas\/AutoGPT, 2023. Accessed: 2026-02-02."},{"key":"e_1_3_2_1_3_1","volume-title":"et al. Magentic-one: A generalist multi-agent system for solving complex tasks. arXiv preprint arXiv:2411.04468","author":"Fourney Adam","year":"2024","unstructured":"Adam Fourney, Gagan Bansal, Hussein Mozannar, Cheng Tan, Eduardo Salinas, Friederike Niedtner, Grace Proebsting, Griffin Bassman, Jack Gerrits, Jacob Alber, et al. Magentic-one: A generalist multi-agent system for solving complex tasks. arXiv preprint arXiv:2411.04468, 2024."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Seungju Han Kavel Rao Allyson Ettinger Liwei Jiang Bill Yuchen Lin Nathan Lambert Yejin Choi and Nouha Dziri. Wildguard: Open one-stop moderation tools for safety risks jailbreaks and refusals of llms. Advances in neural information processing systems 37:8093\u20138131 2024.","DOI":"10.52202\/079017-0261"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3712003"},{"key":"e_1_3_2_1_6_1","volume-title":"The Twelfth International Conference on Learning Representations","author":"Kim Seungone","year":"2023","unstructured":"Seungone Kim, Jamin Shin, Yejin Cho, Joel Jang, Shayne Longpre, Hwaran Lee, Sangdoo Yun, Seongjin Shin, Sungdong Kim, James Thorne, et al. Prometheus: Inducing fine-grained evaluation capability in language models. In The Twelfth International Conference on Learning Representations, 2023."},{"key":"e_1_3_2_1_7_1","first-page":"4353","volume-title":"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing","author":"Kim Seungone","year":"2024","unstructured":"Seungone Kim, Juyoung Suk, Shayne Longpre, Bill Yuchen Lin, Jamin Shin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, and Minjoon Seo. Prometheus 2: An open source language model specialized in evaluating other language models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 4334\u20134353, 2024."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_3_2_1_9_1","volume-title":"https:\/\/github.com\/langchain-ai\/langchain","year":"2023","unstructured":"langchain. Langchain. https:\/\/github.com\/langchain-ai\/langchain, 2023. Accessed: 2026-02-02."},{"key":"e_1_3_2_1_10_1","volume-title":"Lost in the middle: How language models use long contexts. Transactions of the association for computational linguistics, 12:157\u2013173","author":"Liu Nelson F","year":"2024","unstructured":"Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts. Transactions of the association for computational linguistics, 12:157\u2013173, 2024."},{"key":"e_1_3_2_1_11_1","first-page":"2522","volume-title":"Proceedings of the 2023 conference on empirical methods in natural language processing","author":"Liu Yang","year":"2023","unstructured":"Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. G-eval: Nlg evaluation using gpt-4 with better human alignment. In Proceedings of the 2023 conference on empirical methods in natural language processing, pages 2511\u20132522, 2023."},{"key":"e_1_3_2_1_12_1","volume-title":"et al. Autellix: An efficient serving engine for llm agents as general programs. arXiv preprint arXiv:2502.13965","author":"Luo Michael","year":"2025","unstructured":"Michael Luo, Xiaoxiang Shi, Colin Cai, Tianjun Zhang, Justin Wong, Yichuan Wang, Chi Wang, Yanping Huang, Zhifeng Chen, Joseph E Gonzalez, et al. Autellix: An efficient serving engine for llm agents as general programs. arXiv preprint arXiv:2502.13965, 2025."},{"key":"e_1_3_2_1_13_1","volume-title":"Solving a million-step llm task with zero errors. arXiv preprint arXiv:2511.09030","author":"Meyerson Elliot","year":"2025","unstructured":"Elliot Meyerson, Giuseppe Paolo, Roberto Dailey, Hormoz Shahrzad, Olivier Francon, Conor F Hayes, Xin Qiu, Babak Hodjat, and Risto Miikkulainen. Solving a million-step llm task with zero errors. arXiv preprint arXiv:2511.09030, 2025."},{"key":"e_1_3_2_1_14_1","volume-title":"The Twelfth International Conference on Learning Representations","author":"Mialon Gr\u00e9goire","year":"2023","unstructured":"Gr\u00e9goire Mialon, Cl\u00e9mentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. Gaia: a benchmark for general ai assistants. In The Twelfth International Conference on Learning Representations, 2023."},{"key":"e_1_3_2_1_15_1","first-page":"6150","volume-title":"Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2","author":"Ning Liangbo","year":"2025","unstructured":"Liangbo Ning, Ziran Liang, Zhuohang Jiang, Haohao Qu, Yujuan Ding, Wenqi Fan, Xiao-yong Wei, Shanru Lin, Hui Liu, Philip S Yu, et al. A survey of webagents: Towards next-generation ai agents for web automation with large foundation models. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, pages 6140\u20136150, 2025."},{"key":"e_1_3_2_1_16_1","volume-title":"TensorRT-LLM: A tensorrt toolbox for optimized large language model inference. https:\/\/github.com\/NVIDIA\/TensorRT-LLM","author":"NVIDIA.","year":"2023","unstructured":"NVIDIA. TensorRT-LLM: A tensorrt toolbox for optimized large language model inference. https:\/\/github.com\/NVIDIA\/TensorRT-LLM, 2023. Accessed: 2026-02-02."},{"key":"e_1_3_2_1_17_1","volume-title":"https:\/\/huggingface.co\/openai\/gpt-oss-120b","author":"AI.","year":"2025","unstructured":"OpenAI. Gpt-oss-120b. https:\/\/huggingface.co\/openai\/gpt-oss-120b, 2025. Open-weight 120B parameter mixture-of-experts model. Accessed: 2026-02-24."},{"key":"e_1_3_2_1_18_1","volume-title":"ICLR 2025 Workshop on Building Trust in Language Models and Applications","author":"Pan Melissa Z","year":"2025","unstructured":"Melissa Z Pan, Mert Cemri, Lakshya A Agrawal, Shuyi Yang, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Kannan Ramchandran, Dan Klein, et al. Why do multiagent systems fail? In ICLR 2025 Workshop on Building Trust in Language Models and Applications, 2025."},{"key":"e_1_3_2_1_19_1","first-page":"125","volume-title":"Proceedings of the 4th Workshop on Machine Learning and Systems","author":"Santhanam Keshav","year":"2024","unstructured":"Keshav Santhanam, Deepti Raghavan, Muhammad Shahir Rahman, Thejas Venkatesh, Neha Kunjal, Pratiksha Thaker, Philip Levis, and Matei Zaharia. Alto: An efficient network orchestrator for compound ai systems. In Proceedings of the 4th Workshop on Machine Learning and Systems, pages 117\u2013125, 2024."},{"key":"e_1_3_2_1_20_1","first-page":"68539","article-title":"Language models can teach themselves to use tools","volume":"36","author":"Schick Timo","year":"2023","unstructured":"Timo Schick, Jane Dwivedi-Yu, Roberto Dess\u00ec, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36:68539\u201368551, 2023.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_21_1","volume-title":"The illusion of diminishing returns: Measuring long horizon execution in llms. arXiv preprint arXiv:2509.09677","author":"Sinha Akshit","year":"2025","unstructured":"Akshit Sinha, Arvindh Arun, Shashwat Goel, Steffen Staab, and Jonas Geiping. The illusion of diminishing returns: Measuring long horizon execution in llms. arXiv preprint arXiv:2509.09677, 2025."},{"key":"e_1_3_2_1_22_1","first-page":"284","volume-title":"Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track","author":"Wang Haoxin","year":"2025","unstructured":"Haoxin Wang, Xianhan Peng, Huang Cheng, Yizhe Huang, Ming Gong, Chenghan Yang, Yang Liu, and Jiang Lin. Ecom-bench: Can llm agent resolve real-world e-commerce customer support issues? In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 276\u2013284, 2025."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-024-40231-1"},{"key":"e_1_3_2_1_24_1","volume-title":"The eleventh international conference on learning representations","author":"Yao Shunyu","year":"2022","unstructured":"Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations, 2022."},{"key":"e_1_3_2_1_25_1","first-page":"538","volume-title":"16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)","author":"Yu Gyeong-In","year":"2022","unstructured":"Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, and Byung-Gon Chun. Orca: A distributed serving system for {Transformer-Based} generative models. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 521\u2013538, 2022."},{"key":"e_1_3_2_1_26_1","volume-title":"Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information processing systems, 36:46595\u201346623","author":"Zheng Lianmin","year":"2023","unstructured":"Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information processing systems, 36:46595\u201346623, 2023."},{"key":"e_1_3_2_1_27_1","unstructured":"Wenhao Zheng Xinyu Ye Peng Xia Fang Wu Linjie Li Weitong Zhang Lijuan Wang Yejin Choi Yun Li and Huaxiu Yao. The agent's marathon: Probing the limits of endurance in long-horizon tasks."},{"key":"e_1_3_2_1_28_1","first-page":"210","volume-title":"18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)","author":"Zhong Yinmin","year":"2024","unstructured":"Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, and Hao Zhang. {DistServe}: Disaggregating prefill and decoding for goodput-optimized large language model serving. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 193\u2013210, 2024."},{"key":"e_1_3_2_1_29_1","volume-title":"et al. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854","author":"Zhou Shuyan","year":"2023","unstructured":"Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854, 2023."}],"event":{"name":"EuroSys '26: 21st European Conference on Computer Systems","location":"Edinburgh Scotland Uk","acronym":"EuroMLSys '26","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the Sixth European Workshop on Machine Learning and Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3805621.3807661","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T13:18:39Z","timestamp":1777382319000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3805621.3807661"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,27]]},"references-count":29,"alternative-id":["10.1145\/3805621.3807661","10.1145\/3805621"],"URL":"https:\/\/doi.org\/10.1145\/3805621.3807661","relation":{},"subject":[],"published":{"date-parts":[[2026,4,27]]},"assertion":[{"value":"2026-04-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}