{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T22:27:11Z","timestamp":1776378431028,"version":"3.51.2"},"reference-count":171,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T00:00:00Z","timestamp":1750896000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T00:00:00Z","timestamp":1750896000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62406061"],"award-info":[{"award-number":["62406061"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62432002"],"award-info":[{"award-number":["62432002"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100007129","name":"Natural Science Foundation of Shandong Province","doi-asserted-by":"publisher","award":["ZR2023QF159"],"award-info":[{"award-number":["ZR2023QF159"]}],"id":[{"id":"10.13039\/501100007129","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Sci. Eng."],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Human beings capable of making and using tools can accomplish tasks far beyond their innate abilities, and this paradigm of integration with tools may not be limited to humans themselves. Recently, the large language model (LLM) has demonstrated immense potential across various fields with its unique planning and reasoning abilities. However, there are still many challenges beyond its capabilities due to deficiencies in its training data and inherent illusions. Thus, integrating LLMs and tools into tool learning agents has become a new emerging research direction. To this end, we present a systematic investigation and comprehensive review of tool-learning agents in this paper. We start by introducing the definition of the tool learning task for Agents and then illustrating the  typical\u00a0architecture of the tool-learning models. Since these tools are all defined by users, LLM does not know what tools there are and what their functions are. Thus, LLMs should first find appropriate tools and split the tool retrieval methods into two categories: training-based and non-training-based. To accurately complete the user task, it is important to decompose the task into several sub-tasks and execute them in the correct order. Following that, we introduce the tool planning methods and organize these works by whether they rely on the model\u2019s inherent reasoning capabilities for planning or utilize external reasoning tools. Due to the rapid development of this field, we also introduce an emerging frontier direction: using multimodal tools for LLM. In addition, we compile current open-source benchmarks and evaluation metrics, focusing on their scale, composition, calculation methods, and assessment dimensions. Next, we introduce several application scenarios for the LLM-based tool learning methods. Finally, we discuss the safety and ethical issues involved in tool learning.<\/jats:p>","DOI":"10.1007\/s41019-025-00296-9","type":"journal-article","created":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T03:54:00Z","timestamp":1750910040000},"page":"533-563","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["LLM-Based Agents for Tool Learning: A Survey"],"prefix":"10.1007","volume":"10","author":[{"given":"Weikai","family":"Xu","sequence":"first","affiliation":[]},{"given":"Chengrui","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Shen","family":"Gao","sequence":"additional","affiliation":[]},{"given":"Shuo","family":"Shang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,26]]},"reference":[{"key":"296_CR1","unstructured":"Achiam J, Adler S, Agarwal S et\u00a0al (2023) Gpt-4 technical report. ArXiv preprint arXiv:2303.08774"},{"key":"296_CR2","unstructured":"Aeronautiques C, Howe A, Knoblock C et al (1998) Pddl| the planning domain definition language. Technical Report"},{"key":"296_CR3","unstructured":"Agarwal A, Chan A, Chandel S et\u00a0al (2024) Copilot evaluation harness: evaluating llm-guided software programming. ArXiv preprint arXiv:2402.14261"},{"issue":"5509","key":"296_CR4","doi-asserted-by":"publisher","first-page":"1748","DOI":"10.1126\/science.1059487","volume":"291","author":"SH Ambrose","year":"2001","unstructured":"Ambrose SH (2001) Paleolithic technology and human evolution. Science 291(5509):1748\u20131753","journal-title":"Science"},{"key":"296_CR5","unstructured":"Anantha R, Bandyopadhyay B, Kashi A et\u00a0al (2023) Protip: progressive tool retrieval improves planning. ArXiv preprint arXiv:2312.10332"},{"key":"296_CR6","first-page":"24639","volume":"35","author":"B Baker","year":"2022","unstructured":"Baker B, Akkaya I, Zhokov P et al (2022) Video pretraining (vpt): learning to act by watching unlabeled online videos. Adv Neural Inf Process Syst 35:24639\u201324654","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR7","doi-asserted-by":"crossref","unstructured":"Barwise J (1977) An introduction to first-order logic. In: Studies in logic and the foundations of mathematics, vol\u00a090. Elsevier, pp 5\u201346","DOI":"10.1016\/S0049-237X(08)71097-8"},{"key":"296_CR8","doi-asserted-by":"crossref","unstructured":"Besta M, Blach N, Kubicek A et\u00a0al (2024) Graph of thoughts: Solving elaborate problems with large language models. In: Proceedings of the AAAI conference on artificial intelligence, pp 17682\u201317690","DOI":"10.1609\/aaai.v38i16.29720"},{"key":"296_CR9","doi-asserted-by":"crossref","unstructured":"Bhat V, Kaypak AU, Krishnamurthy P et\u00a0al (2024) Grounding llms for robot task planning using closed-loop state feedback. CoRR","DOI":"10.1002\/adrr.202500072"},{"key":"296_CR10","doi-asserted-by":"crossref","unstructured":"Birr T, Pohl C, Younes A, et\u00a0al (2024) Autogpt+ p: affordance-based task planning with large language models. ArXiv preprint arXiv:2402.10778","DOI":"10.15607\/RSS.2024.XX.112"},{"issue":"1","key":"296_CR11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TCIAIG.2012.2186810","volume":"4","author":"CB Browne","year":"2012","unstructured":"Browne CB, Powley E, Whitehouse D et al (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intell AI Games 4(1):1\u201343","journal-title":"IEEE Trans Comput Intell AI Games"},{"key":"296_CR12","unstructured":"Cai T, Wang X, Ma T et\u00a0al (2024) Large language models as tool makers. In: ICLR"},{"key":"296_CR13","unstructured":"Chaslot GMJBC (2010) Monte-Carlo tree search. Doctoral Thesis"},{"key":"296_CR14","doi-asserted-by":"crossref","unstructured":"Chen L, Shang S (2019) Region-based message exploration over spatio-temporal data streams. In: Proceedings of the AAAI conference on artificial intelligence, pp 873\u2013880","DOI":"10.1609\/aaai.v33i01.3301873"},{"key":"296_CR15","doi-asserted-by":"crossref","unstructured":"Chen L, Shang S, Jensen CS, et\u00a0al (2019) Effective and efficient reuse of past travel behavior for route recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 488\u2013498","DOI":"10.1145\/3292500.3330835"},{"key":"296_CR16","unstructured":"Chen L, Lu K, Rajeswaran A et\u00a0al (2021) Decision transformer: reinforcement learning via sequence modeling. In: Ranzato M, Yann N,\u00a0Dauphin AB, Liang P et\u00a0al (eds) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, December 6\u201314, 2021, virtual, pp 15084\u201315097. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/7f489f642a0ddb10272b5c31057f0663-Abstract.html"},{"key":"296_CR17","unstructured":"Chen W, Wang X, Wang WY (2023) A dataset for answering time-sensitive questions. In: Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2)"},{"key":"296_CR18","doi-asserted-by":"crossref","unstructured":"Chen Y, Liu Z, Li J et al (2022) Intent contrastive learning for sequential recommendation. IN: Proceedings of the ACM web conference, pp 2172\u20132182","DOI":"10.1145\/3485447.3512090"},{"key":"296_CR19","unstructured":"Chen Z, Du W, Zhang W et\u00a0al (2023) T-eval: Evaluating the tool utilization capability step by step. ArXiv preprint arXiv:2312.14033"},{"key":"296_CR20","doi-asserted-by":"crossref","unstructured":"Chen Z, Zhou K, Zhang B et\u00a0al (2023) Chatcot: tool-augmented chain-of-thought reasoning on$$\\backslash$$chat-based large language models. ArXiv preprint arXiv:2305.14323","DOI":"10.18653\/v1\/2023.findings-emnlp.985"},{"key":"296_CR21","doi-asserted-by":"crossref","unstructured":"Chen Z, Zhang D, Feng S et\u00a0al (2024) Kgts: contrastive trajectory similarity learning over prompt knowledge graph embedding. In: Proceedings of the AAAI conference on artificial intelligence, pp 8311\u20138319","DOI":"10.1609\/aaai.v38i8.28672"},{"key":"296_CR22","first-page":"66","volume":"6","author":"M Cheong","year":"2024","unstructured":"Cheong M, Abedin E, Ferreira M et al (2024) Investigating gender and racial biases in dall-e mini images. ACM J Responsib Comput 6:66","journal-title":"ACM J Responsib Comput"},{"key":"296_CR23","unstructured":"Cobbe K, Kosaraju V, Bavarian M et\u00a0al (2021) Training verifiers to solve math word problems. ArXiv preprint arXiv:2110.14168"},{"key":"296_CR24","unstructured":"Dagan G, Keller F, Lascarides A (2023) Dynamic planning with a llm. ArXiv preprint arXiv:2308.06391"},{"key":"296_CR25","doi-asserted-by":"crossref","unstructured":"Deka B, Huang Z, Franzen C et\u00a0al (2017) Rico: A mobile app dataset for building data-driven design applications. In: Proceedings of the 30th annual ACM symposium on user interface software and technology, pp 845\u2013854","DOI":"10.1145\/3126594.3126651"},{"key":"296_CR26","doi-asserted-by":"crossref","unstructured":"Deng S, Xu W, Sun H et\u00a0al (2024) Mobile-bench: an evaluation benchmark for llm-based mobile agents. In: Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: long papers), pp 8813\u20138831","DOI":"10.18653\/v1\/2024.acl-long.478"},{"key":"296_CR27","unstructured":"Ding T (2024) Mobileagent: enhancing mobile control via human-machine interaction and sop integration. ArXiv preprint arXiv:2401.04124"},{"key":"296_CR28","first-page":"66","volume":"36","author":"C Du","year":"2024","unstructured":"Du C, Li Y, Qiu Z et al (2024) Stable diffusion is unstable. Adv Neural Inf Process Syst 36:66","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR29","unstructured":"Du Y, Wei F, Zhang H (2024) Anytool: Self-reflective, hierarchical agents for large-scale api calls. In: International conference on machine learning (PMLR), pp 11812\u201311829"},{"key":"296_CR30","unstructured":"Fan T, Kang Y, Ma G et\u00a0al (2023) Fate-llm: a industrial grade federated learning framework for large language models. arXiv preprint arXiv:2310.10049"},{"key":"296_CR31","unstructured":"Farn N, Shin R (2023) Tooltalk: evaluating tool-usage in a conversational setting. ArXiv preprint arXiv:2311.10775"},{"key":"296_CR32","unstructured":"Fu D, Huang J, Lu S et\u00a0al (2025) Preact: Prediction enhances agent\u2019s planning ability. In: Proceedings of the 31st international conference on computational linguistics, pp 1\u201316"},{"key":"296_CR33","doi-asserted-by":"crossref","unstructured":"Gao S, Shi Z, Zhu M et\u00a0al (2024) Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum. In: Proceedings of the AAAI conference on artificial intelligence, pp 18030\u201318038","DOI":"10.1609\/aaai.v38i16.29759"},{"key":"296_CR34","unstructured":"Gao S, Dwivedi-Yu J, Yu P et\u00a0al (2025) Efficient tool use with chain-of-abstraction reasoning. In: COLING"},{"key":"296_CR35","doi-asserted-by":"crossref","unstructured":"Gao Z, Du Y, Zhang X et\u00a0al (2024b) Clova: a closed-loop visual assistant with tool usage and update. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 13258\u201313268","DOI":"10.1109\/CVPR52733.2024.01259"},{"key":"296_CR36","first-page":"66","volume":"36","author":"Y Ge","year":"2024","unstructured":"Ge Y, Hua W, Mei K et al (2024) Openagi: when llm meets domain experts. Adv Neural Inf Process Syst 36:66","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR37","volume-title":"Tools, language and cognition in human evolution","author":"KR Gibson","year":"1993","unstructured":"Gibson KR, Gibson KR, Ingold T (1993) Tools, language and cognition in human evolution. Cambridge University Press, Cambridge"},{"key":"296_CR38","doi-asserted-by":"crossref","unstructured":"Girdhar R, El-Nouby A, Liu Z et\u00a0al (2023) Imagebind: One embedding space to bind them all. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 15180\u201315190","DOI":"10.1109\/CVPR52729.2023.01457"},{"key":"296_CR39","unstructured":"Gou Z, Shao Z, Gong Y et\u00a0al (2023) Tora: a tool-integrated reasoning agent for mathematical problem solving. CoRR"},{"key":"296_CR40","doi-asserted-by":"crossref","unstructured":"Grande V, Kiesler N, Francisco\u00a0RMA (2024) Student perspectives on using a large language model (llm) for an assignment on professional ethics. In: Proceedings of the 2024 on innovation and technology in computer science education, vol 1, pp 478\u2013484","DOI":"10.1145\/3649217.3653624"},{"key":"296_CR41","doi-asserted-by":"crossref","unstructured":"Gu Y, Shu Y, Yu H et\u00a0al (2024) Middleware for llms: tools are instrumental for language agents in complex environments. In: Proceedings of the 2024 conference on empirical methods in natural language processing, pp 7646\u20137663","DOI":"10.18653\/v1\/2024.emnlp-main.436"},{"key":"296_CR42","unstructured":"Gui A, Li J, Dai Y et\u00a0al (2024) Look before you leap: towards decision-aware and generalizable tool-usage for large language models. ArXiv preprint arXiv:2402.16696"},{"key":"296_CR43","first-page":"11143","volume":"2024","author":"Z Guo","year":"2024","unstructured":"Guo Z, Cheng S, Wang H et al (2024) Stabletoolbench: towards stable large-scale benchmarking on tool learning of large language models. Find Assoc Comput Linguist ACL 2024:11143\u201311156","journal-title":"Find Assoc Comput Linguist ACL"},{"key":"296_CR44","doi-asserted-by":"crossref","unstructured":"Han P, Yang P, Zhao P et\u00a0al (2019) Gcn-mf: disease-gene association identification by graph convolutional networks and matrix factorization. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 705\u2013713","DOI":"10.1145\/3292500.3330912"},{"key":"296_CR45","unstructured":"Han S, Zhang Q, Yao Y et\u00a0al (2024) Llm multi-agent systems: challenges and open problems. arXiv preprint arXiv:2402.03578"},{"key":"296_CR46","first-page":"45870","volume":"36","author":"S Hao","year":"2023","unstructured":"Hao S, Liu T, Wang Z et al (2023) Toolkengpt: augmenting frozen language models with massive tools via tool embeddings. Adv Neural Inf Process Syst 36:45870\u201345894","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR47","doi-asserted-by":"publisher","unstructured":"He J, Chen J, He X et\u00a0al (2016) Deep reinforcement learning with a natural language action space. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Berlin, Germany, pp 1621\u20131630. https:\/\/doi.org\/10.18653\/v1\/P16-1153, https:\/\/aclanthology.org\/P16-1153","DOI":"10.18653\/v1\/P16-1153"},{"key":"296_CR48","doi-asserted-by":"crossref","unstructured":"Hong W, Wang W, Lv Q et\u00a0al (2024) Cogagent: a visual language model for gui agents. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 14281\u201314290","DOI":"10.1109\/CVPR52733.2024.01354"},{"key":"296_CR49","doi-asserted-by":"crossref","unstructured":"Hori K, Suzuki K, Ogata T (2024) Interactively robot action planning with uncertainty analysis and active questioning by large language model. In: 2024 IEEE\/SICE international symposium on system integration (SII). IEEE, pp 85\u201391","DOI":"10.1109\/SII58957.2024.10417267"},{"key":"296_CR50","unstructured":"Hsieh CY, Chen SA, Li CL et\u00a0al (2023) Tool documentation enables zero-shot tool-usage with large language models. ArXiv preprint arXiv:2308.00675"},{"key":"296_CR51","unstructured":"Hu EJ, Shen Y, Wallis P et\u00a0al (2022) Lora: low-rank adaptation of large language models. In: The tenth international conference on learning representations, ICLR 2022, Virtual Event, April 25\u201329, 2022. OpenReview.net. https:\/\/openreview.net\/forum?id=nZeVKeeFYf9"},{"key":"296_CR52","first-page":"4363","volume":"2024","author":"S Huang","year":"2024","unstructured":"Huang S, Zhong W, Lu J et al (2024) Planning, creation, usage: benchmarking llms for comprehensive tool utilization in real-world complex scenarios. Find Assoc Comput Linguist ACL 2024:4363\u20134400","journal-title":"Find Assoc Comput Linguist ACL"},{"key":"296_CR53","unstructured":"Huang X, Liu W, Chen X et\u00a0al (2024) Understanding the planning of llm agents: a survey. CoRR"},{"key":"296_CR54","unstructured":"Huang Y, Shi J, Li Y et\u00a0al (2024) Metatool benchmark for large language models: deciding whether to use tools and which to use. In: ICLR"},{"key":"296_CR55","unstructured":"Imrie F, Rauba P, van\u00a0der Schaar M (2023) Redefining digital health interfaces with large language models. ArXiv preprint arXiv:2310.03560"},{"key":"296_CR56","doi-asserted-by":"crossref","unstructured":"Jain S, van Zuylen M, Hajishirzi H et\u00a0al (2020) Scirex: A challenge dataset for document-level information extraction. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7506\u20137516","DOI":"10.18653\/v1\/2020.acl-main.670"},{"key":"296_CR57","first-page":"66","volume":"6","author":"F Jiang","year":"2024","unstructured":"Jiang F, Peng Y, Dong L et al (2024) Large language model enhanced multi-agent systems for 6g communications. IEEE Wirel Commun 6:66","journal-title":"IEEE Wirel Commun"},{"key":"296_CR58","doi-asserted-by":"crossref","unstructured":"Jin Q, Yang Y, Chen Q et\u00a0al (2023) Genegpt: augmenting large language models with domain tools for improved access to biomedical information. arxiv","DOI":"10.1093\/bioinformatics\/btae075"},{"key":"296_CR59","doi-asserted-by":"crossref","unstructured":"Jin Q, Wang Z, Yang Y et\u00a0al (2024) Agentmd: empowering language agents for risk prediction with large-scale clinical tool learning. ArXiv preprint arXiv:2402.13225","DOI":"10.1038\/s41467-025-64430-x"},{"issue":"3","key":"296_CR60","doi-asserted-by":"publisher","first-page":"535","DOI":"10.1109\/TBDATA.2019.2921572","volume":"7","author":"J Johnson","year":"2019","unstructured":"Johnson J, Douze M, J\u00e9gou H (2019) Billion-scale similarity search with gpus. IEEE Trans Big Data 7(3):535\u2013547","journal-title":"IEEE Trans Big Data"},{"key":"296_CR61","doi-asserted-by":"crossref","unstructured":"Kapania S, Wang R, Li TJJ et\u00a0al (2024) \u201c i\u2019m categorizing llm as a productivity tool\u201d: examining ethics of llm use in hci research practices. arXiv preprint arXiv:2403.19876","DOI":"10.1145\/3711000"},{"key":"296_CR62","doi-asserted-by":"crossref","unstructured":"Kong Y, Ruan J, Chen Y et\u00a0al (2024) Tptu-v2: boosting task planning and tool usage of large language model-based agents in real-world industry systems. In: Proceedings of the 2024 conference on empirical methods in natural language processing: industry track, pp 371\u2013385","DOI":"10.18653\/v1\/2024.emnlp-industry.27"},{"key":"296_CR63","unstructured":"Kumar A, Singh S, Murty SV et\u00a0al (2024) The ethics of interaction: Mitigating security threats in llms. arXiv preprint arXiv:2401.12273"},{"key":"296_CR64","unstructured":"Kumar SS, Jain D, Agarwal E et\u00a0al (2024) Swissnyf: tool grounded llm agents for black box setting. ArXiv preprint arXiv:2402.10051"},{"key":"296_CR65","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1162\/tacl_a_00276","volume":"7","author":"T Kwiatkowski","year":"2019","unstructured":"Kwiatkowski T, Palomaki J, Redfield O et al (2019) Natural questions: a benchmark for question answering research. Trans Assoc Comput Linguist 7:452\u2013466. https:\/\/doi.org\/10.1162\/tacl_a_00276","journal-title":"Trans Assoc Comput Linguist"},{"key":"296_CR66","doi-asserted-by":"crossref","unstructured":"Li J, Ye D, Shang S (2019) Adversarial transfer for named entity boundary detection with pointer networks. In: IJCAI, pp 5053\u20135059","DOI":"10.24963\/ijcai.2019\/702"},{"key":"296_CR67","doi-asserted-by":"crossref","unstructured":"Li M, Zhao Y, Yu B et\u00a0al (2023) Api-bank: a comprehensive benchmark for tool-augmented llms. In: Proceedings of the 2023 conference on empirical methods in natural language processing, pp 3102\u20133116","DOI":"10.18653\/v1\/2023.emnlp-main.187"},{"key":"296_CR68","first-page":"66","volume":"36","author":"BY Lin","year":"2024","unstructured":"Lin BY, Fu Y, Yang K et al (2024) Swiftsage: a generative agent with fast and slow thinking for complex interactive tasks. Adv Neural Inf Process Syst 36:66","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR69","unstructured":"Liu B, Jiang Y, Zhang X et\u00a0al (2023a) Llm+ p: empowering large language models with optimal planning proficiency. ArXiv preprint arXiv:2304.11477"},{"key":"296_CR70","unstructured":"Liu O, Fu D, Yogatama D et\u00a0al (2024) Dellma: a framework for decision making under uncertainty with large language models. ArXiv preprint arXiv:2402.02392"},{"key":"296_CR71","unstructured":"Liu S, Biswal A, Cheng A et\u00a0al (2024) Optimizing llm queries in relational workloads. CoRR"},{"key":"296_CR72","doi-asserted-by":"crossref","unstructured":"Liu S, Cheng H, Liu H et\u00a0al (2024) Llava-plus: learning to use tools for creating multimodal agents. In: European conference on computer vision. Springer, Berlin, pp 126\u2013142","DOI":"10.1007\/978-3-031-72970-6_8"},{"key":"296_CR73","unstructured":"Liu X, Peng Z, Yi X et\u00a0al (2024) Toolnet: connecting large language models with massive tools via tool graph. ArXiv preprint arXiv:2403.00839"},{"key":"296_CR74","unstructured":"Liu Y, Deng G, Xu Z et\u00a0al (2023) Jailbreaking chatgpt via prompt engineering: an empirical study. arXiv preprint arXiv:2305.13860"},{"key":"296_CR75","unstructured":"Liu Y, Tang X, Cai Z, et\u00a0al (2023) Ml-bench: large language models leverage open-source libraries for machine learning tasks. ArXiv preprint arXiv:2311.09835"},{"key":"296_CR76","unstructured":"Liu Y, Peng X, Zhang Y et\u00a0al (2024) Tool-planner: dynamic solution tree planning for large language model with tool clustering. arXiv preprint arXiv:2406.03807"},{"key":"296_CR77","unstructured":"Liu Y, Yuan Y, Wang C et\u00a0al (2024) From summary to action: Enhancing large language models for complex tasks with open world apis. ArXiv preprint arXiv:2402.18157"},{"key":"296_CR78","doi-asserted-by":"crossref","unstructured":"Liu Z, Lai Z, Gao Z et\u00a0al (2024) Controlllm: augment language models with tools by searching on graphs. In: European conference on computer vision. Springer, Berlin, pp 89\u2013105","DOI":"10.1007\/978-3-031-73254-6_6"},{"key":"296_CR79","unstructured":"Long J (2023) Large language model guided tree-of-thought. ArXiv preprint arXiv:2305.08291"},{"key":"296_CR80","first-page":"66","volume":"36","author":"P Lu","year":"2024","unstructured":"Lu P, Peng B, Cheng H et al (2024) Chameleon: plug-and-play compositional reasoning with large language models. Adv Neural Inf Process Syst 36:66","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR81","doi-asserted-by":"crossref","unstructured":"Lykov A, Konenkov M, Gbagbe KF et\u00a0al (2024) Cognitiveos: large multimodal model based system to endow any type of robot with generative ai. ArXiv preprint arXiv:2401.16205","DOI":"10.1109\/ICRA55743.2025.11128224"},{"key":"296_CR82","unstructured":"Lyu B, Cong X, Yu H et\u00a0al (2023) Gitagent: Facilitating autonomous agent with github by tool extension. ArXiv preprint arXiv:2312.17294"},{"key":"296_CR83","first-page":"9097","volume":"2024","author":"X Ma","year":"2024","unstructured":"Ma X, Zhang Z, Zhao H (2024) Coco-agent: a comprehensive cognitive mllm agent for smartphone gui automation. Find Assoc Comput Linguist ACL 2024:9097\u20139110","journal-title":"Find Assoc Comput Linguist ACL"},{"key":"296_CR84","doi-asserted-by":"crossref","unstructured":"Ma Y, Gou Z, Hao J et\u00a0al (2024) Sciagent: Tool-augmented language models for scientific reasoning. In: Proceedings of the 2024 conference on empirical methods in natural language processing, pp 15701\u201315736","DOI":"10.18653\/v1\/2024.emnlp-main.880"},{"key":"296_CR85","first-page":"5026","volume":"2024","author":"D Mekala","year":"2024","unstructured":"Mekala D, Weston J, Lanchantin J et al (2024) Toolverifier: generalization to new tools via self-verification. Find Assoc Comput Linguist EMNLP 2024:5026\u20135041","journal-title":"Find Assoc Comput Linguist EMNLP"},{"issue":"1","key":"296_CR86","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1038\/s41746-023-00873-0","volume":"6","author":"B Mesk\u00f3","year":"2023","unstructured":"Mesk\u00f3 B, Topol EJ (2023) The imperative for regulatory oversight of large language models (or generative ai) in healthcare. NPJ Digit Med 6(1):120","journal-title":"NPJ Digit Med"},{"key":"296_CR87","doi-asserted-by":"publisher","unstructured":"Miao Sy, Liang CC, Su KY (2020) A diverse corpus for evaluating and developing English math word problem solvers. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for computational linguistics, Online, pp 975\u2013984. https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.92","DOI":"10.18653\/v1\/2020.acl-main.92"},{"issue":"3","key":"296_CR88","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1109\/TTS.2020.3019595","volume":"1","author":"K Michael","year":"2020","unstructured":"Michael K, Abbas R, Roussos G et al (2020) Ethics in ai and autonomous system applications design. IEEE Trans Technol Soc 1(3):114\u2013127","journal-title":"IEEE Trans Technol Soc"},{"key":"296_CR89","unstructured":"Miller FP, Vandome AF, McBrewster J (2009) Levenshtein distance: information theory, computer science, string (computer science), string metric, damerau? Levenshtein distance, spell checker, hamming distance"},{"issue":"4","key":"296_CR90","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1111\/exsy.12062","volume":"32","author":"C Moreira","year":"2015","unstructured":"Moreira C, Calado P, Martins B (2015) Learning to rank academic experts in the dblp dataset. Expert Syst 32(4):477\u2013493","journal-title":"Expert Syst"},{"key":"296_CR91","unstructured":"Nakano R, Hilton J, Balaji S et\u00a0al (2021) Webgpt: Browser-assisted question-answering with human feedback. ArXiv preprint arXiv:2112.09332"},{"key":"296_CR92","doi-asserted-by":"crossref","unstructured":"Nam D, Macvean A, Hellendoorn V et\u00a0al (2024) Using an llm to help with code understanding. In: 2024 IEEE\/ACM 46th international conference on software engineering (ICSE). IEEE Computer Society, p 881","DOI":"10.1145\/3597503.3639187"},{"key":"296_CR93","doi-asserted-by":"publisher","DOI":"10.1016\/j.eml.2024.102131","volume":"67","author":"B Ni","year":"2024","unstructured":"Ni B, Buehler MJ (2024) Mechagents: large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge. Extreme Mech Lett 67:102131","journal-title":"Extreme Mech Lett"},{"key":"296_CR94","doi-asserted-by":"crossref","unstructured":"Pan L, Albalak A, Wang X et\u00a0al (2023) Logic-lm: empowering large language models with symbolic solvers for faithful logical reasoning. In: Proceedings of the 2023 conference on empirical methods in natural language processing (EMNLP)","DOI":"10.18653\/v1\/2023.findings-emnlp.248"},{"key":"296_CR95","unstructured":"Parisi A, Zhao Y, Fiedel N (2022) Talm: tool augmented language models. ArXiv preprint arXiv:2205.12255"},{"key":"296_CR96","doi-asserted-by":"crossref","unstructured":"Park JS, O\u2019Brien J, Cai CJ et\u00a0al (2023) Generative agents: Interactive simulacra of human behavior. In: Proceedings of the 36th annual ACM symposium on user interface software and technology, pp 1\u201322","DOI":"10.1145\/3586183.3606763"},{"key":"296_CR97","doi-asserted-by":"publisher","unstructured":"Patel A, Bhattamishra S, Goyal N (2021) Are NLP models really able to solve simple math word problems? In: Proceedings of the 2021 conference of the north american chapter of the Association for Computational Linguistics: human language technologies. Association for Computational Linguistics, Online, pp 2080\u20132094. https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.168","DOI":"10.18653\/v1\/2021.naacl-main.168"},{"key":"296_CR98","first-page":"126544","volume":"37","author":"SG Patil","year":"2024","unstructured":"Patil SG, Zhang T, Wang X et al (2024) Gorilla: large language model connected with massive apis. Adv Neural Inf Process Syst 37:126544\u2013126565","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR99","unstructured":"Pedro R, Castro D, Carreira P et\u00a0al (2023) From prompt injections to sql injection attacks: How protected is your llm-integrated web application? ArXiv preprint arXiv:2308.01990"},{"key":"296_CR100","doi-asserted-by":"publisher","unstructured":"Puig X, Ra K, Boben M et\u00a0al (2018) Virtualhome: Simulating household activities via programs. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18\u201322, 2018. IEEE Computer Society, pp 8494\u20138502. https:\/\/doi.org\/10.1109\/CVPR.2018.00886","DOI":"10.1109\/CVPR.2018.00886"},{"key":"296_CR101","doi-asserted-by":"crossref","unstructured":"Qian C, Xiong C, Liu Z et\u00a0al (2024) Toolink: Linking toolkit creation and using through chain-of-solving on open-source model. In: Proceedings of the 2024 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies (volume 1: long papers), pp 831\u2013854","DOI":"10.18653\/v1\/2024.naacl-long.48"},{"issue":"4","key":"296_CR102","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3704435","volume":"57","author":"Y Qin","year":"2024","unstructured":"Qin Y, Hu S, Lin Y et al (2024) Tool learning with foundation models. ACM Comput Surv 57(4):1\u201340","journal-title":"ACM Comput Surv"},{"key":"296_CR103","unstructured":"Qin Y, Liang S, Ye Y et\u00a0al (2024) Toolllm: facilitating large language models to master 16000+ real-world apis. In: ICLR"},{"key":"296_CR104","first-page":"59708","volume":"36","author":"C Rawles","year":"2023","unstructured":"Rawles C, Li A, Rodriguez D et al (2023) Androidinthewild: a large-scale dataset for android device control. Adv Neural Inf Process Syst 36:59708\u201359728","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR105","doi-asserted-by":"crossref","unstructured":"Roy D, Zhang X, Bhave R et\u00a0al (2024) Exploring llm-based agents for root cause analysis. In: Companion proceedings of the 32nd ACM international conference on the foundations of software engineering, pp 208\u2013219","DOI":"10.1145\/3663529.3663841"},{"key":"296_CR106","unstructured":"Ruan J, Chen Y, Zhang B et\u00a0al (2023) Tptu: task planning and tool usage of large language model-based ai agents. In: NeurIPS 2023 foundation models for decision making workshop"},{"key":"296_CR107","unstructured":"Ruan Y, Dong H, Wang A et\u00a0al (2023) Identifying the risks of lm agents with an lm-emulated sandbox. In: NeurIPS 2023 foundation models for decision making workshop"},{"key":"296_CR108","first-page":"68539","volume":"36","author":"T Schick","year":"2023","unstructured":"Schick T, Dwivedi-Yu J, Dess\u00ec R et al (2023) Toolformer: language models can teach themselves to use tools. Adv Neural Inf Process Syst 36:68539\u201368551","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR109","first-page":"66","volume":"36","author":"T Schick","year":"2024","unstructured":"Schick T, Dwivedi-Yu J, Dess\u00ec R et al (2024) Toolformer: language models can teach themselves to use tools. Adv Neural Inf Process Syst 36:66","journal-title":"Adv Neural Inf Process Syst"},{"issue":"6","key":"296_CR110","doi-asserted-by":"publisher","first-page":"1194","DOI":"10.1109\/TKDE.2018.2854705","volume":"31","author":"S Shang","year":"2018","unstructured":"Shang S, Chen L, Zheng K et al (2018) Parallel trajectory-to-location join. IEEE Trans Knowl Data Eng 31(6):1194\u20131207","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"296_CR111","doi-asserted-by":"crossref","unstructured":"Shen W, Li C, Chen H et\u00a0al (2024) Small llms are weak tool learners: a multi-llm agent. In: Proceedings of the 2024 conference on empirical methods in natural language processing, pp 16658\u201316680","DOI":"10.18653\/v1\/2024.emnlp-main.929"},{"key":"296_CR112","first-page":"66","volume":"36","author":"Y Shen","year":"2024","unstructured":"Shen Y, Song K, Tan X et al (2024) Hugginggpt: solving ai tasks with chatgpt and its friends in hugging face. Adv Neural Inf Process Syst 36:66","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR113","doi-asserted-by":"crossref","unstructured":"Shi W, Xu R, Zhuang Y et\u00a0al (2024) Ehragent: code empowers large language models for complex tabular reasoning on electronic health records. ArXiv preprint arXiv:2401.07128","DOI":"10.18653\/v1\/2024.emnlp-main.1245"},{"key":"296_CR114","first-page":"10642","volume":"2024","author":"Z Shi","year":"2024","unstructured":"Shi Z, Gao S, Chen X et al (2024) Learning to use tools via cooperative and interactive agents. Find Assoc Comput Linguist EMNLP 2024:10642\u201310657","journal-title":"Find Assoc Comput Linguist EMNLP"},{"key":"296_CR115","unstructured":"Silver T, Hariprasad V, Shuttleworth RS et\u00a0al (2022) Pddl planning with pretrained large language models. In: NeurIPS 2022 foundation models for decision making workshop"},{"key":"296_CR116","doi-asserted-by":"crossref","unstructured":"Silver T, Dan S, Srinivas K et\u00a0al (2024) Generalized planning in pddl domains with pretrained large language models. In: Proceedings of the AAAI conference on artificial intelligence, pp 20256\u201320264","DOI":"10.1609\/aaai.v38i18.30006"},{"key":"296_CR117","unstructured":"Skrynnik A, Andreychuk A, Borzilov A et\u00a0al (2024) Pogema: A benchmark platform for cooperative multi-agent navigation. arXiv preprint arXiv:2407.14931"},{"key":"296_CR118","unstructured":"Song Y, Xiong W, Zhu D et\u00a0al (2023) Restgpt: connecting large language models with real-world restful apis. ArXiv preprint arXiv:2306.06624"},{"key":"296_CR119","unstructured":"Tang Q, Deng Z, Lin H et\u00a0al (2023) Toolalpaca: generalized tool learning for language models with 3000 simulated cases. ArXiv preprint arXiv:2306.05301"},{"key":"296_CR120","doi-asserted-by":"crossref","unstructured":"Theuma A, Shareghi E (2024) Equipping language models with tool use capability for tabular data analysis in finance. In: Proceedings of the 18th conference of the european chapter of the Association for Computational Linguistics (volume 2: short papers), pp 90\u2013103","DOI":"10.18653\/v1\/2024.eacl-short.10"},{"key":"296_CR121","doi-asserted-by":"crossref","unstructured":"Topsakal O, Akinci TC (2023) Creating large language model applications utilizing langchain: a primer on developing llm apps fast. In: International conference on applied engineering and natural sciences, pp 1050\u20131056","DOI":"10.59287\/icaens.1127"},{"key":"296_CR122","doi-asserted-by":"crossref","unstructured":"Toubal IE, Avinash A, Alldrin NG et\u00a0al (2024) Modeling collaborator: Enabling subjective vision classification with minimal human effort via llm tool-use. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 17553\u201317563","DOI":"10.1109\/CVPR52733.2024.01662"},{"key":"296_CR123","unstructured":"Toyama D, Hamel P, Gergely A et\u00a0al (2021) Androidenv: a reinforcement learning platform for android. ArXiv preprint arXiv:2105.13231"},{"key":"296_CR124","doi-asserted-by":"crossref","unstructured":"Wang B, Li G, Li Y (2023) Enabling conversational interaction with mobile ui using large language models. In: Proceedings of the 2023 CHI conference on human factors in computing systems, pp 1\u201317","DOI":"10.1145\/3544548.3580895"},{"key":"296_CR125","doi-asserted-by":"crossref","unstructured":"Wang B, Fang H, Eisner J et\u00a0al (2024) Llms in the imaginarium: tool learning through simulated trial and error. In: Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: long papers), pp 10583\u201310604","DOI":"10.18653\/v1\/2024.acl-long.570"},{"key":"296_CR126","unstructured":"Wang C, Luo W, Chen Q et\u00a0al (2024) Tool-lmm: a large multi-modal model for tool agent learning. ArXiv preprint arXiv:2401.10727"},{"key":"296_CR127","doi-asserted-by":"crossref","unstructured":"Wang H, Wang H, Wang L et\u00a0al (2023) Tpe: towards better compositional reasoning over conceptual tools with multi-persona collaboration. ArXiv preprint arXiv:2309.16090","DOI":"10.1007\/978-981-97-9434-8_22"},{"key":"296_CR128","doi-asserted-by":"crossref","unstructured":"Wang H, Feng S, Chen L et\u00a0al (2024) Simulating individual infection risk over big trajectory data. In: International conference on database systems for advanced applications. Springer, Berlin, pp 136\u2013151","DOI":"10.1007\/978-981-97-5552-3_9"},{"key":"296_CR129","first-page":"66","volume":"6","author":"J Wang","year":"2024","unstructured":"Wang J, Shi E, Hu H et al (2024) Large language models for robotics: opportunities, challenges, and perspectives. J Autom Intell 6:66","journal-title":"J Autom Intell"},{"key":"296_CR130","unstructured":"Wang J, Xu H, Ye J et\u00a0al (2024) Mobile-agent: autonomous multi-modal mobile device agent with visual perception. In: ICLR 2024 workshop on large language model (LLM) agents"},{"key":"296_CR131","doi-asserted-by":"crossref","unstructured":"Wang W, Shi J, Wang C et\u00a0al (2024) Learning to ask: when llms meet unclear instruction. arXiv:2409.00557","DOI":"10.18653\/v1\/2025.emnlp-main.1104"},{"key":"296_CR132","unstructured":"Wang X, Wei J, Schuurmans D et\u00a0al (2022) Self-consistency improves chain of thought reasoning in language models. ArXiv preprint arXiv:2203.11171"},{"key":"296_CR133","doi-asserted-by":"publisher","first-page":"107200","DOI":"10.1016\/j.neunet.2025.107200","volume":"66","author":"Y Wang","year":"2025","unstructured":"Wang Y, Wu Z, Yao J et al (2025) Tdag: a multi-agent framework based on dynamic task decomposition and agent generation. Neural Netw 66:107200","journal-title":"Neural Netw"},{"key":"296_CR134","first-page":"24824","volume":"35","author":"J Wei","year":"2022","unstructured":"Wei J, Wang X, Schuurmans D et al (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst 35:24824\u201324837","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR135","doi-asserted-by":"crossref","unstructured":"Wei Y, Su Y, Ma H et al (2023) Menatqa: a new dataset for testing the temporal comprehension and reasoning abilities of large language models. In: Findings of the Association for Computational Linguistics: EMNLP 2023, pp 1434\u20131447","DOI":"10.18653\/v1\/2023.findings-emnlp.100"},{"key":"296_CR136","doi-asserted-by":"crossref","unstructured":"Wu Q, Xu W, Liu W et\u00a0al (2024) Mobilevlm: a vision-language model for better intra-and inter-ui understanding. arXiv preprint arXiv:2409.14818","DOI":"10.18653\/v1\/2024.findings-emnlp.599"},{"key":"296_CR137","unstructured":"Xie J, Chen Z, Zhang R et\u00a0al (2024) Large multimodal agents: a survey. ArXiv preprint arXiv:2402.15116"},{"key":"296_CR138","unstructured":"Xu Q, Hong F, Li B et\u00a0al (2023) On the tool manipulation capability of open-sourced large language models. In: NeurIPS 2023 foundation models for decision making workshop"},{"key":"296_CR139","doi-asserted-by":"crossref","unstructured":"Yang C, Chen L, Shang S et\u00a0al (2019) Toward efficient navigation of massive-scale geo-textual streams. In: IJCAI, pp 4838\u20134845","DOI":"10.24963\/ijcai.2019\/672"},{"key":"296_CR140","unstructured":"Yang H, Yue S, He Y (2023) Auto-gpt for online decision making: benchmarks and additional opinions. ArXiv preprint arXiv:2306.02224"},{"key":"296_CR141","first-page":"66","volume":"36","author":"R Yang","year":"2024","unstructured":"Yang R, Song L, Li Y et al (2024) Gpt4tools: teaching large language model to use tools via self-instruction. Adv Neural Inf Process Syst 36:66","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR142","doi-asserted-by":"crossref","unstructured":"Yang Z, Qi P, Zhang S et\u00a0al (2018) HotpotQA: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, Belgium, pp 2369\u20132380. DOIurlhttps:\/\/doi.org\/10.18653\/v1\/D18-1259","DOI":"10.18653\/v1\/D18-1259"},{"key":"296_CR143","doi-asserted-by":"crossref","unstructured":"Yang Z, Ishay A, Lee J (2023) Coupling large language models with logic programming for robust and general reasoning from text. In: Findings of the Association for Computational Linguistics: ACL 2023, pp 5186\u20135219","DOI":"10.18653\/v1\/2023.findings-acl.321"},{"key":"296_CR144","unstructured":"Yang Z, Liu J, Han Y et\u00a0al (2023) Appagent: Multimodal agents as smartphone users. ArXiv preprint arXiv:2312.13771"},{"key":"296_CR145","first-page":"11809","volume":"36","author":"S Yao","year":"2023","unstructured":"Yao S, Yu D, Zhao J et al (2023) Tree of thoughts: deliberate problem solving with large language models. Adv Neural Inf Process Syst 36:11809\u201311822","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR146","unstructured":"Yao S, Zhao J, Yu D et\u00a0al (2023) React: synergizing reasoning and acting in language models. In: International conference on learning representations (ICLR)"},{"key":"296_CR147","doi-asserted-by":"crossref","unstructured":"Ye J, Li S, Li G et\u00a0al (2024) Toolsword: Unveiling safety issues of large language models in tool learning across three stages. In: Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: long papers), pp 2181\u20132211","DOI":"10.18653\/v1\/2024.acl-long.119"},{"key":"296_CR148","doi-asserted-by":"crossref","unstructured":"Ye J, Wu Y, Gao S et\u00a0al (2024) Rotbench: a multi-level benchmark for evaluating the robustness of large language models in tool learning. In: Proceedings of the 2024 conference on empirical methods in natural language processing, pp 313\u2013333","DOI":"10.18653\/v1\/2024.emnlp-main.19"},{"key":"296_CR149","unstructured":"Ye J, Li G, Gao S et\u00a0al (2025) Tooleyes: fine-grained evaluation for tool learning capabilities of large language models in real-world scenarios. In: Proceedings of the 31st international conference on computational linguistics, pp 156\u2013187"},{"key":"296_CR150","first-page":"66","volume":"36","author":"X Ye","year":"2024","unstructured":"Ye X, Chen Q, Dillig I et al (2024) Satlm: satisfiability-aided language models using declarative prompting. Adv Neural Inf Process Syst 36:66","journal-title":"Adv Neural Inf Process Syst"},{"key":"296_CR151","doi-asserted-by":"crossref","unstructured":"You K, Zhang H, Schoop E et\u00a0al (2024) Ferret-ui: Grounded mobile ui understanding with multimodal llms. In: European conference on computer vision. Springer, Berlin, pp 240\u2013255","DOI":"10.1007\/978-3-031-73039-9_14"},{"key":"296_CR152","unstructured":"Yuan S, Song K, Chen J et\u00a0al (2024) Easytool: enhancing llm-based agents with concise tool instruction. In: ICLR 2024 workshop on large language model (LLM) agents"},{"key":"296_CR153","doi-asserted-by":"crossref","unstructured":"Yuan T, He Z, Dong L et al (2024) R-judge: Benchmarking safety risk awareness for llm agents. In: Findings of the Association for Computational Linguistics: EMNLP 2024, pp 1467\u20131490","DOI":"10.18653\/v1\/2024.findings-emnlp.79"},{"key":"296_CR154","doi-asserted-by":"crossref","unstructured":"Zhan Q, Liang Z, Ying Z et al (2024) Injecagent: benchmarking indirect prompt injections in tool-integrated large language model agents. In: Findings of the Association for Computational Linguistics ACL 2024, pp 10471\u201310506","DOI":"10.18653\/v1\/2024.findings-acl.624"},{"key":"296_CR155","unstructured":"Zhang D, Chen L, Yu K (2023) Mobile-env: a universal platform for training and evaluation of mobile interaction. CoRR"},{"key":"296_CR156","doi-asserted-by":"crossref","unstructured":"Zhang D, Yu Y, Dong J et al (2024) Mm-llms: recent advances in multimodal large language models. In: Findings of the Association for Computational Linguistics ACL 2024, pp 12401\u201312430","DOI":"10.18653\/v1\/2024.findings-acl.738"},{"key":"296_CR157","unstructured":"Zhang J (2023) Graph-toolformer: to empower llms with graph reasoning ability via prompt augmented by chatgpt. ArXiv preprint arXiv:2304.11116"},{"key":"296_CR158","doi-asserted-by":"crossref","unstructured":"Zhang W, Zhao L, Xia H et\u00a0al (2024) Finagent: a multimodal foundation agent for financial trading: tool-augmented, diversified, and generalist. arXiv e-prints pp arXiv-2402","DOI":"10.1145\/3637528.3671801"},{"key":"296_CR159","doi-asserted-by":"crossref","unstructured":"Zhang Z, Zhang A (2024) You only look at screens: Multimodal chain-of-action agents. In: Findings of the Association for Computational Linguistics ACL 2024, pp 3132\u20133149","DOI":"10.18653\/v1\/2024.findings-acl.186"},{"key":"296_CR160","unstructured":"Zhao H, Liu Z, Wu Z et\u00a0al (2024) Revolutionizing finance with llms: an overview of applications and insights. arXiv preprint arXiv:2401.11641"},{"key":"296_CR161","doi-asserted-by":"crossref","unstructured":"Zhao L, Yang Y, Zhang K et\u00a0al (2024) Diffagent: fast and accurate text-to-image api selection with large language model. In: 2024 IEEE\/CVF conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 6390\u20136399","DOI":"10.1109\/CVPR52733.2024.00611"},{"issue":"2","key":"296_CR162","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s11427-023-2509-x","volume":"67","author":"Q Zhao","year":"2024","unstructured":"Zhao Q, Zhou X, Wu J et al (2024) Biotreasury: a community-based repository enabling indexing and rating of bioinformatics tools. Sci China Life Sci 67(2):221\u2013229","journal-title":"Sci China Life Sci"},{"key":"296_CR163","doi-asserted-by":"crossref","unstructured":"Zhao Y, Xia J, Liu G et\u00a0al (2019) Preference-aware task assignment in spatial crowdsourcing. In: Proceedings of the AAAI conference on artificial intelligence, pp 2629\u20132636","DOI":"10.1609\/aaai.v33i01.33012629"},{"key":"296_CR164","unstructured":"Zheng B, Gou B, Kil J et\u00a0al (2024) Gpt-4v (ision) is a generalist web agent, if grounded. In: International conference on machine learning, PMLR, pp 61349\u201361385"},{"key":"296_CR165","unstructured":"Zheng Y, Li P, Liu W et\u00a0al (2024) Toolrerank: adaptive and hierarchy-aware reranking for tool retrieval. In: Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), pp 16263\u201316273"},{"key":"296_CR166","doi-asserted-by":"crossref","unstructured":"Zheng Y, Li P, Yan M et al (2024) Budget-constrained tool learning with planning. In: Findings of the Association for Computational Linguistics ACL 2024, pp 9039\u20139052","DOI":"10.18653\/v1\/2024.findings-acl.536"},{"key":"296_CR167","unstructured":"Zhou X, Zhao X, Li G (2024) Llm-enhanced data management. ArXiv preprint arXiv:2402.02643"},{"key":"296_CR168","unstructured":"Zhu JY, Cano CG, Bermudez DV et\u00a0al (2024) Incoro: in-context learning for robotics control with feedback loops. ArXiv preprint arXiv:2402.05188"},{"key":"296_CR169","unstructured":"Zhuang Y, Yu Y, Wang K et\u00a0al (2023) Toolqa: a dataset for llm question answering with external tools. In: Proceedings of the 37th international conference on neural information processing systems, pp 50117\u201350143"},{"key":"296_CR170","unstructured":"Zhuang Y, Chen X, Yu T et\u00a0al (2024) Toolchain*: efficient action space navigation in large language models with a* search. In: The twelfth international conference on learning representations"},{"key":"296_CR171","unstructured":"Zou A, Wang Z, Carlini N et\u00a0al (2023) Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043"}],"container-title":["Data Science and Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41019-025-00296-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41019-025-00296-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41019-025-00296-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T08:43:21Z","timestamp":1765529001000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41019-025-00296-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,26]]},"references-count":171,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["296"],"URL":"https:\/\/doi.org\/10.1007\/s41019-025-00296-9","relation":{},"ISSN":["2364-1185","2364-1541"],"issn-type":[{"value":"2364-1185","type":"print"},{"value":"2364-1541","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,26]]},"assertion":[{"value":"9 October 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 April 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 April 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 June 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The corresponding author, Shuo Shang, is the associate editor of Data Science and Engineering and was not involved in the peer review of, and decision related to, this manuscript.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}