{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T12:57:48Z","timestamp":1770382668044,"version":"3.49.0"},"reference-count":28,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2024,12,7]],"date-time":"2024-12-07T00:00:00Z","timestamp":1733529600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2023YFC2605703"],"award-info":[{"award-number":["2023YFC2605703"]}]},{"name":"National Key Research and Development Program of China","award":["2023YFC2206402"],"award-info":[{"award-number":["2023YFC2206402"]}]},{"name":"Cybersecurity Joint Defense System for Large Scientific Facilities Construction Project of Chinese Academy of Sciences","award":["2023YFC2605703"],"award-info":[{"award-number":["2023YFC2605703"]}]},{"name":"Cybersecurity Joint Defense System for Large Scientific Facilities Construction Project of Chinese Academy of Sciences","award":["2023YFC2206402"],"award-info":[{"award-number":["2023YFC2206402"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>The acquisition of cybersecurity threat intelligence is a critical task in the implementation of effective security defense strategies. Recently, advancements in large language model (LLM) technology have led to remarkable capabilities in natural language processing and understanding. In this paper, we introduce an LLM-based approach for open-source intelligence (OSINT) acquisition. This approach autonomously obtains OSINT based on user requirements, eliminating the need for manual scanning or querying, thus saving significant time and effort. To further address the knowledge limitations and timeliness challenges inherent in LLMs when handling threat intelligence, we propose a framework that integrates chain-of-thought techniques to assist LLMs in utilizing tools to acquire OSINT. Based on this framework, we have developed a threat intelligence acquisition agent capable of decomposing logical reasoning problems into multiple steps and gradually solving them using appropriate tools, along with a toolkit for the agent to dynamically access during the problem-solving process. To validate the effectiveness of our approach, we have designed four evaluation metrics to assess the agent\u2019s performance and constructed a test set. Experimental results indicate that the agent achieves high accuracy rates in OSINT acquisition tasks, with a substantial improvement noted over its baseline large language model counterpart in specific intelligence acquisition scenarios.<\/jats:p>","DOI":"10.3390\/fi16120461","type":"journal-article","created":{"date-parts":[[2024,12,9]],"date-time":"2024-12-09T10:11:47Z","timestamp":1733739107000},"page":"461","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Empowering LLMs with Toolkits: An Open-Source Intelligence Acquisition Method"],"prefix":"10.3390","volume":"16","author":[{"given":"Xinyang","family":"Yuan","sequence":"first","affiliation":[{"name":"Computing Center, Institute of High Energy Physics, Chinese Academy of Sciences, 19B Yuquan Road, Beijing 100049, China"},{"name":"School of Nuclear Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5164-5023","authenticated-orcid":false,"given":"Jiarong","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Nuclear Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Haozhi","family":"Zhao\u00a0","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Tian","family":"Yan","sequence":"additional","affiliation":[{"name":"Computing Center, Institute of High Energy Physics, Chinese Academy of Sciences, 19B Yuquan Road, Beijing 100049, China"}]},{"given":"Fazhi","family":"Qi","sequence":"additional","affiliation":[{"name":"Computing Center, Institute of High Energy Physics, Chinese Academy of Sciences, 19B Yuquan Road, Beijing 100049, China"},{"name":"China Spallation Neutron Source Science Center, Dongguan 523803, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,12,7]]},"reference":[{"key":"ref_1","unstructured":"Cui, L., Yang, L., He, Q., Wang, M., and Ma, J. (2022). Survey of Cyber Threat Intelligence Mining Based on Open Source Information Platform. J. Cyber Secur., 7."},{"key":"ref_2","first-page":"131","article-title":"A Framework for Proactive Acquisition of Threat Intelligence Based on Darknet","volume":"6","author":"Huang","year":"2020","journal-title":"J. Inf. Secur. Res."},{"key":"ref_3","first-page":"23","article-title":"A Multi-Source Cybersecurity Threat Intelligence Collection and Packaging Technology","volume":"10","author":"Xu","year":"2018","journal-title":"Netw. Secur. Technol. Appl."},{"key":"ref_4","unstructured":"Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., and Dong, Z. (2023). A survey of large language models. arXiv."},{"key":"ref_5","unstructured":"DDOSI (2024, July 11). Spiderfoot Automation for OSINT. Available online: https:\/\/www.ddosi.org\/spiderfoot\/."},{"key":"ref_6","unstructured":"Lanmaster53 (2024, August 01). Recon-ng: Open Source Intelligence Gathering Tool. Available online: https:\/\/gitpiper.com\/resources\/pentest\/opensourcesintelligenceosint\/lanmaster53-recon-ng."},{"key":"ref_7","unstructured":"Chiasmod0n (2024, August 01). Chiasmodon: OSINT Tool for Gathering Domain-Related Information. Available online: https:\/\/github.com\/chiasmod0n\/chiasmodon-mobile."},{"key":"ref_8","unstructured":"Sindiramutty, S.R. (2023). Autonomous Threat Hunting: A Future Paradigm for AI-Driven Threat Intelligence. arXiv."},{"key":"ref_9","unstructured":"Puzis, R., Zilberman, P., and Elovici, Y. (2020). ATHAFI: Agile threat hunting and forensic investigation. arXiv."},{"key":"ref_10","unstructured":"Guan, W. (2023). Research on Extracting Threat Intelligence Information Based on Pre-Trained Language Models. [Master\u2019s Thesis, Jiangsu University of Science and Technology]."},{"key":"ref_11","unstructured":"Khare, A., Dutta, S., Li, Z., Solko-Breslin, A., Alur, R., and Naik, M. (2023). Understanding the effectiveness of large language models in detecting security vulnerabilities. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/s10922-024-09831-x","article-title":"Benchmarking Large Language Models for Log Analysis, Security, and Interpretation","volume":"32","author":"Karlsen","year":"2024","journal-title":"J. Netw. Syst. Manag."},{"key":"ref_13","unstructured":"Fayyazi, R., and Yang, S.J. (2023). On the uses of large language models to interpret ambiguous cyberattack descriptions. arXiv."},{"key":"ref_14","unstructured":"Yuan, S., Song, K., Chen, J., Tan, X., Shen, Y., Kan, R., Li, D., and Yang, D. (2024). Easytool: Enhancing llm-based agents with concise tool instruction. arXiv."},{"key":"ref_15","unstructured":"Bouzenia, I., Devanbu, P., and Pradel, M. (2024). Repairagent: An autonomous, llm-based agent for program repair. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zeng, A., Liu, M., Lu, R., Wang, B., Liu, X., Dong, Y., and Tang, J. (2023). Agenttuning: Enabling generalized agent abilities for llms. arXiv.","DOI":"10.18653\/v1\/2024.findings-acl.181"},{"key":"ref_17","first-page":"50117","article-title":"ToolQA: A Dataset for LLM Question Answering with External Tools","volume":"Volume 36","author":"Zhuang","year":"2023","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3580","DOI":"10.1109\/TKDE.2024.3352100","article-title":"Unifying Large Language Models and Knowledge Graphs: A Roadmap","volume":"36","author":"Pan","year":"2024","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_19","unstructured":"Giarelis, N., Mastrokostas, C., and Karacapilidis, N. (2024, January 20). A Unified LLM-KG Framework to Assist Fact-Checking in Public Deliberation. Proceedings of the First Workshop on Language-Driven Deliberation Technology (DELITE) @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia. Available online: https:\/\/aclanthology.org\/2024.delite-1.2."},{"key":"ref_20","unstructured":"Wang, C., Long, Q., Xiao, M., Cai, X., Wu, C., Meng, Z., Wang, X., and Zhou, Y. (2024). BioRAG: A RAG-LLM Framework for Biological Question Reasoning. arXiv."},{"key":"ref_21","unstructured":"G\u00fcnther, M., Ong, J., Mohr, I., Abdessalem, A., Abel, T., Akram, M.K., Guzman, S., Mastrapas, G., Sturua, S., and Wang, B. (2023). Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents. arXiv."},{"key":"ref_22","unstructured":"Prasad, A., and Chandra, S. (2024, November 04). PhiUSIIL Phishing URL. In UCI Machine Learning Repository. Available online: https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0167404823004558?via%3Dihub."},{"key":"ref_23","unstructured":"Xu, L., Zhang, X., and Dong, Q. (2020). CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model. arXiv."},{"key":"ref_24","unstructured":"(2023, July 10). OpenAI. ChatGPT-3.5. Available online: https:\/\/chatgpt.com\/g\/g-F00faAwkE-open-a-i-gpt-3-5."},{"key":"ref_25","unstructured":"Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., and Huang, F. (2023). Qwen Technical Report. arXiv."},{"key":"ref_26","unstructured":"Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding. arXiv."},{"key":"ref_27","unstructured":"Huang, Y., Bai, Y., Zhu, Z., Zhang, J., Zhang, J., Su, T., Liu, J., Lv, C., Zhang, Y., and Lei, J. (2023). C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models. arXiv."},{"key":"ref_28","unstructured":"Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., and Nakano, R. (2021). Training Verifiers to Solve Math Word Problems. arXiv."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/16\/12\/461\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:49:20Z","timestamp":1760114960000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/16\/12\/461"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,7]]},"references-count":28,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["fi16120461"],"URL":"https:\/\/doi.org\/10.3390\/fi16120461","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,7]]}}}