{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T05:50:58Z","timestamp":1747893058624,"version":"3.40.3"},"publisher-location":"Cham","reference-count":44,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031657931"},{"type":"electronic","value":"9783031657948"}],"license":[{"start":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T00:00:00Z","timestamp":1704067200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,15]],"date-time":"2024-08-15T00:00:00Z","timestamp":1723680000000},"content-version":"vor","delay-in-days":227,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Research Software code projects are typically described with a README files, which often contains the steps to set up, test and run the code contained in them. Installation instructions are written in a human-readable manner and therefore are difficult to interpret by intelligent assistants designed to help other researchers setting up a code repository. In this paper we explore this gap by assessing whether Large Language Models (LLMs) are able to extract installation instruction plans from README files. In particular, we define a methodology to extract alternate installation plans, an evaluation framework to assess the effectiveness of each result and an initial quantitative evaluation based on state of the art LLM models ( and ). Our results show that while LLMs are a promising approach for finding installation instructions, they present important limitations when these instructions are not sequential or mandatory.<\/jats:p>","DOI":"10.1007\/978-3-031-65794-8_8","type":"book-chapter","created":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T06:02:44Z","timestamp":1723615364000},"page":"114-133","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Automated Extraction of\u00a0Research Software Installation Instructions from\u00a0README Files: An Initial Analysis"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9994-1462","authenticated-orcid":false,"given":"Carlos","family":"Utrilla Guerrero","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9260-0753","authenticated-orcid":false,"given":"Oscar","family":"Corcho","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0454-7145","authenticated-orcid":false,"given":"Daniel","family":"Garijo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,8,15]]},"reference":[{"key":"8_CR1","unstructured":"Constructions Aeronautiques et al.: PDDL\u2014the planning domain definition language. Technical report (1998)"},{"key":"8_CR2","unstructured":"Microsoft Research AI4Science and Microsoft Azure Quantum. \u201cThe impact of large language models on scientific discovery: a preliminary study using GPT-4\u201d. arXiv:2311.07361 (2023)"},{"key":"8_CR3","doi-asserted-by":"crossref","unstructured":"Blagec, K., et al.: A global analysis of metrics used for measuring performance in natural language processing. arXiv:2204.11574 (2022)","DOI":"10.18653\/v1\/2022.nlppower-1.6"},{"key":"8_CR4","doi-asserted-by":"publisher","unstructured":"Boiko, D.A., et al.: Autonomous chemical research with large language models. Nature 624(7992), 570\u2013578 (2023). https:\/\/doi.org\/10.1038\/s41586-023-06792-0. https:\/\/www.nature.com\/articles\/s41586-023-06792-0. ISSN 1476-4687. Accessed 31 Dec 2023","DOI":"10.1038\/s41586-023-06792-0"},{"key":"8_CR5","doi-asserted-by":"publisher","unstructured":"Hong, N.P.C., et al.: FAIR Principles for Research Software (FAIR4RS Principles). Version 1.0 (2022). https:\/\/doi.org\/10.15497\/RDA00068","DOI":"10.15497\/RDA00068"},{"key":"8_CR6","doi-asserted-by":"publisher","unstructured":"Du, C., et al.: Softcite dataset: a dataset of software mentions in biomedical and economic research publications. J. Assoc. Inf. Sci. Technol. 72(7), 870\u2013884 (2021). https:\/\/doi.org\/10.1002\/asi.24454. ISSN 2330-1635","DOI":"10.1002\/asi.24454"},{"key":"8_CR7","unstructured":"Garijo, D., Gil, Y.: Augmenting PROV with plans in PPLAN: scientific processes as linked data. In: Second International Workshop on Linked Science: Tackling Big Data (LISC), Held in Conjunction with the International Semantic Web Conference (ISWC), Boston, MA (2012)"},{"key":"8_CR8","unstructured":"Hirsch, E., Uziel, G., Anaby-Tavor, A.: What\u2019s the plan? Evaluating and developing planning-aware techniques for LLMs. arXiv:2402.11489 (2024)"},{"key":"8_CR9","unstructured":"Hirsch, E., Uziel, G., Anaby-Tavor, A.: What\u2019s the plan? Evaluating and developing planning-aware techniques for LLMs (2024). arXiv:2402.11489 [cs]. Accessed 14 Mar 2024"},{"key":"8_CR10","unstructured":"Hou, X., et al.: Large language models for software engineering: a systematic literature review (2023). http:\/\/arxiv.org\/abs\/2308.10620. Accessed 05 Sept 2023"},{"key":"8_CR11","unstructured":"Huang, X., et al.: Understanding the planning of LLM agents: a survey. arXiv:2402.02716 (2024)"},{"key":"8_CR12","unstructured":"Jiang, A.Q., et al.: Mixtral of experts. arXiv:2401.04088 (2024)"},{"key":"8_CR13","doi-asserted-by":"crossref","unstructured":"Jin, Q., et al.: GeneGPT: augmenting large language models with domain tools for improved access to biomedical information (2023). arXiv:2304.09667 [cs, q- bio]. Accessed 14 Mar 2024","DOI":"10.1093\/bioinformatics\/btae075"},{"key":"8_CR14","unstructured":"Kambhampati, S., et al.: LLMs can\u2019t plan, but can help planning in LLM-modulo frameworks. arXiv:2402.01817 (2024)"},{"key":"8_CR15","doi-asserted-by":"publisher","unstructured":"Kelley, A., Garijo, D.: A framework for creating knowledge graphs of scientific software metadata. Quant. Sci. Stud. 1\u201337 (2021). https:\/\/doi.org\/10.1162\/qss_a_00167. ISSN 2641-3337","DOI":"10.1162\/qss_a_00167"},{"key":"8_CR16","unstructured":"Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, Barcelona, Spain, pp. 74\u201381. Association for Computational Linguistics (2004). https:\/\/www.aclweb.org\/anthology\/W04-1013"},{"key":"8_CR17","doi-asserted-by":"publisher","unstructured":"Mao, A., Garijo, D., Fakhraei, S.: SoMEF: a framework for capturing scientific software metadata from its documentation. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 3032\u20133037 (2019). https:\/\/doi.org\/10.1109\/BigData47090.2019.9006447","DOI":"10.1109\/BigData47090.2019.9006447"},{"key":"8_CR18","unstructured":"Miglani, S., Yorke-Smith, N.: NLtoPDDL: one-shot learning of PDDL models from natural language process manuals. In: ICAPS 2020 Workshop on Knowledge Engineering for Planning and Scheduling (KEPS 2020) (2020)"},{"key":"8_CR19","unstructured":"Mondorf, P., Plank, B.: Beyond accuracy: evaluating the reasoning behavior of large language models\u2013a survey. arXiv:2404.01869 (2024)"},{"key":"8_CR20","unstructured":"Olmo, A., Sreedharan, S., Kambhampati, S.: GPT3- to-plan: extracting plans from text using GPT-3 (2021). arXiv:2106.07131 [cs]. Accessed 17 Jan 2024"},{"key":"8_CR21","unstructured":"OpenAI. GPT-4 Technical Report (2023). arXiv:2303.08774 [cs]. Accessed 24 Sept 2023"},{"key":"8_CR22","unstructured":"Qin, Y., et al.: InFoBench: evaluating instruction following ability in large language models (2024). arXiv:2401.03601 [cs]. Accessed 16 Feb 2024"},{"key":"8_CR23","unstructured":"Qin, Y., et al.: ToolLLM: facilitating large language models to master 16000+ real-world APIs (2023). arXiv:2307.16789 [cs]. Accessed 16 Feb 2024"},{"key":"8_CR24","doi-asserted-by":"publisher","unstructured":"Rula, A., D\u2019Souza, J.: Procedural text mining with large language models. In: Proceedings of the 12th Knowledge Capture Conference 2023, K-CAP 2023, pp. 9\u201316. Association for Computing Machinery, New York (2023). https:\/\/doi.org\/10.1145\/3587259.3627572","DOI":"10.1145\/3587259.3627572"},{"key":"8_CR25","unstructured":"Schick, T., et al.: ToolFormer: language models can teach themselves to use tools (2023). arXiv:2302.04761 [cs]. Accessed 21 Sept 2023"},{"key":"8_CR26","unstructured":"Shen, Y., et al.: TaskBench: benchmarking large language models for task automation (2023). arXiv:2311.18760 [cs]. Accessed 14 Mar 2024"},{"key":"8_CR27","doi-asserted-by":"crossref","unstructured":"Shridhar, M., et al.: ALFRED: a benchmark for interpreting grounded instructions for everyday tasks (2020). arXiv:1912.01734 [cs]. Accessed 16 Jan 2024","DOI":"10.1109\/CVPR42600.2020.01075"},{"key":"8_CR28","unstructured":"Silver, T., et al.: PDDL planning with pretrained large language models. In: NeurIPS 2022 Foundation Models for Decision Making Workshop (2022)"},{"key":"8_CR29","unstructured":"Stechly, K., Valmeekam, K., Kambhampati, S.: On the self-verification limitations of large language models on reasoning and planning tasks. arXiv:2402.08115 (2024)"},{"key":"8_CR30","doi-asserted-by":"publisher","unstructured":"Tenorth, M., Nyga, D., Beetz, M.: Understanding and executing instructions for everyday manipulation tasks from the World Wide Web. In: 2010 IEEE International Conference on Robotics and Automation (ICRA 2010), Anchorage, AK, pp. 1486\u20131491. IEEE (2010). https:\/\/doi.org\/10.1109\/ROBOT.2010.5509955. ISBN 978-1-4244-5038-1. Accessed 02 Feb 2024","DOI":"10.1109\/ROBOT.2010.5509955"},{"key":"8_CR31","unstructured":"Touvron, H., et al.: Llama 2: open foundation and fine-tuned chat models. arXiv:2307.09288 (2023)"},{"key":"8_CR32","doi-asserted-by":"publisher","unstructured":"Tsay, J., et al.: AIMMX: artificial intelligence model metadata extractor. In: Proceedings of the 17th International Conference on Mining Software Repositories, MSR 2020, pp. 81\u201392. Association for Computing Machinery, New York (2020). https:\/\/doi.org\/10.1145\/3379597.3387448. ISBN 978-1-4503-7517-7. Accessed 20 Sept 2023","DOI":"10.1145\/3379597.3387448"},{"key":"8_CR33","unstructured":"Valmeekam, K., et al.: Large language models still can\u2019t plan (a benchmark for LLMs on planning and reasoning about change)"},{"key":"8_CR34","unstructured":"Valmeekam, K., et al.: On the planning abilities of large language models-a critical investigation. In: Advances in Neural Information Processing Systems, vol. 36 (2024)"},{"key":"8_CR35","unstructured":"Valmeekam, K., et al.: PlanBench: an extensible benchmark for evaluating large language models on planning and reasoning about change (2023). arXiv:2206.10498 [cs]. Accessed 18 Jan 2024"},{"key":"8_CR36","unstructured":"Valmeekam, K., et al.: PlanBench: an extensible benchmark for evaluating large language models on planning and reasoning about change. In: Advances in Neural Information Processing Systems, vol. 36 (2024)"},{"key":"8_CR37","unstructured":"Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)"},{"key":"8_CR38","doi-asserted-by":"publisher","unstructured":"Wang, H., et al.: Scientific discovery in the age of artificial intelligence. Nature 620(7972), 47\u201360 (2023). https:\/\/doi.org\/10.1038\/s41586-023-06221-2. ISSN 1476-4687. Accessed 07 Sept 2023","DOI":"10.1038\/s41586-023-06221-2"},{"key":"8_CR39","doi-asserted-by":"crossref","unstructured":"Wang, J., et al.: Software testing with large language models: survey, landscape, and vision. IEEE Trans. Softw. Eng. (2024)","DOI":"10.1109\/TSE.2024.3368208"},{"key":"8_CR40","doi-asserted-by":"crossref","unstructured":"Wang, L., et al.: Plan-and-solve prompting: improving zero-shot chain-of- thought reasoning by large language models (2023). arXiv:2305.04091 [cs]. Accessed 14 Mar 2024","DOI":"10.18653\/v1\/2023.acl-long.147"},{"key":"8_CR41","unstructured":"Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems, vol. 35, pp. 24824\u201324837 (2022)"},{"key":"8_CR42","unstructured":"Xie, D., et al.: Impact of large language models on generating software specifications (2023). arXiv:2306.03324 [cs]. Accessed 11 Sept 2023"},{"key":"8_CR43","unstructured":"Yuan, S., et al.: EASYTOOL: enhancing LLM-based agents with concise tool instruction (2024). arXiv:2401.06201 [cs]. Accessed 06 Feb 2024"},{"key":"8_CR44","doi-asserted-by":"publisher","unstructured":"Carlos, Z.: Carlosug\/READMEtoP-PLAN: READMEtoP-PLAN First Release (2024). https:\/\/doi.org\/10.5281\/zenodo.10991890","DOI":"10.5281\/zenodo.10991890"}],"container-title":["Lecture Notes in Computer Science","Natural Scientific Language Processing and Research Knowledge Graphs"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-65794-8_8","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T06:04:14Z","timestamp":1723615454000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-65794-8_8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"ISBN":["9783031657931","9783031657948"],"references-count":44,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-65794-8_8","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2024]]},"assertion":[{"value":"15 August 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"NSLP","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Workshop on Natural Scientific Language Processing and Research Knowledge Graphs","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Hersonissos, Crete","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Greece","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 May 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 May 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"1","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"nslp2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/nfdi4ds.github.io\/nslp2024\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}