{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T16:42:04Z","timestamp":1778776924487,"version":"3.51.4"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T00:00:00Z","timestamp":1751328000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,11]],"date-time":"2025-07-11T00:00:00Z","timestamp":1752192000000},"content-version":"vor","delay-in-days":10,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002911","name":"Universidad Complutense de Madrid","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100002911","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Scientometrics"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Bibliographic catalogues store millions of data. The use of computer techniques such as web-scraping allows the extraction of data in an efficient and accurate manner. The recent emergence of ChatGPT is facilitating the development of suitable prompts that allow the configuration of scraping to identify and extract information from databases. The aim of this article is to define how to efficiently use prompts engineering to elaborate a suitable data entry model, able to generate in a single interaction with ChatGPT-4o, a fully functional web-scraper, programmed in PHP language, adapted to the case of bibliographic catalogues. As a demonstration example, the bibliographic catalogue of the National Library of Spain with a dataset of thousands of records is used. The findings present an effective model for developing web-scraping programs, assisted with AI and with the minimum possible interaction. The results obtained with the model indicate that the use of prompts with large language models (LLM) can improve the quality of scraping by understanding specific contexts and patterns, adapting to different formats and styles of presentation of bibliographic information.<\/jats:p>","DOI":"10.1007\/s11192-025-05372-5","type":"journal-article","created":{"date-parts":[[2025,7,11]],"date-time":"2025-07-11T19:23:04Z","timestamp":1752261784000},"page":"3433-3453","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Prompt engineering for bibliographic web-scraping"],"prefix":"10.1007","volume":"130","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4108-7531","authenticated-orcid":false,"given":"Manuel","family":"Bl\u00e1zquez-Ochando","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1730-8621","authenticated-orcid":false,"given":"Juan Jos\u00e9","family":"Prieto-Guti\u00e9rrez","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6149-4724","authenticated-orcid":false,"given":"Mar\u00eda Antonia","family":"Ovalle-Perandones","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,11]]},"reference":[{"key":"5372_CR1","unstructured":"Atlas, S. (2023). ChatGPT for higher education and professional development: A guide to conversational AI. https:\/\/digitalcommons.uri.edu\/cba_facpubs\/548"},{"key":"5372_CR2","doi-asserted-by":"publisher","first-page":"1877","DOI":"10.5555\/3495724.3495883","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877\u20131901. https:\/\/doi.org\/10.5555\/3495724.3495883","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"3","key":"5372_CR3","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1017\/S1351324922000213","volume":"29","author":"CP Chai","year":"2023","unstructured":"Chai, C. P. (2023). Comparison of text preprocessing methods. Natural Language Engineering, 29(3), 509\u2013553. https:\/\/doi.org\/10.1017\/S1351324922000213","journal-title":"Natural Language Engineering"},{"key":"5372_CR4","unstructured":"Chen, S., Wong, S., Chen, L., & Tian, Y. (2023b). Extending context window of large language models via positional interpolation. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2306.15595"},{"key":"5372_CR5","unstructured":"Chen, B., Zhang, Z., Langren\u00e9, N., & Zhu, S. (2023a). Unleashing the potential of prompt engineering in large language models: a comprehensive review. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2310.14735"},{"key":"5372_CR6","unstructured":"Dong, Z., Li, J., Men, X., Zhao, W.X., Wang, B., Tian, Z., & Wen, J. R. (2024). Exploring context window of large language models via decomposed positional vectors. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2405.18009"},{"key":"5372_CR7","unstructured":"Duan, H., Yang, Y., & Tam, K. Y. (2024). Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2402.09733"},{"issue":"2","key":"5372_CR8","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1080\/19322909.2012.677296","volume":"6","author":"MW Dula","year":"2012","unstructured":"Dula, M. W., & Ye, G. (2012). Case study: Pepperdine University libraries\u2019 migration to OCLC\u2019s WorldShare. Journal of Web Librarianship, 6(2), 125\u2013132. https:\/\/doi.org\/10.1080\/19322909.2012.677296","journal-title":"Journal of Web Librarianship"},{"issue":"5","key":"5372_CR9","doi-asserted-by":"publisher","first-page":"195","DOI":"10.3390\/fi17050195","volume":"17","author":"TM Fahrudin","year":"2025","unstructured":"Fahrudin, T. M., Funabiki, N., Brata, K. C., Naing, I., Aung, S. T., Muhaimin, A., & Prasetya, D. A. (2025). An improved reference paper collection system using web scraping with three enhancements. Future Internet, 17(5), 195. https:\/\/doi.org\/10.3390\/fi17050195","journal-title":"Future Internet"},{"key":"5372_CR10","unstructured":"Gao, J., Zhao, H., Yu, C., & Xu, R. (2023). Exploring the feasibility of chatgpt for event extraction. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2303.03836"},{"issue":"12","key":"5372_CR11","doi-asserted-by":"publisher","first-page":"2629","DOI":"10.1007\/s10439-023-03272-4","volume":"51","author":"L Giray","year":"2023","unstructured":"Giray, L. (2023). Prompt engineering with ChatGPT: A guide for academic writers. Annals of Biomedical Engineering, 51(12), 2629\u20132633. https:\/\/doi.org\/10.1007\/s10439-023-03272-4","journal-title":"Annals of Biomedical Engineering"},{"key":"5372_CR12","doi-asserted-by":"publisher","DOI":"10.1145\/36057643623985","author":"K Greshake","year":"2023","unstructured":"Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you\u2019ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. Proceedings of the ACM Workshop on Artificial Intelligence and Security. https:\/\/doi.org\/10.1145\/36057643623985","journal-title":"Proceedings of the ACM Workshop on Artificial Intelligence and Security"},{"issue":"4","key":"5372_CR13","doi-asserted-by":"publisher","first-page":"37","DOI":"10.3390\/asi2040037","volume":"2","author":"HE-D Hassanien","year":"2019","unstructured":"Hassanien, H.E.-D. (2019). Web scraping scientific repositories for augmented relevant literature search using CRISP-DM. Applied System Innovation, 2(4), 37. https:\/\/doi.org\/10.3390\/asi2040037","journal-title":"Applied System Innovation"},{"key":"5372_CR14","unstructured":"Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., & Liu, T. (2023). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2311.05232"},{"key":"5372_CR15","volume-title":"Neural information processing. ICONIP 2023. Communications in computer and information science","author":"F Huang","year":"2024","unstructured":"Huang, F., et al. (2024). A Three-Stage Framework for Event-Event Relation Extraction with Large Language Model. In B. Luo, L. Cheng, Z. G. Wu, H. Li, & C. Li (Eds.), Neural information processing. ICONIP 2023. Communications in computer and information science.  (Vol. 1968). Singapore: Springer."},{"key":"5372_CR16","doi-asserted-by":"publisher","DOI":"10.1145\/3660788","author":"R Khojah","year":"2024","unstructured":"Khojah, R., Mohamad, M., Leitner, P., & Neto, F. G. D. O. (2024). Beyond code generation: An observational study of ChatGPT usage in software engineering practice. Proceedings of the ACM on Software Engineering. https:\/\/doi.org\/10.1145\/3660788","journal-title":"Proceedings of the ACM on Software Engineering"},{"key":"5372_CR17","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2205.11916","author":"T Kojima","year":"2022","unstructured":"Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems. https:\/\/doi.org\/10.48550\/arXiv.2205.11916","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5372_CR18","unstructured":"Kong, A., Zhao, S., Chen, H., Li, Q., Qin, Y., Sun, R., & Zhou, X. (2023). Better zero-shot reasoning with role-play prompting. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2308.07702"},{"key":"5372_CR19","doi-asserted-by":"publisher","unstructured":"Krotov, V., Johnson, L., & Silva, L. (2020). Tutorial: Legality and ethics of web scraping. https:\/\/doi.org\/10.17705\/1CAIS.04724","DOI":"10.17705\/1CAIS.04724"},{"key":"5372_CR20","doi-asserted-by":"publisher","DOI":"10.3145\/infonomy.24.042","author":"P L\u00e1zaro-Rodr\u00edguez","year":"2024","unstructured":"L\u00e1zaro-Rodr\u00edguez, P. (2024). PyDataBibPub: script en Python para automatizar la descarga de datos de bibliotecas p\u00fablicas de Espa\u00f1a desarrollado con ChatGPT 3.5. Infonomy. https:\/\/doi.org\/10.3145\/infonomy.24.042","journal-title":"Infonomy"},{"key":"5372_CR21","unstructured":"Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., & Liu, Y. (2023). Jailbreaking chatgpt via prompt engineering: An empirical study. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2305.13860"},{"key":"5372_CR22","unstructured":"National Library of Spain (2021). National Library of Spain Report 2021. https:\/\/www.bne.es\/sites\/default\/files\/repositorio-archivos\/memoria_BNE_2021_0.pdf"},{"key":"5372_CR23","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.4622517","author":"A Nguyen Duc","year":"2023","unstructured":"Nguyen Duc, A., Cabrero-Daniel, B., Przybylek, A., Arora, C., Khanna, D., Herda, T., & Rafiq, U. (2023). Generative artificial intelligence for software engineering\u2014A research agenda. SSRN. https:\/\/doi.org\/10.2139\/ssrn.4622517","journal-title":"SSRN"},{"key":"5372_CR24","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2107.02794","author":"M Nye","year":"2021","unstructured":"Nye, M., Tessler, M., Tenenbaum, J., & Lake, B. M. (2021). Improving coherence and consistency in neural sequence models with dual-system, neuro-symbolic reasoning. Advances in Neural Information Processing Systems. https:\/\/doi.org\/10.48550\/arXiv.2107.02794","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5372_CR25","doi-asserted-by":"publisher","DOI":"10.1101\/2023.01.21.525030","author":"M Pividori","year":"2023","unstructured":"Pividori, M., & Greene, C. S. (2023). A publishing infrastructure for AI-assisted academic authoring. BioRxiv. https:\/\/doi.org\/10.1101\/2023.01.21.525030","journal-title":"BioRxiv"},{"issue":"6","key":"5372_CR26","doi-asserted-by":"publisher","first-page":"103510","DOI":"10.1016\/j.ipm.2023.103510","volume":"60","author":"S Qi","year":"2023","unstructured":"Qi, S., Cao, Z., Rao, J., Wang, L., Xiao, J., & Wang, X. (2023). What is the limitation of multimodal LLMs? A deeper look into multimodal LLMs through prompt probing. Information Processing & Management, 60(6), 103510. https:\/\/doi.org\/10.1016\/j.ipm.2023.103510","journal-title":"Information Processing & Management"},{"key":"5372_CR27","doi-asserted-by":"publisher","unstructured":"Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1\u20137). https:\/\/doi.org\/10.1145\/3411763.3451760","DOI":"10.1145\/3411763.3451760"},{"issue":"3","key":"5372_CR28","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1016\/j.joi.2017.07.003","volume":"11","author":"N Robinson-Garcia","year":"2017","unstructured":"Robinson-Garcia, N., Mongeon, P., Jeng, W., & Costas, R. (2017). DataCite as a novel bibliometric source: Coverage, strengths and limitations. Journal of Informetrics, 11(3), 841\u2013854. https:\/\/doi.org\/10.1016\/j.joi.2017.07.003","journal-title":"Journal of Informetrics"},{"key":"5372_CR29","unstructured":"Sahoo, P., Singh, A.K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2402.07927"},{"key":"5372_CR30","doi-asserted-by":"publisher","DOI":"10.14569\/IJACSA.2024.0150442","author":"N Ul Huda","year":"2024","unstructured":"Ul Huda, N., Sahito, S. F., Gilal, A. R., Abro, A., Alshanqiti, A., Alsughayyir, A., & Palli, A. S. (2024). Impact of contradicting subtle emotion cues on large language models with various prompting techniques. International Journal of Advanced Computer Science & Applications. https:\/\/doi.org\/10.14569\/IJACSA.2024.0150442","journal-title":"International Journal of Advanced Computer Science & Applications"},{"key":"5372_CR31","doi-asserted-by":"crossref","unstructured":"Vaillant, T.S., de Almeida, F.D., Neto, P.A., Gao, C., Bosch, J., & de Almeida, E.S. (2024). Developers' perceptions on the impact of ChatGPT in software development: A survey. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2405.12195","DOI":"10.2139\/ssrn.5023364"},{"key":"5372_CR32","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1706.03762","author":"A Vaswani","year":"2017","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems. https:\/\/doi.org\/10.48550\/arXiv.1706.03762","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5372_CR33","unstructured":"Verma, S., Tran, K., Ali, Y., & Min, G. (2023). Reducing llm hallucinations using epistemic neural networks. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2312.15576"},{"key":"5372_CR34","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2201.11903","author":"J Wei","year":"2022","unstructured":"Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems. https:\/\/doi.org\/10.48550\/arXiv.2201.11903","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5372_CR35","doi-asserted-by":"crossref","unstructured":"Xia, C.S., & Zhang, L. (2023). Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2304.00385","DOI":"10.1145\/3650212.3680323"},{"key":"5372_CR36","doi-asserted-by":"publisher","unstructured":"Yang, Z., Chen, S., Gao, C., Li, Z., Li, G., & Lv, R. (2023). Deep learning based code generation methods: A literature review. https:\/\/doi.org\/10.48550\/arXiv.2303.01056","DOI":"10.48550\/arXiv.2303.01056"},{"key":"5372_CR37","unstructured":"Ye, Q., Axmed, M., Pryzant, R., & Khani, F. (2023). Prompt engineering a prompt engineer. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2311.05661"},{"key":"5372_CR38","unstructured":"Yehuda, Y., Malkiel, I., Barkan, O., Weill, J., Ronen, R., & Koenigstein, N. (2024). In Search of Truth: An Interrogation Approach to Hallucination Detection. Preprint retrieved from https:\/\/arxiv.org\/abs\/quant-ph\/2403.02889"},{"key":"5372_CR39","doi-asserted-by":"crossref","unstructured":"Zhao, Z., Song, S., Duah, B., Macbeth, J., Carter, S., Van, M. P., et al. (2023, June). More human than human: LLM-generated narratives outperform human-LLM interleaved narratives. In Proceedings of the 15th Conference on Creativity and Cognition (pp. 368\u2013370)","DOI":"10.1145\/3591196.3596612"},{"issue":"5","key":"5372_CR40","doi-asserted-by":"publisher","first-page":"103462","DOI":"10.1016\/j.ipm.2023.103462","volume":"60","author":"X Zhu","year":"2023","unstructured":"Zhu, X., Kuang, Z., & Zhang, L. (2023). A prompt model with combined semantic refinement for aspect sentiment analysis. Information Processing & Management, 60(5), 103462. https:\/\/doi.org\/10.1016\/j.ipm.2023.103462","journal-title":"Information Processing & Management"}],"container-title":["Scientometrics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11192-025-05372-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11192-025-05372-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11192-025-05372-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T07:58:17Z","timestamp":1755935897000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11192-025-05372-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":40,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["5372"],"URL":"https:\/\/doi.org\/10.1007\/s11192-025-05372-5","relation":{},"ISSN":["0138-9130","1588-2861"],"issn-type":[{"value":"0138-9130","type":"print"},{"value":"1588-2861","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7]]},"assertion":[{"value":"11 July 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 June 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}