{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T07:07:39Z","timestamp":1778742459488,"version":"3.51.4"},"reference-count":114,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T00:00:00Z","timestamp":1778716800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T00:00:00Z","timestamp":1778716800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100009042","name":"Universidad de Sevilla","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100009042","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Computing"],"published-print":{"date-parts":[[2026,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The integration of Large Language Models (LLMs) with web scraping and crawling techniques is transforming automated web data extraction by enabling semantic understanding and adaptability. This Systematic Literature Review (SLR) synthesizes evidence regarding this integration, focusing on tools, models, challenges, evaluation methods, trends, and applications. Following PRISMA guidelines, we conducted a rigorous search across Scopus, Web of Science, ACM, and IEEE databases (2021\u20132025). From 976 screened records, 91 high-quality studies (53 conference papers and 38 journal articles) were selected after duplicate removal, screening, and AI-powered quality assessment. The field has experienced explosive growth, with 84% of publications appearing in 2024\u20132025 alone (36 in 2024, 40 in 2025). Key tools include Scrapy, BeautifulSoup, and Selenium, with emerging LLM-augmented tools like Scrapeghost, Crawl4AI, and ScrapeGraphAI. While transformer-based models dominate (86 of 91 papers), the landscape is diversifying: the BERT family appears in 23 studies, the GPT family in 34, and other LLMs (Llama, Mistral, Claude, Gemini) in 44. Major challenges involve HTML complexity, computational costs, token limits, data biases, and legal risks. Evaluation relies on hybrid frameworks combining task-specific metrics (F1, BLEU, RAGAS), human validation, and operational efficiency measures. Applications span Cybersecurity, Healthcare, Education, E-commerce, Media, Technology, and Finance\/Legal, with high thematic specialization. A notable trend is the shift toward efficient Small Language Models (SLMs) for resource-constrained, domain-specific tasks. The findings suggest that LLMs are enabling a decisive transition from rule-based to semantic, agentic approaches in web extraction. Challenges in robustness and efficiency persist, but trends point toward intelligent, domain-specialized, and ethically aware systems. Future work should explore SLM implementation, hybrid pipelines, and standardized evaluation benchmarks.<\/jats:p>","DOI":"10.1007\/s00607-026-01666-5","type":"journal-article","created":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T06:11:12Z","timestamp":1778739072000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["LLMs applied to web scraping and web crawling: a systematic review"],"prefix":"10.1007","volume":"108","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2914-8696","authenticated-orcid":false,"given":"Pablo","family":"Landeta-L\u00f3pez","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0303-2740","authenticated-orcid":false,"given":"Jos\u00e9 Mar\u00eda","family":"Garc\u00eda","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2470-8287","authenticated-orcid":false,"given":"Cathy","family":"Guevara-Vega","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9827-1834","authenticated-orcid":false,"given":"Antonio","family":"Ruiz-Cort\u00e9s","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,5,14]]},"reference":[{"key":"1666_CR1","doi-asserted-by":"publisher","unstructured":"Acharya A, Singh B, Onoe N (eds) (2023) Llm based generation of item-description for recommendation system. Proceedings of the 17th ACM conference on recommender systems, RecSys \u201923, pp 1204\u2014-1207. https:\/\/doi.org\/10.1145\/3604915.3610647","DOI":"10.1145\/3604915.3610647"},{"key":"1666_CR2","doi-asserted-by":"publisher","unstructured":"Adhikari M, Joshi P, Ramos G\u00a0V, Doulat A\u00a0A, Shaik S (2025) Aide: Leveraging retrieval-augmented generation for context-aware educational data retrieval and dialogue. https:\/\/doi.org\/10.1109\/SmartNets65254.2025.11106900","DOI":"10.1109\/SmartNets65254.2025.11106900"},{"key":"1666_CR3","doi-asserted-by":"publisher","unstructured":"Adnan K, Akbar R (2019) An analytical study of information extraction from unstructured and multidimensional big data. J Big Data 6. https:\/\/doi.org\/10.1186\/s40537-019-0254-8","DOI":"10.1186\/s40537-019-0254-8"},{"key":"1666_CR4","doi-asserted-by":"publisher","first-page":"2251","DOI":"10.13053\/CyS-28-4-5292","volume":"28","author":"T Alc\u00e1ntara","year":"2024","unstructured":"Alc\u00e1ntara T, Garc\u00eda-V\u00e1zquez O, Hernandez M, Calvo H, Desiderio A (2024) Lyricscraper: a dataset of spanish song lyrics created via web scraping and dual-labeling for llm classification. Computacion y Sistemas 28:2251\u20132260. https:\/\/doi.org\/10.13053\/CyS-28-4-5292","journal-title":"Computacion y Sistemas"},{"key":"1666_CR5","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1080\/10580530801941058","volume":"25","author":"H Baars","year":"2008","unstructured":"Baars H, Kemper H-G (2008) Management support with structured and unstructured data - an integrated business intelligence framework. Inf Syst Manag 25:132\u2013148. https:\/\/doi.org\/10.1080\/10580530801941058","journal-title":"Inf Syst Manag"},{"key":"1666_CR6","doi-asserted-by":"publisher","unstructured":"Balasubramanian P, Seby J, Kostakos P (2023) Semantic-driven focused crawling using laser and faiss: a novel approach for threat detection and improved information retrieval, pp 1598 \u2013 1605. https:\/\/doi.org\/10.1109\/TrustCom60117.2023.00218","DOI":"10.1109\/TrustCom60117.2023.00218"},{"key":"1666_CR7","doi-asserted-by":"publisher","unstructured":"Bhatt C et\u00a0al (eds) (2023) Web scraping techniques and its applications: a review.. https:\/\/doi.org\/10.1109\/CISCT57197.2023.10351298","DOI":"10.1109\/CISCT57197.2023.10351298"},{"key":"1666_CR8","doi-asserted-by":"publisher","unstructured":"Bin Zaman\u00a0Chowdhury M\u00a0T, Islam M\u00a0R, Hossain M (2024) Durghotona gpt: a web scraping and large language model based framework to generate road accident dataset automatically in bangladesh, pp 50 \u2013 55. https:\/\/doi.org\/10.1109\/ICCIT64611.2024.11021969","DOI":"10.1109\/ICCIT64611.2024.11021969"},{"key":"1666_CR9","doi-asserted-by":"publisher","first-page":"3433","DOI":"10.1007\/s11192-025-05372-5","volume":"130","author":"M Bl\u00e1zquez-Ochando","year":"2025","unstructured":"Bl\u00e1zquez-Ochando M, Prieto-Guti\u00e9rrez JJ, Ovalle-Perandones MA (2025) Prompt engineering for bibliographic web-scraping. Scientometrics 130:3433\u20133453. https:\/\/doi.org\/10.1007\/s11192-025-05372-5","journal-title":"Scientometrics"},{"key":"1666_CR10","doi-asserted-by":"publisher","unstructured":"Brach W, Petrik M, Ko\u0161t\u2019\u00e1l K, Ries M (2025) Ghosts in the markup: Techniques to fight large language model-powered web scrapers, pp 37 \u2013 46. https:\/\/doi.org\/10.23919\/FRUCT65909.2025.11008269","DOI":"10.23919\/FRUCT65909.2025.11008269"},{"key":"1666_CR11","doi-asserted-by":"publisher","unstructured":"Cavero F\u00a0J, Alonso J\u00a0C, Ruiz-Cort\u00e9s A (eds) (2026) From static to intelligent: evolving saas pricing with llms. Service-oriented computing - ICSOC 2024 workshops, pp 136\u2013147. https:\/\/doi.org\/10.1007\/978-981-96-7423-7_12","DOI":"10.1007\/978-981-96-7423-7_12"},{"key":"1666_CR12","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1016\/j.neucom.2020.07.073","volume":"418","author":"L Chai","year":"2020","unstructured":"Chai L, Xu H, Luo Z, Li S (2020) A multi-source heterogeneous data analytic method for future price fluctuation prediction. Neurocomputing 418:11\u201320. https:\/\/doi.org\/10.1016\/j.neucom.2020.07.073","journal-title":"Neurocomputing"},{"key":"1666_CR13","doi-asserted-by":"publisher","unstructured":"Chen F-K, Liu C-H, You SD (2025) Using large language model to fill in web forms to support automated web application testing. Information (Switzerland) 16. https:\/\/doi.org\/10.3390\/info16020102","DOI":"10.3390\/info16020102"},{"key":"1666_CR14","doi-asserted-by":"publisher","unstructured":"Chung S, Kim J, Baik J, Chi S, Kim DY (2024) Identifying issues in international construction projects from news text using pre-trained models and clustering. Autom Constr 168. https:\/\/doi.org\/10.1016\/j.autcon.2024.105875","DOI":"10.1016\/j.autcon.2024.105875"},{"key":"1666_CR15","doi-asserted-by":"publisher","unstructured":"Corradin A, Ciccarese F, Raimondi V, Urso L, Silic-Benussi M (2025) Paperparser: a bioinformatic tool that synthesizes scientific literature through advanced ai techniques, enhancing scholarly insights while mimicking human-like expression. https:\/\/doi.org\/10.11159\/icbb25.161","DOI":"10.11159\/icbb25.161"},{"key":"1666_CR16","doi-asserted-by":"crossref","unstructured":"Cui J, Zha M, Wang X, Liao X (2025) The odyssey of robots.txt governance: Measuring convention implications of web bots in large language model services, pp 21 \u2013 35","DOI":"10.1145\/3719027.3765063"},{"key":"1666_CR17","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1109\/MIE.2024.3382962","volume":"19","author":"D de Silva","year":"2025","unstructured":"de Silva D et al (2025) Opportunities and challenges of generative artificial intelligence: research, education, industry engagement, and social impact. IEEE Ind Electron Mag 19:30\u201345. https:\/\/doi.org\/10.1109\/MIE.2024.3382962","journal-title":"IEEE Ind Electron Mag"},{"key":"1666_CR18","doi-asserted-by":"publisher","unstructured":"De\u00a0Silva S, Chang M (2025) Chatbotllm - training educational chatbots on the materials uploaded by teachers, pp 39 \u2013 43. https:\/\/doi.org\/10.1109\/ICIET66371.2025.11046253","DOI":"10.1109\/ICIET66371.2025.11046253"},{"key":"1666_CR19","unstructured":"Dobriy D (2024) Employing rag to create a conference knowledge graph from text, vol. 3747, pp 18"},{"key":"1666_CR20","doi-asserted-by":"publisher","unstructured":"Dodge J et\u00a0al (2021) Documenting large webtext corpora: a case study on the colossal clean crawled corpus, 1286 \u2013 1305. https:\/\/doi.org\/10.18653\/v1\/2021.emnlp-main.98","DOI":"10.18653\/v1\/2021.emnlp-main.98"},{"key":"1666_CR21","doi-asserted-by":"publisher","unstructured":"El\u00a0Ouadi A, Knowlton W, Pimentel A, Beskow D (2025) Operationalizing common crawl news: Ai-enabled data pipeline for large-scale news analysis, 1\u20133. https:\/\/doi.org\/10.1109\/SysCon64521.2025.11014869","DOI":"10.1109\/SysCon64521.2025.11014869"},{"key":"1666_CR22","doi-asserted-by":"publisher","first-page":"7.1","DOI":"10.4230\/OASIcs.SLATE.2025.7","volume":"135","author":"DS Eleuterio","year":"2025","unstructured":"Eleuterio DS, Oliveira PF, Matos P, Alves JMA (2025) A chatbot to help promoting financial literacy. OpenAccess Series in Informatics 135:7.1-7.9. https:\/\/doi.org\/10.4230\/OASIcs.SLATE.2025.7","journal-title":"OpenAccess Series in Informatics"},{"key":"1666_CR23","doi-asserted-by":"publisher","unstructured":"Fahrudin TM et al (2025) An improved reference paper collection system using web scraping with three enhancements. Future Internet 17. https:\/\/doi.org\/10.3390\/fi17050195","DOI":"10.3390\/fi17050195"},{"key":"1666_CR24","doi-asserted-by":"publisher","unstructured":"Ferrari E et al (2024) Search engine for open geospatial consortium web services improving discoverability through natural language processing-based processing and ranking. ISPRS Int J Geo Inf 13. https:\/\/doi.org\/10.3390\/ijgi13040128","DOI":"10.3390\/ijgi13040128"},{"key":"1666_CR25","doi-asserted-by":"publisher","unstructured":"Fijacko N et al (2024) Using generative artificial intelligence in bibliometric analysis: 10 years of research trends from the european resuscitation congresses. Resusc Plus 18. https:\/\/doi.org\/10.1016\/j.resplu.2024.100584","DOI":"10.1016\/j.resplu.2024.100584"},{"key":"1666_CR26","doi-asserted-by":"publisher","unstructured":"Gadiraju S\u00a0S, Liao D, Kudupudi A, Kasula S, Chalasani C (2024) Infotech assistant: A multimodal conversational agent for infotechnology web portal queries, pp 3264 \u2013 3272. https:\/\/doi.org\/10.1109\/BigData62323.2024.10825668","DOI":"10.1109\/BigData62323.2024.10825668"},{"key":"1666_CR27","doi-asserted-by":"publisher","unstructured":"Garad T, Waghmare O, Powar S, Bhatsangave S\u00a0P, Naikwade M (2025) Job-candidate suitability score using llms a linkedin profile analysis system, pp 1\u20136. https:\/\/doi.org\/10.1109\/ICCUBEA65967.2025.11284175","DOI":"10.1109\/ICCUBEA65967.2025.11284175"},{"key":"1666_CR28","doi-asserted-by":"publisher","unstructured":"George A et\u00a0al (2025) A personalized ai assistant for analyzing and authoring product reviews using groq\/llama3-70b. https:\/\/doi.org\/10.1109\/ICCTDC64446.2025.11158767","DOI":"10.1109\/ICCTDC64446.2025.11158767"},{"key":"1666_CR29","doi-asserted-by":"publisher","unstructured":"Guillen\u00a0Hernandez R\u00a0E, Armando Canizales\u00a0Turcios R, Erazo\u00a0Alfaro M\u00a0E (2024) Generative ia-based tool for efficient processing of human rights violations-related news. https:\/\/doi.org\/10.1109\/CONCAPAN63470.2024.10933836","DOI":"10.1109\/CONCAPAN63470.2024.10933836"},{"key":"1666_CR30","doi-asserted-by":"publisher","first-page":"1041","DOI":"10.3171\/2023.7.JNS23573","volume":"140","author":"E Guo","year":"2024","unstructured":"Guo E et al (2024) neurogpt-x: toward a clinic-ready large language model. J Neurosurg 140:1041\u20131053. https:\/\/doi.org\/10.3171\/2023.7.JNS23573","journal-title":"J Neurosurg"},{"key":"1666_CR31","doi-asserted-by":"publisher","unstructured":"Guo K, Diefenbach D, Gourru A, Gravier C (2023) Wikidata as a seed for web extraction, pp 2402 \u2013 2411. https:\/\/doi.org\/10.1145\/3543507.3583236","DOI":"10.1145\/3543507.3583236"},{"key":"1666_CR32","doi-asserted-by":"publisher","unstructured":"Gupta D et\u00a0al (2024) Enhanced research on legal data extraction and document analysis tool, pp 756 \u2013 763. https:\/\/doi.org\/10.1109\/I-SMAC61858.2024.10714867","DOI":"10.1109\/I-SMAC61858.2024.10714867"},{"key":"1666_CR33","doi-asserted-by":"publisher","unstructured":"Guti\u00e9rrez-Fandi\u00f1o A et al (2023) escorpius-m: a massive multilingual crawling corpus with a focus on spanish. Appl Sci (Switzerland) 13. https:\/\/doi.org\/10.3390\/app132212155","DOI":"10.3390\/app132212155"},{"key":"1666_CR34","doi-asserted-by":"publisher","first-page":"e1230","DOI":"10.1002\/cl2.1230","volume":"18","author":"NR Haddaway","year":"2022","unstructured":"Haddaway NR, Page MJ, Pritchard CC, McGuinness LA (2022) Prisma 2020: an r package and shiny app for producing prisma 2020-compliant flow diagrams, with interactivity for optimised digital transparency and open synthesis. Campbell Syst Rev 18:e1230. https:\/\/doi.org\/10.1002\/cl2.1230","journal-title":"Campbell Syst Rev"},{"key":"1666_CR35","doi-asserted-by":"publisher","first-page":"11921","DOI":"10.1007\/s11042-019-08373-8","volume":"79","author":"M Hashemi","year":"2020","unstructured":"Hashemi M (2020) Web page classification: a survey of perspectives, gaps, and future directions. Multim Tools Appl 79:11921\u201311945. https:\/\/doi.org\/10.1007\/s11042-019-08373-8","journal-title":"Multim Tools Appl"},{"key":"1666_CR36","doi-asserted-by":"publisher","unstructured":"H\u00e4tty A et\u00a0al (2024) A cost-efficient modular sieve for extracting product information from company websites, pp 1444 \u2013 1456. https:\/\/doi.org\/10.18653\/v1\/2024.emnlp-industry.106","DOI":"10.18653\/v1\/2024.emnlp-industry.106"},{"key":"1666_CR37","doi-asserted-by":"publisher","first-page":"1577","DOI":"10.1007\/s11280-018-0602-1","volume":"22","author":"I Hern\u00e1ndez","year":"2019","unstructured":"Hern\u00e1ndez I, Rivero CR, Ruiz D (2019) Deep web crawling: a survey. World Wide Web 22:1577\u20131610. https:\/\/doi.org\/10.1007\/s11280-018-0602-1","journal-title":"World Wide Web"},{"key":"1666_CR38","doi-asserted-by":"publisher","unstructured":"Huang W et\u00a0al (2024) Autoscraper: a progressive understanding web agent for web scraper generation, pp 2371 \u2013 2389. https:\/\/doi.org\/10.18653\/v1\/2024.emnlp-main.141","DOI":"10.18653\/v1\/2024.emnlp-main.141"},{"key":"1666_CR39","doi-asserted-by":"publisher","unstructured":"Huang J, Song J (2025) Automatic xpath generation agents for vertical websites by llms. Journal of King Saud University - Computer and Information Sciences 37. https:\/\/doi.org\/10.1007\/s44443-025-00071-w","DOI":"10.1007\/s44443-025-00071-w"},{"key":"1666_CR40","doi-asserted-by":"publisher","unstructured":"Huang Z, Tang J, Karir M, Liu M, Sarabi A (2024) Analyzing corporate privacy policies using ai chatbots, pp 505 \u2013 515. https:\/\/doi.org\/10.1145\/3646547.3689015","DOI":"10.1145\/3646547.3689015"},{"key":"1666_CR41","doi-asserted-by":"publisher","unstructured":"Ilyankou I, Wang M, Cavazzi S, Haworth J (2024) Cc-gpx: extracting high-quality annotated geospatial data from common crawl, pp 693 \u2013 696. https:\/\/doi.org\/10.1145\/3678717.3691215","DOI":"10.1145\/3678717.3691215"},{"key":"1666_CR42","doi-asserted-by":"publisher","unstructured":"Isaac\u00a0Ritharson P, Sujitha\u00a0Juliet D, Anitha J, Immanuel Alex\u00a0Pandian S (2023) Multi-document summarization made easy: an abstractive query-focused system using web scraping and transformer models, pp 1\u20136. https:\/\/doi.org\/10.1109\/CONIT59222.2023.10205946","DOI":"10.1109\/CONIT59222.2023.10205946"},{"key":"1666_CR43","doi-asserted-by":"publisher","unstructured":"Jamil H, Noorosmawie AB, Rabu HW, Razak LA (2025) From archives to ai: residential property data across three decades in brunei darussalam. Data Brief 60. https:\/\/doi.org\/10.1016\/j.dib.2025.111505","DOI":"10.1016\/j.dib.2025.111505"},{"key":"1666_CR44","doi-asserted-by":"publisher","DOI":"10.1080\/12265934.2025.2547792","author":"J Jiao","year":"2025","unstructured":"Jiao J, Chang A (2025) Evaluating sentiment and spatial patterns of ev charging station user experience with ai-agents. Int J Urban Sci. https:\/\/doi.org\/10.1080\/12265934.2025.2547792","journal-title":"Int J Urban Sci"},{"key":"1666_CR45","doi-asserted-by":"publisher","unstructured":"Jin B, Sun Q, Chen L (2025) Enhancing supply chain transparency in emerging economies using online contents and llms, pp 487 \u2013 492. https:\/\/doi.org\/10.1109\/ICOIN63865.2025.10993099","DOI":"10.1109\/ICOIN63865.2025.10993099"},{"key":"1666_CR46","doi-asserted-by":"publisher","unstructured":"Kaur A (2025) Automating xpath query generation using nlp for streamlined web crawling and gui testing 1:1\u20136. https:\/\/doi.org\/10.1109\/ICTEST64710.2025.11042798","DOI":"10.1109\/ICTEST64710.2025.11042798"},{"key":"1666_CR47","doi-asserted-by":"publisher","unstructured":"Khan FQ et al (2021) Smart algorithmic based web crawling and scraping with template autoupdate capabilities. Concurr Computat Pract Exper 33. https:\/\/doi.org\/10.1002\/cpe.6042","DOI":"10.1002\/cpe.6042"},{"key":"1666_CR48","doi-asserted-by":"publisher","first-page":"144","DOI":"10.15849\/ijasca.211128.11","volume":"13","author":"MA Khder","year":"2021","unstructured":"Khder MA (2021) Web scraping or web crawling: state of art, techniques, approaches and application. Int J Adv Soft Comput Appl 13:144\u2013168. https:\/\/doi.org\/10.15849\/ijasca.211128.11","journal-title":"Int J Adv Soft Comput Appl"},{"key":"1666_CR49","doi-asserted-by":"publisher","unstructured":"Kim M, Kim D, Park Y, Jeong D (2024) Development of an expert chatbot for digital forensics using rag model implementation, pp 182\u2013187. https:\/\/doi.org\/10.1109\/PlatCon63925.2024.10830748","DOI":"10.1109\/PlatCon63925.2024.10830748"},{"key":"1666_CR50","unstructured":"Kitchenham B (2004) Procedures for performing systematic reviews. Keele, UK, Keele Univ 33"},{"key":"1666_CR51","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1016\/j.infsof.2008.09.009","volume":"51","author":"B Kitchenham","year":"2009","unstructured":"Kitchenham B et al (2009) Systematic literature reviews in software engineering - a systematic literature review. Inf Softw Technol 51:7\u201315. https:\/\/doi.org\/10.1016\/j.infsof.2008.09.009","journal-title":"Inf Softw Technol"},{"key":"1666_CR52","doi-asserted-by":"publisher","first-page":"154381","DOI":"10.1109\/ACCESS.2024.3483905","volume":"12","author":"T Koide","year":"2024","unstructured":"Koide T, Nakano H, Chiba D (2024) Chatphishdetector: detecting phishing sites using large language models. IEEE Access 12:154381\u2013154400. https:\/\/doi.org\/10.1109\/ACCESS.2024.3483905","journal-title":"IEEE Access"},{"key":"1666_CR53","doi-asserted-by":"publisher","unstructured":"Koloveas P, Chantzios T, Alevizopoulou S, Skiadopoulos S, Tryfonopoulos C (2021) Intime: a machine learning-based framework for gathering and leveraging web data to cyber-threat intelligence. Electronics (Switzerland) 10. https:\/\/doi.org\/10.3390\/electronics10070818","DOI":"10.3390\/electronics10070818"},{"key":"1666_CR54","doi-asserted-by":"publisher","unstructured":"Kriesch L, Losacker S (2025) A geolocated dataset of german news articles. Scientific Data 12. https:\/\/doi.org\/10.1038\/s41597-025-05422-w","DOI":"10.1038\/s41597-025-05422-w"},{"key":"1666_CR55","doi-asserted-by":"publisher","unstructured":"Kumar N, Gupta M, Sharma D, Ofori I (2022) Technical job recommendation system using apis and web crawling. Comput Intell Neurosci 2022. https:\/\/doi.org\/10.1155\/2022\/7797548","DOI":"10.1155\/2022\/7797548"},{"key":"1666_CR56","doi-asserted-by":"publisher","unstructured":"Landeta-L\u00f3pez P, Garc\u00eda JM, Guevara Vega C, Ruiz-Cort\u00e9s A (2025). Supplementary material Llms applied to web scraping and web crawling A systematic review. https:\/\/doi.org\/10.5281\/zenodo.18284382","DOI":"10.5281\/zenodo.18284382"},{"key":"1666_CR57","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","volume":"33","author":"JR Landis","year":"1977","unstructured":"Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159\u2013174. https:\/\/doi.org\/10.2307\/2529310","journal-title":"Biometrics"},{"key":"1666_CR58","doi-asserted-by":"publisher","first-page":"74872","DOI":"10.1109\/ACCESS.2024.3405583","volume":"12","author":"Y Li Chong","year":"2024","unstructured":"Li Chong Y, Poo Lee C, Zen Muhd-Yassin S, Ming Lim K, Kamsani Samingan A (2024) Transkgqa: enhanced knowledge graph question answering with sentence transformers. IEEE Access 12:74872\u201374887. https:\/\/doi.org\/10.1109\/ACCESS.2024.3405583","journal-title":"IEEE Access"},{"key":"1666_CR59","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1017\/S1351324923000049","volume":"30","author":"DE Losada","year":"2024","unstructured":"Losada DE, Pichel JC, Gamallo P (2024) An unsupervised perplexity-based method for boilerplate removal. Nat Lang Eng 30:132\u2013149. https:\/\/doi.org\/10.1017\/S1351324923000049","journal-title":"Nat Lang Eng"},{"key":"1666_CR60","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1007\/s10579-021-09536-6","volume":"56","author":"L Lowphansirikul","year":"2022","unstructured":"Lowphansirikul L, Polpanumas C, Rutherford AT, Nutanong S (2022) A large english\u2013thai parallel corpus from the web and machine-generated text. Lang Resour Eval 56:477\u2013499. https:\/\/doi.org\/10.1007\/s10579-021-09536-6","journal-title":"Lang Resour Eval"},{"key":"1666_CR61","doi-asserted-by":"publisher","unstructured":"Lumpp F, Braga D, Fummi F, Bombieri N (2024) Automating finops in cloud computing: An integrated solution for efficient data collection with dynamic scraper generation, pp 79 \u2013 86. https:\/\/doi.org\/10.1109\/CloudCom62794.2024.00025","DOI":"10.1109\/CloudCom62794.2024.00025"},{"key":"1666_CR62","doi-asserted-by":"publisher","unstructured":"Meddeb P, Ruseti S, Dascalu M, Terian S-M, Travadel S (2022) Counteracting french fake news on climate change using language models. Sustainability (Switzerland) 14. https:\/\/doi.org\/10.3390\/su141811724","DOI":"10.3390\/su141811724"},{"key":"1666_CR63","doi-asserted-by":"publisher","unstructured":"Miano J et\u00a0al (2021) Using event-based web-scraping methods and bidirectional transformers to characterize covid-19 outbreaks in food production and retail settings 12721 LNAI, pp 187 \u2013 198. https:\/\/doi.org\/10.1007\/978-3-030-77211-6_21","DOI":"10.1007\/978-3-030-77211-6_21"},{"key":"1666_CR64","doi-asserted-by":"publisher","unstructured":"Min B et al (2024) Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput Surv 56. https:\/\/doi.org\/10.1145\/3605943","DOI":"10.1145\/3605943"},{"key":"1666_CR65","doi-asserted-by":"publisher","unstructured":"Min Q et al (2024) Synergetic event understanding a collaborative approach to cross-document event coreference resolution with large language models vol. 1, pp 2985\u20133002. https:\/\/doi.org\/10.18653\/v1\/2024.acl-long.164","DOI":"10.18653\/v1\/2024.acl-long.164"},{"key":"1666_CR66","doi-asserted-by":"publisher","unstructured":"Mo G et al (2025) Social media data-based rapid hazard assessment of urban waterlogging event: A case study of guilin 6.19 waterlogging. Water (Switzerland) 17. https:\/\/doi.org\/10.3390\/w17030354","DOI":"10.3390\/w17030354"},{"key":"1666_CR67","doi-asserted-by":"publisher","unstructured":"Moenck K, Thieu D\u00a0T, Koch J, Sch\u00fcppstuhl T (2024) Industrial language-image dataset (ilid): Adapting vision foundation models for industrial settings, vol.130, pp 250 \u2013 263. https:\/\/doi.org\/10.1016\/j.procir.2024.10.084","DOI":"10.1016\/j.procir.2024.10.084"},{"key":"1666_CR68","unstructured":"Molino-Pe\u00f1a E, Cruz-Lorite JM, Garc\u00eda JM, Ruiz-Cort\u00e9s A (2025) TOSL: an ontology to detect abusive services. CEUR Workshop Proceedings 3977"},{"key":"1666_CR69","doi-asserted-by":"publisher","unstructured":"Naushad R, Gupta R, Bhutiyal T, Prajapati V (2024) A novel approach to rental market analysis for property management firms using large language models and machine learning 14840 LNAI, pp 247 \u2013 261. https:\/\/doi.org\/10.1007\/978-3-031-65668-2_17","DOI":"10.1007\/978-3-031-65668-2_17"},{"key":"1666_CR70","doi-asserted-by":"publisher","unstructured":"Neupane S et\u00a0al (2024) From questions to insightful answers: Building an informed chatbot for university resources, pp 1\u20139. https:\/\/doi.org\/10.1109\/FIE61694.2024.10892994","DOI":"10.1109\/FIE61694.2024.10892994"},{"key":"1666_CR71","doi-asserted-by":"publisher","unstructured":"Ngai EW, Lee MC, Luo M, Chan PS, Liang T (2021) An intelligent knowledge-based chatbot for customer service. Electron Commer Res Appl 50. https:\/\/doi.org\/10.1016\/j.elerap.2021.101098","DOI":"10.1016\/j.elerap.2021.101098"},{"key":"1666_CR72","doi-asserted-by":"publisher","unstructured":"Oussaleh Taoufik A, Azmani A (2024). Ai-enhanced techniques for extracting structured data from unstructured public procurement documents. https:\/\/doi.org\/10.1109\/ISAS64331.2024.10845583","DOI":"10.1109\/ISAS64331.2024.10845583"},{"key":"1666_CR73","doi-asserted-by":"publisher","unstructured":"Page MJ et al (2021) Prisma 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ 372. https:\/\/doi.org\/10.1136\/bmj.n160","DOI":"10.1136\/bmj.n160"},{"key":"1666_CR74","doi-asserted-by":"publisher","unstructured":"Page MJ et al (2021) The prisma 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372. https:\/\/doi.org\/10.1136\/bmj.n71","DOI":"10.1136\/bmj.n71"},{"key":"1666_CR75","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.infsof.2015.03.007","volume":"64","author":"K Petersen","year":"2015","unstructured":"Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1\u201318. https:\/\/doi.org\/10.1016\/j.infsof.2015.03.007","journal-title":"Inf Softw Technol"},{"key":"1666_CR76","doi-asserted-by":"publisher","unstructured":"Plaskowski DA, Skwarek S, Grajewska D, Niemir M, \u0141awrynowicz A (2024) Automating opinion extraction from semi-structured webpages Leveraging language models and instruction finetuning on synthetic data 3:681\u2013688. https:\/\/doi.org\/10.5220\/0012384900003636","DOI":"10.5220\/0012384900003636"},{"key":"1666_CR77","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1515\/commun-2022-0098","volume":"48","author":"J Pohlmann","year":"2023","unstructured":"Pohlmann J, Barbaresi A, Leinen P (2023) Platform regulation and \u201coverblocking\u2019\u2019-the netzdg discourse in germany. Communications 48:395\u2013419. https:\/\/doi.org\/10.1515\/commun-2022-0098","journal-title":"Communications"},{"key":"1666_CR78","doi-asserted-by":"publisher","unstructured":"Porlou C et\u00a0al (2024) Optimizing an llm prompt for accurate data extraction from firearm-related listings in dark web marketplaces, pp 2821 \u2013 2830. https:\/\/doi.org\/10.1109\/BigData62323.2024.10825446","DOI":"10.1109\/BigData62323.2024.10825446"},{"key":"1666_CR79","doi-asserted-by":"publisher","first-page":"143","DOI":"10.24818\/18423264\/58.3.24.09","volume":"58","author":"B-S Posedaru","year":"2024","unstructured":"Posedaru B-S, Batagan L, Bologa R, Placinta D-D, Mirea C-M (2024) Software architecture for improving scraping systems using artificial intelligence. Econom Comput Econom Cybernet Stud Res 58:143\u2013160. https:\/\/doi.org\/10.24818\/18423264\/58.3.24.09","journal-title":"Econom Comput Econom Cybernet Stud Res"},{"key":"1666_CR80","unstructured":"Prabhong T, Kertkeidkachorn N, Trongratsameethong A (2024) Kgc-rag: knowledge graph construction from large language model using retrieval-augmented generation 3853"},{"key":"1666_CR81","doi-asserted-by":"publisher","unstructured":"Pshenova A, Ahn J (2025) Towards a curatorial agent for heritage institutions Web source credibility verification for grounding domain-specific llms, vol. 48, pp 1213\u20131219. https:\/\/doi.org\/10.5194\/isprs-archives-XLVIII-M-9-2025-1213-2025","DOI":"10.5194\/isprs-archives-XLVIII-M-9-2025-1213-2025"},{"key":"1666_CR82","doi-asserted-by":"publisher","unstructured":"Pushpalatha M, Aravindan MS (2025). Comparative analysis of web scraping methodologies using generative ai. https:\/\/doi.org\/10.1109\/RAIT65068.2025.11088928","DOI":"10.1109\/RAIT65068.2025.11088928"},{"key":"1666_CR83","doi-asserted-by":"publisher","unstructured":"R R et\u00a0al (2025) Cyberscribe- scalable autonomous web crawler, pp 1\u20135. https:\/\/doi.org\/10.1109\/CSITSS67709.2025.11294606","DOI":"10.1109\/CSITSS67709.2025.11294606"},{"key":"1666_CR84","doi-asserted-by":"publisher","unstructured":"Rodriguez-Sarmiento L\u00a0F, Galpin I, Sanchez-Ria\u00f1o V (2024) Mapping brand territories using chatgpt 1874 CCIS, pp 31 \u2013 46. https:\/\/doi.org\/10.1007\/978-3-031-46813-1_3","DOI":"10.1007\/978-3-031-46813-1_3"},{"key":"1666_CR85","doi-asserted-by":"crossref","unstructured":"Romanyshyn N, Chaplynskyi D, Romanyshyn M (2024) Automated extraction of hypo-hypernym relations for the ukrainian wordnet, pp 51 \u2013 60","DOI":"10.63317\/4ekekaazwpeq"},{"key":"1666_CR86","doi-asserted-by":"publisher","first-page":"1047","DOI":"10.1007\/s10579-021-09551-7","volume":"55","author":"S Roziewski","year":"2021","unstructured":"Roziewski S, Koz\u0142owski M (2021) Languagecrawl: a generic tool for building language models upon common crawl. Lang Resour Eval 55:1047\u20131075. https:\/\/doi.org\/10.1007\/s10579-021-09551-7","journal-title":"Lang Resour Eval"},{"key":"1666_CR87","doi-asserted-by":"publisher","first-page":"185401","DOI":"10.1109\/ACCESS.2024.3513155","volume":"12","author":"B Saha","year":"2024","unstructured":"Saha B, Saha U, Zubair Malik M (2024) Quim-rag: advancing retrievalaugmented generation with inverted question matching for enhanced qa performance. IEEE Access 12:185401\u2013185410. https:\/\/doi.org\/10.1109\/ACCESS.2024.3513155","journal-title":"IEEE Access"},{"key":"1666_CR88","doi-asserted-by":"publisher","unstructured":"Sakib SN, Rubaiat SY, Naha K, Rahman HH, Jamil HM (2025) A fair resource recommender system for smart open scientific inquiries. Appl Sci (Switzerland) 15. https:\/\/doi.org\/10.3390\/app15158334","DOI":"10.3390\/app15158334"},{"key":"1666_CR89","doi-asserted-by":"publisher","unstructured":"Sampaio T, Oliveira P\u00a0F, Matos P (2024) Development of a chatbot to support an university institutional website, pp 1\u20136. https:\/\/doi.org\/10.1109\/ICEET65156.2024.10913604","DOI":"10.1109\/ICEET65156.2024.10913604"},{"key":"1666_CR90","doi-asserted-by":"publisher","unstructured":"Senapati A, Ananya J, Jha A, Mahishi A, Srinivas K (2024) Driving innovation: Creating a dataset for automotive ad campaign analysis, pp 273 \u2013 278. https:\/\/doi.org\/10.1109\/IC2SDT62152.2024.10696037","DOI":"10.1109\/IC2SDT62152.2024.10696037"},{"key":"1666_CR91","doi-asserted-by":"publisher","unstructured":"Shah A, Shah H, Bafna V, Khandor C, Nair S (2025) Validation and extraction of reliable information through automated scraping and natural language inference. Eng Appl Artif Intell 147. https:\/\/doi.org\/10.1016\/j.engappai.2025.110284","DOI":"10.1016\/j.engappai.2025.110284"},{"key":"1666_CR92","doi-asserted-by":"publisher","unstructured":"Sharma G (eds) (2024) Web crawling and scraping: a survey, pp 190\u2014-192. https:\/\/doi.org\/10.1109\/HISET61796.2024.00063","DOI":"10.1109\/HISET61796.2024.00063"},{"key":"1666_CR93","doi-asserted-by":"publisher","unstructured":"Singh A, Mithun S (2025) Enhancing large language models for real-time, seo-optimized article generation 1567 LNNS, pp 128 \u2013 141. https:\/\/doi.org\/10.1007\/978-3-032-00071-2_8","DOI":"10.1007\/978-3-032-00071-2_8"},{"key":"1666_CR94","doi-asserted-by":"publisher","unstructured":"Smith E, Peters J, Reiter N (2024) Automatic detection of problem-gambling signs from online texts using large language models. PLOS Digital Health 3. https:\/\/doi.org\/10.1371\/journal.pdig.0000605","DOI":"10.1371\/journal.pdig.0000605"},{"key":"1666_CR95","doi-asserted-by":"publisher","unstructured":"Soni TC, Manoj M, Verma M, Tripathi MK (2025) Supercapacitor materials database generated using web scrapping and natural language processing. J Mol Graph Model 136. https:\/\/doi.org\/10.1016\/j.jmgm.2025.108980","DOI":"10.1016\/j.jmgm.2025.108980"},{"key":"1666_CR96","doi-asserted-by":"publisher","unstructured":"Tan H\u00a0H, Erol B, Erten H\u00a0Y, Ba\u011fdatl\u0131 A\u00a0A, Karakaya K\u00a0M (2025) Ai-based domestic relocation assistant for t\u00fcrkiye, pp 125\u2013130. https:\/\/doi.org\/10.1109\/UBMK67458.2025.11206933","DOI":"10.1109\/UBMK67458.2025.11206933"},{"key":"1666_CR97","doi-asserted-by":"publisher","unstructured":"Tawil Y, Alqaraleh S (2021) Bert based topic-specific crawler, pp 1\u20135. https:\/\/doi.org\/10.1109\/ASYU52992.2021.9599076","DOI":"10.1109\/ASYU52992.2021.9599076"},{"key":"1666_CR98","doi-asserted-by":"publisher","unstructured":"Tiwari S, Verma R, Jaiswal J, Rai B\u00a0K (eds) (2020) Open source intelligence initiating efficient investigation and reliable web searching. Communications in computer and information science, vol. 1244, pp 151\u2013163. https:\/\/doi.org\/10.1007\/978-981-15-6634-9_15","DOI":"10.1007\/978-981-15-6634-9_15"},{"key":"1666_CR99","doi-asserted-by":"publisher","first-page":"826","DOI":"10.1162\/tacl_a_00577","volume":"11","author":"M Treviso","year":"2023","unstructured":"Treviso M et al (2023) Efficient methods for natural language processing: a survey. Trans Assoc Comput Linguist 11:826\u2013860. https:\/\/doi.org\/10.1162\/tacl_a_00577","journal-title":"Trans Assoc Comput Linguist"},{"key":"1666_CR100","doi-asserted-by":"publisher","unstructured":"Truong B-V, Nguyen LT, Pham P, Vo B (2025) Htmldownloader: an open-source tool for dynamic web scraping and archiving using webview2. SoftwareX 32. https:\/\/doi.org\/10.1016\/j.softx.2025.102373","DOI":"10.1016\/j.softx.2025.102373"},{"key":"1666_CR101","doi-asserted-by":"publisher","first-page":"61726","DOI":"10.1109\/ACCESS.2020.2984503","volume":"8","author":"E Uzun","year":"2020","unstructured":"Uzun E (2020) A novel web scraping approach using the additional information obtained from web pages. IEEE Access 8:61726\u201361740. https:\/\/doi.org\/10.1109\/ACCESS.2020.2984503","journal-title":"IEEE Access"},{"key":"1666_CR102","doi-asserted-by":"publisher","unstructured":"Vibhute M, Gutierrez N, Radivojevic K, Brenner P (2025) Multimodal web agents for automated (dark) web navigation, vol. 1, pp 437 \u2013 444. https:\/\/doi.org\/10.5220\/0013171600003890","DOI":"10.5220\/0013171600003890"},{"key":"1666_CR103","doi-asserted-by":"publisher","unstructured":"Wu H, Cho H, Davies A\u00a0R, Jones G J\u00a0F (2024) Llm-based automated web retrieval and text classification of food sharing initiatives, pp 4983 \u2013 4990. https:\/\/doi.org\/10.1145\/3627673.3680090","DOI":"10.1145\/3627673.3680090"},{"key":"1666_CR104","doi-asserted-by":"publisher","unstructured":"Xiong J, Wei M, Lu Z, Liu Y (2025) Assessing the effectiveness of crawlers and large language models in detecting adversarial hidden link threats in meta computing. High-Confidence Computing 5. https:\/\/doi.org\/10.1016\/j.hcc.2024.100292","DOI":"10.1016\/j.hcc.2024.100292"},{"key":"1666_CR105","doi-asserted-by":"publisher","unstructured":"Xu Z et\u00a0al (2024) Cleaner pretraining corpus curation with neural web scraping, vol.2, pp 802 \u2013 812. https:\/\/doi.org\/10.18653\/v1\/2024.acl-short.71","DOI":"10.18653\/v1\/2024.acl-short.71"},{"key":"1666_CR106","doi-asserted-by":"publisher","unstructured":"Yadav AK, Ranvijay, Yadav RS, Maurya AK (2023) State-of-the-art approach to extractive text summarization: a comprehensive review. Multim Tools Appl 82:29135\u201329197. https:\/\/doi.org\/10.1007\/s11042-023-14613-9","DOI":"10.1007\/s11042-023-14613-9"},{"key":"1666_CR107","doi-asserted-by":"publisher","unstructured":"Yang W et al (2025) An mllm-assisted web crawler approach for web application fuzzing. Appl Sci (Switzerland) 15. https:\/\/doi.org\/10.3390\/app15020962","DOI":"10.3390\/app15020962"},{"key":"1666_CR108","doi-asserted-by":"publisher","first-page":"71456","DOI":"10.1109\/ACCESS.2025.3562718","volume":"13","author":"G Yurtalan","year":"2025","unstructured":"Yurtalan G, Arslan S (2025) Redefining osint software architecture with system-centric architecture design: a framework shaped by qaw, add, and atam. IEEE Access 13:71456\u201371480. https:\/\/doi.org\/10.1109\/ACCESS.2025.3562718","journal-title":"IEEE Access"},{"key":"1666_CR109","doi-asserted-by":"publisher","unstructured":"Zakarija I, \u0161kopljanac Ma\u010dina, F., Maru\u0161i\u0107, H. & Bla\u0161kovi\u0107, B, (2024) A sentiment analysis model based on user experiences of dubrovnik on the tripadvisor platform. Appl Sci (Switzerland) 14. https:\/\/doi.org\/10.3390\/app14188304","DOI":"10.3390\/app14188304"},{"key":"1666_CR110","doi-asserted-by":"publisher","unstructured":"Zewail A, Abdulghany Y, Samy M (2025) Reducing mean time to respond using large language model-driven incident response with the aid of reactively retrieved threat intelligence, pp 322 \u2013 327. https:\/\/doi.org\/10.1109\/IMSA65733.2025.11167573","DOI":"10.1109\/IMSA65733.2025.11167573"},{"key":"1666_CR111","doi-asserted-by":"publisher","unstructured":"Zhang R, El-Gohary N (2021) A deep neural network-based method for deep information extraction using transfer learning strategies to support automated compliance checking. Autom Constr 132. https:\/\/doi.org\/10.1016\/j.autcon.2021.103834","DOI":"10.1016\/j.autcon.2021.103834"},{"key":"1666_CR112","doi-asserted-by":"publisher","unstructured":"Zhou H, Zhou K, Xiang X, Gu Z (2025) Adaptive web crawling for threat intelligence using a reinforcement learning-enhanced large language model 2421 CCIS, pp 348 \u2013 362. https:\/\/doi.org\/10.1007\/978-981-96-4506-0_22","DOI":"10.1007\/978-981-96-4506-0_22"},{"key":"1666_CR113","doi-asserted-by":"publisher","unstructured":"Zia A et al (2022) Artificial intelligence-based medical data mining. J Pers Med 12. https:\/\/doi.org\/10.3390\/jpm12091359","DOI":"10.3390\/jpm12091359"},{"key":"1666_CR114","doi-asserted-by":"publisher","unstructured":"Zuehlke S, Nitu J, Sandler S, Krauss O, Stockl A (2024). Self-repairing data scraping for websites. https:\/\/doi.org\/10.1109\/ICECCME62383.2024.10796733","DOI":"10.1109\/ICECCME62383.2024.10796733"}],"container-title":["Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00607-026-01666-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00607-026-01666-5","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00607-026-01666-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T06:11:22Z","timestamp":1778739082000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00607-026-01666-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,14]]},"references-count":114,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2026,6]]}},"alternative-id":["1666"],"URL":"https:\/\/doi.org\/10.1007\/s00607-026-01666-5","relation":{},"ISSN":["0010-485X","1436-5057"],"issn-type":[{"value":"0010-485X","type":"print"},{"value":"1436-5057","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,14]]},"assertion":[{"value":"20 October 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 April 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 May 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"78"}}