{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,5]],"date-time":"2026-05-05T12:05:00Z","timestamp":1777982700055,"version":"3.51.4"},"reference-count":65,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T00:00:00Z","timestamp":1775433600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T00:00:00Z","timestamp":1775433600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005765","name":"Universidade de Lisboa","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005765","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Data Sci Anal"],"published-print":{"date-parts":[[2026,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Document classification serves as foundational step in critical tasks such as information extraction, analysis and decision-making. However, existing approaches often struggle with the variability, volume, and complexity of real-world documents. These methods are further limited by a lack of configurability and explainability, requiring specialized technical expertize to accommodate diverse user needs and often producing results that are difficult to interpret. To address the complexities of modern document processing, this paper introduces a novel zero-shot document classification framework that leverages Large Language Models (LLMs), designed for accessibility and configurability by both technical and non-technical users. Unlike traditional methods, which require extensive labeled data, the zero-shot configuration enables our framework to perform the classification task without any prior exposure to labeled examples of the target categories, relying instead on semantic understanding derived from user-provided label descriptions and document content. Developed and validated using a real-world banking dataset, our framework leverages different strategies for providing context to LLMs during classification. Experimental results demonstrate substantial improvements in both accuracy and efficiency, outperforming current zero-shot methods while also reducing operating costs.<\/jats:p>","DOI":"10.1007\/s41060-026-01077-x","type":"journal-article","created":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T15:48:49Z","timestamp":1775490529000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Leveraging large language models for document classification in the banking sector"],"prefix":"10.1007","volume":"22","author":[{"given":"R\u00f3mulo","family":"Nogueira","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8538-1727","authenticated-orcid":false,"given":"Hugo","family":"Mentzingen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6371-3310","authenticated-orcid":false,"given":"Nuno","family":"Garcia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,4,6]]},"reference":[{"issue":"2","key":"1077_CR1","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1145\/321160.321165","volume":"10","author":"H Borko","year":"1963","unstructured":"Borko, H., Bernick, M.: Automatic document classification. J. ACM 10(2), 151\u2013162 (1963). https:\/\/doi.org\/10.1145\/321160.321165","journal-title":"J. ACM"},{"key":"1077_CR2","doi-asserted-by":"crossref","unstructured":"Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. CoRR (2017) arXiv: 1707.02968","DOI":"10.1109\/ICCV.2017.97"},{"issue":"1","key":"1077_CR3","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1108\/eb026526","volume":"28","author":"K Sparck Jones","year":"1972","unstructured":"Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11\u201321 (1972). https:\/\/doi.org\/10.1108\/eb026526","journal-title":"J. Doc."},{"key":"1077_CR4","doi-asserted-by":"publisher","DOI":"10.1017\/9781108684163","volume-title":"Mining of Massive Datasets","author":"J Leskovec","year":"2020","unstructured":"Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 3rd edn. Cambridge University Press, USA (2020)","edition":"3"},{"key":"1077_CR5","volume-title":"A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going","author":"M Wooldridge","year":"2021","unstructured":"Wooldridge, M.: A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going, 1st edn. Flatiron Books, New York (2021)","edition":"1"},{"key":"1077_CR6","doi-asserted-by":"publisher","DOI":"10.2760\/801580","author":"E Commission","year":"2020","unstructured":"Commission, E., Centre, J.R., Delipetrev, B., Tsinaraki, C., Kosti\u0107, U.: AI watch, historical evolution of artificial intelligence\u2014analysis of the three main paradigm shifts in AI. Tech. Rep. (2020). https:\/\/doi.org\/10.2760\/801580","journal-title":"Tech. Rep."},{"issue":"1","key":"1077_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/505282.505283","volume":"34","author":"F Sebastiani","year":"2002","unstructured":"Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1\u201347 (2002). https:\/\/doi.org\/10.1145\/505282.505283","journal-title":"ACM Comput. Surv. (CSUR)"},{"issue":"4","key":"1077_CR8","doi-asserted-by":"publisher","first-page":"449","DOI":"10.1016\/j.cpet.2021.07.001","volume":"16","author":"A Toosi","year":"2021","unstructured":"Toosi, A., Bottino, A.G., Saboury, B., Siegel, E., Rahmim, A.: A brief history of AI: How to prevent another winter (a critical review). PET Clin. 16(4), 449\u2013469 (2021). https:\/\/doi.org\/10.1016\/j.cpet.2021.07.001","journal-title":"PET Clin."},{"key":"1077_CR9","doi-asserted-by":"publisher","DOI":"10.4304\/jcp.4.3.230-237","author":"Z Yong","year":"2009","unstructured":"Yong, Z., Youwen, L., Shixiong, X.: An improved KNN text classification algorithm based on clustering. J. Comput. (2009). https:\/\/doi.org\/10.4304\/jcp.4.3.230-237","journal-title":"J. Comput."},{"key":"1077_CR10","doi-asserted-by":"publisher","unstructured":"Shi, K., Li, L., Liu, H., He, J., Zhang, N., Song, W.: An improved KNN text classification algorithm based on density. In: 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, pp. 113\u2013117 (2011). https:\/\/doi.org\/10.1109\/CCIS.2011.6045043","DOI":"10.1109\/CCIS.2011.6045043"},{"key":"1077_CR11","doi-asserted-by":"publisher","unstructured":"Chang, Y.-H., Huang, H.-Y.: An automatic document classifier system based on na\u00edve bayes classifier and ontology. In: 2008 International Conference on Machine Learning and Cybernetics, 6, 3144\u20133149 (2008). https:\/\/doi.org\/10.1109\/ICMLC.2008.4620948","DOI":"10.1109\/ICMLC.2008.4620948"},{"key":"1077_CR12","doi-asserted-by":"publisher","unstructured":"Wang, Y., Hodges, J., Tang, B.: Classification of web documents using a naive bayes method. In: Proceedings 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 560\u2013564 (2003). https:\/\/doi.org\/10.1109\/TAI.2003.1250241","DOI":"10.1109\/TAI.2003.1250241"},{"key":"1077_CR13","doi-asserted-by":"publisher","unstructured":"Nguyen, L.: A proposal of discovering user interest by support vector machine and decision tree on document classification. In: 2009 International Conference on Computational Science and Engineering, 4, 809\u2013814 (2009). https:\/\/doi.org\/10.1109\/CSE.2009.112","DOI":"10.1109\/CSE.2009.112"},{"key":"1077_CR14","doi-asserted-by":"publisher","unstructured":"Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1. AAAI\u201906, pp. 500\u2013505. AAAI Press, Boston, Massachusetts (2006). https:\/\/doi.org\/10.5555\/1597538.1597619","DOI":"10.5555\/1597538.1597619"},{"key":"1077_CR15","doi-asserted-by":"publisher","first-page":"95","DOI":"10.4156\/jdcta.vol4.issue3.9","volume":"4","author":"H Gao","year":"2010","unstructured":"Gao, H., Jiang, J., She, L., Fu, Y.: A new agglomerative hierarchical clustering algorithm implementation based on the map reduce framework. JDCTA 4, 95\u2013100 (2010). https:\/\/doi.org\/10.4156\/jdcta.vol4.issue3.9","journal-title":"JDCTA"},{"issue":"2 part 2","key":"1077_CR16","doi-asserted-by":"publisher","first-page":"3208","DOI":"10.1016\/j.eswa.2008.01.014","volume":"36","author":"CH Li","year":"2009","unstructured":"Li, C.H., Park, S.C.: An efficient document classification model using an improved back propagation neural network and singular value decomposition. Expert Syst. Appl. 36(2 part 2), 3208\u20133215 (2009). https:\/\/doi.org\/10.1016\/j.eswa.2008.01.014","journal-title":"Expert Syst. Appl."},{"issue":"7","key":"1077_CR17","doi-asserted-by":"publisher","first-page":"1466","DOI":"10.1016\/j.neucom.2006.05.013","volume":"70","author":"L Manevitz","year":"2007","unstructured":"Manevitz, L., Yousef, M.: One-class document classification via neural networks (Advances in Computational Intelligence and Learning). Neurocomputing 70(7), 1466\u20131481 (2007). https:\/\/doi.org\/10.1016\/j.neucom.2006.05.013","journal-title":"Neurocomputing"},{"key":"1077_CR18","doi-asserted-by":"publisher","unstructured":"Trappey, A.J.C., Hsu, F.-C., Trappey, C.V., Lin, C.-I.: Development of a patent document classification and search platform using a back-propagation network. Expert Systems with Applications 31(4), 755\u2013765 (2006). Computer Supported Cooperative Work in Design and Manufacturing https:\/\/doi.org\/10.1016\/j.eswa.2006.01.013","DOI":"10.1016\/j.eswa.2006.01.013"},{"key":"1077_CR19","doi-asserted-by":"publisher","first-page":"27","DOI":"10.5121\/ijcses.2012.3403","volume":"3","author":"CP Sumathi","year":"2012","unstructured":"Sumathi, C.P., Santhanam, T., Devi, G.: A survey on various approaches of text extraction in images. Int. J. Comput. Sci. Eng. Surv. 3, 27 (2012)","journal-title":"Int. J. Comput. Sci. Eng. Surv."},{"key":"1077_CR20","doi-asserted-by":"publisher","unstructured":"Mittal, R., Garg, A.: Text extraction using OCR: A systematic review. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 357\u2013362 (2020). https:\/\/doi.org\/10.1109\/ICIRCA48905.2020.9183326","DOI":"10.1109\/ICIRCA48905.2020.9183326"},{"key":"1077_CR21","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1710.05703","author":"N Islam","year":"2016","unstructured":"Islam, N., Islam, Z., Noor, N.: A survey on optical character recognition system. ITB J. Inf. Commun. Technol. (2016). https:\/\/doi.org\/10.48550\/arXiv.1710.05703","journal-title":"ITB J. Inf. Commun. Technol."},{"key":"1077_CR22","unstructured":"Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.H., Van Esesn, B.C., Awwal, A.A.S., Asari, V.K.: The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv (2018) arXiv:1803.01164 [cs.LG]"},{"key":"1077_CR23","doi-asserted-by":"publisher","unstructured":"Zhong, G., Yao, H., Liu, Y., Hong, C., Pham, T.: Classification of photographed document images based on deep-learning features. In: Wang, Y., Pham, T.D., Vozenilek, V., Zhang, D., Xie, Y. (eds.) Eighth International Conference on Graphic and Image Processing (ICGIP 2016), vol. 10225, p. 102250. SPIE, China (2017). International Society for Optics and Photonics https:\/\/doi.org\/10.1117\/12.2266984","DOI":"10.1117\/12.2266984"},{"key":"1077_CR24","doi-asserted-by":"publisher","unstructured":"Mikolov, T., Chen, K., Corrado, G.s., Dean, J.: Efficient estimation of word representations in vector space. Proceedings of Workshop at ICLR 2013 (2013) https:\/\/doi.org\/10.48550\/arXiv.1301.3781","DOI":"10.48550\/arXiv.1301.3781"},{"key":"1077_CR25","doi-asserted-by":"publisher","unstructured":"Hiemstra, D.. In: LIU, L., \u00d6ZSU, M.T. (eds.) N-Gram Models, pp. 1910\u20131910. Springer, Boston, MA (2009). https:\/\/doi.org\/10.1007\/978-0-387-39940-9_935","DOI":"10.1007\/978-0-387-39940-9_935"},{"key":"1077_CR26","doi-asserted-by":"publisher","unstructured":"Usman Naseem, S.K.K. Imran Razzak, Prasad, M.: A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. ACM Symposium on Neural Gaze Detection, 46 (2018) https:\/\/doi.org\/10.1145\/3434237","DOI":"10.1145\/3434237"},{"key":"1077_CR27","unstructured":"Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. CoRR (2014) arXiv: 1405.4053"},{"key":"1077_CR28","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511973000","volume-title":"Machine Learning: The Art and Science of Algorithms That Make Sense of Data","author":"P Flach","year":"2012","unstructured":"Flach, P.: Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge University Press, Cambridge, UK (2012)"},{"key":"1077_CR29","unstructured":"Prince, S.J.D.: Understanding Deep Learning. MIT Press, Massachusetts, US (2024). http:\/\/udlbook.com"},{"key":"1077_CR30","doi-asserted-by":"publisher","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https:\/\/doi.org\/10.48550\/arXiv.1706.03762","DOI":"10.48550\/arXiv.1706.03762"},{"key":"1077_CR31","unstructured":"Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., Wen, J.-R.: A Survey of Large Language Models (2024). arXiv: 2303.18223"},{"key":"1077_CR32","doi-asserted-by":"publisher","unstructured":"Fan, L., Li, L., Ma, Z., Lee, S., Yu, H., Hemphill, L.: A bibliometric review of large language models research from 2017 to 2023 (2023) https:\/\/doi.org\/10.48550\/arXiv.2304.02020","DOI":"10.48550\/arXiv.2304.02020"},{"key":"1077_CR33","unstructured":"Touvron, H., et al.: Llama: Open and efficient foundation language models. arXiv:2302.13971 (2023)"},{"key":"1077_CR34","unstructured":"Anthropic: Introducing Claude. https:\/\/www.anthropic.com\/index\/introducing-claude. (2023). Accessed 13 Jun 2024"},{"key":"1077_CR35","unstructured":"Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training. OpenAI Blog 1, 8 (2018). https:\/\/cdn.openai.com\/research-covers\/language-unsupervised\/language_understanding_paper.pdf"},{"key":"1077_CR36","doi-asserted-by":"publisher","first-page":"26839","DOI":"10.1109\/ACCESS.2024.3365742","volume":"12","author":"MAK Raiaan","year":"2024","unstructured":"Raiaan, M.A.K., Mukta, M.S.H., Fatema, K., Fahad, N.M., Sakib, S., Mim, M.M.J., Ahmad, J., Ali, M.E., Azam, S.: A review on large language models: architectures, applications, taxonomies, open issues and challenges. IEEE Access 12, 26839\u201326874 (2024). https:\/\/doi.org\/10.1109\/ACCESS.2024.3365742","journal-title":"IEEE Access"},{"key":"1077_CR37","doi-asserted-by":"publisher","unstructured":"Trautmann, D.: Large language model prompt chaining for long legal document classification. In: SwissText 2023 Late Breaking Work (Generative AI & LLM) (2023). https:\/\/doi.org\/10.48550\/arXiv.2308.04138","DOI":"10.48550\/arXiv.2308.04138"},{"key":"1077_CR38","doi-asserted-by":"crossref","unstructured":"Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (2019). arXiv: 1908.10084","DOI":"10.18653\/v1\/D19-1410"},{"key":"1077_CR39","doi-asserted-by":"crossref","unstructured":"Zheng, L., Guha, N., Anderson, B.R., Henderson, P., Ho, D.E.: When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset (2021). arXiv: 2104.08671","DOI":"10.1145\/3462757.3466088"},{"key":"1077_CR40","doi-asserted-by":"publisher","unstructured":"Menon, S., Vondrick, C.: Visual Classification via Description from Large Language Models. arXiv (2022) https:\/\/doi.org\/10.48550\/arXiv.2210.07183","DOI":"10.48550\/arXiv.2210.07183"},{"key":"1077_CR41","doi-asserted-by":"publisher","unstructured":"Loukas, L., Stogiannidis, I., Diamantopoulos, O., Malakasiotis, P., Vassos, S.: Making llms worth every penny: Resource-limited text classification in banking. In: Proceedings of the Fourth ACM International Conference on AI in Finance. ICAIF \u201923, pp. 392\u2013400. Association for Computing Machinery, New York, NY, USA (2023). https:\/\/doi.org\/10.1145\/3604237.3626891","DOI":"10.1145\/3604237.3626891"},{"key":"1077_CR42","doi-asserted-by":"crossref","unstructured":"Sun, X., Li, X., Li, J., Wu, F., Guo, S., Zhang, T., Wang, G.: Text Classification via Large Language Models (2023). arXiv: 2305.08377","DOI":"10.18653\/v1\/2023.findings-emnlp.603"},{"key":"1077_CR43","doi-asserted-by":"publisher","DOI":"10.3390\/info15120792","author":"KI Roumeliotis","year":"2024","unstructured":"Roumeliotis, K.I., Tselikas, N.D., Nasiopoulos, D.K.: Leveraging large language models in tourism: a comparative study of the latest GPT omni models and Bert NLP for customer review classification and sentiment analysis. Information (2024). https:\/\/doi.org\/10.3390\/info15120792","journal-title":"Information"},{"key":"1077_CR44","doi-asserted-by":"publisher","unstructured":"Prasad, N., Boughanem, M., Dkaki, T.: Exploring large language models and hierarchical frameworks for classification of large unstructured legal documents. In: Advances in Information Retrieval: Proceedings of the 46th European Conference on Information Retrieval (ECIR 2024), Part II, pp. 221\u2013237. Springer (2024). https:\/\/doi.org\/10.1007\/978-3-031-56060-6_15","DOI":"10.1007\/978-3-031-56060-6_15"},{"key":"1077_CR45","doi-asserted-by":"publisher","unstructured":"Wei, F., Keeling, R., Huber-Fliflet, N., Zhang, J., Dabrowski, A., Yang, J., Mao, Q., Qin, H.: Empirical study of llm fine-tuning for text classification in legal document review. In: 2023 IEEE International Conference on Big Data (BigData), pp. 2786\u20132792 (2023). https:\/\/doi.org\/10.1109\/BigData59044.2023.10386911","DOI":"10.1109\/BigData59044.2023.10386911"},{"key":"1077_CR46","doi-asserted-by":"publisher","unstructured":"Alonso-Rocha, J.L., Mart\u00ednez-Rojas, A., Enr\u00edquez, J.G., S\u00e1nchez-Oliva, J.M.: From manual to automated: a state-of-the-art review to examine the impact of intelligent document processing in banking automation. Expert Syst. Appl. 285, 127958 (2025). https:\/\/doi.org\/10.1016\/j.eswa.2025.127958","DOI":"10.1016\/j.eswa.2025.127958"},{"key":"1077_CR47","unstructured":"Inc., A.S., contributors: PyMuPDF: Python Bindings for MuPDF. Version 1.24.14 (2024). https:\/\/pymupdf.readthedocs.io\/"},{"key":"1077_CR48","unstructured":"O\u2019Riordan, M.D.: Python-tesseract is an optical character recognition (OCR) tool for python. (2024). Accessed Nov 2024. https:\/\/github.com\/madmaze\/pytesseract"},{"key":"1077_CR49","unstructured":"Smith, R.: Tesseract: An Open-Source OCR Engine. Accessed: November 2024 (2007). https:\/\/github.com\/tesseract-ocr\/tesseract"},{"key":"1077_CR50","doi-asserted-by":"publisher","unstructured":"Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., Wang, H.: Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv preprint arXiv:2312.10997 (2023) https:\/\/doi.org\/10.48550\/arXiv.2312.10997. Ongoing Work","DOI":"10.48550\/arXiv.2312.10997"},{"key":"1077_CR51","doi-asserted-by":"publisher","unstructured":"Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K\u00fcttler, H., Lewis, M., Yih, W.-T., Rockt\u00e4schel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Advances in Neural Information Processing Systems (NeurIPS) (2020). https:\/\/doi.org\/10.48550\/arXiv.2005.11401. Accepted at NeurIPS 2020","DOI":"10.48550\/arXiv.2005.11401"},{"key":"1077_CR52","unstructured":"Martineau, K.: What is retrieval-augmented generation? https:\/\/research.ibm.com\/blog\/retrieval-augmented-generation-RAG (2023). Accessed 26 Dec 2024"},{"key":"1077_CR53","unstructured":"OpenAI: Embeddings - OpenAI Documentation (2024). Accessed 23 Dec 2024. https:\/\/platform.openai.com\/docs\/guides\/embeddings\/"},{"key":"1077_CR54","unstructured":"LangChain: InMemoryVectorStore - LangChain Documentation (2024). https:\/\/python.langchain.com\/api_reference\/core\/vectorstores\/langchain_core.vectorstores.in_memory.InMemoryVectorStore.html"},{"key":"1077_CR55","doi-asserted-by":"crossref","unstructured":"Kukreja, S., Kumar, T., Bharate, V., Purohit, A., Dasgupta, A., Guha, D.: Vector databases and vector embeddings-review. In: 2023 International Workshop on Artificial Intelligence and Image Processing (IWAIIP), pp. 231\u2013236 (2023). DOI: 10.1109\/IWAIIP58158.2023.10462847","DOI":"10.1109\/IWAIIP58158.2023.10462847"},{"key":"1077_CR56","unstructured":"Han, Y., Liu, C., Wang, P.: A comprehensive survey on vector database: Storage and retrieval technique, challenge. arXiv:2310.11703 (2023)"},{"key":"1077_CR57","unstructured":"He, J., Rungta, M., Koleczek, D., Sekhon, A., Wang, F.X., Hasan, S.: Does prompt formatting have any impact on llm performance? arXiv:2411.10541 (2024). Submitted to NAACL 2025"},{"key":"1077_CR58","unstructured":"OpenAI: OpenAI Guide to Prompt Engineering (2024). https:\/\/platform.openai.com\/docs\/guides\/prompt-engineering"},{"issue":"3","key":"1077_CR59","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1177\/01626434241298954","volume":"40","author":"J Park","year":"2025","unstructured":"Park, J., Choo, S.: Generative AI prompt engineering for educators: practical strategies. J. Spec. Educ. Technol. 40(3), 411\u2013417 (2025). https:\/\/doi.org\/10.1177\/01626434241298954","journal-title":"J. Spec. Educ. Technol."},{"key":"1077_CR60","unstructured":"Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E.H., Le, Q., Zhou, D.: Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR arXiv: 2201.11903 (2022)"},{"key":"1077_CR61","doi-asserted-by":"publisher","unstructured":"Nogueira, R.: Portuguese Financial, Legal, and Property Documents Dataset. https:\/\/doi.org\/10.17632\/pvvkv2j6dx.1","DOI":"10.17632\/pvvkv2j6dx.1"},{"key":"1077_CR62","doi-asserted-by":"crossref","unstructured":"Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L.: Bart: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. CoRR arXiv: 1910.13461 (2019) [cs.CL]","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"1077_CR63","doi-asserted-by":"publisher","unstructured":"Yin, W., Hay, J., Roth, D.: Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) and IJCNLP, pp. 3914\u20133923. Association for Computational Linguistics, Hong Kong, China (2019). https:\/\/doi.org\/10.18653\/v1\/D19-1404","DOI":"10.18653\/v1\/D19-1404"},{"key":"1077_CR64","unstructured":"Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: Advances in Neural Information Processing Systems (NeurIPS) (2020). https:\/\/www.microsoft.com\/en-us\/research\/publication\/minilm-deep-self-attention-distillation-for-task-agnostic-compression-of-pre-trained-transformers\/"},{"key":"1077_CR65","doi-asserted-by":"crossref","unstructured":"Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLMv2: Multi-head self-attention relation distillation for compressing pretrained transformers. arXiv:2012.15828 (2021)","DOI":"10.18653\/v1\/2021.findings-acl.188"}],"container-title":["International Journal of Data Science and Analytics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-026-01077-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41060-026-01077-x","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-026-01077-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T15:49:03Z","timestamp":1775490543000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41060-026-01077-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,6]]},"references-count":65,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,12]]}},"alternative-id":["1077"],"URL":"https:\/\/doi.org\/10.1007\/s41060-026-01077-x","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-7511605\/v1","asserted-by":"object"}]},"ISSN":["2364-415X","2364-4168"],"issn-type":[{"value":"2364-415X","type":"print"},{"value":"2364-4168","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,6]]},"assertion":[{"value":"1 September 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 February 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 April 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"126"}}