{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T05:16:16Z","timestamp":1766553376420,"version":"3.48.0"},"reference-count":32,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T00:00:00Z","timestamp":1766016000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Thailand Science Research and Innovation Fun","award":["2253\/2568"],"award-info":[{"award-number":["2253\/2568"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>The rising deployment of artificial intelligence in public services is constrained by computational costs and limited domain-specific data, particularly in multilingual contexts. This study proposes a generalizable Agentic AI pipeline for developing question\u2013answer chatbot systems using small language models (SLMs), demonstrated through a case study on the Thai Student Loan Fund (TSLF). The pipeline integrates four stages: OCR-based document digitization using Typhoon2-3B, agentic question\u2013answer dataset construction via a clean\u2013check\u2013plan\u2013generate (CCPG) workflow, parameter-efficient fine-tuning with QLoRA on Typhoon2-1B and Typhoon2-3B models, and retrieval-augmented generation (RAG) for source-grounded responses. Evaluation using BERTScore and CondBERT confirmed high semantic consistency (FBERT = 0.9807) and stylistic reliability (FBERT = 0.9839) of the generated QA corpus. Fine-tuning improved the 1B model\u2019s domain alignment (FBERT: 0.8593 \u2192 0.8641), while RAG integration further enhanced factual grounding (FBERT = 0.8707) and citation transparency. Cross-validation with GPT-5 and Gemini 2.5 Pro demonstrated dataset transferability and reliability. The results establish that Agentic AI combined with SLMs offers a cost-effective, interpretable, and scalable framework for automating bilingual advisory services in resource-constrained government and educational institutions.<\/jats:p>","DOI":"10.3390\/computation13120297","type":"journal-article","created":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T14:32:03Z","timestamp":1766068323000},"page":"297","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Generalizable Agentic AI Pipeline for Developing Chatbots Using Small Language Models: A Case Study on Thai Student Loan Fund Services"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-1031-1324","authenticated-orcid":false,"given":"Jakkaphong","family":"Inpun","sequence":"first","affiliation":[{"name":"School of Information and Communication Technology, University of Phayao, Phayao 56000, Thailand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8563-017X","authenticated-orcid":false,"given":"Watcharaporn","family":"Cholamjiak","sequence":"additional","affiliation":[{"name":"School of Science, University of Phayao, Phayao 56000, Thailand"}]},{"given":"Piyada","family":"Phrueksawatnon","sequence":"additional","affiliation":[{"name":"School of Information and Communication Technology, University of Phayao, Phayao 56000, Thailand"}]},{"given":"Kanokwatt","family":"Shiangjen","sequence":"additional","affiliation":[{"name":"School of Information and Communication Technology, University of Phayao, Phayao 56000, Thailand"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,18]]},"reference":[{"key":"ref_1","unstructured":"Belcak, P., Heinrich, G., Diao, S., Fu, Y., Dong, X., Muralidharan, S., Lin, Y.C., and Molchanov, P. (2025). Small Language Models are the Future of Agentic AI. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Smith, R. (2007, January 23\u201326). An Overview of the Tesseract OCR Engine. Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil.","DOI":"10.1109\/ICDAR.2007.4376991"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Baek, Y., Lee, B., Han, D., Yun, S., and Lee, H. (2019). Character Region Awareness for Text Detection. arXiv.","DOI":"10.1109\/CVPR.2019.00959"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Feng, H., Wang, Y., Zhou, W., Deng, J., and Li, H. (2022). DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction. arXiv.","DOI":"10.1145\/3474085.3475388"},{"key":"ref_5","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2023). Attention Is All You Need. arXiv."},{"key":"ref_6","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_7","unstructured":"Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P.J. (2023). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv."},{"key":"ref_8","unstructured":"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020). Language Models Are Few-Shot Learners. arXiv."},{"key":"ref_9","unstructured":"Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi\u00e8re, B., Goyal, N., Hambro, E., and Azhar, F. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv."},{"key":"ref_10","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_11","unstructured":"Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv."},{"key":"ref_12","unstructured":"Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gururangan, S., Marasovi\u0107, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., and Smith, N.A. (2020). Don\u2019t Stop Pretraining: Adapt Language Models to Domains and Tasks. arXiv.","DOI":"10.18653\/v1\/2020.acl-main.740"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining","volume":"36","author":"Lee","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_15","unstructured":"Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., and Ray, A. (2022). Training Language Models to Follow Instructions with Human Feedback. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N.A., Khashabi, D., and Hajishirzi, H. (2023). Self-Instruct: Aligning Language Models with Self-Generated Instructions. arXiv.","DOI":"10.18653\/v1\/2023.acl-long.754"},{"key":"ref_17","unstructured":"Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., and Artzi, Y. (2020). BERTScore: Evaluating Text Generation with BERT. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Dale, D., Voronov, A., Dementieva, D., Logacheva, V., Kozlova, O., Semenov, N., and Panchenko, A. (2021). Text Detoxification Using Large Pre-Trained Neural Models. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.629"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Adamopoulou, E., and Moussiades, L. (2020, January 5\u20137). An Overview of Chatbot Technology. Proceedings of the Artificial Intelligence Applications and Innovations (AIAI 2020), Neos Marmaras, Greece.","DOI":"10.1007\/978-3-030-49186-4_31"},{"key":"ref_20","unstructured":"Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K\u00fcttler, H., Lewis, M., Yih, W.-t., and Rockt\u00e4schel, T. (2020, January 6\u201312). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Karpukhin, V., O\u011fuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W.-T. (2020). Dense Passage Retrieval for Open-Domain Question Answering. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chen, D., Fisch, A., Weston, J., and Bordes, A. (2017). Reading Wikipedia to Answer Open-Domain Questions. arXiv.","DOI":"10.18653\/v1\/P17-1171"},{"key":"ref_23","unstructured":"Johnson, J., Douze, M., and J\u00e9gou, H. (2017). Billion-Scale Similarity Search with GPUs. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z. (2025). BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. arXiv.","DOI":"10.18653\/v1\/2024.findings-acl.137"},{"key":"ref_26","unstructured":"Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv."},{"key":"ref_27","unstructured":"Shen, Y., Song, K., Tan, X., Li, D., Lu, W., and Zhuang, Y. (2023). HuggingGPT: Solving AI Tasks with ChatGPT and Its Friends in Hugging Face. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Leslie, D., and Perini, A.M. (2024). Future Shock: Generative AI and the International AI Policy and Governance Crisis. Harv. Data Sci. Rev., Available online: https:\/\/hdsr.mitpress.mit.edu\/pub\/yixt9mqu\/release\/3.","DOI":"10.1162\/99608f92.88b4cc98"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1162\/coli_a_00446","article-title":"Survey of Low-Resource Machine Translation","volume":"48","author":"Haddow","year":"2022","journal-title":"Comput. Linguist."},{"key":"ref_30","unstructured":"Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., and Cao, Y. (2023, January 1\u20135). ReAct: Synergizing Reasoning and Acting in Language Models. Proceedings of the Eleventh International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda."},{"key":"ref_31","first-page":"1","article-title":"On-Device Large Language Models: A Survey of Model Compression and System Optimization","volume":"13","author":"Chen","year":"2025","journal-title":"Computation"},{"key":"ref_32","first-page":"68539","article-title":"Toolformer: Language Models Can Teach Themselves to Use Tools","volume":"36","author":"Schick","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/12\/297\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T05:12:44Z","timestamp":1766553164000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/12\/297"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,18]]},"references-count":32,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["computation13120297"],"URL":"https:\/\/doi.org\/10.3390\/computation13120297","relation":{},"ISSN":["2079-3197"],"issn-type":[{"type":"electronic","value":"2079-3197"}],"subject":[],"published":{"date-parts":[[2025,12,18]]}}}