{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T19:12:59Z","timestamp":1776107579087,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2024,9,9]],"date-time":"2024-09-09T00:00:00Z","timestamp":1725840000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Science and Higher Education of the Republic of Kazakhstan","award":["AP19677756"],"award-info":[{"award-number":["AP19677756"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Large language models (LLMs) can store factual knowledge within their parameters and have achieved superior results in question-answering tasks. However, challenges persist in providing provenance for their decisions and keeping their knowledge up to date. Some approaches aim to address these challenges by combining external knowledge with parametric memory. In contrast, our proposed QA-RAG solution relies solely on the data stored within an external knowledge base, specifically a dense vector index database. In this paper, we compare RAG configurations using two LLMs\u2014Llama 2b and 13b\u2014systematically examining their performance in three key RAG capabilities: noise robustness, knowledge gap detection, and external truth integration. The evaluation reveals that while our approach achieves an accuracy of 83.3%, showcasing its effectiveness across all baselines, the model still struggles significantly in terms of external truth integration. These findings suggest that considerable work is still required to fully leverage RAG in question-answering tasks.<\/jats:p>","DOI":"10.3390\/bdcc8090115","type":"journal-article","created":{"date-parts":[[2024,9,9]],"date-time":"2024-09-09T03:04:18Z","timestamp":1725851058000},"page":"115","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":39,"title":["QA-RAG: Exploring LLM Reliance on External Knowledge"],"prefix":"10.3390","volume":"8","author":[{"given":"Aigerim","family":"Mansurova","sequence":"first","affiliation":[{"name":"Big Data and Blockchain Technologies Science and Innovation Center, Astana IT University, 020000 Astana, Kazakhstan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9076-0722","authenticated-orcid":false,"given":"Aiganym","family":"Mansurova","sequence":"additional","affiliation":[{"name":"Big Data and Blockchain Technologies Science and Innovation Center, Astana IT University, 020000 Astana, Kazakhstan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5522-4421","authenticated-orcid":false,"given":"Aliya","family":"Nugumanova","sequence":"additional","affiliation":[{"name":"Big Data and Blockchain Technologies Science and Innovation Center, Astana IT University, 020000 Astana, Kazakhstan"}]}],"member":"1968","published-online":{"date-parts":[[2024,9,9]]},"reference":[{"key":"ref_1","unstructured":"OpenAI (2024, June 17). ChatGPT (Mar 14 Version) [Large Language Model]. Available online: https:\/\/chat.openai.com\/chat."},{"key":"ref_2","unstructured":"Chase, H. (2022). LangChain, GitHub. Available online: https:\/\/github.com\/langchain-ai\/langchain."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., and Chung, W. (2023). A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv.","DOI":"10.18653\/v1\/2023.ijcnlp-main.45"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1109\/JAS.2023.123555","article-title":"Can ChatGPT boost artistic creation: The need of imaginative intelligence for parallel art","volume":"10","author":"Guo","year":"2023","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3571730","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_6","unstructured":"He, H., Zhang, H., and Roth, D. (2022). Rethinking with retrieval: Faithful large language model inference. arXiv."},{"key":"ref_7","unstructured":"Shen, X., Chen, Z., Backes, M., and Zhang, Y. (2023). In ChatGPT we trust? measuring and characterizing the reliability of ChatGPT. arXiv."},{"key":"ref_8","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive nlp tasks","volume":"33","author":"Lewis","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_9","unstructured":"Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., and Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv."},{"key":"ref_10","unstructured":"Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., Van Den Driessche, G.B., Lespiau, J.-B., Damoc, B., and Sifre, L. (2022, January 17\u201323). Improving language models by retrieving from trillions of tokens. Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA. PMLR."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Roberts, A., Raffel, C., and Shazeer, N. (2020). How much knowledge can you pack into the parameters of a language model?. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-main.437"},{"key":"ref_12","unstructured":"Elaraby, M., Lu, M., Dunn, J., Zhang, X., Wang, Y., and Liu, S. (2023). Halo: Estimation and reduction of hallucinations in open-source weak large language models. arXiv."},{"key":"ref_13","unstructured":"Stechly, K., Marquez, M., and Kambhampati, S. (2023). GPT-4 Doesn\u2019t Know It\u2019s Wrong: An Analysis of Iterative Prompting for Reasoning Problems. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Huang, R., Li, M., Yang, D., Shi, J., Chang, X., Ye, Z., Wu, Y., Hong, Z., Huang, J., and Watanabe, S. (2024, January 20\u201327). Audiogpt: Understanding and generating speech, music, sound, and talking head. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada.","DOI":"10.1609\/aaai.v38i21.30570"},{"key":"ref_15","first-page":"1","article-title":"Atlas: Few-shot learning with retrieval augmented language models","volume":"24","author":"Izacard","year":"2023","journal-title":"J. Mach. Learn. Res."},{"key":"ref_16","unstructured":"Guu, K., Lee, K., Tung, Z., Pasupat, P., and Chang, M. (2020, January 13\u201318). Retrieval augmented language model pre-training. Proceedings of the International Conference on Machine Learning, Vienna, Austria. PMLR."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Trivedi, H., Balasubramanian, N., Khot, T., and Sabharwal, A. (2022). Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv.","DOI":"10.18653\/v1\/2023.acl-long.557"},{"key":"ref_18","unstructured":"Chen, J., Lin, H., Han, X., and Sun, L. (2024, January 20\u201327). Benchmarking large language models in retrieval-augmented generation. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada."},{"key":"ref_19","unstructured":"Wang, H., Huang, W., Deng, Y., Wang, R., Wang, Z., Wang, Y., Mi, F., Pan, J.Z., and Wong, K.F. (2024). UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems. arXiv."},{"key":"ref_20","first-page":"27","article-title":"Development of a question answering chatbot for blockchain domain","volume":"15","author":"Mansurova","year":"2023","journal-title":"Sci. J. Astana IT Univ."},{"key":"ref_21","unstructured":"Jacob, T.P., Bizotto BL, S., and Sathiyanarayanan, M. (2024, January 16\u201317). Constructing the ChatGPT for PDF Files with Langchain\u2013AI. Proceedings of the 2024 International Conference on Inventive Computation Technologies (ICICT), Bangkok, Thailand."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Topsakal, O., and Akinci, T.C. (2023, January 10\u201312). Creating large language model applications utilizing langchain: A primer on developing llm apps fast. Proceedings of the International Conference on Applied Engineering and Natural Sciences, Konya, Turkey.","DOI":"10.59287\/icaens.1127"},{"key":"ref_23","unstructured":"Pandya, K., and Holia, M. (2023). Automating Customer Service using LangChain: Building custom open-source GPT Chatbot for organizations. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Singh, A., Ehtesham, A., Mahmud, S., and Kim, J.H. (2024, January 8\u201310). Revolutionizing Mental Health Care through LangChain: A Journey with a Large Language Model. Proceedings of the 2024 IEEE 14th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA.","DOI":"10.1109\/CCWC60891.2024.10427865"},{"key":"ref_25","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lyu, Y., Li, Z., Niu, S., Xiong, F., Tang, B., Wang, W., Wu, H., Liu, H., Xu, T., and Chen, E. (2024). CRUD-RAG: A comprehensive chinese benchmark for retrieval-augmented generation of large language models. arXiv.","DOI":"10.1145\/3701228"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Cormack, G.V., Clarke, C.L., and Buettcher, S. (2009, January 19\u201323). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, MA, USA.","DOI":"10.1145\/1571941.1572114"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ni, J., Qu, C., Lu, J., Dai, Z., Abrego, G.H., Ma, J., Zhao, V., Luan, Y., Hall, K., and Chang, M.-W. (2021). Large dual encoders are generalizable retrievers. arXiv.","DOI":"10.18653\/v1\/2022.emnlp-main.669"},{"key":"ref_29","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., and Scialom, T. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Joshi, M., Choi, E., Weld, D.S., and Zettlemoyer, L. (2017). Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv.","DOI":"10.18653\/v1\/P17-1147"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. arXiv.","DOI":"10.18653\/v1\/D16-1264"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Reddy, R.G., Small, K., Zhang, T., and Ji, H. (2024). Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization. arXiv.","DOI":"10.18653\/v1\/2024.findings-naacl.48"},{"key":"ref_33","unstructured":"Liu, Y., Huang, L., Li, S., Chen, S., Zhou, H., Meng, F., Zhou, J., and Sun, X. (2023). Recall: A benchmark for llms robustness against external counterfactual knowledge. arXiv."},{"key":"ref_34","unstructured":"Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., and Schulman, J. (2021). Webgpt: Browser-assisted question-answering with human feedback. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1162\/coli_a_00322","article-title":"A structured review of the validity of BLEU","volume":"44","author":"Reiter","year":"2018","journal-title":"Comput. Linguist."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Chiang, C.H., and Lee, H.Y. (2023). Can large language models be an alternative to human evaluations?. arXiv.","DOI":"10.18653\/v1\/2023.acl-long.870"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R., and Zhu, C. (2023). Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv.","DOI":"10.18653\/v1\/2023.emnlp-main.153"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Svikhnushina, E., and Pu, P. (2023, January 11\u201315). Approximating online human evaluation of social chatbots with prompting. Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Prague, Czechia.","DOI":"10.18653\/v1\/2023.sigdial-1.25"},{"key":"ref_39","unstructured":"Es, S., James, J., Espinosa-Anke, L., and Schockaert, S. (2023). Ragas: Automated evaluation of retrieval augmented generation. arXiv."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/9\/115\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:51:44Z","timestamp":1760111504000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/9\/115"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,9]]},"references-count":39,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["bdcc8090115"],"URL":"https:\/\/doi.org\/10.3390\/bdcc8090115","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,9]]}}}