{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T13:04:11Z","timestamp":1781787851396,"version":"3.54.5"},"reference-count":70,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T00:00:00Z","timestamp":1761782400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan","award":["AP22787410"],"award-info":[{"award-number":["AP22787410"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>This paper presents a systematic evaluation of large language models (LLMs) and retrieval-augmented generation (RAG) approaches for question answering (QA) in the low-resource Kazakh language. We assess the performance of existing proprietary (GPT-4o, Gemini 2.5-flash) and open-source Kazakh-oriented models (KazLLM-8B, Sherkala-8B, Irbis-7B) across closed-book and RAG settings. Within a three-stage evaluation framework we benchmark retriever quality, examine LLM abilities such as knowledge-gap detection, external truth integration and context grounding, and measures gains from realistic end-to-end RAG pipelines. Our results show a clear pattern: proprietary models lead in closed-book QA, but RAG narrows the gap substantially. Under the Ideal RAG setting, KazLLM-8B improves from its closed-book baseline of 0.427 to reach answer correctness of 0.867, closely matching GPT-4o\u2019s score of 0.869. In the end-to-end RAG setup, KazLLM-8B paired with Snowflake retriever achieved answer correctness up to 0.754, surpassing GPT-4o\u2019s best score of 0.632. Despite improvements, RAG outcomes show an inconsistency: high retrieval metrics do not guarantee high QA system accuracy. The findings highlight the importance of retrievers and context grounding strategies in enabling open-source Kazakh models to deliver competitive QA performance in a low-resource setting.<\/jats:p>","DOI":"10.3390\/info16110943","type":"journal-article","created":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T05:28:43Z","timestamp":1761888523000},"page":"943","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-1978-9574","authenticated-orcid":false,"given":"Aigerim","family":"Mansurova","sequence":"first","affiliation":[{"name":"Big Data and Blockchain Technologies Research and Innovation Center, Astana IT University, Astana 020000, Kazakhstan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9560-9756","authenticated-orcid":false,"given":"Arailym","family":"Tleubayeva","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Data Science, Astana IT University, Astana 020000, Kazakhstan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5522-4421","authenticated-orcid":false,"given":"Aliya","family":"Nugumanova","sequence":"additional","affiliation":[{"name":"Big Data and Blockchain Technologies Research and Innovation Center, Astana IT University, Astana 020000, Kazakhstan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Adai","family":"Shomanov","sequence":"additional","affiliation":[{"name":"Computer Science Department, Nazarbayev University, Astana 020000, Kazakhstan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7323-3695","authenticated-orcid":false,"given":"Sadi Evren","family":"Seker","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Faculty of Computer and Information Technologies, Istanbul University, 34320 Istanbul, Turkey"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Jiang, S., Xie, X., Tang, R., Wang, X., Sun, K., Li, G., Xu, Z., Xue, P., Li, Z., and Fu, X. (2025). ARGUS: Retrieval-Augmented QA System for Government Services. Electronics, 14.","DOI":"10.3390\/electronics14122445"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Jiang, F., Qin, C., Yao, K., Fang, C., Zhuang, F., Zhu, H., and Xiong, H. (2024, January 8\u201311). Enhancing Question Answering for Enterprise Knowledge Bases Using Large Language Models. Proceedings of the International Conference on Database Systems for Advanced Applications, Singapore.","DOI":"10.1007\/978-981-97-5562-2_18"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1126\/science.adh2586","article-title":"Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence","volume":"381","author":"Noy","year":"2023","journal-title":"Science"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2528","DOI":"10.1007\/s12559-024-10285-1","article-title":"ChatGPT Needs SPADE (Sustainability, Privacy, Digital Divide, and Ethics) Evaluation: A Review","volume":"16","author":"Khowaja","year":"2024","journal-title":"Cogn. Comput."},{"key":"ref_5","unstructured":"Veitsman, Y., and Hartmann, M. (2024). Recent Advancements and Challenges of Turkic Central Asian Language Processing. arXiv."},{"key":"ref_6","unstructured":"(2025, August 12). WorldData.info. Spread of the Kazakh Language. Total Native Speakers: Approximately 15.3 Million, Including 13.3 Million in Kazakhstan. Available online: https:\/\/www.worlddata.info\/languages\/kazakh.php."},{"key":"ref_7","unstructured":"Koto, F., Joshi, R., Mukhituly, N., Wang, Y., Xie, Z., Pal, R., Orel, D., Mullah, P., Turmakhan, D., and Goloburda, M. (2025). Llama-3.1-Sherkala-8B-Chat: An Open Large Language Model for Kazakh. arXiv."},{"key":"ref_8","unstructured":"Institute of Smart Systems and Artificial Intelligence (ISSAI), Nazarbayev University (2025, August 10). (n.d.). *LLama-3.1-KazLLM-1.0-8B* [Large Language Model]. Hugging Face. Available online: https:\/\/huggingface.co\/issai\/LLama-3.1-KazLLM-1.0-8B."},{"key":"ref_9","unstructured":"Astana Hub (2025, August 10). AlemLLM [Large Language Model]. Hugging Face. Available online: https:\/\/huggingface.co\/astanahub\/alemllm."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kadyrbek, N., Tuimebayev, Z., Mansurova, M., and Viegas, V. (2025). The Development of Small-Scale Language Models for Low-Resource Languages, with a Focus on Kazakh and Direct Preference Optimization. Big Data Cogn. Comput., 9.","DOI":"10.3390\/bdcc9050137"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3151","DOI":"10.1007\/s10115-022-01744-y","article-title":"Conversational question answering: A survey","volume":"64","author":"Zaib","year":"2022","journal-title":"Knowl. Inf. Syst."},{"key":"ref_12","unstructured":"Alkhaldi, T.Y.S. (2023). Studies on Question Answering in Open-Book and Closed-Book Settings. [Ph.D. Thesis, Kyoto University]."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, C., Liu, P., and Zhang, Y. (2021). Can generative pre-trained language models serve as knowledge bases for closed-book QA?. arXiv.","DOI":"10.18653\/v1\/2021.acl-long.251"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3571730","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_15","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive nlp tasks","volume":"33","author":"Lewis","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_16","unstructured":"Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., and Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Mansurova, A., Mansurova, A., and Nugumanova, A. (2024). QA-RAG: Exploring LLM Reliance on External Knowledge. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8090115"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yu, S., Kim, G., and Kang, S. (2025). Context and Layers in Harmony: A Unified Strategy for Mitigating LLM Hallucinations. Mathematics, 13.","DOI":"10.20944\/preprints202504.1749.v1"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lee, M. (2023). A mathematical investigation of hallucination and creativity in GPT models. Mathematics, 11.","DOI":"10.3390\/math11102320"},{"key":"ref_20","first-page":"197","article-title":"A comparative analysis of artificial hallucinations in GPT-3.5 and GPT-4: Insights into AI progress and challenges","volume":"Volume 2","author":"Mohammed","year":"2024","journal-title":"Business Sustainability with Artificial Intelligence (AI): Challenges and Opportunities"},{"key":"ref_21","unstructured":"Li, J., Yuan, Y., and Zhang, Z. (2024). Enhancing llm factual accuracy with rag to counter hallucinations: A case study on domain-specific queries in private knowledge-bases. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Patel, N., Mouratidis, H., and Zhi, K.N.K. (2025, January 26\u201329). LLM-Based Automated Hallucination Detection in Multilingual Customer Service RAG Applications. Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Limassol, Cyprus.","DOI":"10.1007\/978-3-031-96235-6_26"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Pingua, B., Sahoo, A., Kandpal, M., Murmu, D., Rautaray, J., Barik, R.K., and Saikia, M.J. (2025). Medical LLMs: Fine-Tuning vs. Retrieval-Augmented Generation. Bioengineering, 12.","DOI":"10.3390\/bioengineering12070687"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Lakatos, R., Pollner, P., Hajdu, A., and Jo\u00f3, T. (2025). Investigating the Performance of Retrieval-Augmented Generation and Domain-Specific Fine-Tuning for the Development of AI-Driven Knowledge-Based Systems. Mach. Learn. Knowl. Extr., 7.","DOI":"10.3390\/make7010015"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Gu\u021bu, B.M., and Popescu, N. (2024). Exploring Data Analysis Methods in Generative Models: From Fine-Tuning to RAG Implementation. Computers, 13.","DOI":"10.3390\/computers13120327"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Papageorgiou, G., Sarlis, V., Maragoudakis, M., and Tjortjis, C. (2025). Hybrid Multi-Agent GraphRAG for E-Government: Towards a Trustworthy AI Assistant. Appl. Sci., 15.","DOI":"10.3390\/app15116315"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Darwish, A.M., Rashed, E.A., and Khoriba, G. (2025). Mitigating LLM Hallucinations Using a Multi-Agent Framework. Information, 16.","DOI":"10.3390\/info16070517"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Knollmeyer, S., Caymazer, O., and Grossmann, D. (2025). Document GraphRAG: Knowledge Graph Enhanced Retrieval Augmented Generation for Document Question Answering Within the Manufacturing Domain. Electronics, 14.","DOI":"10.3390\/electronics14112102"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wagenpfeil, S. (2025). Multimedia Graph Codes for Fast and Semantic Retrieval-Augmented Generation. Electronics, 14.","DOI":"10.3390\/electronics14122472"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1038\/s41746-025-01651-w","article-title":"Leveraging long context in retrieval augmented language models for medical question answering","volume":"8","author":"Zhang","year":"2025","journal-title":"npj Digit. Med."},{"key":"ref_31","unstructured":"Jiang, Z., Ma, X., and Chen, W. (2024). Longrag: Enhancing retrieval-augmented generation with long-context llms. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"105846","DOI":"10.1016\/j.autcon.2024.105846","article-title":"Performance comparison of retrieval-augmented generation and fine-tuned large language models for construction safety management knowledge retrieval","volume":"168","author":"Lee","year":"2024","journal-title":"Autom. Constr."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Erak, O., Alabbasi, N., Alhussein, O., Lotfi, I., Hussein, A., Muhaidat, S., and Debbah, M. (2024). Leveraging fine-tuned retrieval-augmented generation with long-context support: For 3GPP standards. arXiv.","DOI":"10.1109\/GCWkshp64532.2024.11100917"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"111332","DOI":"10.1016\/j.patcog.2024.111332","article-title":"Enhancing textual textbook question answering with large language models and retrieval augmented generation","volume":"162","author":"Alawwad","year":"2025","journal-title":"Pattern Recognit."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Soudani, H., Kanoulas, E., and Hasibi, F. (2024, January 6\u20139). Fine tuning vs. retrieval augmented generation for less popular knowledge. Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, Tokyo, Japan.","DOI":"10.1145\/3673791.3698415"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Byun, J., Kim, B., Cha, K.-A., and Lee, E. (2024). Design and Implementation of an Interactive Question-Answering System with Retrieval-Augmented Generation for Personalized Databases. Appl. Sci., 14.","DOI":"10.3390\/app14177995"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Shymbayev, M., and Alimzhanov, Y. (2023, January 4\u20136). Extractive question answering for Kazakh language. Proceedings of the 2023 IEEE International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan.","DOI":"10.1109\/SIST58284.2023.10223508"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"105460","DOI":"10.1109\/ACCESS.2024.3433426","article-title":"Development of a Geographical Question-Answering System in the Kazakh Language","volume":"12","author":"Mukanova","year":"2024","journal-title":"IEEE Access"},{"key":"ref_39","first-page":"89","article-title":"Comparative analysis of multilingual QA models and their adaptation to the Kazakh language","volume":"19","author":"Tleubayeva","year":"2024","journal-title":"Sci. J. Astana IT Univ."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Nugumanova, A., Apayev, K., Saken, A., Quandyq, S., Mansurova, A., and Kamiluly, A. (2024, January 15\u201317). Developing a Kazakh question-answering model: Standing on the shoulders of multilingual giants. Proceedings of the 2024 IEEE 4th International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan.","DOI":"10.1109\/SIST61555.2024.10629326"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1038\/s42256-024-00961-0","article-title":"Learning from models beyond fine-tuning","volume":"7","author":"Zheng","year":"2025","journal-title":"Nat. Mach. Intell."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Maxutov, A., Myrzakhmet, A., and Braslavski, P. (2024, January 15). Do LLMs speak Kazakh? A pilot evaluation of seven models. Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024), Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.sigturk-1.8"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Nugumanova, A., Rakhimzhanov, D., and Mansurova, A. (2025). Global Embeddings, Local Signals: Zero-Shot Sentiment Analysis of Transport Complaints. Informatics, 12.","DOI":"10.3390\/informatics12030082"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Rakhimzhanov, D., Belginova, S., and Yedilkhan, D. (2025). Automated Classification of Public Transport Complaints via Text Mining Using LLMs and Embeddings. Information, 16.","DOI":"10.3390\/info16080644"},{"key":"ref_45","unstructured":"Chase, H. (2022). LangChain, GitHub. Available online: https:\/\/github.com\/langchain-ai\/langchain."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazar\u00e9, P.E., Lomeli, M., Hosseini, L., and J\u00e9gou, H. (2024). The faiss library. arXiv.","DOI":"10.1109\/TBDATA.2025.3618474"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"100124","DOI":"10.1016\/j.nlp.2024.100124","article-title":"Evaluation of open and closed-source LLMs for low-resource language with zero-shot, few-shot, and chain-of-thought prompting","volume":"10","author":"Hossain","year":"2025","journal-title":"Nat. Lang. Process. J."},{"key":"ref_48","unstructured":"Lin, C.Y. (2004, January 25\u201326). ROUGE: A Package for Automatic Evaluation of Summaries. Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain."},{"key":"ref_49","unstructured":"Banerjee, S., and Lavie, A. (2005, January 29). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization, Ann Arbor, MI, USA."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Es, S., James, J., Anke, L.E., and Schockaert, S. (2024, January 17\u201322). Ragas: Automated evaluation of retrieval augmented generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, St. Julians, Malta.","DOI":"10.18653\/v1\/2024.eacl-demo.16"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"126680","DOI":"10.1016\/j.neucom.2023.126680","article-title":"Information retrieval algorithms and neural ranking models to detect previously fact-checked information","volume":"557","author":"Chakraborty","year":"2023","journal-title":"Neurocomputing"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z. (2024). BGE M3-Embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv.","DOI":"10.18653\/v1\/2024.findings-acl.137"},{"key":"ref_53","unstructured":"Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., and Wei, F. (2024). Multilingual E5 text embeddings: A technical report. arXiv."},{"key":"ref_54","unstructured":"Yu, P., Merrick, L., Nuti, G., and Campos, D. (2024). Arctic-Embed 2.0: Multilingual retrieval without compromise. arXiv."},{"key":"ref_55","unstructured":"Feng, F., Yang, Y., Cer, D., Arivazhagan, N., and Wang, W. (2020). Language-agnostic BERT sentence embedding. arXiv."},{"key":"ref_56","unstructured":"OpenAI (2025, September 04). Text-Embedding-3-Large Model. Available online: https:\/\/platform.openai.com\/docs\/guides\/embeddings."},{"key":"ref_57","unstructured":"Gen2B (2025, August 08). Irbis-7b-Instruct LoRA. Hugging Face. Available online: https:\/\/huggingface.co\/Gen2B\/Irbis-7b-Instruct_lora."},{"key":"ref_58","unstructured":"OpenAI (2025, August 08). GPT-4\u2014Proprietary Model Accessed via the OpenAI API (Exact Model Version). OpenAI. Available online: https:\/\/platform.openai.com."},{"key":"ref_59","unstructured":"Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., and Rosen, E. (2025). Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv."},{"key":"ref_60","unstructured":"Yeshpanov, R., Efimov, P., Boytsov, L., Shalkarbayuli, A., and Braslavski, P. (2024, January 20\u201325). KazQAD: Kazakh open-domain question answering dataset. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Tleubayeva, A., Aubakirov, S., Tabuldin, A., and Shomanov, A. (2025, January 14\u201316). Development and Evaluation of a Small Kazakh Language Corpus to Improve the Efficiency of Multilingual NLP Systems in Low-Resource Environments. Proceedings of the 2025 IEEE 5th International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan.","DOI":"10.1109\/SIST61657.2025.11139363"},{"key":"ref_62","unstructured":"Mbzuai (2025, August 08). Kazmmlu: Kazakh_History. Hugging Face. Available online: https:\/\/huggingface.co\/datasets\/MBZUAI\/KazMMLU."},{"key":"ref_63","unstructured":"Simple Kazakh Question Answering Dataset (sKQuAD) (2025, September 14). Hugging Face Datasets. Available online: https:\/\/huggingface.co\/datasets\/Kyrmasch\/sKQuAD."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1016\/j.jksuci.2014.10.007","article-title":"A survey on question answering systems with classification","volume":"28","author":"Mishra","year":"2016","journal-title":"J. King Saud Univ.-Comput. Inf. Sci."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Hawthorne, J., Radcliffe, F., and Whitaker, L. (2024). Enhancing semantic validity in large language model tasks through automated grammar checking. arXiv.","DOI":"10.31219\/osf.io\/7xp6s"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Qiu, Z., Duan, X., and Cai, Z. (2024, January 15). Evaluating grammatical well-formedness in large language models: A comparative study with human judgments. Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.cmcl-1.16"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"e101570","DOI":"10.1136\/bmjhci-2025-101570","article-title":"Development and evaluation of an agentic LLM based RAG framework for evidence-based patient education","volume":"32","author":"AlSammarraie","year":"2025","journal-title":"BMJ Health Care Inform."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Fanous, A., Goldberg, J., Agarwal, A., Lin, J., Zhou, A., Xu, S., Bikia, V., Daneshjou, R., and Koyejo, S. (2025, January 20\u201322). Syceval: Evaluating LLM sycophancy. Proceedings of the AAAI\/ACM Conference on AI, Ethics, and Society, Madrid, Spain.","DOI":"10.1609\/aies.v8i1.36598"},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1140\/epjds\/s13688-025-00579-1","article-title":"Selective agreement, not sycophancy: Investigating opinion dynamics in LLM interactions","volume":"14","author":"Cau","year":"2025","journal-title":"EPJ Data Sci."},{"key":"ref_70","unstructured":"Ranaldi, L., and Pucci, G. (2023). When large language models contradict humans? Large language models\u2019 sycophantic behaviour. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/11\/943\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T05:44:19Z","timestamp":1761889459000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/11\/943"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,30]]},"references-count":70,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["info16110943"],"URL":"https:\/\/doi.org\/10.3390\/info16110943","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,30]]}}}