{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T23:38:07Z","timestamp":1776209887718,"version":"3.50.1"},"reference-count":32,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2025,5,26]],"date-time":"2025-05-26T00:00:00Z","timestamp":1748217600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100013209","name":"European Union\u2014NextGenerationEU","doi-asserted-by":"publisher","award":["15010"],"award-info":[{"award-number":["15010"]}],"id":[{"id":"10.13039\/501100013209","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>The integration of Large Language Models (LLMs) in chatbot applications gains momentum. However, to successfully deploy such systems, the underlying capabilities of LLMs must be carefully considered, especially when dealing with low-resource languages and specialized fields. This paper presents the results of a comprehensive evaluation of several LLMs conducted in the context of a chatbot agent designed to assist migrants in their integration process. Our aim is to identify the optimal LLM that can effectively process and generate text in Greek and provide accurate information, addressing the specific needs of migrant populations. The design of the evaluation methodology leverages input from experts on social assistance initiatives, social impact and technological solutions, as well as from automated LLM self-evaluations. Given the linguistic challenges specific to the Greek language and the application domain, research findings indicate that Claude 3.7 Sonnet and Gemini 2.0 Flash demonstrate superior performance across all criteria, with Claude 3.7 Sonnet emerging as the leading candidate for the chatbot. Moreover, the results suggest that automated custom evaluations of LLMs can align with human assessments, offering a viable option for preliminary low-cost analysis to assist stakeholders in selecting the optimal LLM based on user and application domain requirements.<\/jats:p>","DOI":"10.3390\/fi17060235","type":"journal-article","created":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T05:52:52Z","timestamp":1748325172000},"page":"235","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["LLM Performance in Low-Resource Languages: Selecting an Optimal Model for Migrant Integration Support in Greek"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-7922-339X","authenticated-orcid":false,"given":"Alexandros","family":"Tassios","sequence":"first","affiliation":[{"name":"School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3857-1552","authenticated-orcid":false,"given":"Stergios","family":"Tegos","sequence":"additional","affiliation":[{"name":"School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8809-3303","authenticated-orcid":false,"given":"Christos","family":"Bouas","sequence":"additional","affiliation":[{"name":"School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0557-2542","authenticated-orcid":false,"given":"Konstantinos","family":"Manousaridis","sequence":"additional","affiliation":[{"name":"School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0658-5065","authenticated-orcid":false,"given":"Maria","family":"Papoutsoglou","sequence":"additional","affiliation":[{"name":"School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2422-7889","authenticated-orcid":false,"given":"Maria","family":"Kaltsa","sequence":"additional","affiliation":[{"name":"Department of Theoretical & Applied Linguistics, School of English, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece"},{"name":"Information Technologies Institute, Center for Research and Technology Hellas, 57001 Thessaloniki, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7166-9559","authenticated-orcid":false,"given":"Eleni","family":"Dimopoulou","sequence":"additional","affiliation":[{"name":"PRAKSIS, 10432 Athens, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7326-5910","authenticated-orcid":false,"given":"Thanassis","family":"Mavropoulos","sequence":"additional","affiliation":[{"name":"Information Technologies Institute, Center for Research and Technology Hellas, 57001 Thessaloniki, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2505-9178","authenticated-orcid":false,"given":"Stefanos","family":"Vrochidis","sequence":"additional","affiliation":[{"name":"Information Technologies Institute, Center for Research and Technology Hellas, 57001 Thessaloniki, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4242-5245","authenticated-orcid":false,"given":"Georgios","family":"Meditskos","sequence":"additional","affiliation":[{"name":"School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,5,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"26839","DOI":"10.1109\/ACCESS.2024.3365742","article-title":"A review on large Language Models: Architectures, applications, taxonomies, open issues and challenges","volume":"12","author":"Raiaan","year":"2024","journal-title":"IEEE Access"},{"key":"ref_2","unstructured":"Dong, G., Wang, H., Sun, J., and Wang, X. (2024). Evaluating and Mitigating Linguistic Discrimination in Large Language Models. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Meditskos, G., Tegos, S., Bouas, C., Tassios, A., Manousaridis, K., Papoutsoglou, M., Mavropoulos, T., and Vrochidis, S. (2024). Towards Semantically Conscious, Conversation-Based Chatbot Services for Migrants. Artificial Intelligence Applications and Innovations, Proceedings of the 20th IFIP WG 12.5 International Conference, AIAI 2024, Corfu, Greece, 27\u201330 June 2024, Springer.","DOI":"10.1007\/978-3-031-63219-8_11"},{"key":"ref_4","first-page":"e78196","article-title":"Health Needs and Access to Healthcare Services in Migrant Populations in Greece: Data From the Hprolipsis Study","volume":"17","author":"Anagnostou","year":"2025","journal-title":"Cureus"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"100124","DOI":"10.1016\/j.nlp.2024.100124","article-title":"Evaluation of open and closed-source LLMs for low-resource language with zero-shot, few-shot, and chain-of-thought prompting","volume":"10","author":"Nazi","year":"2025","journal-title":"Nat. Lang. Process. J."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1007\/s11023-024-09694-w","article-title":"Mapping the ethics of generative ai: A comprehensive scoping review","volume":"34","author":"Hagendorff","year":"2024","journal-title":"Minds Mach."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/s10676-024-09745-x","article-title":"Ethics of generative AI and manipulation: A design-oriented research agenda","volume":"26","author":"Klenk","year":"2024","journal-title":"Ethics Inf. Technol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"104103","DOI":"10.1016\/j.im.2025.104103","article-title":"Addressing bias in generative AI: Challenges and research opportunities in information management","volume":"62","author":"Wei","year":"2025","journal-title":"Inf. Manag."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Mattheoudakis, M., Fotiadou, G., and Papadopoulou, D. (2025). CLIL on the spot: Migrant education in Greece. Front. Educ., 9.","DOI":"10.3389\/feduc.2024.1504257"},{"key":"ref_10","unstructured":"Chiang, W.L., Zheng, L., Sheng, Y., Angelopoulos, A.N., Li, T., Li, D., Zhang, H., Zhu, B., Jordan, M., and Gonzalez, J.E. (2024). Chatbot arena: An open platform for evaluating llms by human preference. arXiv."},{"key":"ref_11","first-page":"58478","article-title":"Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and LLMs evaluations","volume":"36","author":"Yuan","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_12","unstructured":"Agarwal, V., Garg, M.K., Dharmavaram, S., and Kumar, D. (2024). \u201cWhich LLM should I use?\u201d: Evaluating LLMs for tasks performed by Undergraduate Computer Science Students in India. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Oniani, D., and Wang, Y. (2020, January 21\u201324). A qualitative evaluation of language models on automatic question-answering for COVID-19. Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Virtual.","DOI":"10.1145\/3388440.3412413"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Xiao, C., Xu, S.X., Zhang, K., Wang, Y., and Xia, L. (2023, January 13\u201314). Evaluating reading comprehension exercises generated by LLMs: A showcase of ChatGPT in education applications. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.bea-1.52"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Murugadoss, B., Poelitz, C., Drosos, I., Le, V., McKenna, N., Negreanu, C.S., Parnin, C., and Sarkar, A. (2024). Evaluating the Evaluator: Measuring LLMs\u2019 Adherence to Task Evaluation Instructions. arXiv.","DOI":"10.1609\/aaai.v39i18.34157"},{"key":"ref_16","unstructured":"Srivastava, A., Rastogi, A., Rao, A., Shoeb, A.A.M., Abid, A., Fisch, A., Brown, A.R., Santoro, A., Gupta, A., and Garriga-Alonso, A. (2022). Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv."},{"key":"ref_17","first-page":"91","article-title":"Comparative Evaluation of Topic Detection: Humans vs. LLMs","volume":"13","author":"Kosar","year":"2024","journal-title":"Comput. Linguist. Neth. J."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Hu, X., Gao, M., Hu, S., Zhang, Y., Chen, Y., Xu, T., and Wan, X. (2024). Are LLM-based Evaluators Confusing NLG Quality Criteria?. arXiv.","DOI":"10.18653\/v1\/2024.acl-long.516"},{"key":"ref_19","unstructured":"Panickssery, A., Bowman, S.R., and Feng, S. (2024). Llm evaluators recognize and favor their own generations. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Desmond, M., Ashktorab, Z., Pan, Q., Dugan, C., and Johnson, J.M. (2024, January 18\u201321). EvaluLLM: LLM assisted evaluation of generative outputs. Proceedings of the Companion Proceedings of the 29th International Conference on Intelligent User Interfaces, Greenville, SC, USA.","DOI":"10.1145\/3640544.3645216"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Shankar, S., Zamfirescu-Pereira, J., Hartmann, B., Parameswaran, A.G., and Arawjo, I. (2024). Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences. arXiv.","DOI":"10.1145\/3654777.3676450"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1145\/3687034","article-title":"Enhancing Conversations in Migrant Counseling Services: Designing for Trustworthy Human-AI Collaboration","volume":"8","author":"Truong","year":"2024","journal-title":"Proc. ACM Hum.-Comput. Interact."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Fazzinga, B., Palmieri, E., Vestoso, M., Bolognini, L., Galassi, A., Furfaro, F., and Torroni, P. (2024, January 28\u201330). A Chatbot for Asylum-Seeking Migrants in Europe. Proceedings of the 2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI), Herndon, VA, USA.","DOI":"10.1109\/ICTAI62512.2024.00104"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Lee, S., Choi, D., Truong, L., Sawhney, N., and Paakki, H. (May, January 26). Into the Unknown: Leveraging Conversational AI in Supporting Young Migrants\u2019 Journeys Towards Cultural Adaptation. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan.","DOI":"10.1145\/3706598.3713091"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Buba\u0161, G., \u010ci\u017eme\u0161ija, A., and Kova\u010di\u0107, A. (2023). Development of an assessment scale for measurement of usability and user experience characteristics of Bing chat conversational AI. Future Internet, 16.","DOI":"10.3390\/fi16010004"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lynch, C.J., Jensen, E.J., Zamponi, V., O\u2019Brien, K., Frydenlund, E., and Gore, R. (2023). A structured narrative prompt for prompting narratives from large language models: Sentiment assessment of ChatGPT-generated narratives and real tweets. Future Internet, 15.","DOI":"10.3390\/fi15120375"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Roumeliotis, K.I., and Tselikas, N.D. (2023). Chatgpt and open-ai models: A preliminary review. Future Internet, 15.","DOI":"10.3390\/fi15060192"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Branda, F., Stella, M., Ceccarelli, C., Cabitza, F., Ceccarelli, G., Maruotti, A., Ciccozzi, M., and Scarpa, F. (2025). The Role of AI-Based Chatbots in Public Health Emergencies: A Narrative Review. Future Internet, 17.","DOI":"10.3390\/fi17040145"},{"key":"ref_29","unstructured":"Jung, D., Butler, A., Park, J., and Saperstein, Y. (2024). Evaluating the Impact of a Specialized LLM on Physician Experience in Clinical Decision Support: A Comparison of Ask Avo and ChatGPT-4. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Nallur, V. (2023). Anxiety among migrants-questions for agent simulation. Autonomous Agents and Multiagent Systems. Best and Visionary Papers, Proceedings of the AAMAS 2023 Workshops, London, UK, 29 May\u20132 June 2023, Springer.","DOI":"10.1007\/978-3-031-56255-6_8"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Coen, E., Del Fiol, G., Kaphingst, K.A., Borsato, E., Shannon, J., Smith, H.S., Masino, A., and Allen, C.G. (2024). Chatbot for the Return of Positive Genetic Screening Results for Hereditary Cancer Syndromes: A Prompt Engineering Study. Res. Sq.","DOI":"10.2196\/preprints.65848"},{"key":"ref_32","unstructured":"Kamalloo, E., Jafari, A., Zhang, X., Thakur, N., and Lin, J. (2023). Hagrid: A human-llm collaborative dataset for generative information-seeking with attribution. arXiv."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/6\/235\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:40:50Z","timestamp":1760031650000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/6\/235"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,26]]},"references-count":32,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["fi17060235"],"URL":"https:\/\/doi.org\/10.3390\/fi17060235","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,26]]}}}