{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T19:17:15Z","timestamp":1776107835342,"version":"3.50.1"},"reference-count":97,"publisher":"Elsevier BV","issue":"2","license":[{"start":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T00:00:00Z","timestamp":1719273600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T00:00:00Z","timestamp":1719273600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Artif Intell Educ"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>This paper assesses the potential for the large language models (LLMs) GPT-4 and GPT-3.5 to aid in deriving insight from education feedback surveys. Exploration of LLM use cases in education has focused on teaching and learning, with less exploration of capabilities in education feedback analysis. Survey analysis in education involves goals such as finding gaps in curricula or evaluating teachers, often requiring time-consuming manual processing of textual responses. LLMs have the potential to provide a flexible means of achieving these goals without specialized machine learning models or fine-tuning. We demonstrate a versatile approach to such goals by treating them as sequences of natural language processing (NLP) tasks including classification (multi-label, multi-class, and binary), extraction, thematic analysis, and sentiment analysis, each performed by LLM. We apply these workflows to a real-world dataset of 2500 end-of-course survey comments from biomedical science courses, and evaluate a zero-shot approach (i.e., requiring no examples or labeled training data) across all tasks, reflecting education settings, where labeled data is often scarce. By applying effective prompting practices, we achieve human-level performance on multiple tasks with GPT-4, enabling workflows necessary to achieve typical goals. We also show the potential of inspecting LLMs\u2019 chain-of-thought (CoT) reasoning for providing insight that may foster confidence in practice. Moreover, this study features development of a versatile set of classification categories, suitable for various course types (online, hybrid, or in-person) and amenable to customization. Our results suggest that LLMs can be used to derive a range of insights from survey text.<\/jats:p>","DOI":"10.1007\/s40593-024-00414-0","type":"journal-article","created":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T16:15:34Z","timestamp":1719332134000},"page":"444-481","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["A Large Language Model Approach to Educational Survey Feedback Analysis"],"prefix":"10.1016","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4739-5217","authenticated-orcid":false,"given":"Michael J.","family":"Parker","sequence":"first","affiliation":[]},{"given":"Caitlin","family":"Anderson","sequence":"additional","affiliation":[]},{"given":"Claire","family":"Stone","sequence":"additional","affiliation":[]},{"given":"YeaRim","family":"Oh","sequence":"additional","affiliation":[]}],"member":"78","published-online":{"date-parts":[[2024,6,25]]},"reference":[{"key":"414_CR1","unstructured":"Abdali, S., Parikh, A., Lim, S. & Kiciman, E. (2023). Extracting self-consistent causal insights from users feedback with LLMs and in-context learning. In arXiv [cs.AI]. arXiv. Retrieved April 5, 2024, from http:\/\/arxiv.org\/abs\/2312.06820"},{"key":"414_CR2","doi-asserted-by":"crossref","unstructured":"Aldeman, M., & Branoff, T. J. (2021). Impact of course modality on student course evaluations. Paper presented at 2021 ASEE Virtual Annual Conference Content Access, Virtual Online. Retrieved August 21, 2023, from https:\/\/peer.asee.org\/37275.pdf","DOI":"10.18260\/1-2--37275"},{"issue":"1","key":"414_CR3","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1016\/j.stueduc.2009.01.002","volume":"35","author":"FN-A Alhija","year":"2009","unstructured":"Alhija, F.N.-A., & Fresko, B. (2009). Student evaluation of instruction: What can be learned from students\u2019 written comments? Studies in Educational Evaluation, 35(1), 37\u201344. https:\/\/doi.org\/10.1016\/j.stueduc.2009.01.002","journal-title":"Studies in Educational Evaluation"},{"issue":"2","key":"414_CR4","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1191\/1478088706qp063oa","volume":"3","author":"V Braun","year":"2006","unstructured":"Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77\u2013101. https:\/\/doi.org\/10.1191\/1478088706qp063oa","journal-title":"Qualitative Research in Psychology"},{"key":"414_CR5","unstructured":"Brennan, J., & Williams, R. (2004). Collecting and using student feedback. A guide to good practice. Learning and Teaching Support Network. Retrieved August 21, 2023, from https:\/\/www.advance-he.ac.uk\/knowledge-hub\/collecting-and-using-student-feedback-guide-good-practice"},{"key":"414_CR6","unstructured":"cardiffnlp\/twitter-roberta-base-sentiment-latest. (2022). Retrieved August 21, 2023, from https:\/\/huggingface.co\/cardiffnlp\/twitter-roberta-base-sentiment-latest."},{"key":"414_CR7","unstructured":"Chen, L., Zaharia, M., & Zou, J. (2023). How is ChatGPT\u2019s behavior changing over time? arXiv [cs.CL]. Retrieved August 21, 2023, from https:\/\/arxiv.org\/abs\/2307.09009"},{"issue":"4","key":"414_CR8","doi-asserted-by":"publisher","first-page":"305","DOI":"10.1109\/TE.2019.2924385","volume":"62","author":"S Cunningham-Nelson","year":"2019","unstructured":"Cunningham-Nelson, S., Baktashmotlagh, M., & Boles, W. (2019). Visualizing student opinion through text analysis. IEEE Transactions on Education, 62(4), 305\u2013311. https:\/\/doi.org\/10.1109\/TE.2019.2924385","journal-title":"IEEE Transactions on Education"},{"key":"414_CR9","doi-asserted-by":"publisher","unstructured":"Deepa, D., Raaji, & Tamilarasi, A. (2019). Sentiment analysis using feature extraction and dictionary-based approaches. In 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 786\u2013790. https:\/\/doi.org\/10.1109\/I-SMAC47947.2019.9032456","DOI":"10.1109\/I-SMAC47947.2019.9032456"},{"key":"414_CR10","unstructured":"Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv [cs.CL]. Retrieved August 21, 2023, from http:\/\/arxiv.org\/abs\/1810.04805"},{"issue":"3","key":"414_CR11","first-page":"285","volume":"33","author":"NP Diaz","year":"2022","unstructured":"Diaz, N. P., Walker, J. P., Rocconi, L. M., Morrow, J. A., Skolits, G. J., Osborne, J. D., & Parlier, T. R. (2022). Faculty use of end-of-course evaluations. International Journal of Teaching and Learning in Higher Education, 33(3), 285\u2013297.","journal-title":"International Journal of Teaching and Learning in Higher Education"},{"issue":"5","key":"414_CR12","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1080\/02602930410001689171","volume":"29","author":"CJ Dommeyer","year":"2004","unstructured":"Dommeyer, C. J., Baum, P., Hanna, R. W., & Chapman, K. S. (2004). Gathering faculty teaching evaluations by in-class and online surveys: Their effects on response rates and evaluations. Assessment & Evaluation in Higher Education, 29(5), 611\u2013623. https:\/\/doi.org\/10.1080\/02602930410001689171","journal-title":"Assessment & Evaluation in Higher Education"},{"key":"414_CR13","doi-asserted-by":"publisher","unstructured":"Edalati, M., Imran, A. S., Kastrati, Z., & Daudpota, S. M. (2022). The potential of machine learning algorithms for sentiment classification of students\u2019 feedback on MOOC. In Intelligent Systems and Applications (pp. 11\u201322). Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-030-82199-9_2","DOI":"10.1007\/978-3-030-82199-9_2"},{"key":"414_CR14","doi-asserted-by":"publisher","unstructured":"Fan, X., Luo, W., Menekse, M., Litman, D. & Wang, J. (2015). CourseMIRROR: Enhancing large classroom instructor-student interactions via mobile interfaces and natural language processing. Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, 1473\u20131478. https:\/\/doi.org\/10.1145\/2702613.2732853","DOI":"10.1145\/2702613.2732853"},{"issue":"112","key":"414_CR15","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1002\/ir.29","volume":"2001","author":"AS Ferren","year":"2001","unstructured":"Ferren, A. S., & Aylesworth, M. S. (2001). Using qualitative and quantitative information in academic decision making. New Directions for Institutional Research, 2001(112), 67\u201383. https:\/\/doi.org\/10.1002\/ir.29","journal-title":"New Directions for Institutional Research"},{"issue":"7","key":"414_CR16","doi-asserted-by":"publisher","first-page":"1054","DOI":"10.1080\/02602938.2016.1224997","volume":"42","author":"J Flod\u00e9n","year":"2017","unstructured":"Flod\u00e9n, J. (2017). The impact of student feedback on teaching in higher education. Assessment & Evaluation in Higher Education, 42(7), 1054\u20131068. https:\/\/doi.org\/10.1080\/02602938.2016.1224997","journal-title":"Assessment & Evaluation in Higher Education"},{"key":"414_CR17","doi-asserted-by":"crossref","unstructured":"Gilardi, F., Alizadeh, M., & Kubli, M. (2023). ChatGPT outperforms crowd-workers for text-annotation tasks. arXiv [cs.CL]. Retrieved August 21, 2023, from http:\/\/arxiv.org\/abs\/2303.15056","DOI":"10.1073\/pnas.2305016120"},{"issue":"11","key":"414_CR18","doi-asserted-by":"publisher","first-page":"5396","DOI":"10.3390\/app12115396","volume":"12","author":"D Go\u0161tautait\u0117","year":"2022","unstructured":"Go\u0161tautait\u0117, D., & Sakalauskas, L. (2022). Multi-label classification and explanation methods for students\u2019 learning style prediction and interpretation. NATO Advanced Science Institutes Series e: Applied Sciences, 12(11), 5396. https:\/\/doi.org\/10.3390\/app12115396","journal-title":"NATO Advanced Science Institutes Series e: Applied Sciences"},{"issue":"1","key":"414_CR19","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1186\/s41039-018-0073-0","volume":"13","author":"S Gottipati","year":"2018","unstructured":"Gottipati, S., Shankararaman, V., & Lin, J. R. (2018). Text analytics approach to extract course improvement suggestions from students\u2019 feedback. Research and Practice in Technology Enhanced Learning, 13(1), 6. https:\/\/doi.org\/10.1186\/s41039-018-0073-0","journal-title":"Research and Practice in Technology Enhanced Learning"},{"key":"414_CR20","doi-asserted-by":"publisher","unstructured":"Gottipati, S., Shankararaman, V. & Gan, S. (2017). A conceptual framework for analyzing students\u2019 feedback. 2017 IEEE Frontiers in Education Conference (FIE), 1\u20138. https:\/\/doi.org\/10.1109\/FIE.2017.8190703","DOI":"10.1109\/FIE.2017.8190703"},{"issue":"1","key":"414_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3458754","volume":"3","author":"Y Gu","year":"2021","unstructured":"Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J., & Poon, H. (2021). Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare, 3(1), 1\u201323. https:\/\/doi.org\/10.1145\/3458754","journal-title":"ACM Transactions on Computing for Healthcare"},{"issue":"09","key":"414_CR22","doi-asserted-by":"publisher","first-page":"4","DOI":"10.3991\/ijim.v14i09.11069","volume":"14","author":"A Hamzah","year":"2020","unstructured":"Hamzah, A., Hidayatullah, A. F., & Persada, A. G. (2020). Discovering trends of mobile learning research using topic modelling approach. International Journal of Interactive Mobile Technologies (iJIM), 14(09), 4. https:\/\/doi.org\/10.3991\/ijim.v14i09.11069","journal-title":"International Journal of Interactive Mobile Technologies (iJIM)"},{"issue":"1","key":"414_CR23","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1007\/s12559-023-10179-8","volume":"16","author":"V Hassija","year":"2024","unstructured":"Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., Scardapane, S., Spinelli, I., Mahmud, M., & Hussain, A. (2024). Interpreting black-box models: A review on explainable artificial intelligence. Cognitive Computation, 16(1), 45\u201374. https:\/\/doi.org\/10.1007\/s12559-023-10179-8","journal-title":"Cognitive Computation"},{"key":"414_CR24","doi-asserted-by":"crossref","unstructured":"Huang, F., Kwak, H., & An, J. (2023). Is ChatGPT better than human annotators? Potential and limitations of ChatGPT in explaining implicit hate speech. arXiv [cs.CL]. Retrieved August 21, 2023, from http:\/\/arxiv.org\/abs\/2302.07736","DOI":"10.1145\/3543873.3587368"},{"key":"414_CR25","unstructured":"Huang, H., Qu, Y., Liu, J., Yang, M., Zhao, T. (2024). An empirical study of LLM-as-a-judge for LLM evaluation: Fine-tuned judge models are task-specific classifiers. arXiv [cs.CL]. Retrieved April 12, 2024, from http:\/\/arxiv.org\/abs\/2403.02839"},{"key":"414_CR26","unstructured":"Hugging Face \u2013 The AI community building the future. (n.d.). Retrieved August 21, 2023, from https:\/\/huggingface.co\/datasets?task_categories=task_categories:zero-shot-classification&sort=trending."},{"key":"414_CR27","doi-asserted-by":"publisher","DOI":"10.1016\/j.nlp.2023.100020","volume":"4","author":"BJ Jansen","year":"2023","unstructured":"Jansen, B. J., Jung, S.-G., & Salminen, J. (2023). Employing large language models in survey research. Natural Language Processing Journal, 4, 100020. https:\/\/doi.org\/10.1016\/j.nlp.2023.100020","journal-title":"Natural Language Processing Journal"},{"issue":"1","key":"414_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0192-5","volume":"6","author":"JM Johnson","year":"2019","unstructured":"Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1), 1\u201354. https:\/\/doi.org\/10.1186\/s40537-019-0192-5","journal-title":"Journal of Big Data"},{"issue":"7","key":"414_CR29","doi-asserted-by":"publisher","first-page":"14","DOI":"10.3102\/0013189X033007014","volume":"33","author":"RB Johnson","year":"2004","unstructured":"Johnson, R. B., & Onwuegbuzie, A. J. (2004). Mixed methods research: A research paradigm whose time has come. Educational Researcher, 33(7), 14\u201326. https:\/\/doi.org\/10.3102\/0013189X033007014","journal-title":"Educational Researcher"},{"key":"414_CR30","unstructured":"Kane, T. J., McCaffrey, D., Miller, T. & Staiger, D. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Research paper. MET project. Bill & Melinda Gates Foundation. Retrieved April 9, 2024, from https:\/\/eric.ed.gov\/?id=ED540959"},{"key":"414_CR31","doi-asserted-by":"publisher","first-page":"106799","DOI":"10.1109\/ACCESS.2020.3000739","volume":"8","author":"Z Kastrati","year":"2020","unstructured":"Kastrati, Z., Imran, A. S., & Kurti, A. (2020b). Weakly supervised framework for aspect-based sentiment analysis on students\u2019 reviews of MOOCs. IEEE Access, 8, 106799\u2013106810. https:\/\/doi.org\/10.1109\/ACCESS.2020.3000739","journal-title":"IEEE Access"},{"key":"414_CR32","doi-asserted-by":"publisher","unstructured":"Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., & Nishliu, E. (2020a). Aspect-based opinion mining of students\u2019 reviews on online courses. In Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence (ICCAI \u201920) (pp. 510\u2013514). Association for Computing Machinery. https:\/\/doi.org\/10.1145\/3404555.3404633","DOI":"10.1145\/3404555.3404633"},{"issue":"1","key":"414_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-024-00897-7","volume":"11","author":"RKL Kennedy","year":"2024","unstructured":"Kennedy, R. K. L., Villanustre, F., Khoshgoftaar, T. M., & Salekshahrezaee, Z. (2024). Synthesizing class labels for highly imbalanced credit card fraud detection data. Journal of Big Data, 11(1), 1\u201322. https:\/\/doi.org\/10.1186\/s40537-024-00897-7","journal-title":"Journal of Big Data"},{"key":"414_CR34","unstructured":"K\u0131c\u0131man, E., Ness, R., Sharma, A., & Tan, C. (2023). Causal reasoning and large language models: Opening a new frontier for causality. arXiv [cs.AI]. Retrieved August 21, 2023, from http:\/\/arxiv.org\/abs\/2305.00050"},{"key":"414_CR35","unstructured":"Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. In arXiv [cs.CL]. arXiv. http:\/\/arxiv.org\/abs\/2205.11916."},{"key":"414_CR36","doi-asserted-by":"publisher","DOI":"10.1002\/ir.233","author":"LR Lattuca","year":"2007","unstructured":"Lattuca, L. R., & Domagal-Goldman, J. M. (2007). Using qualitative methods to assess teaching effectiveness. New Directions for Institutional Research. https:\/\/doi.org\/10.1002\/ir.233","journal-title":"New Directions for Institutional Research"},{"issue":"4","key":"414_CR37","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234\u20131240. https:\/\/doi.org\/10.1093\/bioinformatics\/btz682","journal-title":"Bioinformatics"},{"key":"414_CR38","doi-asserted-by":"publisher","unstructured":"Loureiro, D., Barbieri, F., Neves, L., et al. (2022). TimeLMs: Diachronic language models from Twitter. arXiv [cs.CL]. https:\/\/doi.org\/10.48550\/arXiv.2202.03829","DOI":"10.48550\/arXiv.2202.03829"},{"key":"414_CR39","doi-asserted-by":"publisher","unstructured":"Madaan, A., Tandon, N., Gupta, P., et al. (2023). Self-refine: Iterative refinement with self-feedback. arXiv [cs.CL]. https:\/\/doi.org\/10.48550\/arXiv.2303.17651","DOI":"10.48550\/arXiv.2303.17651"},{"key":"414_CR40","unstructured":"Marginson, S. & Considine, M. (2000). The enterprise university: Power, governance and reinvention in Australia. Cambridge University Press. Retrieved April 5, 2024, from https:\/\/play.google.com\/store\/books\/details?id=SLljlFVJVOsC"},{"issue":"11","key":"414_CR41","doi-asserted-by":"publisher","first-page":"4","DOI":"10.3991\/ijet.v12.i11.6987","volume":"12","author":"A Marks","year":"2017","unstructured":"Marks, A., Al-Ali, M., Majdalawieh, M., & Bani-Hani, A. (2017). Improving academic decision-making through course evaluation technology. International Journal of Emerging Technologies in Learning, 12(11), 4. https:\/\/doi.org\/10.3991\/ijet.v12.i11.6987","journal-title":"International Journal of Emerging Technologies in Learning"},{"issue":"1","key":"414_CR42","doi-asserted-by":"publisher","first-page":"217","DOI":"10.3102\/00028312030001217","volume":"30","author":"HW Marsh","year":"1993","unstructured":"Marsh, H. W., & Roche, L. (1993). The use of students\u2019 evaluations and an individually structured intervention to enhance university teaching effectiveness. American Educational Research Journal, 30(1), 217\u2013251. https:\/\/doi.org\/10.3102\/00028312030001217","journal-title":"American Educational Research Journal"},{"key":"414_CR43","doi-asserted-by":"publisher","unstructured":"Masala, M., Ruseti, S., Dascalu, M., & Dobre, C. (2021). Extracting and clustering main ideas from student feedback using language models. In Artificial Intelligence in Education (pp. 282\u2013292). Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-030-78292-4_23","DOI":"10.1007\/978-3-030-78292-4_23"},{"key":"414_CR44","doi-asserted-by":"publisher","unstructured":"Mattimoe, R., Hayden, M. T., Murphy, B. & Ballantine, J. (2021). Approaches to analysis of qualitative research data: A reflection on the manual and technological approaches. In Accounting, Finance & Governance Review. https:\/\/doi.org\/10.52399\/001c.22026","DOI":"10.52399\/001c.22026"},{"issue":"3","key":"414_CR45","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1108\/09513540310467778","volume":"17","author":"T Mazzarol","year":"2003","unstructured":"Mazzarol, T., Geoffrey, N. S., & Michael, S. Y. S. (2003). The third wave: Future trends in international education. International Journal of Educational Management, 17(3), 90\u201399. https:\/\/doi.org\/10.1108\/09513540310467778","journal-title":"International Journal of Educational Management"},{"key":"414_CR46","unstructured":"McGourty, J., Scoles, K., & Thorpe, S. (2002). Web-based student evaluation of instruction: Promises and pitfalls. In 42nd Annual Forum of the Association for Institutional Research, Toronto, Ontario. Retrieved April 5, 2024, from http:\/\/web.augsburg.edu\/~krajewsk\/educause2004\/webeval.pdf"},{"issue":"11","key":"414_CR47","doi-asserted-by":"publisher","first-page":"1218","DOI":"10.1037\/0003-066X.52.11.1218","volume":"52","author":"WJ McKeachie","year":"1997","unstructured":"McKeachie, W. J. (1997). Student ratings: The validity of use. The American Psychologist, 52(11), 1218\u20131225. https:\/\/doi.org\/10.1037\/0003-066X.52.11.1218","journal-title":"The American Psychologist"},{"key":"414_CR48","doi-asserted-by":"publisher","first-page":"7177","DOI":"10.5688\/ajpe7177","volume":"83","author":"MS Medina","year":"2019","unstructured":"Medina, M. S., Smith, W. T., Kolluru, S., et al. (2019). A review of strategies for designing, administering, and using student ratings of instruction. American Journal of Pharmaceutical Education, 83, 7177. https:\/\/doi.org\/10.5688\/ajpe7177","journal-title":"American Journal of Pharmaceutical Education"},{"key":"414_CR49","doi-asserted-by":"publisher","unstructured":"Meidinger, M., & A\u00dfenmacher, M. (2021). A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence (pp. 866\u2013873). SCITEPRESS - Science and Technology Publications. https:\/\/doi.org\/10.5220\/0010255108660873","DOI":"10.5220\/0010255108660873"},{"key":"414_CR50","unstructured":"Mentkowski, M. (1991). Creating a context where institutional assessment yields educational improvement. The Journal of General Education, 40, 255\u2013283. Retrieved April 5, 2024, from http:\/\/www.jstor.org\/stable\/27797140"},{"key":"414_CR51","doi-asserted-by":"publisher","unstructured":"Morbidoni, C. (2023). Poster: LLMs for online customer reviews analysis: oracles or tools? Experiments with GPT 3.5. Proceedings of the 15th Biannual Conference of the Italian SIGCHI Chapter, 1\u20134. https:\/\/doi.org\/10.1145\/3605390.3610810","DOI":"10.1145\/3605390.3610810"},{"issue":"5","key":"414_CR52","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1111\/1467-8535.00293","volume":"33","author":"J Moss","year":"2002","unstructured":"Moss, J., & Hendry, G. (2002). Use of electronic surveys in course evaluation. British Journal of Educational Technology: Journal of the Council for Educational Technology, 33(5), 583\u2013592. https:\/\/doi.org\/10.1111\/1467-8535.00293","journal-title":"British Journal of Educational Technology: Journal of the Council for Educational Technology"},{"issue":"2","key":"414_CR53","doi-asserted-by":"publisher","first-page":"146","DOI":"10.1109\/TLT.2021.3064798","volume":"14","author":"G Nanda","year":"2021","unstructured":"Nanda, G., Douglas, K. A., Waller, D. R., Merzdorf, H. E., & Goldwasser, D. (2021). Analyzing large collections of open-ended feedback from MOOC learners using LDA topic modeling and qualitative analysis. IEEE Transactions on Learning Technologies, 14(2), 146\u2013160. https:\/\/doi.org\/10.1109\/TLT.2021.3064798","journal-title":"IEEE Transactions on Learning Technologies"},{"key":"414_CR54","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/FIE.2015.7344296","volume":"2015","author":"GI Nitin","year":"2015","unstructured":"Nitin, G. I., Swapna, G., & Shankararaman, V. (2015). Analyzing educational comments for topics and sentiments: A text analytics approach. IEEE Frontiers in Education Conference (FIE), 2015, 1\u20139. https:\/\/doi.org\/10.1109\/FIE.2015.7344296","journal-title":"IEEE Frontiers in Education Conference (FIE)"},{"issue":"3","key":"414_CR55","doi-asserted-by":"publisher","first-page":"572","DOI":"10.1002\/cae.22253","volume":"29","author":"A Onan","year":"2021","unstructured":"Onan, A. (2021a). Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach. Computer Applications in Engineering Education, 29(3), 572\u2013589. https:\/\/doi.org\/10.1002\/cae.22253","journal-title":"Computer Applications in Engineering Education"},{"key":"414_CR56","doi-asserted-by":"publisher","unstructured":"Onan, A. (2021b). Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurrency and Computation: Practice & Experience, 33(23). https:\/\/doi.org\/10.1002\/cpe.5909","DOI":"10.1002\/cpe.5909"},{"key":"414_CR57","doi-asserted-by":"publisher","first-page":"1213419","DOI":"10.3389\/frai.2023.1213419","volume":"6","author":"M Orescanin","year":"2023","unstructured":"Orescanin, M., Smith, L. N., Sahu, S., Goyal, P., & Chhetri, S. R. (2023). Editorial: Deep learning with limited labeled data for vision, audio, and text. Frontiers in Artificial Intelligence, 6, 1213419. https:\/\/doi.org\/10.3389\/frai.2023.1213419","journal-title":"Frontiers in Artificial Intelligence"},{"key":"414_CR58","unstructured":"Pangakis, N., Wolken, S., & Fasching, N. (2023). Automated annotation with generative AI requires validation. arXiv [cs.CL]. Retrieved April 5, 2024, from http:\/\/arxiv.org\/abs\/2306.00176"},{"key":"414_CR59","unstructured":"Papers with Code - Machine Learning Datasets. (n.d.). Retrieved August 21, 2023, from https:\/\/paperswithcode.com\/datasets?task=text-classification."},{"key":"414_CR60","doi-asserted-by":"publisher","unstructured":"Patil, P. P., Phansalkar, S. & Kryssanov, V. V. (2019). Topic modelling for aspect-level sentiment analysis. Proceedings of the 2nd International Conference on Data Engineering and Communication Technology, 221\u2013229. https:\/\/doi.org\/10.1007\/978-981-13-1610-4_23","DOI":"10.1007\/978-981-13-1610-4_23"},{"key":"414_CR61","doi-asserted-by":"publisher","unstructured":"Peng, B., Li, C., He, P., et al. (2023). Instruction tuning with GPT-4. arXiv [cs.CL]. https:\/\/doi.org\/10.48550\/arXiv.2304.03277","DOI":"10.48550\/arXiv.2304.03277"},{"issue":"1","key":"414_CR62","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1177\/1028315317724556","volume":"22","author":"A Perez-Encinas","year":"2018","unstructured":"Perez-Encinas, A., & Rodriguez-Pomeda, J. (2018). International students\u2019 perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education, 22(1), 20\u201336. https:\/\/doi.org\/10.1177\/1028315317724556","journal-title":"Journal of Studies in International Education"},{"key":"414_CR63","doi-asserted-by":"publisher","DOI":"10.3389\/frai.2022.828187","volume":"5","author":"VK Pradhan","year":"2022","unstructured":"Pradhan, V. K., Schaekermann, M., & Lease, M. (2022). In search of ambiguity: A three-stage workflow design to clarify annotation guidelines for crowd workers. Frontiers in Artificial Intelligence, 5, 828187. https:\/\/doi.org\/10.3389\/frai.2022.828187","journal-title":"Frontiers in Artificial Intelligence"},{"key":"414_CR64","doi-asserted-by":"crossref","unstructured":"Pyasi, S., Gottipati, S. & Shankararaman, V. (2018). SUFAT: An analytics tool for gaining insights from student feedback comments. (2018). 2018 Frontiers in Education Conference 48th FIE: San Jose, CA, October 3\u20136: Proceedings, 1\u20139. Retrieved April 5, 2024, from https:\/\/core.ac.uk\/download\/pdf\/200254353.pdf","DOI":"10.1109\/FIE.2018.8658457"},{"key":"414_CR65","doi-asserted-by":"publisher","unstructured":"Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. arXiv [cs.CL]. https:\/\/doi.org\/10.48550\/arxiv.1908.10084","DOI":"10.48550\/arxiv.1908.10084"},{"key":"414_CR66","doi-asserted-by":"crossref","unstructured":"Reiss, M. V. (2023). Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark. arXiv [cs.CL]. Retrieved August 21, 2023, from http:\/\/arxiv.org\/abs\/2304.11085","DOI":"10.31219\/osf.io\/rvy5p"},{"key":"414_CR67","doi-asserted-by":"publisher","unstructured":"Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. arXiv [cs.CL]. https:\/\/doi.org\/10.48550\/arXiv.2102.07350","DOI":"10.48550\/arXiv.2102.07350"},{"issue":"4","key":"414_CR68","doi-asserted-by":"publisher","first-page":"387","DOI":"10.1080\/02602930500099193","volume":"30","author":"JTE Richardson","year":"2005","unstructured":"Richardson, J. T. E. (2005). Instruments for obtaining student feedback: A review of the literature. Assessment & Evaluation in Higher Education, 30(4), 387\u2013415. https:\/\/doi.org\/10.1080\/02602930500099193","journal-title":"Assessment & Evaluation in Higher Education"},{"key":"414_CR69","doi-asserted-by":"crossref","unstructured":"Riger, S. & Sigurvinsdottir, R. (2016). Thematic Analysis. In Jason, L., & Glenwick, D. (Eds.),\u00a0Handbook of methodological approaches to community-based research: Qualitative, quantitative, and mixed methods (pp. 33\u201341). Oxford university press.","DOI":"10.1093\/med:psych\/9780190243654.003.0004"},{"issue":"7","key":"414_CR70","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0254764","volume":"16","author":"A Rother","year":"2021","unstructured":"Rother, A., Niemann, U., Hielscher, T., V\u00f6lzke, H., Ittermann, T., & Spiliopoulou, M. (2021). Assessing the difficulty of annotating medical data in crowdworking with help of experiments. PLoS ONE, 16(7), e0254764. https:\/\/doi.org\/10.1371\/journal.pone.0254764","journal-title":"PLoS ONE"},{"key":"414_CR71","unstructured":"Schreiner, M. (2023). GPT-4 architecture, datasets, costs and more leaked. Retrieved April 5, 2024, from https:\/\/the-decoder.com\/gpt-4-architecture-datasets-costs-and-more-leaked\/"},{"key":"414_CR72","unstructured":"Schulz, J., Sud, G. & Crowe, B. (2014). Lessons from the field: The role of student surveys in teacher evaluation and development. Bellwether Education Partners. Retrieved April 5, 2024, from https:\/\/eric.ed.gov\/?id=ED553986"},{"key":"414_CR73","doi-asserted-by":"publisher","unstructured":"Shah, M. & Ali, H. (2023). Imbalanced data in machine learning: A comprehensive review. https:\/\/doi.org\/10.13140\/RG.2.2.18456.98564","DOI":"10.13140\/RG.2.2.18456.98564"},{"issue":"2","key":"414_CR74","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1108\/JARHE-02-2019-0030","volume":"12","author":"M Shah","year":"2019","unstructured":"Shah, M., & Pabel, A. (2019). Making the student voice count: Using qualitative student feedback to enhance the student experience. Journal of Applied Research in Higher Education, 12(2), 194\u2013209. https:\/\/doi.org\/10.1108\/JARHE-02-2019-0030","journal-title":"Journal of Applied Research in Higher Education"},{"key":"414_CR75","doi-asserted-by":"publisher","first-page":"56720","DOI":"10.1109\/ACCESS.2022.3177752","volume":"10","author":"T Shaik","year":"2022","unstructured":"Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., & Galligan, L. (2022). A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access, 10, 56720\u201356739. https:\/\/doi.org\/10.1109\/ACCESS.2022.3177752","journal-title":"IEEE Access"},{"key":"414_CR76","doi-asserted-by":"crossref","unstructured":"Shaik, T., Tao, X., Dann, C., Xie, H., Li, Y. & Galligan, L. (2023). Sentiment analysis and opinion mining on educational data: A survey. In arXiv [cs.CL]. arXiv. Retrieved April 4, 2024, from http:\/\/arxiv.org\/abs\/2302.04359","DOI":"10.1016\/j.nlp.2022.100003"},{"key":"414_CR77","doi-asserted-by":"publisher","unstructured":"Shen, Y., Song, K., Tan, X., et al. (2023). HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face. arXiv [cs.CL]. https:\/\/doi.org\/10.48550\/arXiv.2303.17580","DOI":"10.48550\/arXiv.2303.17580"},{"key":"414_CR78","doi-asserted-by":"publisher","first-page":"108729","DOI":"10.1109\/ACCESS.2019.2928872","volume":"7","author":"I Sindhu","year":"2019","unstructured":"Sindhu, I., Muhammad, S., Badar, K., Bakhtyar, M., Baber, J., & Nurunnabi, M. (2019). Aspect-based opinion mining on student\u2019s feedback for faculty teaching performance evaluation. IEEE Access, 7, 108729\u2013108741. https:\/\/doi.org\/10.1109\/ACCESS.2019.2928872","journal-title":"IEEE Access"},{"issue":"2","key":"414_CR79","doi-asserted-by":"publisher","first-page":"262","DOI":"10.3758\/bf03192778","volume":"38","author":"AE Smith","year":"2006","unstructured":"Smith, A. E., & Humphreys, M. S. (2006). Evaluation of unsupervised semantic mapping of natural language with Leximancer concept mapping. Behavior Research Methods, 38(2), 262\u2013279. https:\/\/doi.org\/10.3758\/bf03192778","journal-title":"Behavior Research Methods"},{"issue":"4","key":"414_CR80","doi-asserted-by":"publisher","first-page":"598","DOI":"10.3102\/0034654313496870","volume":"83","author":"P Spooren","year":"2013","unstructured":"Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598\u2013642. https:\/\/doi.org\/10.3102\/0034654313496870","journal-title":"Review of Educational Research"},{"issue":"4","key":"414_CR81","doi-asserted-by":"publisher","first-page":"465","DOI":"10.1080\/02602938.2010.545869","volume":"37","author":"JR Stowell","year":"2012","unstructured":"Stowell, J. R., Addison, W. E., & Smith, J. L. (2012). Comparison of online and classroom-based student evaluations of instruction. Assessment & Evaluation in Higher Education, 37(4), 465\u2013473. https:\/\/doi.org\/10.1080\/02602938.2010.545869","journal-title":"Assessment & Evaluation in Higher Education"},{"key":"414_CR82","doi-asserted-by":"publisher","first-page":"741","DOI":"10.1109\/TLT.2023.3330531","volume":"17","author":"AS Sunar","year":"2024","unstructured":"Sunar, A. S., & Khalid, M. S. (2024). Natural language processing of student\u2019s feedback to instructors: A systematic review. IEEE Transactions on Learning Technologies, 17, 741\u2013753. https:\/\/doi.org\/10.1109\/TLT.2023.3330531","journal-title":"IEEE Transactions on Learning Technologies"},{"key":"414_CR83","doi-asserted-by":"publisher","unstructured":"Sutoyo, E., Almaarif, A., & Yanto, I. T. R. (2021). Sentiment analysis of student evaluations of teaching using deep learning approach. In International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI\u20192020) (pp. 272\u2013281). Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-030-80216-5_20","DOI":"10.1007\/978-3-030-80216-5_20"},{"key":"414_CR84","unstructured":"T\u00f6rnberg, P. (2023). ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with zero-shot learning. In arXiv [cs.CL]. arXiv. Retrieved August 21, 2023, from http:\/\/arxiv.org\/abs\/2304.06588"},{"key":"414_CR85","doi-asserted-by":"publisher","unstructured":"Tunstall, L., Reimers, N., Jo, U. E. S., Bates, L., Korat, D., Wasserblat, M., & Pereg, O. (2022). Efficient few-shot learning without prompts. In arXiv [cs.CL]. arXiv. https:\/\/doi.org\/10.48550\/arXiv.2209.11055","DOI":"10.48550\/arXiv.2209.11055"},{"key":"414_CR86","unstructured":"UC Berkeley Center for Teaching & Learning. (n.d.). Course evaluations question bank. Retrieved August 21, 2023, from https:\/\/teaching.berkeley.edu\/course-evaluations-question-bank"},{"key":"414_CR87","doi-asserted-by":"publisher","unstructured":"Unankard, S., & Nadee, W. (2020). Topic detection for online course feedback using LDA. In Emerging Technologies for Education (pp. 133\u2013142). Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-030-38778-5_16","DOI":"10.1007\/978-3-030-38778-5_16"},{"key":"414_CR88","unstructured":"University of Wisconsin\u2014Madison. (n.d.). Best practices and sample questions for course evaluation surveys. In Student learning assessment. Retrieved August 21, 2023, from https:\/\/assessment.wisc.edu\/best-practices-and-sample-questions-for-course-evaluation-surveys\/"},{"key":"414_CR89","doi-asserted-by":"publisher","unstructured":"Veselovsky, V., Ribeiro, M. H., & West, R. (2023). Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks. arXiv [cs.CL]. https:\/\/doi.org\/10.48550\/arXiv.2306.07899","DOI":"10.48550\/arXiv.2306.07899"},{"issue":"1","key":"414_CR90","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/87567555.2018.1483317","volume":"67","author":"SL Wallace","year":"2019","unstructured":"Wallace, S. L., Lewis, A. K., & Allen, M. D. (2019). The state of the literature on student evaluations of teaching and an exploratory analysis of written comments: Who benefits most? College Teaching, 67(1), 1\u201314. https:\/\/doi.org\/10.1080\/87567555.2018.1483317","journal-title":"College Teaching"},{"key":"414_CR91","doi-asserted-by":"publisher","unstructured":"Wang, X., Wei, J., Schuurmans, D., et al. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv [cs.CL]. https:\/\/doi.org\/10.48550\/arXiv.2203.11171","DOI":"10.48550\/arXiv.2203.11171"},{"key":"414_CR92","doi-asserted-by":"publisher","unstructured":"Wei, J., Wang, X., Schuurmans, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv [cs.CL], 24824\u201324837. https:\/\/doi.org\/10.48550\/arXiv.2201.11903","DOI":"10.48550\/arXiv.2201.11903"},{"key":"414_CR93","unstructured":"Weng, L. (2023). LLM-powered autonomous agents. Lil\u2019Log. Retrieved August 21, 2023, from https:\/\/lilianweng.github.io\/posts\/2023-06-23-agent\/"},{"key":"414_CR94","doi-asserted-by":"publisher","unstructured":"White, J., Fu, Q., Hays, S., et al. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv [cs.SE]. https:\/\/doi.org\/10.48550\/arXiv.2302.11382","DOI":"10.48550\/arXiv.2302.11382"},{"issue":"1","key":"414_CR95","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1108\/09684881111107762","volume":"19","author":"W Wongsurawat","year":"2011","unstructured":"Wongsurawat, W. (2011). What\u2019s a comment worth? How to better understand student evaluations of teaching. Quality Assurance in Education, 19(1), 67\u201383. https:\/\/doi.org\/10.1108\/09684881111107762","journal-title":"Quality Assurance in Education"},{"key":"414_CR96","doi-asserted-by":"publisher","unstructured":"Yao, S., Zhao, J., Yu, D., et al. (2022). ReAct: Synergizing reasoning and acting in language models. arXiv [cs.CL]. https:\/\/doi.org\/10.48550\/arXiv.2210.03629","DOI":"10.48550\/arXiv.2210.03629"},{"key":"414_CR97","doi-asserted-by":"publisher","unstructured":"Zhang, H., Dong, J., Min, L., & Bi, P. (2020). A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms, 29(07n08), 2040018. https:\/\/doi.org\/10.1142\/S0218213020400187","DOI":"10.1142\/S0218213020400187"}],"container-title":["International Journal of Artificial Intelligence in Education"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-024-00414-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40593-024-00414-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-024-00414-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T18:12:51Z","timestamp":1772647971000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40593-024-00414-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,25]]},"references-count":97,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["414"],"URL":"https:\/\/doi.org\/10.1007\/s40593-024-00414-0","relation":{},"ISSN":["1560-4292","1560-4306"],"issn-type":[{"value":"1560-4292","type":"print"},{"value":"1560-4306","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,25]]},"assertion":[{"value":"8 June 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 June 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This study was determined not to be human subjects research by the Harvard Medical School Office of Human Research Administration.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics Approval and Consent to Participate"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}]}}