{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T00:59:14Z","timestamp":1775869154302,"version":"3.50.1"},"reference-count":454,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T00:00:00Z","timestamp":1765497600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Advanced Research and Engineering Centre"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Background: Retrieval-augmented generation (RAG) aims to reduce hallucinations and outdated knowledge by grounding LLM outputs in retrieved evidence, but empirical results are scattered across tasks, systems, and metrics, limiting cumulative insight. Objective: We aimed to synthesise empirical evidence on RAG effectiveness versus parametric-only baselines, map datasets\/architectures\/evaluation practices, and surface limitations and research gaps. Methods: This systematic review was conducted and reported in accordance with PRISMA 2020. We searched the ACM Digital Library, IEEE Xplore, Scopus, ScienceDirect, and DBLP; all sources were last searched on 13 May 2025. This included studies from January 2020\u2013May 2025 that addressed RAG or similar retrieval-supported systems producing text output, met citation thresholds (\u226515 for 2025; \u226530 for 2024 or earlier), and offered original contributions; excluded non-English items, irrelevant works, duplicates, and records without accessible full text. Bias was appraised with a brief checklist; screening used one reviewer with an independent check and discussion. LLM suggestions were advisory only; 2025 citation thresholds were adjusted to limit citation-lag. We used a descriptive approach to synthesise the results, organising studies by themes aligned to RQ1\u2013RQ4 and reporting summary counts\/frequencies; no meta-analysis was undertaken due to heterogeneity of designs and metrics. Results: We included 128 studies spanning knowledge-intensive tasks (35\/128; 27.3%), open-domain QA (20\/128; 15.6%), software engineering (13\/128; 10.2%), and medical domains (11\/128; 8.6%). Methods have shifted from DPR + seq2seq baselines to modular, policy-driven RAG with hybrid\/structure-aware retrieval, uncertainty-triggered loops, memory, and emerging multimodality. Evaluation remains overlap-heavy (EM\/F1), with increasing use of retrieval diagnostics (e.g., Recall@k, MRR@k), human judgements, and LLM-as-judge protocols. Efficiency and security (poisoning, leakage, jailbreaks) are growing concerns. Discussion: Evidence supports a shift to modular, policy-driven RAG, combining hybrid\/structure-aware retrieval, uncertainty-aware control, memory, and multimodality, to improve grounding and efficiency. To advance from prototypes to dependable systems, we recommend: (i) holistic benchmarks pairing quality with cost\/latency and safety, (ii) budget-aware retrieval\/tool-use policies, and (iii) provenance-aware pipelines that expose uncertainty and deliver traceable evidence. We note the evidence base may be affected by citation-lag from the inclusion thresholds and by English-only, five-library coverage. Funding: Advanced Research and Engineering Centre. Registration: Not registered.<\/jats:p>","DOI":"10.3390\/bdcc9120320","type":"journal-article","created":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T15:27:20Z","timestamp":1765553240000},"page":"320","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-9250-9038","authenticated-orcid":false,"given":"Andrew","family":"Brown","sequence":"first","affiliation":[{"name":"Advanced Research and Engineering Centre (ARC), Queen\u2019s University Belfast, Belfast BT7 1NN, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9035-2426","authenticated-orcid":false,"given":"Muhammad","family":"Roman","sequence":"additional","affiliation":[{"name":"Advanced Research and Engineering Centre (ARC), Queen\u2019s University Belfast, Belfast BT7 1NN, UK"},{"name":"Institute of Computing (IoC), Kohat University of Science & Technology (KUST), Kohat 26000, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2128-8632","authenticated-orcid":false,"given":"Barry","family":"Devereux","sequence":"additional","affiliation":[{"name":"Advanced Research and Engineering Centre (ARC), Queen\u2019s University Belfast, Belfast BT7 1NN, UK"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,12]]},"reference":[{"key":"ref_1","unstructured":"Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W.t., and Rockt\u00e4schel, T. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv."},{"key":"ref_2","unstructured":"Li, H., Su, Y., Cai, D., Wang, Y., and Liu, L. (2022). A Survey on Retrieval-Augmented Text Generation. arXiv."},{"key":"ref_3","unstructured":"Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., and Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv."},{"key":"ref_4","unstructured":"Gupta, S., Ranjan, R., and Narayan Singh, S. (2024). A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wu, S., Xiong, Y., Cui, Y., Wu, H., Chen, C., Yuan, Y., Huang, L., Liu, X., Kuo, T.W., and Guan, N. (2024). Retrieval-Augmented Generation for Natural Language Processing: A Survey. arXiv.","DOI":"10.2139\/ssrn.5163979"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3781","DOI":"10.1016\/j.procs.2024.09.178","article-title":"A Survey on RAG with LLMs","volume":"246","author":"Arslan","year":"2024","journal-title":"Procedia Comput. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.S., and Li, Q. (2024, January 25\u201329). A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain.","DOI":"10.1145\/3637528.3671470"},{"key":"ref_8","unstructured":"Cheng, M., Luo, Y., Ouyang, J., Liu, Q., Liu, H., Li, L., Yu, S., Zhang, B., Cao, J., and Ma, J. (2025). A Survey on Knowledge-Oriented Retrieval-Augmented Generation. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Arslan, M., Munawar, S., and Cruz, C. (2024). Business insights using RAG\u2013LLMs: A review and case study. J. Decis. Syst., 1\u201330.","DOI":"10.1080\/12460125.2024.2410040"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"46171","DOI":"10.1109\/ACCESS.2025.3550145","article-title":"Enhancing the Precision and Interpretability of Retrieval-Augmented Generation (RAG) in Legal Technology: A Survey","volume":"13","author":"Hindi","year":"2025","journal-title":"IEEE Access"},{"key":"ref_11","unstructured":"Huang, Y., and Huang, J. (2024). A Survey on Retrieval-Augmented Text Generation for Large Language Models. arXiv."},{"key":"ref_12","unstructured":"Zhao, S., Yang, Y., Wang, Z., He, Z., Qiu, L.K., and Qiu, L. (2024). Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely. arXiv."},{"key":"ref_13","unstructured":"Verma, S. (2024). Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv."},{"key":"ref_14","unstructured":"Zhao, P., Zhang, H., Yu, Q., Wang, Z., Geng, Y., Fu, F., Yang, L., Zhang, W., Jiang, J., and Cui, B. (2024). Retrieval-Augmented Generation for AI-Generated Content: A Survey. arXiv."},{"key":"ref_15","unstructured":"Singh, A., Ehtesham, A., Kumar, S., and Talaei Khoei, T. (2025). Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Peng, B., Zhu, Y., Liu, Y., Bo, X., Shi, H., Hong, C., Zhang, Y., and Tang, S. (2024). Graph Retrieval-Augmented Generation: A Survey. arXiv.","DOI":"10.1145\/3777378"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Procko, T.T., and Ochoa, O. (October,, January 30). Graph Retrieval-Augmented Generation for Large Language Models: A Survey. Proceedings of the 2024 Conference on AI, Science, Engineering, and Technology (AIxSET), Laguna Hills, CA, USA.","DOI":"10.1109\/AIxSET62544.2024.00030"},{"key":"ref_18","unstructured":"Zhang, Q., Chen, S., Bei, Y., Yuan, Z., Zhou, H., Hong, Z., Dong, J., Chen, H., Chang, Y., and Huang, X. (2025). A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Mahdi Abootorabi, M., Zobeiri, A., Dehghani, M., Mohammadkhani, M., Mohammadi, B., Ghahroodi, O., Soleymani Baghshah, M., and Asgari, E. (2025). Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation. arXiv.","DOI":"10.18653\/v1\/2025.findings-acl.861"},{"key":"ref_20","unstructured":"Zheng, X., Weng, Z., Lyu, Y., Jiang, L., Xue, H., Ren, B., Paudel, D., Sebe, N., Van Gool, L., and Hu, X. (2025). Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook. arXiv."},{"key":"ref_21","unstructured":"Simon, K., O\u011fuz, C., Leonid, K., Muhammad, A., Saara, A., Selvine, M., and Daniel, G. (2024, January 17\u201319). Benchmarking of Retrieval Augmented Generation: A Comprehensive Systematic Literature Review on Evaluation Dimensions, Evaluation Metrics and Datasets. Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Porto, Portugal."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yu, H., Gan, A., Zhang, K., Tong, S., Liu, Q., and Liu, Z. (2024). Evaluation of Retrieval-Augmented Generation: A Survey. arXiv.","DOI":"10.1007\/978-981-96-1024-2_8"},{"key":"ref_23","unstructured":"Zhou, Y., Liu, Y., Li, X., Jin, J., Qian, H., Liu, Z., Li, C., Dou, Z., Ho, T.Y., and Yu, P.S. (2024). Trustworthiness in Retrieval-Augmented Generation Systems: A Survey. arXiv."},{"key":"ref_24","unstructured":"Ni, B., Liu, Z., Wang, L., Lei, Y., Zhao, Y., Cheng, X., Zeng, Q., Dong, L., Xia, Y., and Kenthapadi, K. (2025). Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1186\/s13643-021-01626-4","article-title":"The PRISMA 2020 statement: An updated guideline for reporting systematic reviews","volume":"10","author":"Page","year":"2021","journal-title":"Syst. Rev."},{"key":"ref_26","unstructured":"Kitchenham, B., and Charters, S. (2007). Guidelines for Performing Systematic Literature Reviews in Software Engineering, Keele University."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sidiropoulos, G., and Kanoulas, E. (2022, January 11\u201315). Analysing the Robustness of Dual Encoders for Dense Retrieval Against Misspellings. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain.","DOI":"10.1145\/3477495.3531818"},{"key":"ref_28","unstructured":"Kuratov, Y., Bulatov, A., Anokhin, P., Rodkin, I., Sorokin, D., Sorokin, A., and Burtsev, M. (2024). BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Alaofi, M., Arabzadeh, N., Clarke, C.L.A., and Sanderson, M. (2024). Generative Information Retrieval Evaluation. arXiv.","DOI":"10.1007\/978-3-031-73147-1_6"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., and Varol, G. (2024). Improving Medical Multi-modal Contrastive Learning with Expert Annotations. Computer Vision\u2014ECCV 2024, Springer Nature.","DOI":"10.1007\/978-3-031-72980-5"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, M., Chen, L., Cheng, F., Liao, S., Zhang, X., Wu, B., Yu, H., Xu, N., Zhang, L., and Luo, R. (2024, January 12\u201316). Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA.","DOI":"10.18653\/v1\/2024.emnlp-main.322"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wu, J., Zhu, J., Qi, Y., Chen, J., Xu, M., Menolascina, F., and Grau, V. (2024). Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation. arXiv.","DOI":"10.18653\/v1\/2025.acl-long.1381"},{"key":"ref_33","unstructured":"Zheng, L., Yin, L., Xie, Z., Sun, C., Huang, J., Hao Yu, C., Cao, S., Kozyrakis, C., Stoica, I., and Gonzalez, J.E. (2023). SGLang: Efficient Execution of Structured Language Model Programs. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1177\/00222429241276529","article-title":"AI\u2013Human Hybrids for Marketing Research: Leveraging Large Language Models (LLMs) as Collaborators","volume":"89","author":"Arora","year":"2025","journal-title":"J. Mark."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2306724","DOI":"10.1002\/advs.202306724","article-title":"BioinspiredLLM: Conversational Large Language Model for the Mechanics of Biological and Bio-Inspired Materials","volume":"11","author":"Luu","year":"2024","journal-title":"Adv. Sci."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhang, B., and Soh, H. (2024, January 12\u201316). Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA.","DOI":"10.18653\/v1\/2024.emnlp-main.548"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Liu, S., Cheng, H., Liu, H., Zhang, H., Li, F., Ren, T., Zou, X., Yang, J., Su, H., and Zhu, J. (2023). LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents. arXiv.","DOI":"10.1007\/978-3-031-72970-6_8"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gebreab, S.A., Salah, K., Jayaraman, R., Rehman, M.H.u., and Ellaham, S. (2024, January 29\u201330). LLM-Based Framework for Administrative Task Automation in Healthcare. Proceedings of the 2024 12th International Symposium on Digital Forensics and Security (ISDFS), San Antonio, TX, USA.","DOI":"10.1109\/ISDFS60797.2024.10527275"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Loukas, L., Stogiannidis, I., Diamantopoulos, O., Malakasiotis, P., and Vassos, S. (2023, January 27\u201329). Making LLMs Worth Every Penny: Resource-Limited Text Classification in Banking. Proceedings of the Fourth ACM International Conference on AI in Finance, Brooklyn, NY, USA.","DOI":"10.1145\/3604237.3626891"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"021001","DOI":"10.1115\/1.4063843","article-title":"MechGPT, a Language-Based Strategy for Mechanics and Materials Modeling That Connects Knowledge Across Scales, Disciplines, and Modalities","volume":"76","author":"Buehler","year":"2024","journal-title":"Appl. Mech. Rev."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Chen, J., Zhang, R., Guo, J., de Rijke, M., Chen, W., Fan, Y., and Cheng, X. (2023, January 21\u201325). Continual Learning for Generative Retrieval over Dynamic Corpora. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK.","DOI":"10.1145\/3583780.3614821"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Mao, Y., He, P., Liu, X., Shen, Y., Gao, J., Han, J., and Chen, W. (2021, January 1\u20136). Generation-augmented retrieval for open-domain question answering. Proceedings of the ACL-IJCNLP 2021\u201459th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Bangkok, Thailand.","DOI":"10.18653\/v1\/2021.acl-long.316"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1316","DOI":"10.1162\/tacl_a_00605","article-title":"In-Context Retrieval-Augmented Language Models","volume":"11","author":"Ram","year":"2023","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_44","unstructured":"Xu, P., Ping, W., Wu, X., McAfee, L., Zhu, C., Liu, Z., Subramanian, S., Bakhturina, E., Shoeybi, M., and Catanzaro, B. (2023). Retrieval meets Long Context Large Language Models. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Komeili, M., Shuster, K., and Weston, J. (2022, January 22\u201327). Internet-Augmented Dialogue Generation. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.acl-long.579"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Yan, S.Q., Gu, J.C., Zhu, Y., and Ling, Z.H. (2024). Corrective Retrieval Augmented Generation. arXiv.","DOI":"10.2139\/ssrn.5267341"},{"key":"ref_47","unstructured":"Wooldridge, M., Dy, J., and Natarajan, S. (2024, January 20\u201327). Knowledge Graph Prompting for Multi-Document Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada."},{"key":"ref_48","unstructured":"He, X., Tian, Y., Sun, Y., Chawla, N.V., Laurent, T., LeCun, Y., Bresson, X., and Hooi, B. (2024). G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Shao, Z., Gong, Y., Shen, Y., Huang, M., Duan, N., and Chen, W. (2023). Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.620"},{"key":"ref_50","unstructured":"Bouamor, H., Pino, J., and Bali, K. (2023, January 6\u201310). RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. Proceedings of the EMNLP 2023\u20142023 Conference on Empirical Methods in Natural Language Processing, Singapore."},{"key":"ref_51","unstructured":"Liu, S., Chen, Y., Xie, X., Siow, J., and Liu, Y. (2021, January 3\u20137). Retrieval-augmented generation for code summarization via hybrid GNN. Proceedings of the ICLR 2021\u20149th International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_52","unstructured":"Yasunaga, M., Aghajanyan, A., Shi, W., James, R., Leskovec, J., Liang, P., Lewis, M., Zettlemoyer, L., and Yih, W.T. (2023, January 23\u201329). Retrieval-Augmented Multimodal Language Modeling. Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Gui, L., Wang, B., Huang, Q., Hauptmann, A., Bisk, Y., and Gao, J. (2022, January 10\u201315). KAT: A Knowledge Augmented Transformer for Vision-and-Language. Proceedings of the NAACL 2022\u20142022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA.","DOI":"10.18653\/v1\/2022.naacl-main.70"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Glass, M., Rossiello, G., Chowdhury, M.F.M., and Gliozzo, A. (2021, January 7\u201311). Robust Retrieval Augmented Generation for Zero-shot Slot Filling. Proceedings of the EMNLP 2021\u20142021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.148"},{"key":"ref_55","unstructured":"Sachan, D.S., Reddy, S., Hamilton, W., Dyer, C., and Yogatama, D. (2021, January 6\u201314). End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering. Proceedings of the Advances in Neural Information Processing Systems, Virtual."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Ke, Y., Jin, L., Elangovan, K., Rizal Abdullah, H., Liu, N., Sia, A.T.H., Soh, C.R., Tung, J.Y.M., Ong, J.C.L., and Ting, D.S.W. (2024). Development and Testing of Retrieval Augmented Generation in Large Language Models\u2014A Case Study Report. arXiv.","DOI":"10.2139\/ssrn.4719185"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"btae560","DOI":"10.1093\/bioinformatics\/btae560","article-title":"Biomedical knowledge graph-optimized prompt generation for large language models","volume":"40","author":"Soman","year":"2024","journal-title":"Bioinformatics"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Chen, W., Hu, H., Chen, X., Verga, P., and Cohen, W.W. (2022). MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text. arXiv.","DOI":"10.18653\/v1\/2022.emnlp-main.375"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/TE.2024.3467912","article-title":"An LLM-Driven Chatbot in Higher Education for Databases and Information Systems","volume":"68","author":"Neumann","year":"2025","journal-title":"IEEE Trans. Educ."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Xu, Z., Jerome Cruz, M., Guevara, M., Wang, T., Deshpande, M., Wang, X., and Li, Z. (2024). Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering. arXiv.","DOI":"10.1145\/3626772.3661370"},{"key":"ref_61","unstructured":"Feng, Y., and Lefever, E. (2023, January 6\u201310). RALLE: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models. Proceedings of the EMNLP 2023\u20142023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Singapore."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Jiang, W., Zhang, S., Han, B., Wang, J., Wang, B., and Kraska, T. (2024). PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design. arXiv.","DOI":"10.1145\/3690624.3709194"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Caffagni, D., Cocchi, F., Moratelli, N., Sarto, S., Cornia, M., Baraldi, L., and Cucchiara, R. (2024, January 17\u201318). Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs. Proceedings of the 2024 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00188"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Guo, Y., Li, Z., Jin, X., Liu, Y., Zeng, Y., Liu, W., Li, X., Yang, P., Bai, L., and Guo, J. (2023). Retrieval-Augmented Code Generation for Universal Information Extraction. arXiv.","DOI":"10.1007\/978-981-97-9434-8_3"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Xiong, G., Jin, Q., Lu, Z., and Zhang, A. (2024, January 11\u201316). Benchmarking Retrieval-Augmented Generation for Medicine. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.findings-acl.372"},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"100101","DOI":"10.1016\/j.nlp.2024.100101","article-title":"Towards effective teaching assistants: From intent-based chatbots to LLM-powered teaching assistants","volume":"8","author":"Alsafari","year":"2024","journal-title":"Nat. Lang. Process. J."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Yu, C., Yang, G., Chen, X., Liu, K., and Zhou, Y. (2022, January 3\u20137). Bashexplainer: Retrieval-augmented bash code comment generation based on fine-tuned codebert. Proceedings of the 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), Limassol, Cyprus.","DOI":"10.1109\/ICSME55016.2022.00016"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"7063","DOI":"10.1007\/s40747-024-01527-8","article-title":"KnowledgeNavigator: Leveraging large language models for enhanced reasoning over knowledge graph","volume":"10","author":"Guo","year":"2024","journal-title":"Complex Intell. Syst."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Wiratunga, N., Abeyratne, R., Jayawardena, L., Martin, K., Massie, S., Nkisi-Orji, I., Weerasinghe, R., Liret, A., and Fleisch, B. (2024). CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering. Case-Based Reasoning Research and Development, Springer Nature.","DOI":"10.1007\/978-3-031-63646-2_29"},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"104769","DOI":"10.1016\/j.jbi.2024.104769","article-title":"BiomedRAG: A retrieval augmented large language model for biomedicine","volume":"162","author":"Li","year":"2025","journal-title":"J. Biomed. Inform."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1109\/MNET.2024.3401159","article-title":"Interactive AI with Retrieval-Augmented Generation for Next Generation Networking","volume":"38","author":"Zhang","year":"2024","journal-title":"IEEE Netw."},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Guo, Z., Xia, L., Yu, Y., Ao, T., and Huang, C. (2024). LightRAG: Simple and Fast Retrieval-Augmented Generation. arXiv.","DOI":"10.18653\/v1\/2025.findings-emnlp.568"},{"key":"ref_73","unstructured":"Wu, D., Ahmad, W.U., Zhang, D., Krishna Ramanathan, M., and Ma, X. (2024). Repoformer: Selective Retrieval for Repository-Level Code Completion. arXiv."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Chen, Z., Xiang, Z., Xiao, C., Song, D., and Li, B. (2024). AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases. arXiv.","DOI":"10.52202\/079017-4136"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Ren, Y., Cao, Y., Guo, P., Fang, F., Ma, W., and Lin, Z. (2023, January 9\u201314). Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.acl-long.17"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Chowdhury, J.R., Zhuang, Y., and Wang, S. (March, January 22). Novelty Controlled Paraphrase Generation with Retrieval Augmented Conditional Prompt Tuning. Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022, Virtual.","DOI":"10.1609\/aaai.v36i10.21297"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Fang, M., and Chen, L. (2024). RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering. arXiv.","DOI":"10.18653\/v1\/2024.findings-acl.415"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Soong, D., Sridhar, S., Si, H., Wagner, J.S., S\u00e1, A.C.C., Yu, C.Y., Karagoz, K., Guan, M., Kumar, S., and Hamadeh, H. (2024). Improving accuracy of GPT-3\/4 results on biomedical data using a retrieval-augmented language model. PLoS Digit Health, 3.","DOI":"10.1371\/journal.pdig.0000568"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Jin, C., Zhang, Z., Jiang, X., Liu, F., Liu, X., Liu, X., and Jin, X. (2024). RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation. arXiv.","DOI":"10.1145\/3768628"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Wang, W., Wang, Y., Joty, S., and Hoi, S.C. (2023, January 3\u20139). RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA.","DOI":"10.1145\/3611643.3616256"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Sawarkar, K., Mangal, A., and Solanki, S.R. (2024, January 7\u20139). Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers. Proceedings of the 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA.","DOI":"10.1109\/MIPR62202.2024.00031"},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Ramos, R., Elliott, D., and Martins, B. (2023, January 2\u20136). Retrieval-augmented Image Captioning. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics; Association for Computational Linguistics, Dubrovnik, Croatia.","DOI":"10.18653\/v1\/2023.eacl-main.266"},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Yang, Z., Ping, W., Liu, Z., Korthikanti, V., Nie, W., Huang, D.A., Fan, L., Yu, Z., Lan, S., and Li, B. (2023, January 6\u201310). Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore.","DOI":"10.18653\/v1\/2023.findings-emnlp.793"},{"key":"ref_84","first-page":"1","article-title":"Retrieval Augmented Convolutional Encoder-Decoder Networks for Video Captioning","volume":"19","author":"Chen","year":"2023","journal-title":"ACM Trans. Multimed. Comput. Commun. Appl."},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"19080","DOI":"10.1609\/aaai.v38i17.29875","article-title":"Graph Neural Prompting with Large Language Models","volume":"Volume 38","author":"Wooldridge","year":"2024","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Lin, W., and Byrne, B. (2022, January 7\u201311). Retrieval Augmented Visual Question Answering with Outside Knowledge. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.emnlp-main.772"},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Hofst\u00e4tter, S., Chen, J., Raman, K., and Zamani, H. (2023, January 23\u201327). FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan.","DOI":"10.1145\/3539618.3591687"},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Feng, Z., Feng, X., Zhao, D., Yang, M., and Qin, B. (2023). Retrieval-generation synergy augmented large language models. arXiv.","DOI":"10.1109\/ICASSP48485.2024.10448015"},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Jeong, C. (2023). A Study on the Implementation of Generative AI Services Using an Enterprise Data-Based LLM Application Architecture. arXiv.","DOI":"10.54364\/AAIML.2023.1191"},{"key":"ref_90","unstructured":"Li, X., Li, Z., Shi, C., Xu, Y., Du, Q., Tan, M., and Huang, J. (2024, January 20\u201325). AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy."},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Xia, P., Zhu, K., Li, H., Zhu, H., Li, Y., Li, G., Zhang, L., and Yao, H. (2024, January 12\u201316). RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA.","DOI":"10.18653\/v1\/2024.emnlp-main.62"},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Sarto, S., Cornia, M., Baraldi, L., and Cucchiara, R. (2022, January 14\u201316). Retrieval-Augmented Transformer for Image Captioning. Proceedings of the 19th International Conference on Content-Based Multimedia Indexing, Graz, Austria.","DOI":"10.1145\/3549555.3549585"},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Siriwardhana, S., Weerasekera, R., Wen, E., Kaluarachchi, T., Rana, R., and Nanayakkara, S. (2022). Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering. arXiv.","DOI":"10.1162\/tacl_a_00530"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Izacard, G., and Grave, E. (2021, January 19\u201323). Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume; Association for Computational Linguistics, Virtual.","DOI":"10.18653\/v1\/2021.eacl-main.74"},{"key":"ref_95","unstructured":"Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., van den Driessche, G., Lespiau, J.B., Damoc, B., and Clark, A. (2021). Improving language models by retrieving from trillions of tokens. arXiv."},{"key":"ref_96","unstructured":"Asai, A., Wu, Z., Wang, Y., Sil, A., and Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv."},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"bbac409","DOI":"10.1093\/bib\/bbac409","article-title":"BioGPT: Generative pre-trained transformer for biomedical text generation and mining","volume":"23","author":"Luo","year":"2022","journal-title":"Brief. Bioinform."},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Magesh, V., Surani, F., Dahl, M., Suzgun, M., Manning, C.D., and Ho, D.E. (2024). Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools. arXiv.","DOI":"10.1111\/jels.12413"},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Wang, Y., Wang, W., Joty, S., and Hoi, S.C. (2021, January 7\u201311). CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"ref_100","doi-asserted-by":"crossref","unstructured":"Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., and Karri, R. (2021). Asleep at the Keyboard? Assessing the Security of GitHub Copilot\u2019s Code Contributions. arXiv.","DOI":"10.1109\/SP46214.2022.9833571"},{"key":"ref_101","unstructured":"Zhu, D., Chen, J., Shen, X., Li, X., and Elhoseiny, M. (2023). MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv."},{"key":"ref_102","unstructured":"Liu, H., Li, C., Wu, Q., and Lee, Y.J. (2023). Visual Instruction Tuning. arXiv."},{"key":"ref_103","unstructured":"Wang, P., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Chen, K., Liu, X., Wang, J., and Ge, W. (2024). Qwen2-VL: Enhancing Vision-Language Model\u2019s Perception of the World at Any Resolution. arXiv."},{"key":"ref_104","unstructured":"(2025, May 14). Chat with Claude. Available online: https:\/\/claude.ai\/chats."},{"key":"ref_105","unstructured":"Workshop, B., Le Scao, T., Fan, A., Akiki, C., Pavlick, E., Ili\u0107, S., Hesslow, D., Castagn\u00e9, R., Sasha Luccioni, A., and Yvon, F. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv."},{"key":"ref_106","unstructured":"DeepSeek-AI, Liu, A., Feng, B., Wang, B., Wang, B., Liu, B., Zhao, C., Dengr, C., Ruan, C., and Dai, D. (2024). DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model. arXiv."},{"key":"ref_107","unstructured":"Wang, B., and Komatsuzaki, A. (2025, May 14). GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. Available online: https:\/\/github.com\/kingoflolz\/mesh-transformer-jax."},{"key":"ref_108","unstructured":"Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., and Brahma, S. (2022). Scaling Instruction-Finetuned Language Models. arXiv."},{"key":"ref_109","unstructured":"Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., and Chen, Z. (2023). PaLM 2 Technical Report. arXiv."},{"key":"ref_110","doi-asserted-by":"crossref","unstructured":"Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2020, January 5\u201310). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics, Virtual.","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"ref_111","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., and Bhosale, S. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv."},{"key":"ref_112","unstructured":"Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozi\u00e8re, B., Goyal, N., Hambro, E., and Azhar, F. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv."},{"key":"ref_113","unstructured":"(2025, May 14). TheBloke. Llama 2 70B Chat\u2014AWQ. Available online: https:\/\/huggingface.co\/TheBloke\/Llama-2-70B-Chat-AWQ."},{"key":"ref_114","unstructured":"(2025, May 14). Ai@Meta. Llama 3 Model Card. Available online: https:\/\/github.com\/meta-llama\/llama3\/blob\/main\/MODEL_CARD.md."},{"key":"ref_115","unstructured":"(2025, May 14). Meta AI. Introducing Llama 3.1: Our Most Capable Models to Date. Available online: https:\/\/ai.meta.com\/blog\/meta-llama-3-1\/."},{"key":"ref_116","unstructured":"Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Singh Chaplot, D., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., and Saulnier, L. (2023). Mistral 7B. arXiv."},{"key":"ref_117","unstructured":"Jiang, A.Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Singh Chaplot, D., de las Casas, D., Hanna, E.B., and Bressand, F. (2024). Mixtral of Experts. arXiv."},{"key":"ref_118","unstructured":"Nomic, A.I. (2025, May 14). GPT4All: Private, Local AI Chatbot Platform by Nomic. Available online: https:\/\/www.nomic.ai\/gpt4all."},{"key":"ref_119","unstructured":"Liu, Z., Ping, W., Roy, R., Xu, P., Lee, C., Shoeybi, M., and Catanzaro, B. (2024). ChatQA: Surpassing GPT-4 on Conversational QA and RAG. arXiv."},{"key":"ref_120","first-page":"9","article-title":"Language Models are Unsupervised Multitask Learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"ref_121","unstructured":"(2025, May 14). OpenAI Product. Available online: https:\/\/openai.com\/product."},{"key":"ref_122","unstructured":"Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Leoni Aleman, F., Almeida, D., Altenschmidt, J., and Altman, S. (2023). GPT-4 Technical Report. arXiv."},{"key":"ref_123","unstructured":"Hurst, A., Lerer, A., Goucher, A.P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., and Hayes, A. (2024). GPT-4o System Card. arXiv."},{"key":"ref_124","unstructured":"Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., and Huang, F. (2023). Qwen Technical Report. arXiv."},{"key":"ref_125","unstructured":"Jimeno Yepes, A., You, Y., Milczek, J., Laverde, S., and Li, R. (2024). Financial Report Chunking for Effective Retrieval Augmented Generation. arXiv."},{"key":"ref_126","doi-asserted-by":"crossref","unstructured":"Ge, J., Sun, S., Owens, J., Galvez, V., Gologorskaya, O., Lai, J.C., Pletcher, M.J., and Lai, K. (2023). Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation. medRxiv.","DOI":"10.1101\/2023.11.10.23298364"},{"key":"ref_127","doi-asserted-by":"crossref","unstructured":"Miao, J., Thongprayoon, C., Suppadungsuk, S., Garcia Valencia, O.A., and Cheungpasitporn, W. (2024). Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications. Medicina, 60.","DOI":"10.3390\/medicina60030445"},{"key":"ref_128","unstructured":"Jiang, Z., Ma, X., and Chen, W. (2024). LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs. arXiv."},{"key":"ref_129","doi-asserted-by":"crossref","unstructured":"Jin, M., Shahriar, S., Tufano, M., Shi, X., Lu, S., Sundaresan, N., and Svyatkovskiy, A. (,  2023). InferFix: End-to-End Program Repair with LLMs. Proceedings of the ESEC\/FSE 2023\u2014Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA.","DOI":"10.1145\/3611643.3613892"},{"key":"ref_130","doi-asserted-by":"crossref","unstructured":"Cheng, P., Ding, Y., Ju, T., Wu, Z., Du, W., Yi, P., Zhang, Z., and Liu, G. (2024). TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models. arXiv.","DOI":"10.2139\/ssrn.5327517"},{"key":"ref_131","doi-asserted-by":"crossref","unstructured":"Rackauckas, Z. (2024). RAG-Fusion: A New Take on Retrieval-Augmented Generation. arXiv.","DOI":"10.5121\/ijnlc.2024.13103"},{"key":"ref_132","doi-asserted-by":"crossref","unstructured":"Dong, G., Zhu, Y., Zhang, C., Wang, Z., Dou, Z., and Wen, J.R. (2024). Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation. arXiv.","DOI":"10.1145\/3696410.3714717"},{"key":"ref_133","unstructured":"Wang, Z., Araki, J., Jiang, Z., Parvez, M.R., and Neubig, G. (2023). Learning to Filter Context for Retrieval-Augmented Generation. arXiv."},{"key":"ref_134","doi-asserted-by":"crossref","unstructured":"Soudani, H., Kanoulas, E., and Hasibi, F. (2024, January 9\u201312). Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge. Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, Tokyo, Japan.","DOI":"10.1145\/3673791.3698415"},{"key":"ref_135","doi-asserted-by":"crossref","unstructured":"Xu, S., Pang, L., Shen, H., Cheng, X., and Chua, T.S. (2024, January 13\u201317). Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks. Proceedings of the WWW 2024\u2014Proceedings of the ACM Web Conference, Singapore.","DOI":"10.1145\/3589334.3645363"},{"key":"ref_136","doi-asserted-by":"crossref","unstructured":"Ke, Z., Kong, W., Li, C., Zhang, M., Mei, Q., and Bendersky, M. (2024, January 11\u201316). Bridging the Preference Gap between Retrievers and LLMs. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.acl-long.562"},{"key":"ref_137","doi-asserted-by":"crossref","unstructured":"Cuconasu, F., Trappolini, G., Siciliano, F., Filice, S., Campagnano, C., Maarek, Y., Tonellotto, N., and Silvestri, F. (2024, January 14\u201318). The Power of Noise: Redefining Retrieval for RAG Systems. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA.","DOI":"10.1145\/3626772.3657834"},{"key":"ref_138","unstructured":"Bouamor, H., Pino, J., and Bali, K. (2023, January 6\u201310). Knowledge-Augmented Language Model Verification. Proceedings of the EMNLP 2023\u20142023 Conference on Empirical Methods in Natural Language Processing, Singapore."},{"key":"ref_139","doi-asserted-by":"crossref","unstructured":"Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z., and Abdelrazek, M. (2024). Seven Failure Points When Engineering a Retrieval Augmented Generation System. arXiv.","DOI":"10.1145\/3644815.3644945"},{"key":"ref_140","doi-asserted-by":"crossref","unstructured":"Adolphs, L., Shuster, K., Urbanek, J., Szlam, A., and Weston, J. (2022, January 7\u201311). Reason first, then respond: Modular Generation for Knowledge-infused Dialogue. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.findings-emnlp.527"},{"key":"ref_141","doi-asserted-by":"crossref","first-page":"104580","DOI":"10.1016\/j.jbi.2023.104580","article-title":"Retrieval augmentation of large language models for lay language generation","volume":"149","author":"Guo","year":"2024","journal-title":"J. Biomed. Inform."},{"key":"ref_142","doi-asserted-by":"crossref","unstructured":"Shi, Z., Zhang, S., Sun, W., Gao, S., Ren, P., Chen, Z., and Ren, Z. (2024, January 11\u201316). Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.acl-long.397"},{"key":"ref_143","doi-asserted-by":"crossref","unstructured":"Jiang, Z., Xu, F.F., Gao, L., Sun, Z., Liu, Q., Dwivedi-Yu, J., Yang, Y., Callan, J., and Neubig, G. (2023). Active Retrieval Augmented Generation. arXiv.","DOI":"10.18653\/v1\/2023.emnlp-main.495"},{"key":"ref_144","doi-asserted-by":"crossref","unstructured":"Su, W., Tang, Y., Ai, Q., Wu, Z., and Liu, Y. (2024). DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models. arXiv.","DOI":"10.18653\/v1\/2024.acl-long.702"},{"key":"ref_145","unstructured":"B\u00e9chard, P., and Marquez Ayala, O. (2024). Reducing hallucination in structured outputs via Retrieval-Augmented Generation. arXiv."},{"key":"ref_146","doi-asserted-by":"crossref","unstructured":"Li, J., Liu, Y., Fan, W., Wei, X.Y., Liu, H., Tang, J., and Li, Q. (2023). Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective. arXiv.","DOI":"10.1109\/TKDE.2024.3393356"},{"key":"ref_147","doi-asserted-by":"crossref","unstructured":"Tsai, Y., Liu, M., and Ren, H. (2024, January 23\u201327). RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Model. Proceedings of the 61st ACM\/IEEE Design Automation Conference, San Francisco, CA, USA.","DOI":"10.1145\/3649329.3657353"},{"key":"ref_148","doi-asserted-by":"crossref","first-page":"btae353","DOI":"10.1093\/bioinformatics\/btae353","article-title":"KRAGEN: A knowledge graph-enhanced RAG framework for biomedical problem solving using large language models","volume":"40","author":"Matsumoto","year":"2024","journal-title":"Bioinformatics"},{"key":"ref_149","doi-asserted-by":"crossref","unstructured":"Zeng, S., Zhang, J., He, P., Liu, Y., Xing, Y., Xu, H., Ren, J., Chang, Y., Wang, S., and Yin, D. (2024, January 11\u201316). The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG). Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.findings-acl.267"},{"key":"ref_150","unstructured":"Yu, H., Guo, P., and Sano, A. (2023, January 10). Zero-Shot ECG Diagnosis with Large Language Models and Retrieval-Augmented Generation. Proceedings of the Machine Learning for Health (ML4H), PMLR, New Orleans, LA, USA."},{"key":"ref_151","doi-asserted-by":"crossref","unstructured":"Jin, J., Zhu, Y., Dong, G., Zhang, Y., Yang, X., Zhang, C., Zhao, T., Yang, Z., Dou, Z., and Wen, J.R. (2024). FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research. arXiv.","DOI":"10.1145\/3701716.3715313"},{"key":"ref_152","unstructured":"Bouamor, H., Pino, J., and Bali, K. (2023, January 6\u201310). Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study. Proceedings of the EMNLP 2023\u20142023 Conference on Empirical Methods in Natural Language Processing, Singapore."},{"key":"ref_153","doi-asserted-by":"crossref","unstructured":"Hu, Y., Lei, Z., Zhang, Z., Pan, B., Ling, C., and Zhao, L. (2024). GRAG: Graph Retrieval-Augmented Generation. arXiv.","DOI":"10.18653\/v1\/2025.findings-naacl.232"},{"key":"ref_154","unstructured":"Levonian, Z., Li, C., Zhu, W., Gade, A., Henkel, O., Postle, M.E., and Xing, W. (2023). Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference. arXiv."},{"key":"ref_155","doi-asserted-by":"crossref","unstructured":"Yu, W. (2022, January 10\u201315). Retrieval-augmented generation across heterogeneous knowledge. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, Seattle, WA, USA and Online.","DOI":"10.18653\/v1\/2022.naacl-srw.7"},{"key":"ref_156","doi-asserted-by":"crossref","unstructured":"Du, X., and Ji, H. (2022, January 7\u201311). Retrieval-Augmented Generative Question Answering for Event Argument Extraction. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.emnlp-main.307"},{"key":"ref_157","doi-asserted-by":"crossref","unstructured":"Di Palma, D. (2023, January 18\u201322). Retrieval-Augmented Recommender System: Enhancing Recommender Systems with Large Language Models. Proceedings of the 17th ACM Conference on Recommender Systems, Singapore.","DOI":"10.1145\/3604915.3608889"},{"key":"ref_158","doi-asserted-by":"crossref","first-page":"i119","DOI":"10.1093\/bioinformatics\/btae238","article-title":"Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models","volume":"40","author":"Jeong","year":"2024","journal-title":"Bioinformatics"},{"key":"ref_159","doi-asserted-by":"crossref","unstructured":"Yu, W., Zhang, H., Pan, X., Cao, P., Ma, K., Li, J., Wang, H., and Yu, D. (2024, January 12\u201316). Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA.","DOI":"10.18653\/v1\/2024.emnlp-main.813"},{"key":"ref_160","unstructured":"Wang, Z., Liu, A., Lin, H., Li, J., Ma, X., and Liang, Y. (2024). RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation. arXiv."},{"key":"ref_161","unstructured":"Wu, Y., Zhu, J., Xu, S., Shum, K., Niu, C., Zhong, R., Song, J., and Zhang, T. (2023). RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models. arXiv."},{"key":"ref_162","doi-asserted-by":"crossref","unstructured":"Wang, X., Wang, Z., Gao, X., Zhang, F., Wu, Y., Xu, Z., Shi, T., Wang, Z., Li, S., and Qian, Q. (2024). Searching for Best Practices in Retrieval-Augmented Generation. arXiv.","DOI":"10.18653\/v1\/2024.emnlp-main.981"},{"key":"ref_163","unstructured":"Cheng, X., Luo, D., Chen, X., Liu, L., Zhao, D., and Yan, R. (2023). Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory. arXiv."},{"key":"ref_164","doi-asserted-by":"crossref","unstructured":"Wang, J., Jiang, R., Yang, C., Wu, Z., Onizuka, M., Shibasaki, R., Koshizuka, N., and Xiao, C. (2024). Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation. arXiv.","DOI":"10.52202\/079017-3957"},{"key":"ref_165","doi-asserted-by":"crossref","unstructured":"Yang, Y., Xu, C., Guo, J., Feng, T., and Ruan, C. (2024). Improving the RAG-based Personalized Discharge Care System by Introducing the Memory Mechanism. Preprints.","DOI":"10.20944\/preprints202410.1696.v1"},{"key":"ref_166","doi-asserted-by":"crossref","unstructured":"Baek, J., Chandrasekaran, N., Cucerzan, S., Herring, A., and Jauhar, S.K. (2024, January 13\u201317). Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion. Proceedings of the ACM Web Conference, Singapore.","DOI":"10.1145\/3589334.3645404"},{"key":"ref_167","doi-asserted-by":"crossref","unstructured":"Parvez, M.R., Ahmad, W.U., Chakraborty, S., Ray, B., and Chang, K.W. (2021, January 16\u201320). Retrieval Augmented Code Generation and Summarization. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.findings-emnlp.232"},{"key":"ref_168","doi-asserted-by":"crossref","unstructured":"Tian, Z., Bi, W., Li, X., and Zhang, N.L. (August, January 28). Learning to abstract for memory-augmented conversational response generation. Proceedings of the ACL 2019\u201457th Annual Meeting of the Association for Computational Linguistics, Florence, Italy.","DOI":"10.18653\/v1\/P19-1371"},{"key":"ref_169","doi-asserted-by":"crossref","unstructured":"Cheng, X., Wang, X., Zhang, X., Ge, T., Chen, S.Q., Wei, F., Zhang, H., and Zhao, D. (2024). xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token. arXiv.","DOI":"10.52202\/079017-3476"},{"key":"ref_170","doi-asserted-by":"crossref","unstructured":"Shi, E., Wang, Y., Tao, W., Du, L., Zhang, H., Han, S., Zhang, D., and Sun, H. (2022, January 7\u201311). RACE: Retrieval-augmented Commit Message Generation. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.emnlp-main.372"},{"key":"ref_171","unstructured":"Thulke, D., Daheim, N., Dugast, C., and Ney, H. (2021). Efficient Retrieval Augmented Generation from Unstructured Knowledge for Task-Oriented Dialog. arXiv."},{"key":"ref_172","doi-asserted-by":"crossref","first-page":"104662","DOI":"10.1016\/j.jbi.2024.104662","article-title":"Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records","volume":"156","author":"Alkhalaf","year":"2024","journal-title":"J. Biomed. Inform."},{"key":"ref_173","unstructured":"Ranjit, M., Ganapathy, G., Manuel, R., and Ganu, T. (2023). Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models. arXiv."},{"key":"ref_174","doi-asserted-by":"crossref","unstructured":"Dixit, T., Paranjape, B., Hajishirzi, H., and Zettlemoyer, L. (2022, January 7\u201311). CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.findings-emnlp.216"},{"key":"ref_175","doi-asserted-by":"crossref","unstructured":"Salemi, A., and Zamani, H. (2024, January 14\u201318). Evaluating Retrieval Quality in Retrieval-Augmented Generation. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA.","DOI":"10.1145\/3626772.3657957"},{"key":"ref_176","unstructured":"Tang, Y., and Yang, Y. (2024). MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries. arXiv."},{"key":"ref_177","unstructured":"Xue, J., Zheng, M., Hu, Y., Liu, F., Chen, X., and Lou, Q. (2024). BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models. arXiv."},{"key":"ref_178","unstructured":"Wooldridge, M., Dy, J., and Natarajan, S. (2024, January 20\u201327). Benchmarking Large Language Models in Retrieval-Augmented Generation. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada."},{"key":"ref_179","doi-asserted-by":"crossref","unstructured":"Deng, G., Liu, Y., Wang, K., Li, Y., Zhang, T., and Liu, Y. (2024). Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning. arXiv.","DOI":"10.14722\/aiscc.2024.23018"},{"key":"ref_180","doi-asserted-by":"crossref","unstructured":"Wu, K., Wu, E., and Zou, J. (2024). ClashEval: Quantifying the tug-of-war between an LLM\u2019s internal prior and external evidence. arXiv.","DOI":"10.52202\/079017-1053"},{"key":"ref_181","doi-asserted-by":"crossref","unstructured":"Chen, J., Hu, X., Li, Z., Gao, C., Xia, X., and Lo, D. (2024, January 14\u201320). Code Search is All You Need? Improving Code Suggestions with Code Search. Proceedings of the IEEE\/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, Lisbon, Portugal.","DOI":"10.1145\/3597503.3639085"},{"key":"ref_182","doi-asserted-by":"crossref","unstructured":"Liu, Y., Peng, X., Zhang, X., Liu, W., Yin, J., Cao, J., and Du, T. (2024, January 11\u201316). RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.findings-acl.281"},{"key":"ref_183","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1038\/s41746-024-01091-y","article-title":"Optimization of hepatological clinical guidelines interpretation by large language models: A retrieval augmented generation-based framework","volume":"7","author":"Kresevic","year":"2024","journal-title":"NPJ Digit. Med."},{"key":"ref_184","doi-asserted-by":"crossref","first-page":"2152","DOI":"10.1109\/TASLP.2021.3087948","article-title":"PROTOTYPE-TO-STYLE: Dialogue Generation with Style-Aware Editing on Retrieval Memory","volume":"29","author":"Su","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_185","doi-asserted-by":"crossref","unstructured":"Shi, W., Zhuang, Y., Zhu, Y., Iwinski, H., Wattenbarger, M., and Wang, M.D. (2023, January 3\u20136). Retrieval-Augmented Large Language Models for Adolescent Idiopathic Scoliosis Patients in Shared Decision-Making. Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Houston, TX, USA.","DOI":"10.1145\/3584371.3612956"},{"key":"ref_186","unstructured":"Colverd, G., Darm, P., Silverberg, L., and Kasmanoff, N. (2023). FloodBrain: Flood Disaster Reporting by Web-based Retrieval Augmented Generation with an LLM. arXiv."},{"key":"ref_187","doi-asserted-by":"crossref","unstructured":"Saad-Falcon, J., Khattab, O., Potts, C., and Zaharia, M. (2023). ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems. arXiv.","DOI":"10.18653\/v1\/2024.naacl-long.20"},{"key":"ref_188","doi-asserted-by":"crossref","unstructured":"Es, S., James, J., Espinosa-Anke, L., and Schockaert, S. (2023). RAGAS: Automated Evaluation of Retrieval Augmented Generation. arXiv.","DOI":"10.18653\/v1\/2024.eacl-demo.16"},{"key":"ref_189","doi-asserted-by":"crossref","unstructured":"Lyu, Y., Li, Z., Niu, S., Xiong, F., Tang, B., Wang, W., Wu, H., Liu, H., Xu, T., and Chen, E. (2024). CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models. arXiv.","DOI":"10.1145\/3701228"},{"key":"ref_190","first-page":"452","article-title":"Natural Questions: A Benchmark for Question Answering Research","volume":"7","author":"Kwiatkowski","year":"2019","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_191","unstructured":"Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., and Deng, L. (2016). MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv."},{"key":"ref_192","unstructured":"Butler, U. (2025, May 14). Open Australian Legal Corpus. Available online: https:\/\/huggingface.co\/datasets\/isaacus\/open-australian-legal-corpus."},{"key":"ref_193","unstructured":"Tuggener, D., von D\u00e4niken, P., Peetz, T., and Cieliebak, M. (2020, January 11\u201316). LEDGAR: A Large-Scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts. Proceedings of the Twelfth Language Resources and Evaluation Conference; European Language Resources Association, Marseille, France."},{"key":"ref_194","unstructured":"Wang, L.L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Burdick, D., Eide, D., Funk, K., Katsis, Y., and Kinney, R.M. (2020, January 5\u201310). CORD-19: The COVID-19 Open Research Dataset. Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, Online."},{"key":"ref_195","doi-asserted-by":"crossref","unstructured":"Jin, Q., Dhingra, B., Liu, Z., Cohen, W., and Lu, X. (2019, January 3\u20137). PubMedQA: A Dataset for Biomedical Research Question Answering. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1259"},{"key":"ref_196","doi-asserted-by":"crossref","unstructured":"Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., and Manning, C.D. (2018, January 1\u20134). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-1259"},{"key":"ref_197","doi-asserted-by":"crossref","unstructured":"Ho, X., Duong Nguyen, A.K., Sugawara, S., and Aizawa, A. (2020, January 8\u201313). Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps. Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain.","DOI":"10.18653\/v1\/2020.coling-main.580"},{"key":"ref_198","unstructured":"Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Dollar, P., and Zitnick, C.L. (2015). Microsoft COCO Captions: Data Collection and Evaluation Server. arXiv."},{"key":"ref_199","unstructured":"Husain, H., Wu, H.H., Gazit, T., Allamanis, M., and Brockschmidt, M. (2019). CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv."},{"key":"ref_200","doi-asserted-by":"crossref","unstructured":"Wilmot, D., and Keller, F. (2021, January 7\u201311). Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.65"},{"key":"ref_201","unstructured":"Chan, C.M., Xu, C., Yuan, R., Luo, H., Xue, W., Guo, Y., and Fu, J. (2024). RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation. arXiv."},{"key":"ref_202","doi-asserted-by":"crossref","unstructured":"Asai, A., Gardner, M., and Hajishirzi, H. (2022, January 10\u201315). Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2022), Seattle, WA, USA.","DOI":"10.18653\/v1\/2022.naacl-main.162"},{"key":"ref_203","doi-asserted-by":"crossref","unstructured":"Abdulrahman Alawwad, H., Alhothali, A., Naseem, U., Alkhathlan, A., and Jamal, A. (2024). Enhancing Textbook Question Answering Task with Large Language Models and Retrieval Augmented Generation. arXiv.","DOI":"10.2139\/ssrn.4761601"},{"key":"ref_204","unstructured":"Chaudhari, H., Severi, G., Abascal, J., Jagielski, M., Choquette-Choo, C.A., Nasr, M., Nita-Rotaru, C., and Oprea, A. (2024). Phantom: General Trigger Attacks on Retrieval Augmented Language Generation. arXiv."},{"key":"ref_205","unstructured":"Qi, Z., Zhang, H., Xing, E., Kakade, S., and Lakkaraju, H. (2024). Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems. arXiv."},{"key":"ref_206","doi-asserted-by":"crossref","unstructured":"Ovadia, O., Brief, M., Mishaeli, M., and Elisha, O. (2023). Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs. arXiv.","DOI":"10.18653\/v1\/2024.emnlp-main.15"},{"key":"ref_207","doi-asserted-by":"crossref","unstructured":"Salemi, A., Kallumadi, S., and Zamani, H. (2024, January 14\u201318). Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA.","DOI":"10.1145\/3626772.3657783"},{"key":"ref_208","doi-asserted-by":"crossref","unstructured":"Li, Z., Li, C., Zhang, M., Mei, Q., and Bendersky, M. (2024). Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach. arXiv.","DOI":"10.18653\/v1\/2024.emnlp-industry.66"},{"key":"ref_209","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1038\/s41587-024-02534-3","article-title":"A platform for the biomedical application of large language models","volume":"43","author":"Lobentanzer","year":"2025","journal-title":"Nat. Biotechnol."},{"key":"ref_210","unstructured":"Joshi, M., Choi, E., Weld, D., and Zettlemoyer, L. (August, January 30). TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada."},{"key":"ref_211","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1162\/tacl_a_00475","article-title":"MuSiQue: Multihop Questions via Single-hop Question Composition","volume":"10","author":"Trivedi","year":"2022","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_212","doi-asserted-by":"crossref","unstructured":"Thorne, J., Vlachos, A., Christodoulopoulos, C., and Mittal, A. (2018, January 1\u20136). FEVER: A Large-scale Dataset for Fact Extraction and VERification. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-1074"},{"key":"ref_213","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1162\/tacl_a_00370","article-title":"Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies","volume":"9","author":"Geva","year":"2021","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_214","unstructured":"Dinan, E., Roller, S., Shuster, K., Fan, A., Auli, M., and Weston, J. (2018). Wizard of Wikipedia: Knowledge-Powered Conversational agents. arXiv."},{"key":"ref_215","doi-asserted-by":"crossref","unstructured":"Berant, J., Chou, A., Frostig, R., and Liang, P. (2013, January 18\u201321). Semantic Parsing on Freebase from Question-Answer Pairs. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA.","DOI":"10.18653\/v1\/D13-1160"},{"key":"ref_216","unstructured":"Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O. (2018). Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. arXiv."},{"key":"ref_217","unstructured":"Fan, A., Jernite, Y., Perez, E., Grangier, D., Weston, J., and Auli, M. (August, January 28). ELI5: Long Form Question Answering. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_218","unstructured":"Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. (2020). Measuring Massive Multitask Language Understanding. arXiv."},{"key":"ref_219","doi-asserted-by":"crossref","unstructured":"Ko\u010disk\u00fd, T., Schwarz, J., Blunsom, P., Dyer, C., Hermann, K.M., Melis, G., and Grefenstette, E. (2017). The NarrativeQA Reading Comprehension Challenge. arXiv.","DOI":"10.1162\/tacl_a_00023"},{"key":"ref_220","doi-asserted-by":"crossref","unstructured":"Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D., and Hajishirzi, H. (2022). When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. arXiv.","DOI":"10.18653\/v1\/2023.acl-long.546"},{"key":"ref_221","doi-asserted-by":"crossref","unstructured":"Yih, S.W.t., Richardson, M., Meek, C., Chang, M.W., and Suh, J. (2016, January 7\u201312). The Value of Semantic Parse Labeling for Knowledge Base Question Answering. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany.","DOI":"10.18653\/v1\/P16-2033"},{"key":"ref_222","doi-asserted-by":"crossref","unstructured":"Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W.t. (2020, January 8\u201312). Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online.","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"ref_223","doi-asserted-by":"crossref","unstructured":"Stelmakh, I., Luan, Y., Dhingra, B., and Chang, M.W. (2022, January 7\u201311). ASQA: Factoid Questions Meet Long-Form Answers. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.emnlp-main.566"},{"key":"ref_224","doi-asserted-by":"crossref","unstructured":"Mihaylov, T., Clark, P., Khot, T., and Sabharwal, A. (2018). Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering. arXiv.","DOI":"10.18653\/v1\/D18-1260"},{"key":"ref_225","doi-asserted-by":"crossref","unstructured":"Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016, January 1\u20135). SQuAD: 100,000+ Questions for Machine Comprehension of Text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA.","DOI":"10.18653\/v1\/D16-1264"},{"key":"ref_226","unstructured":"Elsahar, H., Vougiouklis, P., Remaci, A., Gravier, C., Hare, J., Laforest, F., and Simperl, E. (2018, January 7\u201312). T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan."},{"key":"ref_227","doi-asserted-by":"crossref","unstructured":"Lin, S., Hilton, J., and Evans, O. (2022, January 22\u201327). TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland.","DOI":"10.18653\/v1\/2022.acl-long.229"},{"key":"ref_228","doi-asserted-by":"crossref","unstructured":"Levy, O., Seo, M., Choi, E., and Zettlemoyer, L. (2017, January 3\u20134). Zero-Shot Relation Extraction via Reading Comprehension. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, BC, Canada.","DOI":"10.18653\/v1\/K17-1034"},{"key":"ref_229","doi-asserted-by":"crossref","unstructured":"Reddy, S., Chen, D., and Manning, C.D. (2018). CoQA: A Conversational Question Answering Challenge. arXiv.","DOI":"10.1162\/tacl_a_00266"},{"key":"ref_230","doi-asserted-by":"crossref","unstructured":"Bai, Y., Lv, X., Zhang, J., Lyu, H., Tang, J., Huang, Z., Du, Z., Liu, X., Zeng, A., and Hou, L. (2023). LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding. arXiv.","DOI":"10.18653\/v1\/2024.acl-long.172"},{"key":"ref_231","first-page":"7432","article-title":"PIQA: Reasoning about Physical Commonsense in Natural Language","volume":"34","author":"Bisk","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_232","doi-asserted-by":"crossref","unstructured":"Dasigi, P., Lo, K., Beltagy, I., Cohan, A., Smith, N.A., and Gardner, M. (2021). A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers. arXiv.","DOI":"10.18653\/v1\/2021.naacl-main.365"},{"key":"ref_233","doi-asserted-by":"crossref","first-page":"8548","DOI":"10.1002\/int.22955","article-title":"A medical question answering system using large language models and knowledge graphs","volume":"37","author":"Guo","year":"2022","journal-title":"Int. J. Intell. Syst."},{"key":"ref_234","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1162\/tacl_a_00362","article-title":"WikiAsp: A Dataset for Multi-domain Aspect-based Summarization","volume":"9","author":"Hayashi","year":"2021","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_235","doi-asserted-by":"crossref","unstructured":"Yang, Y., Yih, W.T., and Meek, C. (2015, January 17\u201321). WikiQA: A challenge dataset for open-domain question answering. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1237"},{"key":"ref_236","doi-asserted-by":"crossref","unstructured":"Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N.A., and Lewis, M. (2022). Measuring and Narrowing the Compositionality Gap in Language Models. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.378"},{"key":"ref_237","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1038\/s41597-023-02068-4","article-title":"BioASQ-QA: A manually curated corpus for Biomedical Question Answering","volume":"10","author":"Krithara","year":"2023","journal-title":"Sci. Data"},{"key":"ref_238","unstructured":"Clark, C., Lee, K., Chang, M.W., Kwiatkowski, T., Collins, M., and Toutanova, K. (2019, January 2\u20137). BoolQ: Exploring the Surprising Difficulty of Natural Yes\/No Questions. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_239","unstructured":"See, A., Liu, P.J., and Manning, C.D. (August, January 30). Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada."},{"key":"ref_240","unstructured":"Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C., Drain, D., Jiang, D., and Tang, D. (2021). CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arXiv."},{"key":"ref_241","unstructured":"Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P.J. (2019). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv."},{"key":"ref_242","unstructured":"Wenzek, G., Lachaux, M.A., Conneau, A., Chaudhary, V., Guzm\u00e1n, F., Joulin, A., and Grave, E. (2020, January 11\u201316). CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data. Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_243","unstructured":"Talmor, A., Herzig, J., Lourie, N., and Berant, J. (2019, January 2\u20137). CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_244","doi-asserted-by":"crossref","unstructured":"Sharma, P., Ding, N., Goodman, S., and Soricut, R. (2018, January 15\u201320). Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1238"},{"key":"ref_245","unstructured":"Conover, M., Hayes, M., Mathur, A., Xie, J., Wan, J., Shah, S., Ghodsi, A., Wendell, P., Zaharia, M., and Xin, R. (2025, May 14). Databricks-Dolly-15K. Available online: https:\/\/www.databricks.com\/blog\/2023\/04\/12\/dolly-first-open-commercially-viable-instruction-tuned-llm."},{"key":"ref_246","doi-asserted-by":"crossref","unstructured":"Saha, S., Yadav, P., Bauer, L., and Bansal, M. (2021, January 7\u201311). ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.609"},{"key":"ref_247","doi-asserted-by":"crossref","unstructured":"Jia, X., Gavves, E., Fernando, B., and Tuytelaars, T. (2015). Guiding Long-Short Term Memory for Image Caption Generation. arXiv.","DOI":"10.1109\/ICCV.2015.277"},{"key":"ref_248","doi-asserted-by":"crossref","unstructured":"Luo, M., Zeng, Y., Banerjee, P., and Baral, C. (2021, January 7\u201311). Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.517"},{"key":"ref_249","unstructured":"Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., and Choi, Y. (August, January 28). HellaSwag: Can a Machine Really Finish Your Sentence?. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_250","doi-asserted-by":"crossref","unstructured":"Ferguson, J., Gardner, M., Hajishirzi, H., Khot, T., and Dasigi, P. (2020, January 16\u201320). IIRC: A Dataset of Incomplete Information Reading Comprehension Questions. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online.","DOI":"10.18653\/v1\/2020.emnlp-main.86"},{"key":"ref_251","unstructured":"Schuhmann, C., Vencu, R., Beaumont, R., Kaczmarczyk, R., Mullis, C., Katta, A., Coombes, T., Jitsev, J., and Komatsuzaki, A. (2021). LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs. arXiv."},{"key":"ref_252","unstructured":"Talmor, A., Yoran, O., Catav, A., Lahav, D., Wang, Y., Asai, A., Ilharco, G., Hajishirzi, H., and Berant, J. (2021). MultiModalQA: Complex Question Answering over Text, Tables and Images. arXiv."},{"key":"ref_253","doi-asserted-by":"crossref","unstructured":"Marino, K., Rastegari, M., Farhadi, A., and Mottaghi, R. (2019). OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge. arXiv.","DOI":"10.1109\/CVPR.2019.00331"},{"key":"ref_254","unstructured":"Zhang, T., Luo, H., Chuang, Y.S., Fang, W., Gaitskell, L., Hartvigsen, T., Wu, X., Fox, D., Meng, H., and Glass, J. (2023). Interpretable Unified Language Checking. arXiv."},{"key":"ref_255","unstructured":"(2025, May 14). PubMed Database, Available online: https:\/\/pubmed.ncbi.nlm.nih.gov\/."},{"key":"ref_256","doi-asserted-by":"crossref","unstructured":"Zhong, M., Yin, D., Yu, T., Zaidi, A., Mutuma, M., Jha, R., Awadallah, A.H., Celikyilmaz, A., Liu, Y., and Qiu, X. (2021, January 6\u201311). QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online.","DOI":"10.18653\/v1\/2021.naacl-main.472"},{"key":"ref_257","unstructured":"Zellers, R., Holtzman, A., Rashkin, H., Bisk, Y., Farhadi, A., Roesner, F., and Choi, Y. (2019, January 8\u201314). Defending against neural fake news. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_258","unstructured":"(2025, May 14). WikiData. Available online: https:\/\/www.wikipedia.org\/."},{"key":"ref_259","unstructured":"Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., Dwivedi-Yu, J., Joulin, A., Riedel, S., and Grave, E. (2022). Atlas: Few-shot Learning with Retrieval Augmented Language Models. arXiv."},{"key":"ref_260","doi-asserted-by":"crossref","unstructured":"Li, S., Ji, H., and Han, J. (2021, January 6\u201311). Document-Level Event Argument Extraction by Conditional Generation. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online.","DOI":"10.18653\/v1\/2021.naacl-main.69"},{"key":"ref_261","unstructured":"Merity, S., Xiong, C., Bradbury, J., and Socher, R. (2016). Pointer Sentinel Mixture Models. arXiv."},{"key":"ref_262","doi-asserted-by":"crossref","unstructured":"Craswell, N., Mitra, B., Yilmaz, E., Campos, D., and Voorhees, E.M. (2020). Overview of the TREC 2019 deep learning track. arXiv.","DOI":"10.6028\/NIST.SP.1266.deep-overview"},{"key":"ref_263","doi-asserted-by":"crossref","unstructured":"Craswell, N., Mitra, B., Yilmaz, E., Campos, D.F., and Voorhees, E.M. (2021). Overview of the TREC 2020 Deep Learning Track. arXiv.","DOI":"10.6028\/NIST.SP.1266.deep-overview"},{"key":"ref_264","unstructured":"Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. (2004, January 26\u201328). The Automatic Content Extraction (ACE) Program\u2014Tasks, Data, and Evaluation. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC\u201904), Lisbon, Portugal."},{"key":"ref_265","doi-asserted-by":"crossref","unstructured":"Krishna, R., Hata, K., Ren, F., Fei-Fei, L., and Niebles, J.C. (2017). Dense-Captioning Events in Videos. arXiv.","DOI":"10.1109\/ICCV.2017.83"},{"key":"ref_266","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1016\/j.jbi.2012.04.008","article-title":"Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports","volume":"45","author":"Gurulingappa","year":"2012","journal-title":"J. Biomed. Inform."},{"key":"ref_267","unstructured":"Lu, W., Zeng, Z., Wang, J., Lu, Z., Chen, Z., Zhuang, H., and Chen, C. (2024). Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge. arXiv."},{"key":"ref_268","doi-asserted-by":"crossref","unstructured":"Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., and Kiela, D. (2020, January 5\u201310). Adversarial NLI: A New Benchmark for Natural Language Understanding. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.441"},{"key":"ref_269","unstructured":"Mao, J., Ye, J., Qian, Y., Pavone, M., and Wang, Y. (2023). A Language Agent for Autonomous Driving. arXiv."},{"key":"ref_270","unstructured":"Zhang, X., Zhao, J., and LeCun, Y. (2015, January 7\u201312). Character-level Convolutional Networks for Text Classification. Proceedings of the 29th International Conference on Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_271","unstructured":"Hoffart, J., Yosef, M.A., Bordino, I., F\u00fcrstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., and Weikum, G. (2011, January 27\u201331). Robust Disambiguation of Named Entities in Text. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK."},{"key":"ref_272","doi-asserted-by":"crossref","unstructured":"Xiao, Y., Hou, Y., Zhou, H., Diallo, G., Fiszman, M., Wolfson, J., Kilicoglu, H., Chen, Y., Su, C., and Xu, H. (2023). Repurposing Non-pharmacological Interventions for Alzheimer\u2019s Diseases through Link Prediction on Biomedical Literature. medRxiv.","DOI":"10.1101\/2023.05.15.23290002"},{"key":"ref_273","doi-asserted-by":"crossref","first-page":"e46777","DOI":"10.2196\/46777","article-title":"The Alzheimer\u2019s Knowledge Base: A Knowledge Graph for Alzheimer Disease Research","volume":"26","author":"Romano","year":"2024","journal-title":"J. Med. Internet Res."},{"key":"ref_274","doi-asserted-by":"crossref","unstructured":"Dong, L., Huang, S., Wei, F., Lapata, M., Zhou, M., and Xu, K. (2017, January 3\u20137). Learning to Generate Product Reviews from Attributes. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain.","DOI":"10.18653\/v1\/E17-1059"},{"key":"ref_275","doi-asserted-by":"crossref","unstructured":"McAuley, J., and Leskovec, J. (2013, January 12\u201316). Hidden factors and hidden topics: Understanding rating dimensions with review text. Proceedings of the 7th ACM Conference on Recommender Systems, Hong Kong, China.","DOI":"10.1145\/2507157.2507163"},{"key":"ref_276","doi-asserted-by":"crossref","unstructured":"Min, S., Michael, J., Hajishirzi, H., and Zettlemoyer, L. (2020, January 16\u201320). AmbigQA: Answering Ambiguous Open-domain Questions. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online.","DOI":"10.18653\/v1\/2020.emnlp-main.466"},{"key":"ref_277","unstructured":"Penzel, T., Moody, G.B., Mark, R.G., Goldberger, A.L., and Peter, J.H. (2000, January 24\u201327). The apnea-ECG database. Proceedings of the Computers in Cardiology 2000, Vol.27 (Cat. 00CH37163), Cambridge, MA, USA."},{"key":"ref_278","unstructured":"Oard, D., Webber, W., Kirsch, D., and Golitsynskiy, S. (2015). Avocado Research Email Collection, Linguistic Data Consortium."},{"key":"ref_279","doi-asserted-by":"crossref","unstructured":"Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P.M., and Bowman, S. (2022, January 22\u201327). BBQ: A hand-built bias benchmark for question answering. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.findings-acl.165"},{"key":"ref_280","unstructured":"Sharma, E., Li, C., and Wang, L. (August, January 28). BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_281","unstructured":"(2025, May 14). Microsoft. Bing. Available online: https:\/\/www.bing.com\/."},{"key":"ref_282","doi-asserted-by":"crossref","unstructured":"Min, S., Krishna, K., Lyu, X., Lewis, M., Yih, W.t., Koh, P., Iyyer, M., Zettlemoyer, L., and Hajishirzi, H. (2023, January 6\u201310). FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.741"},{"key":"ref_283","doi-asserted-by":"crossref","first-page":"D712","DOI":"10.1093\/nar\/gkw1128","article-title":"The Monarch Initiative: An integrative data and analytic platform connecting phenotypes to genotypes across species","volume":"45","author":"Mungall","year":"2017","journal-title":"Nucleic Acids Res."},{"key":"ref_284","doi-asserted-by":"crossref","unstructured":"Chalkidis, I., Jana, A., Hartung, D., Bommarito, M., Androutsopoulos, I., Katz, D., and Aletras, N. (2022, January 22\u201327). LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland.","DOI":"10.18653\/v1\/2022.acl-long.297"},{"key":"ref_285","unstructured":"Bondarenko, M., Kerr, D., Sorichetta, A., and Tatem, A. (2025, May 14). Census\/projection-disaggregated gridded population datasets for 189 countries in 2020 using Built-Settlement Growth Model (BSGM) outputs [Dataset]. University of Southampton, Southampton, UK, 2020. Available online: https:\/\/www.worldpop.org\/doi\/10.5258\/SOTON\/WP00684."},{"key":"ref_286","unstructured":"Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv."},{"key":"ref_287","doi-asserted-by":"crossref","unstructured":"Edwards, C., Zhai, C., and Ji, H. (2021, January 7\u201311). Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.47"},{"key":"ref_288","doi-asserted-by":"crossref","first-page":"D367","DOI":"10.1093\/nar\/gkq906","article-title":"ChemProt: A disease chemical biology database","volume":"39","author":"Taboureau","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"ref_289","unstructured":"Chen, Z., Hern\u00e1ndez Cano, A., Romanou, A., Bonnet, A., Matoba, K., Salvi, F., Pagliardini, M., Fan, S., K\u00f6pf, A., and Mohtashami, A. (2023). MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. arXiv."},{"key":"ref_290","doi-asserted-by":"crossref","unstructured":"Tufano, M., Watson, C., Bavota, G., Di Penta, M., White, M., and Poshyvanyk, D. (2018). An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. arXiv.","DOI":"10.1145\/3238147.3240732"},{"key":"ref_291","first-page":"12","article-title":"CodeMatcher: Searching Code Based on Sequential Semantics of Important Query Words","volume":"31","author":"Liu","year":"2021","journal-title":"ACM Trans. Softw. Eng. Methodol."},{"key":"ref_292","unstructured":"(2025, May 14). CodeParrot. github-jupyter. Available online: https:\/\/huggingface.co\/datasets\/codeparrot\/github-jupyter."},{"key":"ref_293","first-page":"4444","article-title":"ConceptNet 5.5: An Open Multilingual Graph of General Knowledge","volume":"31","author":"Speer","year":"2017","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_294","doi-asserted-by":"crossref","unstructured":"Changpinyo, S., Sharma, P., Ding, N., and Soricut, R. (2021). Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training to Recognize Long-Tail Visual Concepts. arXiv.","DOI":"10.1109\/CVPR46437.2021.00356"},{"key":"ref_295","doi-asserted-by":"crossref","unstructured":"Iyer, S., Konstas, I., Cheung, A., and Zettlemoyer, L. (November, January 31). Mapping Language to Code in Programmatic Context. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-1192"},{"key":"ref_296","doi-asserted-by":"crossref","unstructured":"Tjong Kim Sang, E.F., and De Meulder, F. (June, January 31). Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, AB, Canada.","DOI":"10.3115\/1119176.1119195"},{"key":"ref_297","unstructured":"Roth, D., and Yih, W.t. (2004, January 6\u20137). A Linear Programming Formulation for Global Inference in Natural Language Tasks. Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, Boston, MA, USA."},{"key":"ref_298","doi-asserted-by":"crossref","unstructured":"Wu, C.S., Madotto, A., Liu, W., Fung, P., and Xiong, C. (2021). QAConv: Question Answering on Informative Conversations. arXiv.","DOI":"10.18653\/v1\/2022.acl-long.370"},{"key":"ref_299","doi-asserted-by":"crossref","unstructured":"Chen, Z., Li, S., Smiley, C., Ma, Z., Shah, S., and Wang, W.Y. (2022, January 7\u201311). ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.emnlp-main.421"},{"key":"ref_300","unstructured":"Byeon, M., Park, B., Kim, H., Lee, S., Baek, W., and Kim, S. (2022). COYO-700M: Image-Text Pair Dataset. arXiv."},{"key":"ref_301","unstructured":"Onoe, Y., Zhang, M.J.Q., Choi, E., and Durrett, G. (2021). CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge. arXiv."},{"key":"ref_302","unstructured":"Ding, Y., Wang, Z., Ahmad, W.U., Ding, H., Tan, M., Jain, N., Krishna Ramanathan, M., Nallapati, R., Bhatia, P., and Roth, D. (2023). CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion. arXiv."},{"key":"ref_303","unstructured":"Talmor, A., Yoran, O., Le Bras, R., Bhagavatula, C., Goldberg, Y., Choi, Y., and Berant, J. (2022). CommonsenseQA 2.0: Exposing the Limits of AI through Gamification. arXiv."},{"key":"ref_304","doi-asserted-by":"crossref","unstructured":"Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G., San Juan, E., Capellato, L., and Ferro, N. (2015). Modeling of the Question Answering Task in the YodaQA System. Experimental IR Meets Multilinguality, Multimodality, and Interaction, Springer International Publishing.","DOI":"10.1007\/978-3-319-24027-5"},{"key":"ref_305","unstructured":"Vignav, C.N., and Rajpurkar, P. (2022). CXR-PRO: MIMIC-CXR with Prior References Omitted (version 1.0.0). PhysioNet."},{"key":"ref_306","first-page":"8749","article-title":"CASIE: Extracting Cybersecurity Event Information from Text","volume":"34","author":"Satyapanich","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_307","unstructured":"Li, Y., Su, H., Shen, X., Li, W., Cao, Z., and Niu, S. (December, January 27). DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Nagoya, Japan."},{"key":"ref_308","doi-asserted-by":"crossref","unstructured":"Just, R., Jalali, D., and Ernst, M.D. (2014, January 21\u201325). Defects4J: A database of existing faults to enable controlled testing studies for Java programs. Proceedings of the 2014 International Symposium on Software Testing and Analysis, San Jose, CA, USA.","DOI":"10.1145\/2610384.2628055"},{"key":"ref_309","unstructured":"(2025, May 14). DIG Minecraft. Available online: https:\/\/www.digminecraft.com\/."},{"key":"ref_310","unstructured":"Dua, D., Wang, Y., Dasigi, P., Stanovsky, G., Singh, S., and Gardner, M. (2019, January 2\u20137). DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MI, USA."},{"key":"ref_311","doi-asserted-by":"crossref","unstructured":"Oda, Y., Fudaba, H., Neubig, G., Hata, H., Sakti, S., Toda, T., and Nakamura, S. (2015, January 9\u201313). Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation. Proceedings of the 2015 30th IEEE\/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA.","DOI":"10.1109\/ASE.2015.36"},{"key":"ref_312","doi-asserted-by":"crossref","unstructured":"Feng, S., Wan, H., Gunasekara, C., Patel, S., Joshi, S., and Lastras, L. (2020, January 16\u201320). doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online.","DOI":"10.18653\/v1\/2020.emnlp-main.652"},{"key":"ref_313","unstructured":"Wang, S., Liu, J., Song, S., Cheng, J., Fu, Y., Guo, P., Fang, K., Zhu, Y., and Dou, Z. (2024). DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation. arXiv."},{"key":"ref_314","doi-asserted-by":"crossref","unstructured":"Campos, J.A., Otegi, A., Soroa, A., Deriu, J., Cieliebak, M., and Agirre, E. (2020, January 5\u201310). DoQA\u2014Accessing Domain-Specific FAQs via Conversational QA. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual.","DOI":"10.18653\/v1\/2020.acl-main.652"},{"key":"ref_315","unstructured":"Segura-Bedmar, I., Mart\u00ednez, P., and Herrero-Zazo, M. (2013, January 14\u201315). SemEval-2013 Task 9: Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013). Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, GA, USA."},{"key":"ref_316","unstructured":"(2025, May 14). DynaMed. Available online: https:\/\/www.dynamed.com\/."},{"key":"ref_317","doi-asserted-by":"crossref","unstructured":"Shi, W., Xu, R., Zhuang, Y., Yu, Y., Zhang, J., Wu, H., Zhu, Y., Ho, J., Yang, C., and Wang, M.D. (2024, January 12\u201316). EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA.","DOI":"10.18653\/v1\/2024.emnlp-main.1245"},{"key":"ref_318","first-page":"730","article-title":"Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory","volume":"32","author":"Zhou","year":"2018","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_319","doi-asserted-by":"crossref","unstructured":"Zhang, X., Chen, Y., Hu, S., Xu, Z., Chen, J., Khai Hao, M., Han, X., Leng Thai, Z., Wang, S., and Liu, Z. (2024). \u221eBench: Extending Long Context Evaluation Beyond 100K Tokens. arXiv.","DOI":"10.18653\/v1\/2024.acl-long.814"},{"key":"ref_320","doi-asserted-by":"crossref","unstructured":"Mensink, T., Uijlings, J., Castrejon, L., Goel, A., Cadar, F., Zhou, H., Sha, F., Araujo, A., and Ferrari, V. (2023). Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories. arXiv.","DOI":"10.1109\/ICCV51070.2023.00289"},{"key":"ref_321","doi-asserted-by":"crossref","unstructured":"Sciavolino, C., Zhong, Z., Lee, J., and Chen, D. (2021, January 7\u201311). Simple Entity-Centric Questions Challenge Dense Retrievers. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.496"},{"key":"ref_322","doi-asserted-by":"crossref","unstructured":"European Association for the Study of the Liver (2020). EASL recommendations on treatment of hepatitis C: Final update of the series. J. Hepatol., 73, 1170\u20131218.","DOI":"10.1016\/j.jhep.2020.08.018"},{"key":"ref_323","doi-asserted-by":"crossref","unstructured":"Narayan, S., Cohen, S.B., and Lapata, M. (November, January 31). Don\u2019t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-1206"},{"key":"ref_324","unstructured":"(2025, May 14). Facebook Books Dataset. Available online: https:\/\/github.com\/sisinflab\/LinkedDatasets\/tree\/master\/facebook_book."},{"key":"ref_325","doi-asserted-by":"crossref","unstructured":"Aly, R., Guo, Z., Schlichtkrull, M., Thorne, J., Vlachos, A., Christodoulopoulos, C., Cocarascu, O., and Mittal, A. (2021). FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information. arXiv.","DOI":"10.18653\/v1\/2021.fever-1.1"},{"key":"ref_326","doi-asserted-by":"crossref","unstructured":"Park, J., Min, S., Kang, J., Zettlemoyer, L., and Hajishirzi, H. (2021). FaVIQ: FAct Verification from Information-seeking Questions. arXiv.","DOI":"10.18653\/v1\/2022.acl-long.354"},{"key":"ref_327","doi-asserted-by":"crossref","unstructured":"Kim, J., Park, S., Kwon, Y., Jo, Y., Thorne, J., and Choi, E. (2023). FactKG: Fact Verification via Reasoning on Knowledge Graphs. arXiv.","DOI":"10.18653\/v1\/2023.acl-long.895"},{"key":"ref_328","unstructured":"Lee, N., Ping, W., Xu, P., Patwary, M., Fung, P.N., Shoeybi, M., and Catanzaro, B. (December, January 28). Factuality Enhanced Language Models for Open-Ended Text Generation. Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA."},{"key":"ref_329","doi-asserted-by":"crossref","unstructured":"Kalyan, A., Kumar, A., Chandrasekaran, A., Sabharwal, A., and Clark, P. (2021, January 7\u201311). How much coffee was consumed during EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.582"},{"key":"ref_330","unstructured":"Islam, P., Kannappan, A., Kiela, D., Qian, R., Scherrer, N., and Vidgen, B. (2023). FinanceBench: A New Benchmark for Financial Question Answering. arXiv."},{"key":"ref_331","unstructured":"Jiang, K., Wu, D., and Jiang, H. (2019, January 2\u20137). FreebaseQA: A New Factoid QA Data Set Matching Trivia-Style Question-Answer Pairs with Freebase. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MI, USA."},{"key":"ref_332","doi-asserted-by":"crossref","unstructured":"Vu, T., Iyyer, M., Wang, X., Constant, N., Wei, J., Wei, J., Tar, C., Sung, Y.H., Zhou, D., and Le, Q. (2024, January 11\u201316). FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.findings-acl.813"},{"key":"ref_333","doi-asserted-by":"crossref","unstructured":"Zong, Y., and Qiu, X. (2024, January 11\u201316). GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.findings-acl.521"},{"key":"ref_334","unstructured":"Su, Y., Cai, D., Wang, Y., Baker, S., Korhonen, A., Collier, N., and Liu, X. (2020). Stylistic Dialogue Generation via Information-Guided Reinforcement Learning Strategy. arXiv."},{"key":"ref_335","unstructured":"Li, M., Zhou, H., and Zhang, R. (2023). Benchingmaking Large Langage Models in Biomedical Triple Extraction. arXiv."},{"key":"ref_336","unstructured":"Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., and Neubig, G. (2022). PAL: Program-aided Language Models. arXiv."},{"key":"ref_337","unstructured":"Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., and Nakano, R. (2021). Training Verifiers to Solve Math Word Problems. arXiv."},{"key":"ref_338","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tan, C. (2021, January 10). Investigating the Effect of Natural Language Explanations on Out-of-Distribution Generalization in Few-shot NLI. Proceedings of the Second Workshop on Insights from Negative Results in NLP, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.insights-1.17"},{"key":"ref_339","unstructured":"Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., and Nabeshima, N. (2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv."},{"key":"ref_340","unstructured":"(2025, May 14). Harvard Law Case Corpus. Available online: https:\/\/case.law\/."},{"key":"ref_341","doi-asserted-by":"crossref","unstructured":"Luo, Y., Shi, M., Osama Khan, M., Muneeb Afzal, M., Huang, H., Yuan, S., Tian, Y., Song, L., Kouhana, A., and Elze, T. (2024). FairCLIP: Harnessing Fairness in Vision-Language Learning. arXiv.","DOI":"10.1109\/CVPR52733.2024.01168"},{"key":"ref_342","doi-asserted-by":"crossref","unstructured":"Li, Y., Li, Z., Zhang, K., Dan, R., Jiang, S., and Zhang, Y. (2023). ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge. arXiv.","DOI":"10.7759\/cureus.40895"},{"key":"ref_343","doi-asserted-by":"crossref","unstructured":"Ling, W., Blunsom, P., Grefenstette, E., Hermann, K.M., Ko\u010disk\u00fd, T., Wang, F., and Senior, A. (2016, January 7\u201312). Latent Predictor Networks for Code Generation. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany.","DOI":"10.18653\/v1\/P16-1057"},{"key":"ref_344","unstructured":"Chen, M., Tworek, J., Jun, H., Yuan, Q., Ponde de Oliveira Pinto, H., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., and Brockman, G. (2021). Evaluating Large Language Models Trained on Code. arXiv."},{"key":"ref_345","unstructured":"Liu, J., Xia, C.S., Wang, Y., and Zhang, L. (2023, January 10\u201316). Is your code generated by ChatGPT really correct? rigorous evaluation of large language models for code generation. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA."},{"key":"ref_346","doi-asserted-by":"crossref","unstructured":"Nakamura, K., Levy, S., Tuan, Y.L., Chen, W., and Wang, W.Y. (2022, January 22\u201327). HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.findings-acl.41"},{"key":"ref_347","unstructured":"(2025, May 14). IMDb. IMDb Non-Commercial Datasets. Available online: https:\/\/developer.imdb.com\/non-commercial-datasets\/."},{"key":"ref_348","unstructured":"(2025, May 14). Community, Infineon Developer Community. Developer Community Forum Questions. Available online: https:\/\/community.infineon.com\/."},{"key":"ref_349","unstructured":"(2025, May 14). Documents, Infineon Technologies. XENSIV\u2122\u2014Sensing the World Sensor Solutions for Automotive, Industrial, Consumer and IoT Applications. Available online: https:\/\/www.infineon.com\/cms\/en\/product\/sensor\/mems-microphones\/."},{"key":"ref_350","doi-asserted-by":"crossref","unstructured":"Chen, Y., Hu, H., Luan, Y., Sun, H., Changpinyo, S., Ritter, A., and Chang, M.W. (2023, January 6\u201310). Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.925"},{"key":"ref_351","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1162\/tacl_a_00559","article-title":"InSCIt: Information-Seeking Conversations with Mixed-Initiative Interactions","volume":"11","author":"Wu","year":"2023","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_352","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1093\/jamia\/ocv080","article-title":"Preparing a collection of radiology examinations for distribution and retrieval","volume":"23","author":"Kohli","year":"2016","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_353","unstructured":"Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufi\u015f, D., and Varga, D. (2006, January 22\u201328). The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages. Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC\u201906), Genoa, Italy."},{"key":"ref_354","doi-asserted-by":"crossref","unstructured":"Petroni, F., Piktus, A., Fan, A., Lewis, P., Yazdani, M., De Cao, N., Thorne, J., Jernite, Y., Karpukhin, V., and Maillard, J. (2021, January 6\u201311). KILT: A Benchmark for Knowledge Intensive Language Tasks. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online.","DOI":"10.18653\/v1\/2021.naacl-main.200"},{"key":"ref_355","doi-asserted-by":"crossref","unstructured":"Paperno, D., Kruszewski, G., Lazaridou, A., Pham, N.Q., Bernardi, R., Pezzelle, S., Baroni, M., Boleda, G., and Fern\u00e1ndez, R. (2016, January 7\u201312). The LAMBADA dataset: Word prediction requiring a broad discourse context. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany.","DOI":"10.18653\/v1\/P16-1144"},{"key":"ref_356","doi-asserted-by":"crossref","unstructured":"Salemi, A., Mysore, S., Bendersky, M., and Zamani, H. (2023). LaMP: When Large Language Models Meet Personalization. arXiv.","DOI":"10.18653\/v1\/2024.acl-long.399"},{"key":"ref_357","doi-asserted-by":"crossref","unstructured":"Guha, N., Nyarko, J., Ho, D.E., R\u00e9, C., Chilton, A., Narayana, A., Chohlas-Wood, A., Peters, A., Waldon, B., and Rockmore, D.N. (2023). LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. arXiv.","DOI":"10.2139\/ssrn.4583531"},{"key":"ref_358","unstructured":"Shuster, K., Urbanek, J., Dinan, E., Szlam, A., and Weston, J. (2020). Deploying Lifelong Open-Domain Dialogue Learning. arXiv."},{"key":"ref_359","doi-asserted-by":"crossref","unstructured":"Ben Abacha, A., Agichtein, E., Pinter, Y., and Demner-Fushman, D. (2017, January 15\u201317). Overview of the Medical Question Answering Task at TREC 2017 LiveQA. Proceedings of the Text REtrieval Conference, Gaithersburg, MD, USA.","DOI":"10.6028\/NIST.SP.500-324.qa-overview"},{"key":"ref_360","unstructured":"(2025, May 14). Lyft_2021. Available online: https:\/\/raw.githubusercontent.com\/run-llama\/llama_index\/main\/docs\/docs\/examples\/data\/10k\/lyft_2021.pdf."},{"key":"ref_361","doi-asserted-by":"crossref","unstructured":"Yue, X., Ni, Y., Zhang, K., Zheng, T., Liu, R., Zhang, G., Stevens, S., Jiang, D., Ren, W., and Sun, Y. (2023). MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI. arXiv.","DOI":"10.1109\/CVPR52733.2024.00913"},{"key":"ref_362","unstructured":"Lu, P., Bansal, H., Xia, T., Liu, J., Li, C., Hajishirzi, H., Cheng, H., Chang, K.W., Galley, M., and Gao, J. (2023). MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts. arXiv."},{"key":"ref_363","unstructured":"(2025, May 14). MTsample. Available online: https:\/\/mtsamples.com\/."},{"key":"ref_364","first-page":"25","article-title":"Bridging the Gap Between Consumers\u2019 Medication Questions and Trusted Answers","volume":"264","author":"Abacha","year":"2019","journal-title":"Stud. Health Technol. Inform."},{"key":"ref_365","unstructured":"Zhang, X., Tian, C., Yang, X., Chen, L., Li, Z., and Petzold, L.R. (2023). AlpaCare:Instruction-tuned Large Language Models for Medical Application. arXiv."},{"key":"ref_366","unstructured":"Pal, A., Umapathi, L.K., and Sankarasubbu, M. (2022, January 7\u20138). MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. Proceedings of the Conference on Health, Inference, and Learning, Virtual."},{"key":"ref_367","doi-asserted-by":"crossref","unstructured":"Jin, D., Pan, E., Oufattole, N., Weng, W.H., Fang, H., and Szolovits, P. (2021). What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams. Appl. Sci., 11.","DOI":"10.20944\/preprints202105.0498.v1"},{"key":"ref_368","first-page":"6069","article-title":"Variational Reasoning for Question Answering with Knowledge Graph","volume":"32","author":"Zhang","year":"2018","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_369","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., and Doll\u00e1r, P. (2014). Microsoft COCO: Common Objects in Context. arXiv.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_370","doi-asserted-by":"crossref","unstructured":"Dolan, B., Quirk, C., and Brockett, C. (2004, January 23\u201327). Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. Proceedings of the COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland.","DOI":"10.3115\/1220355.1220406"},{"key":"ref_371","unstructured":"Chen, D., and Dolan, W. (2011, January 19\u201324). Collecting Highly Parallel Data for Paraphrase Evaluation. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies; Association for Computational Linguistics, Portland, OR, USA."},{"key":"ref_372","doi-asserted-by":"crossref","unstructured":"Xu, J., Mei, T., Yao, T., and Rui, Y. (2016, January 27\u201330). MSR-VTT: A Large Video Description Dataset for Bridging Video and Language. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.571"},{"key":"ref_373","doi-asserted-by":"crossref","unstructured":"Johnson, A.E.W., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., and Horng, S. (2019). MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv.","DOI":"10.1038\/s41597-019-0322-0"},{"key":"ref_374","unstructured":"(2025, May 14). Minecraft Wiki. Available online: https:\/\/minecraft.wiki\/."},{"key":"ref_375","unstructured":"Sen, P., Aji, A.F., and Saffari, A. (2022, January 12\u201317). Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering. Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea."},{"key":"ref_376","doi-asserted-by":"crossref","unstructured":"Liu, Y., Duan, H., Zhang, Y., Li, B., Zhang, S., Zhao, W., Yuan, Y., Wang, J., He, C., and Liu, Z. (2023). MMBench: Is Your Multi-modal Model an All-around Player?. arXiv.","DOI":"10.1007\/978-3-031-72658-3_13"},{"key":"ref_377","unstructured":"Fang, Y., Liang, X., Zhang, N., Liu, K., Huang, R., Chen, Z., Fan, X., and Chen, H. (2023). Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models. arXiv."},{"key":"ref_378","unstructured":"Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., and Le, Q. (2021). Program Synthesis with Large Language Models. arXiv."},{"key":"ref_379","unstructured":"(2025, May 14). MovieLens. Available online: https:\/\/grouplens.org\/datasets\/movielens\/."},{"key":"ref_380","doi-asserted-by":"crossref","unstructured":"Boecking, B., Usuyama, N., Bannur, S., Castro, D.C., Schwaighofer, A., Hyland, S., Wetscherek, M., Naumann, T., Nori, A., and Alvarez-Valle, J. (2022). Making the Most of Text Semantics to Improve Biomedical Vision\u2013Language Processing. arXiv.","DOI":"10.1007\/978-3-031-20059-5_1"},{"key":"ref_381","unstructured":"Eric, M., Goel, R., Paul, S., Sethi, A., Agarwal, S., Gao, S., Kumar, A., Goyal, A., Ku, P., and Hakkani-Tur, D. (2020, January 11\u201316). MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines. Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_382","doi-asserted-by":"crossref","unstructured":"Williams, A., Nangia, N., and Bowman, S. (2018, January 1\u20136). A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-1101"},{"key":"ref_383","doi-asserted-by":"crossref","unstructured":"Tao, W., Wang, Y., Shi, E., Du, L., Han, S., Zhang, H., Zhang, D., and Zhang, W. (2021). On the Evaluation of Commit Message Generation Models: An Experimental Study. arXiv.","DOI":"10.1109\/ICSME52107.2021.00018"},{"key":"ref_384","doi-asserted-by":"crossref","unstructured":"Khashabi, D., Chaturvedi, S., Roth, M., Upadhyay, S., and Roth, D. (2018, January 1\u20136). Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-1023"},{"key":"ref_385","unstructured":"Fu, C., Chen, P., Shen, Y., Qin, Y., Zhang, M., Lin, X., Yang, J., Zheng, X., Li, K., and Sun, X. (2023). MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models. arXiv."},{"key":"ref_386","unstructured":"Lin, X.V., Wang, C., Zettlemoyer, L., and Ernst, M.D. (2018, January 7\u201312). NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan."},{"key":"ref_387","unstructured":"Agarwal, M., Chakraborti, T., Fu, Q., Gros, D., Lin, X.V., Maene, J., Talamadupula, K., Teng, Z., and White, J. (2021). NeurIPS 2020 NLC2CMD Competition: Translating Natural Language to Bash Commands. arXiv."},{"key":"ref_388","doi-asserted-by":"crossref","unstructured":"Riedel, S., Yao, L., and McCallum, A. (2010). Modeling relations and their mentions without labeled text. Machine Learning and Knowledge Discovery in Databases, Springer.","DOI":"10.1007\/978-3-642-15939-8_10"},{"key":"ref_389","doi-asserted-by":"crossref","unstructured":"Trischler, A., Wang, T., Yuan, X., Harris, J., Sordoni, A., Bachman, P., and Suleman, K. (2017, January 3). NewsQA: A Machine Comprehension Dataset. Proceedings of the 2nd Workshop on Representation Learning for NLP, Vancouver, BC, Canada.","DOI":"10.18653\/v1\/W17-2623"},{"key":"ref_390","doi-asserted-by":"crossref","unstructured":"Agrawal, H., Desai, K., Wang, Y., Chen, X., Jain, R., Johnson, M., Batra, D., Parikh, D., Lee, S., and Anderson, P. (2018). nocaps: Novel object captioning at scale. arXiv.","DOI":"10.1109\/ICCV.2019.00904"},{"key":"ref_391","unstructured":"Bhattacharya, D., Aronsohn, A., Price, J., and Lo Re, V. (2023). Hepatitis C Guidance 2023 Update: AASLD-IDSA Recommendations for Testing, Managing, and Treating Hepatitis C Virus Infection. Clin. Infect. Dis., ciad319."},{"key":"ref_392","doi-asserted-by":"crossref","unstructured":"Lee, K., Chang, M.W., and Toutanova, K. (2019\u20132, January 28). Latent Retrieval for Weakly Supervised Open Domain Question Answering. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy.","DOI":"10.18653\/v1\/P19-1612"},{"key":"ref_393","unstructured":"Marecek, L., Anthony-Smith, M., and Mathis, A.H. (2020). Prealgebra 2e, OpenStax."},{"key":"ref_394","unstructured":"OpenStreetMap Contributors (2025, May 14). Planet Dump. Available online: https:\/\/planet.osm.org."},{"key":"ref_395","doi-asserted-by":"crossref","unstructured":"Dong, Q., Wan, X., and Cao, Y. (2021, January 19\u201323). ParaSCI: A Large Scientific Paraphrase Dataset for Longer Paraphrase Generation. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online.","DOI":"10.18653\/v1\/2021.eacl-main.33"},{"key":"ref_396","unstructured":"(2025, May 14). PubMed Central (PMC) Full-Text Articles, Available online: https:\/\/www.ncbi.nlm.nih.gov\/pmc\/."},{"key":"ref_397","doi-asserted-by":"crossref","unstructured":"Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, X., and Wen, J.R. (2023, January 6\u201310). Evaluating Object Hallucination in Large Vision-Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.20"},{"key":"ref_398","unstructured":"Smith, S., Patwary, M., Norick, B., LeGresley, P., Rajbhandari, S., Casper, J., Liu, Z., Prabhumoye, S., Zerveas, G., and Korthikanti, V. (2022). Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. arXiv."},{"key":"ref_399","doi-asserted-by":"crossref","first-page":"1098","DOI":"10.1162\/tacl_a_00415","article-title":"PAQ: 65 Million Probably-Asked Questions and What You Can Do with Them","volume":"9","author":"Lewis","year":"2021","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_400","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1038\/s41597-020-0495-6","article-title":"PTB-XL, a large publicly available electrocardiography dataset","volume":"7","author":"Wagner","year":"2020","journal-title":"Sci. Data"},{"key":"ref_401","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1038\/s41597-023-02153-8","article-title":"PTB-XL+, a comprehensive electrocardiographic feature dataset","volume":"10","author":"Strodthoff","year":"2023","journal-title":"Sci. Data"},{"key":"ref_402","unstructured":"Ge, T., Hu, J., Wang, L., Wang, X., Chen, S.Q., and Wei, F. (2023). In-context Autoencoder for Context Compression in a Large Language Model. arXiv."},{"key":"ref_403","unstructured":"Valerio Miceli Barone, A., and Sennrich, R. (2017). A parallel corpus of Python functions and documentation strings for automated code documentation and code generation. arXiv."},{"key":"ref_404","unstructured":"Bahrami, M., Shrikanth, N.C., Ruangwan, S., Liu, L., Mizobuchi, Y., Fukuyori, M., Chen, W.P., Munakata, K., and Menzies, T. (2021). PyTorrent: A Python Library Corpus for Large-scale Language Models. arXiv."},{"key":"ref_405","doi-asserted-by":"crossref","unstructured":"Anantha, R., Vakulenko, S., Tu, Z., Longpre, S., Pulman, S., and Chappidi, S. (2021, January 6\u201311). Open-Domain Question Answering Goes Conversational via Question Rewriting. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online.","DOI":"10.18653\/v1\/2021.naacl-main.44"},{"key":"ref_406","first-page":"8722","article-title":"Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks","volume":"34","author":"Rogers","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_407","doi-asserted-by":"crossref","unstructured":"Pang, R.Y., Parrish, A., Joshi, N., Nangia, N., Phang, J., Chen, A., Padmakumar, V., Ma, J., Thompson, J., and He, H. (2021). QuALITY: Question Answering with Long Input Texts, Yes!. arXiv.","DOI":"10.18653\/v1\/2022.naacl-main.391"},{"key":"ref_408","doi-asserted-by":"crossref","unstructured":"Tafjord, O., Gardner, M., Lin, K., and Clark, P. (2019, January 3\u20137). QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1608"},{"key":"ref_409","doi-asserted-by":"crossref","unstructured":"Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W.t., Choi, Y., Liang, P., and Zettlemoyer, L. (2018\u20134, January 31). QuAC: Question Answering in Context. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-1241"},{"key":"ref_410","doi-asserted-by":"crossref","unstructured":"Hosking, T., and Lapata, M. (2021, January 1\u20136). Factorising Meaning and Form for Intent-Preserving Paraphrasing. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online.","DOI":"10.18653\/v1\/2021.acl-long.112"},{"key":"ref_411","first-page":"5149","article-title":"A Deep Generative Framework for Paraphrase Generation","volume":"32","author":"Gupta","year":"2018","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_412","doi-asserted-by":"crossref","unstructured":"Lai, G., Xie, Q., Liu, H., Yang, Y., and Hovy, E. (2017, January 7\u201311). RACE: Large-scale ReAding Comprehension Dataset From Examinations. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark.","DOI":"10.18653\/v1\/D17-1082"},{"key":"ref_413","unstructured":"(2025, May 14). ParticleMedia. RAGTruth. Available online: https:\/\/github.com\/ParticleMedia\/RAGTruth."},{"key":"ref_414","unstructured":"Zhang, S., Liu, X., Liu, J., Gao, J., Duh, K., and Van Durme, B. (2018). ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension. arXiv."},{"key":"ref_415","doi-asserted-by":"crossref","unstructured":"Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N.A. (2020, January 16\u201320). RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online.","DOI":"10.18653\/v1\/2020.findings-emnlp.301"},{"key":"ref_416","doi-asserted-by":"crossref","unstructured":"V\u00f6lske, M., Potthast, M., Syed, S., and Stein, B. (2017, January 7). TL;DR: Mining Reddit to Learn Automatic Summarization. Proceedings of the Workshop on New Frontiers in Summarization, Copenhagen, Denmark.","DOI":"10.18653\/v1\/W17-4508"},{"key":"ref_417","doi-asserted-by":"crossref","unstructured":"Lin, B.Y., Wu, Z., Yang, Y., Lee, D.H., and Ren, X. (2021, January 1\u20136). RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge. Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online.","DOI":"10.18653\/v1\/2021.findings-acl.131"},{"key":"ref_418","doi-asserted-by":"crossref","unstructured":"Ebner, S., Xia, P., Culkin, R., Rawlins, K., and Van Durme, B. (2020, January 5\u201310). Multi-Sentence Argument Linking. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.718"},{"key":"ref_419","doi-asserted-by":"crossref","unstructured":"Lu, Y., Liu, S., Zhang, Q., and Xie, Z. (2023). RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model. arXiv.","DOI":"10.1109\/ASP-DAC58780.2024.10473904"},{"key":"ref_420","doi-asserted-by":"crossref","unstructured":"Gliwa, B., Mochol, I., Biesek, M., and Wawer, A. (2019, January 4). SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. Proceedings of the 2nd Workshop on New Frontiers in Summarization, Hong Kong, China.","DOI":"10.18653\/v1\/D19-5409"},{"key":"ref_421","unstructured":"Ordonez, V., Kulkarni, G., and Berg, T. (2011, January 12\u201317). Im2Text: Describing Images Using 1 Million Captioned Photographs. Proceedings of the 25th International Conference on Neural Information Processing Systems, Granada Congress and Exhibition Centre, Granada, Spain."},{"key":"ref_422","doi-asserted-by":"crossref","unstructured":"Hudson, D.A., and Manning, C.D. (2019). GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. arXiv.","DOI":"10.1109\/CVPR.2019.00686"},{"key":"ref_423","unstructured":"(2025, May 14). Scoliosis Research Society. Available online: https:\/\/www.srs.org\/."},{"key":"ref_424","unstructured":"Dunn, M., Sagun, L., Higgins, M., Ugur Guney, V., Cirik, V., and Cho, K. (2017). SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. arXiv."},{"key":"ref_425","doi-asserted-by":"crossref","unstructured":"Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N.A., Khashabi, D., and Hajishirzi, H. (2022). Self-Instruct: Aligning Language Models with Self-Generated Instructions. arXiv.","DOI":"10.18653\/v1\/2023.acl-long.754"},{"key":"ref_426","doi-asserted-by":"crossref","unstructured":"Sap, M., Rashkin, H., Chen, D., Le Bras, R., and Choi, Y. (2019, January 3\u20137). Social IQa: Commonsense Reasoning about Social Interactions. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1454"},{"key":"ref_427","doi-asserted-by":"crossref","unstructured":"Kim, H., Hessel, J., Jiang, L., West, P., Lu, X., Yu, Y., Zhou, P., Bras, R., Alikhani, M., and Kim, G. (2023, January 6\u201310). SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.799"},{"key":"ref_428","doi-asserted-by":"crossref","unstructured":"Pasupat, P., and Liang, P. (2015). Compositional Semantic Parsing on Semi-Structured Tables. arXiv.","DOI":"10.3115\/v1\/P15-1142"},{"key":"ref_429","doi-asserted-by":"crossref","unstructured":"Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., and Potts, C. (2013, January 18\u201321). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA.","DOI":"10.18653\/v1\/D13-1170"},{"key":"ref_430","doi-asserted-by":"crossref","unstructured":"Alt, C., Gabryszak, A., and Hennig, L. (2020, January 5\u201310). TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.142"},{"key":"ref_431","unstructured":"Berabi, B., He, J., Raychev, V., and Vechev, M.T. (2021, January 18\u201324). TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_432","unstructured":"Centre for Research on the Epidemiology of Disasters (CRED), and United Nations Office for Disaster Risk Reduction (UNDRR) (2020). The Human Cost of Disasters (2000\u20132019), UNDRR."},{"key":"ref_433","unstructured":"Kocetkov, D., Li, R., Ben Allal, L., Li, J., Mou, C., Mu\u00f1oz Ferrandis, C., Jernite, Y., Mitchell, M., Hughes, S., and Wolf, T. (2022). The Stack: 3 TB of permissively licensed source code. arXiv."},{"key":"ref_434","unstructured":"Zhuang, Y., Yu, Y., Wang, K., Sun, H., and Zhang, C. (2023). ToolQA: A Dataset for LLM Question Answering with External Tools. arXiv."},{"key":"ref_435","doi-asserted-by":"crossref","first-page":"468","DOI":"10.1162\/tacl_a_00471","article-title":"TopiOCQA: Open-domain Conversational Question Answering with Topic Switching","volume":"10","author":"Adlakha","year":"2022","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_436","doi-asserted-by":"crossref","unstructured":"Voorhees, E., Alam, T., Bedrick, S., Demner-Fushman, D., Hersh, W.R., Lo, K., Roberts, K., Soboroff, I., and Wang, L.L. (2020). TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection. arXiv.","DOI":"10.1145\/3451964.3451965"},{"key":"ref_437","doi-asserted-by":"crossref","unstructured":"Qian, H., Liu, Z., Zhang, P., Mao, K., Lian, D., Dou, Z., and Huang, T. (2024). MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation. arXiv.","DOI":"10.1145\/3696410.3714805"},{"key":"ref_438","doi-asserted-by":"crossref","unstructured":"Honovich, O., Scialom, T., Levy, O., and Schick, T. (2023, January 9\u201314). Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.acl-long.806"},{"key":"ref_439","doi-asserted-by":"crossref","unstructured":"Wang, X., Wu, J., Chen, J., Li, L., Wang, Y.F., and Wang, W.Y. (2019). VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. arXiv.","DOI":"10.1109\/ICCV.2019.00468"},{"key":"ref_440","doi-asserted-by":"crossref","unstructured":"Liu, M., Pinckney, N., Khailany, B., and Ren, H. (2023). VerilogEval: Evaluating Large Language Models for Verilog Code Generation. arXiv.","DOI":"10.1109\/ICCAD57390.2023.10323812"},{"key":"ref_441","doi-asserted-by":"crossref","unstructured":"Agrawal, A., Lu, J., Antol, S., Mitchell, M., Zitnick, C.L., Batra, D., and Parikh, D. (2015). VQA: Visual Question Answering. arXiv.","DOI":"10.1007\/s11263-016-0966-6"},{"key":"ref_442","doi-asserted-by":"crossref","unstructured":"Chang, Y., Narang, M., Suzuki, H., Cao, G., Gao, J., and Bisk, Y. (2022, January 18\u201324). WebQA: Multihop and Multimodal QA. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Ernest N. Morial Convention Center, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01600"},{"key":"ref_443","doi-asserted-by":"crossref","unstructured":"Shang, L., Lu, Z., and Li, H. (2015, January 26\u201331). Neural Responding Machine for Short-Text Conversation. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China.","DOI":"10.3115\/v1\/P15-1152"},{"key":"ref_444","doi-asserted-by":"crossref","unstructured":"Cohen, D., Yang, L., and Croft, W.B. (2018). WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval. arXiv.","DOI":"10.1145\/3209978.3210118"},{"key":"ref_445","unstructured":"(2025, May 14). WikiEval. Available online: https:\/\/huggingface.co\/datasets\/explodinggradients\/WikiEval."},{"key":"ref_446","unstructured":"Asai, A., Yu, X., Kasai, J., and Hajishirzi, H. (2021, January 6\u201314). One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval. Proceedings of the 35th International Conference on Neural Information Processing Systems, Online."},{"key":"ref_447","first-page":"8732","article-title":"WinoGrande: An Adversarial Winograd Schema Challenge at Scale","volume":"34","author":"Sakaguchi","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_448","doi-asserted-by":"crossref","unstructured":"Maekawa, S., Iso, H., Gurajada, S., and Bhutani, N. (2024, January 16\u201321). Retrieval Helps or Hurts? A Deeper Dive into the Efficacy of Retrieval Augmentation to Language Models. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico.","DOI":"10.18653\/v1\/2024.naacl-long.308"},{"key":"ref_449","doi-asserted-by":"crossref","unstructured":"Tedeschi, S., Conia, S., Cecconi, F., and Navigli, R. (2021, January 16\u201320). Named Entity Recognition for Entity Linking: What Works and What\u2018s Next. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.findings-emnlp.220"},{"key":"ref_450","unstructured":"Pilehvar, M.T., and Camacho-Collados, J. (2019, January 2\u20137). WiC: The Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_451","doi-asserted-by":"crossref","unstructured":"Liu, A., Swayamdipta, S., Smith, N.A., and Choi, Y. (2022, January 7\u201311). WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.findings-emnlp.508"},{"key":"ref_452","unstructured":"Asghar, N. (2016). Yelp Dataset Challenge: Review Rating Prediction. arXiv."},{"key":"ref_453","unstructured":"(2025, May 14). Yelp. Yelp Open Dataset. Available online: https:\/\/business.yelp.com\/data\/resources\/open-dataset\/."},{"key":"ref_454","doi-asserted-by":"crossref","first-page":"1757","DOI":"10.1021\/ci3001277","article-title":"ZINC: A Free Tool to Discover Chemistry for Biology","volume":"52","author":"Irwin","year":"2012","journal-title":"J. Chem. Inf. Model."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/12\/320\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,16]],"date-time":"2025-12-16T11:20:06Z","timestamp":1765884006000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/12\/320"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,12]]},"references-count":454,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["bdcc9120320"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9120320","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,12]]}}}