{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T20:02:51Z","timestamp":1778702571293,"version":"3.51.4"},"reference-count":30,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T00:00:00Z","timestamp":1726790400000},"content-version":"unspecified","delay-in-days":81,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n\t  <jats:p>Retrieval-augmented generation (RAG) adds a simple but powerful feature to chatbots, the ability to upload files just-in-time. Chatbots are trained on large quantities of public data. The ability to upload files just-in-time makes it possible to reduce hallucinations by filling in gaps in the knowledge base that go beyond the public training data such as private data and recent events. For example, in a customer service scenario, with RAG, we can upload your private bill and then the bot can discuss questions about your bill as opposed to generic FAQ questions about bills in general. This tutorial will show how to upload files and generate responses to prompts; see <jats:uri xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/kwchurch\/RAG\">https:\/\/github.com\/kwchurch\/RAG<\/jats:uri> for multiple solutions based on tools from OpenAI, LangChain, HuggingFace transformers and VecML.<\/jats:p>","DOI":"10.1017\/s1351324924000044","type":"journal-article","created":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T04:32:33Z","timestamp":1726806753000},"page":"870-881","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":23,"title":["Emerging trends: a gentle introduction to RAG"],"prefix":"10.1017","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8378-6069","authenticated-orcid":false,"given":"Kenneth Ward","family":"Church","sequence":"first","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiameng","family":"Sun","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-4135-5629","authenticated-orcid":false,"given":"Richard","family":"Yue","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0479-7363","authenticated-orcid":false,"given":"Peter","family":"Vickers","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Walid","family":"Saba","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2755-0745","authenticated-orcid":false,"given":"Raman","family":"Chandrasekar","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2024,9,20]]},"reference":[{"key":"S1351324924000044_ref18","unstructured":"Nasr, M. , Carlini, N. , Hayase, J. , Jagielski, M. , Cooper, A.F. , Ippolito, D. , Choquette-Choo, C.A. , Wallace, E. , Tram\u00e8r, F. and Lee, K. (2023). Scalable extraction of training data from (production) language models. arXiv preprint arXiv: 2311.17035."},{"key":"S1351324924000044_ref3","first-page":"22","article-title":"Word association norms, mutual information, and lexicography","volume":"16","author":"Church","year":"1990","journal-title":"Computational Linguistics"},{"key":"S1351324924000044_ref9","first-page":"41","volume-title":"Syntax and Semantics","volume":"3","author":"Grice","year":"1975"},{"key":"S1351324924000044_ref13","unstructured":"Lewis, P. , Perez, E. , Piktus, A. , Petroni, F. , Karpukhin, V. , Goyal, N. , K\u00fcttler, H. , Lewis, M. , Yih, W.-t. , Rockt\u00e4schel, T. , Riedel, S. and Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, 33, pp. 9459\u20139474."},{"key":"S1351324924000044_ref14","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.397"},{"key":"S1351324924000044_ref28","unstructured":"Zhao, P. , Zhang, H. , Yu, Q. , Wang, Z. , Geng, Y. , Fu, F. , Yang, L. , Zhang, W. and Cui, B. (2024). Retrieval-augmented generation for ai-generated content: a survey. ArXiv preprint abs\/2402.19473."},{"key":"S1351324924000044_ref24","unstructured":"Wei, J. , Wang, X. , Schuurmans, D. , Bosma, M. , Xia, F. , Chi, E. , Le, Q.V. and Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, 35, pp. 24824\u201324837."},{"key":"S1351324924000044_ref27","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.256"},{"key":"S1351324924000044_ref17","first-page":"403","volume-title":"Semantic Information Processing","author":"McCarthy","year":"1969"},{"key":"S1351324924000044_ref19","doi-asserted-by":"crossref","unstructured":"Petroni, F. , Piktus, A. , Fan, A. , Lewis, P. , Yazdani, M. , De Cao, N. , Thorne, J. , Jernite, Y. , Karpukhin, V. , Maillard, J. , Plachouras, V. , Rockt\u00e4schel, T. and Riedel, S. (2021). KILT: a benchmark for knowledge intensive language tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp. 2523\u20132544, Online.","DOI":"10.18653\/v1\/2021.naacl-main.200"},{"key":"S1351324924000044_ref4","doi-asserted-by":"publisher","DOI":"10.1145\/3626772.3657834"},{"key":"S1351324924000044_ref21","unstructured":"Rosenthal, S. , Sil, A. , Florian, R. and Roukos, S. (2024). CLAPNQ: cohesive long-form answers from passages in natural questions for rag systems. ArXiv, abs\/2404.02103."},{"key":"S1351324924000044_ref8","unstructured":"Gao, Y. , Xiong, Y. , Gao, X. , Jia, K. , Pan, J. , Bi, Y. , Dai, Y. , Sun, J. and Wang, H. (2023). Retrieval-augmented generation for large language models: a survey. arXiv preprint arXiv:2312.10997."},{"key":"S1351324924000044_ref29","volume-title":"Attention and Performance V","author":"Meyer"},{"key":"S1351324924000044_ref7","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.eacl-demo.16"},{"key":"S1351324924000044_ref16","unstructured":"Lyu, Y. , Li, Z. , Niu, S. , Xiong, F. , Tang, B. , Wang, W. , Wu, H. , Liu, H. , Xu, T. and Chen, E. (2024). CRUD-RAG: a comprehensive chinese benchmark for retrieval-augmented generation of large language models. ArXiv preprint abs\/2401.17043."},{"key":"S1351324924000044_ref20","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"S1351324924000044_ref2","doi-asserted-by":"crossref","unstructured":"Chen, L. , Zaharia, M. and Zou, J. (2023). How is ChatGPT\u2019s behavior changing over time? arXiv preprint arXiv:2307.09009.","DOI":"10.1162\/99608f92.5317da47"},{"key":"S1351324924000044_ref26","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.372"},{"key":"S1351324924000044_ref15","doi-asserted-by":"crossref","unstructured":"Lin, D. (1998). Automatic retrieval and clustering of similar words. In COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics.","DOI":"10.3115\/980432.980696"},{"key":"S1351324924000044_ref22","doi-asserted-by":"crossref","unstructured":"Saad-Falcon, J. , Khattab, O. , Potts, C. and Zaharia, M. (2023). ARES: an automated evaluation framework for retrieval-augmented generation systems. ArXiv preprint abs\/2311.09476.","DOI":"10.18653\/v1\/2024.naacl-long.20"},{"key":"S1351324924000044_ref1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i16.29728"},{"key":"S1351324924000044_ref10","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"S1351324924000044_ref23","unstructured":"Sander, D.P. and Dietz, L. (2021). EXAM: how to evaluate retrieve-and-generate systems for users who do not (yet) know what they want. In Biennial Conference on Design of Experimental Search & Information Retrieval Systems."},{"key":"S1351324924000044_ref6","unstructured":"Douze, M. , Guzhva, A. , Deng, C. , Johnson, J. , Szilvasy, G. , Mazar\u00e9, P.-E. , Lomeli, M. , Hosseini, L. and J\u00e9gou, H. (2024). The FAISS library. arXiv:2401.08281."},{"key":"S1351324924000044_ref11","unstructured":"Kautz, H.A. and Allen, J.F. (1986). Generalized plan recognition. In AAAI Conference on Artificial Intelligence, Philadelphia, PA."},{"key":"S1351324924000044_ref30","unstructured":"Palermo, D. and Jenkins, J. (1964). Word Association Norms. Minneapolis, MN: University of Minnesota Press."},{"key":"S1351324924000044_ref5","doi-asserted-by":"publisher","DOI":"10.1093\/jla\/laae003"},{"key":"S1351324924000044_ref25","unstructured":"Wu, Y. , Zhu, J. , Xu, S. , Shum, K. , Niu, C. , Zhong, R. , Song, J. and Zhang, T. (2023). Ragtruth: a hallucination corpus for developing trustworthy retrieval-augmented language models. ArXiv preprint abs\/2401.00396."},{"key":"S1351324924000044_ref12","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00276"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324924000044","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T14:28:23Z","timestamp":1759847303000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324924000044\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,11]]}},"alternative-id":["S1351324924000044"],"URL":"https:\/\/doi.org\/10.1017\/s1351324924000044","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"\u00a9 The Author(s), 2024. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}