{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,26]],"date-time":"2026-06-26T03:00:41Z","timestamp":1782442841528,"version":"3.54.5"},"reference-count":42,"publisher":"Cambridge University Press (CUP)","issue":"2","license":[{"start":{"date-parts":[[2024,1,16]],"date-time":"2024-01-16T00:00:00Z","timestamp":1705363200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2024,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Usage of large language models and chat bots will almost surely continue to grow, since they are so easy to use, and so (incredibly) credible. I would be more comfortable with this reality if we encouraged more evaluations with humans-in-the-loop to come up with a better characterization of when the machine can be trusted and when humans should intervene. This article will describe a homework assignment, where I asked my students to use tools such as chat bots and web search to write a number of essays. Even after considerable discussion in class on hallucinations, many of the essays were full of misinformation that should have been fact-checked. Apparently, it is easier to believe ChatGPT than to be skeptical. Fact-checking and web search are too much trouble.<\/jats:p>","DOI":"10.1017\/s1351324923000578","type":"journal-article","created":{"date-parts":[[2024,1,16]],"date-time":"2024-01-16T10:05:41Z","timestamp":1705399541000},"page":"417-427","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":19,"title":["Emerging trends: When can users trust GPT, and when should they intervene?"],"prefix":"10.1017","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8378-6069","authenticated-orcid":false,"given":"Kenneth","family":"Church","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"56","published-online":{"date-parts":[[2024,1,16]]},"reference":[{"key":"S1351324923000578_ref39","unstructured":"Wang, J. , Hu, X. , Hou, W. , Chen, H. , Zheng, R. , Wang, Y. , Yang, L. , Huang, H. , Ye, W. , Geng, X. , Jiao, B. , Zhang, Y. and Xie, X. (2023). On the robustness of chatgpt: an adversarial and out-of-distribution perspective. ArXiv, abs\/2302.12095."},{"key":"S1351324923000578_ref9","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1017\/S1351324922000481","article-title":"Emerging trends: unfair, biased, addictive, dangerous, deadly, and insanely profitable","volume":"29","author":"Church","year":"2023","journal-title":"Natural Language Engineering"},{"key":"S1351324923000578_ref8","volume-title":"Aspects of the Theory of Syntax","author":"Chomsky","year":"1965"},{"key":"S1351324923000578_ref30","doi-asserted-by":"crossref","unstructured":"Morris, J. , Lifland, E. , Yoo, J.Y. , Grigsby, J. , Jin, D. and Qi, Y. (2020). TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, pp. 119\u2013126. Online.","DOI":"10.18653\/v1\/2020.emnlp-demos.16"},{"key":"S1351324923000578_ref40","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"S1351324923000578_ref6","unstructured":"Chia, Y.K. , Hong, P. , Bing, L. and Poria, S. (2023). Instructeval: towards holistic evaluation of instruction-tuned large language models. arXiv preprint arXiv:2306.04757."},{"key":"S1351324923000578_ref16","unstructured":"Frohberg, J. and Binder, F. (2022). CRASS: a novel data set and benchmark to test counterfactual reasoning of large language models. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France. European Language Resources Association, pp. 2126\u20132140."},{"key":"S1351324923000578_ref36","article-title":"Beyond the imitation game: quantifying and extrapolating the capabilities of language models","author":"Srivastava","year":"2023","journal-title":"Transactions on Machine Learning Research"},{"key":"S1351324923000578_ref41","unstructured":"Ziegler, D.M. , Nix, S. , Chan, L. , Bauman, T. , Schmidt-Nielsen, P. , Lin, T. , Scherlis, A. , Nabeshima, N. , Weinstein-Raun, B. , Haas, D. , Shlegeris, B. and Thomas, N. (2022). Adversarial training for high-stakes reliability. ArXiv, abs\/2205.01663."},{"key":"S1351324923000578_ref14","first-page":"178","article-title":"Preference semantics, ill-formedness, and metaphor","volume":"9","author":"Fass","year":"1983","journal-title":"American Journal of Computational Linguistics"},{"key":"S1351324923000578_ref37","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1073\/pnas.1517441113","article-title":"The spreading of misinformation online","volume":"113","author":"Vicario","year":"2016","journal-title":"Proceedings of The National Academy of Sciences of The United States of America"},{"key":"S1351324923000578_ref21","doi-asserted-by":"crossref","unstructured":"Jia, R. and Liang, P. (2017). Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. Association for Computational Linguistics, pp. 2021\u20132031.","DOI":"10.18653\/v1\/D17-1215"},{"key":"S1351324923000578_ref10","doi-asserted-by":"crossref","first-page":"1402","DOI":"10.1017\/S1351324923000463","article-title":"Emerging trends: smooth-talking machines","volume":"29","author":"Church","year":"2023","journal-title":"Natural Language Engineering"},{"key":"S1351324923000578_ref2","doi-asserted-by":"crossref","first-page":"112884","DOI":"10.1016\/j.cma.2020.112884","article-title":"Error indicators for incompressible darcy flow problems using enhanced velocity mixed finite element method","volume":"363","author":"Amanbek","year":"2020","journal-title":"Computer Methods in Applied Mechanics and Engineering"},{"key":"S1351324923000578_ref3","doi-asserted-by":"crossref","unstructured":"Camburu, O.-M. , Shillingford, B. , Minervini, P. , Lukasiewicz, T. and Blunsom, P. (2020). Make up your mind! adversarial generation of inconsistent natural language explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 4157\u20134165, Online.","DOI":"10.18653\/v1\/2020.acl-main.382"},{"key":"S1351324923000578_ref29","doi-asserted-by":"crossref","unstructured":"Mohammad, S. , Shutova, E. and Turney, P. (2016). Metaphor as a medium for emotion: An empirical study. In Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, Berlin, Germany. Association for Computational Linguistics, pp. 23\u201333.","DOI":"10.18653\/v1\/S16-2003"},{"key":"S1351324923000578_ref28","first-page":"2","volume-title":"Perceptron: An Introduction to Computational Geometry","volume":"19","author":"Minsky","year":"1969"},{"key":"S1351324923000578_ref15","doi-asserted-by":"crossref","first-page":"102524","DOI":"10.1016\/j.ipm.2021.102524","article-title":"How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?","volume":"58","author":"Fortuna","year":"2021","journal-title":"Information Processing and Management"},{"key":"S1351324923000578_ref31","volume-title":"Imperial Twilight: The Opium War and the End of China\u2019s Last Golden Age","author":"Platt","year":"2019"},{"key":"S1351324923000578_ref34","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1145\/3137597.3137600","article-title":"Fake news detection on social media: a data mining perspective","volume":"19","author":"Shu","year":"2017","journal-title":"ACM SIGKDD Explorations Newsletter"},{"key":"S1351324923000578_ref18","doi-asserted-by":"crossref","unstructured":"Gedigian, M. , Bryant, J. , Narayanan, S. and Ciric, B. (2006). Catching metaphors. In Proceedings of the Third Workshop on Scalable Natural Language Understanding, New York City, New York. Association for Computational Linguistics, pp. 41\u201348.","DOI":"10.3115\/1621459.1621467"},{"key":"S1351324923000578_ref42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3161603","article-title":"Detection and resolution of rumours in social media","volume":"51","author":"Zubiaga","year":"2017","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"S1351324923000578_ref22","doi-asserted-by":"crossref","unstructured":"Krishnakumaran, S. and Zhu, X. (2007). Hunting elusive metaphors using lexical resources. In Proceedings of the Workshop on Computational Approaches to Figurative Language, Rochester, New York. Association for Computational Linguistics, pp. 13\u201320.","DOI":"10.3115\/1611528.1611531"},{"key":"S1351324923000578_ref7","doi-asserted-by":"crossref","DOI":"10.1515\/9783112316009","volume-title":"Syntactic Structures","author":"Chomsky","year":"1957"},{"key":"S1351324923000578_ref23","volume-title":"Women, Fire, and Dangerous Things: What Categories Reveal About the Mind","author":"Lakoff","year":"2008"},{"key":"S1351324923000578_ref1","doi-asserted-by":"crossref","first-page":"39","DOI":"10.5120\/16754-7073","article-title":"A survey on nearest neighbor search methods","volume":"95","author":"Abbasifard","year":"2014","journal-title":"International Journal of Computer Applications"},{"key":"S1351324923000578_ref12","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1017\/S1351324920000601","article-title":"Gpt-3: what\u2019s it good for?","volume":"27","author":"Dale","year":"2021","journal-title":"Natural Language Engineering"},{"key":"S1351324923000578_ref32","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1007\/s10579-020-09502-8","article-title":"Resources and benchmark corpora for hate speech detection: a systematic review","volume":"55","author":"Poletto","year":"2020","journal-title":"Language Resources and Evaluation"},{"key":"S1351324923000578_ref5","unstructured":"Chen, M. , Tworek, J. , Jun, H. , Yuan, Q. , de Oliveira Pinto, H.P. , Kaplan, J. , Edwards, H. , Burda, Y. , Joseph, N. , Brockman, G. , Ray, A. , Puri, R. , Krueger, G. , Petrov, M. , Khlaaf, H. , Sastry, G. , Mishkin, P. , Chan, B. , Gray, S. , Ryder, N. , Pavlov, M. , Power, A. , Kaiser, L. , Bavarian, M. , Winter, C. , Tillet, P. , Such, F. P. , Cummings, D. , Plappert, M. , Chantzis, F. , Barnes, E. , Herbert-Voss, A. , Guss, W.H. , Nichol, A. , Paino, A. , Tezak, N. , Tang, J. , Babuschkin, I. , Balaji, S. , Jain, S. , Saunders, W. , Hesse, C. , Carr, A.N. , Leike, J. , Achiam, J. , Misra, V. , Morikawa, E. , Radford, A. , Knight, M. , Brundage, M. , Murati, M. , Mayer, K. , Welinder, P. , McGrew, B. , Amodei, D. , McCandlish, S. , Sutskever, I. and Zaremba, W. (2021). Evaluating large language models trained on code."},{"key":"S1351324923000578_ref13","unstructured":"Dua, D. , Wang, Y. , Dasigi, P. , Stanovsky, G. , Singh, S. and Gardner, M. (2019). DROP: a reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 2368\u20132378."},{"key":"S1351324923000578_ref11","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2023301118","article-title":"The echo chamber effect on social media","volume":"118","author":"Cinelli","year":"2021","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"S1351324923000578_ref25","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive NLP tasks","volume":"33","author":"Lewis","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"S1351324923000578_ref24","volume-title":"Metaphors We Live by","author":"Lakoff","year":"2008"},{"key":"S1351324923000578_ref19","unstructured":"Hendrycks, D. , Burns, C. , Basart, S. , Zou, A. , Mazeika, M. , Song, D. and Steinhardt, J. (2020). Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300."},{"key":"S1351324923000578_ref17","doi-asserted-by":"crossref","unstructured":"Gale, W.A. and Church, K.W. (1991). Identifying word correspondences in parallel texts. In Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991.","DOI":"10.3115\/112405.112428"},{"key":"S1351324923000578_ref27","volume-title":"A Computational Model of Metaphor Interpretation","author":"Martin","year":"1990"},{"key":"S1351324923000578_ref20","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1007\/978-3-642-58146-5_3","volume-title":"Communication From an Artificial Intelligence Perspective: Theoretical and Applied Issues","author":"Hobbs","year":"1992"},{"key":"S1351324923000578_ref26","doi-asserted-by":"crossref","unstructured":"Li, D. , Zhang, Y. , Peng, H. , Chen, L. , Brockett, C. , Sun, M.-T. and Dolan, B. (2021). Contextualized perturbation for textual adversarial attack. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 5053\u20135069. Online.","DOI":"10.18653\/v1\/2021.naacl-main.400"},{"key":"S1351324923000578_ref4","doi-asserted-by":"crossref","unstructured":"Carbonell, J.G. (1980). Metaphor - a key to extensible semantic analysis. In 18th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics, pp. 17\u201321.","DOI":"10.3115\/981436.981441"},{"key":"S1351324923000578_ref33","unstructured":"Qazvinian, V. , Rosengren, E. , Radev, D.R. and Mei, Q. (2011). Rumor has it: identifying misinformation in microblogs. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK. Association for Computational Linguistics, pp. 1589\u20131599."},{"key":"S1351324923000578_ref38","unstructured":"Wang, B. , Xu, C. , Wang, S. , Gan, Z. , Cheng, Y. , Gao, J. , Awadallah, A.H. and Li, B. (2021). Adversarial glue: a multi-task benchmark for robustness evaluation of language models. ArXiv, abs\/2111.02840."},{"key":"S1351324923000578_ref35","unstructured":"Shutova, E. (2010). Models of metaphor in NLP. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden. Association for Computational Linguistics, pp. 688\u2013697."}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324923000578","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,1]],"date-time":"2024-04-01T13:29:10Z","timestamp":1711978150000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324923000578\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,16]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,3]]}},"alternative-id":["S1351324923000578"],"URL":"https:\/\/doi.org\/10.1017\/s1351324923000578","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,16]]},"assertion":[{"value":"\u00a9 The Author(s), 2024. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}