{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T19:35:40Z","timestamp":1776368140864,"version":"3.51.2"},"reference-count":82,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T00:00:00Z","timestamp":1739145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Interact. Intell. Syst."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>\n            Large language models (LLMs) match and sometimes exceed human performance in many domains. This study explores the potential of LLMs to augment human judgment in a forecasting task. We evaluate the effect on human forecasters of two LLM assistants: one designed to provide high-quality (\u201csuperforecasting\u201d) advice, and the other designed to be overconfident and base-rate neglecting, thus providing noisy forecasting advice. We compare participants using these assistants to a control group that received a less advanced model that did not provide numerical predictions or engage in explicit discussion of predictions. Participants (\n            <jats:italic>N<\/jats:italic>\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(=\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            991) answered a set of six forecasting questions and had the option to consult their assigned LLM assistant throughout. Our preregistered analyses show that interacting with each of our frontier LLM assistants significantly enhances prediction accuracy by between 24% and 28% compared to the control group. Exploratory analyses showed a pronounced outlier effect in one forecasting item, without which we find that the superforecasting assistant increased accuracy by 41%, compared with 29% for the noisy assistant. We further examine whether LLM forecasting augmentation disproportionately benefits less skilled forecasters, degrades the wisdom-of-the-crowd by reducing prediction diversity, or varies in effectiveness with question difficulty. Our data do not consistently support these hypotheses. Our results suggest that access to a frontier LLM assistant, even a noisy one, can be a helpful decision aid in cognitively demanding tasks compared to a less powerful model that does not provide specific forecasting advice. However, the effects of outliers suggest that further research into the robustness of this pattern is needed.\n          <\/jats:p>","DOI":"10.1145\/3707649","type":"journal-article","created":{"date-parts":[[2024,12,13]],"date-time":"2024-12-13T12:53:08Z","timestamp":1734094388000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9930-487X","authenticated-orcid":false,"given":"Philipp","family":"Schoenegger","sequence":"first","affiliation":[{"name":"LSE, London, United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6532-0529","authenticated-orcid":false,"given":"Peter S.","family":"Park","sequence":"additional","affiliation":[{"name":"MIT, Cambridge, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8035-8239","authenticated-orcid":false,"given":"Ezra","family":"Karger","sequence":"additional","affiliation":[{"name":"Federal Reserve Bank of Chicago, Chicago, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6003-3731","authenticated-orcid":false,"given":"Sean","family":"Trott","sequence":"additional","affiliation":[{"name":"University of California San Diego, San Diego, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6535-530X","authenticated-orcid":false,"given":"Philip E.","family":"Tetlock","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, Philadelphia, PA, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,2,10]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1093\/oxfordhb\/9780197579329.013.65"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.3386\/w31422"},{"issue":"3","key":"e_1_3_2_4_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3672277","article-title":"ID.8: Co-creating visual stories with generative AI","volume":"14","author":"Antony Victor Nikhil","year":"2023","unstructured":"Victor Nikhil Antony and Chien-Ming Huang. 2023. ID.8: Co-creating visual stories with generative AI. ACM Transactions on Interactive Intelligent Systems 14, 3 (2023), 1\u201329.","journal-title":"ACM Transactions on Interactive Intelligent Systems"},{"key":"e_1_3_2_5_2","unstructured":"Sanjeev Arora and Anirudh Goyal. 2023. A theory for emergence of complex skills in language models. arXiv:2307.15936. Retrieved from https:\/\/arxiv.org\/abs\/2307.15936"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2015.2374"},{"key":"e_1_3_2_7_2","author":"Atari Mohammad","year":"2023","unstructured":"Mohammad Atari, Mona J. Xue, Peter S. Park, Dami\u00e1n Blasi, and Joseph Henrich. 2023. Which Humans? Working Paper.","journal-title":"Which Humans?"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1002\/aaai.12085"},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","unstructured":"David Rhys Bernard and Philipp Schoenegger. 2024. Forecasting Long-Run Causal Effects. Retrieved from https:\/\/ssrn.com\/abstract=4702393","DOI":"10.2139\/ssrn.4702393"},{"key":"e_1_3_2_11_2","unstructured":"Stella Biderman U. S. V. S. N. Sai Prashanth Lintang Sutawika Hailey Schoelkopf Quentin Anthony Shivanshu Purohit and Edward Raff. 2023. Emergent and predictable memorization in large language models. arXiv:2304.11158. Retrieved from https:\/\/arxiv.org\/abs\/2304.11158"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.3386\/w31161"},{"key":"e_1_3_2_13_2","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott Lundberg et al. 2023. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712. Retrieved from https:\/\/arxiv.org\/abs\/2303.12712"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2014.1909"},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations (ICLR \u201923)","author":"Carlini Nicholas","year":"2023","unstructured":"Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tram\u00e8r, and Chiyuan Zhang. 2023. Quantifying memorization across neural language models. In Proceedings of the 11th International Conference on Learning Representations (ICLR \u201923). OpenReview.net. Retrieved from https:\/\/openreview.net\/pdf?id=TatRHT_1cK"},{"issue":"5","key":"e_1_3_2_16_2","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1017\/S1930297500004599","article-title":"Developing expert political judgment: The impact of training and practice on judgmental accuracy in geopolitical forecasting tournaments","volume":"11","author":"Chang Welton","year":"2016","unstructured":"Welton Chang, Eva Chen, Barbara Mellers, and Philip Tetlock. 2016. Developing expert political judgment: The impact of training and practice on judgmental accuracy in geopolitical forecasting tournaments. Judgment and Decision Making 11, 5 (2016), 509\u2013526.","journal-title":"Judgment and Decision Making"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3651990"},{"key":"e_1_3_2_18_2","unstructured":"Wei-Lin Chiang Lianmin Zheng Ying Sheng Anastasios Nikolas Angelopoulos Tianle Li Dacheng Li Hao Zhang Banghua Zhu Michael Jordan Joseph E. Gonzalez et al. 2024. Chatbot arena: An open platform for evaluating LLMs by Human preference. arXiv:2403.04132. Retrieved from https:\/\/arxiv.org\/abs\/2403.04132"},{"key":"e_1_3_2_19_2","unstructured":"Alexander S. Choi Syeda Sabrina Akter J. P. Singh and Antonios Anastasopoulos. 2024. The LLM effect: Are humans truly using LLMs or are they being influenced by them instead? arXiv:2410.04699. Retrieved from https:\/\/arxiv.org\/abs\/2410.04699"},{"key":"e_1_3_2_20_2","article-title":"AI assistance in Legal analysis: An empirical study","volume":"73","author":"Choi Jonathan H.","year":"2024","unstructured":"Jonathan H. Choi and Daniel Schwarcz. 2024. AI assistance in Legal analysis: An empirical study. Journal of Legal Education 73 (2024). Forthcoming.","journal-title":"Journal of Legal Education"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"915","DOI":"10.1007\/s40593-023-00372-z","article-title":"Can ChatGPT pass high school exams on English language comprehension?","volume":"34","author":"Winter Joost C. F. de","year":"2023","unstructured":"Joost C. F. de Winter. 2023. Can ChatGPT pass high school exams on English language comprehension? International Journal of Artificial Intelligence in Education 34 (2023), 915\u2013930.","journal-title":"International Journal of Artificial Intelligence in Education"},{"key":"e_1_3_2_22_2","first-page":"24","volume-title":"Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality","author":"Dell\u2019Acqua Fabrizio","year":"2023","unstructured":"Fabrizio Dell\u2019Acqua, Edward McFowland, Ethan R. Mollick, Hila Lifshitz-Assaf, Katherine Kellogg, Saran Rajendran, Lisa Krayer, Fran\u00e7ois Candelon, and Karim R. Lakhani. 2023. Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Harvard Business School Technology & Operations Mgt. Unit Working Paper 24\u2013013."},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","unstructured":"Anil R. Doshi and Oliver Hauser. 2023. Generative artificial intelligence enhances creativity. Retrieved from https:\/\/ssrn.com\/abstract=4535536","DOI":"10.2139\/ssrn.4535536"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0279720"},{"key":"e_1_3_2_25_2","unstructured":"Deborah Etsenake and Meiyappan Nagappan. 2024. Understanding the human-LLM dynamic: A literature survey of LLM use in programming tasks. arXiv:2410.01026. Retrieved from https:\/\/arxiv.org\/abs\/2410.01026"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.57020\/ject.1297961"},{"key":"e_1_3_2_27_2","unstructured":"Mohammad Fraiwan and Natheer Khasawneh. 2023. A review of ChatGPT applications in education marketing software engineering and healthcare: Benefits drawbacks and research directions. arXiv:2305.00237. Retrieved from https:\/\/arxiv.org\/abs\/2305.00237"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613905.3650786"},{"issue":"2","key":"e_1_3_2_29_2","first-page":"117","article-title":"The impact of AI language models on the future of White-Collar jobs: A comparative study of Job projections in developed and developing countries","volume":"2","author":"George A. Shaji","year":"2023","unstructured":"A. Shaji George and T. Baskar. 2023. The impact of AI language models on the future of White-Collar jobs: A comparative study of Job projections in developed and developing countries. Partners Universal International Research Journal 2, 2 (2023), 117\u2013135.","journal-title":"Partners Universal International Research Journal"},{"key":"e_1_3_2_30_2","unstructured":"Tanya Goyal Junyi Jessy Li and Greg Durrett. 2023. News summarization and evaluation in the era of GPT-3. arXiv:2209.12356. Retrieved from https:\/\/arxiv.org\/abs\/2209.12356"},{"issue":"3","key":"e_1_3_2_31_2","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1039\/C1RP90069B","article-title":"A continuum of learning: From rote memorization to meaningful learning in organic chemistry","volume":"13","author":"Grove Nathaniel P.","year":"2012","unstructured":"Nathaniel P. Grove and Stacey Lowery Bretz. 2012. A continuum of learning: From rote memorization to meaningful learning in organic chemistry. Chemistry Education Research and Practice 13, 3 (2012), 201\u2013208.","journal-title":"Chemistry Education Research and Practice"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643894"},{"key":"e_1_3_2_33_2","unstructured":"Danny Halawi Fred Zhang Chen Yueh-Han and Jacob Steinhardt. 2024. Approaching Human-level forecasting with language models. arXiv:2402.18563. Retrieved from https:\/\/arxiv.org\/abs\/2402.18563"},{"key":"e_1_3_2_34_2","unstructured":"Julian Hazell. 2023. Spear phishing with large language models. arXiv:2305.06972. Retrieved from https:\/\/arxiv.org\/abs\/2305.06972"},{"key":"e_1_3_2_35_2","unstructured":"Fredrik Heiding Bruce Schneier Arun Vishwanath and Jeremy Bernstein. 2023. Devising and detecting phishing: Large language models vs. smaller human models. arXiv:2308.12287. Retrieved from https:\/\/arxiv.org\/abs\/2308.12287"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581641.3584052"},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1007\/978-3-031-30085-1_8","volume-title":"Judgment in Predictive Analytics","author":"Himmelstein Mark","year":"2023","unstructured":"Mark Himmelstein, David V. Budescu, and Ying Han. 2023. The wisdom of timely crowds. In Judgment in Predictive Analytics. Springer, 215\u2013242."},{"key":"e_1_3_2_38_2","unstructured":"Mark Himmelstein Sophie Ma Zhu Nikolay Petrov Ezra Karger Jessica Helmer Sivan Livnat Page Headley Amory Bennett and Philip E. Tetlock. 2024. The forecasting proficiency test: A practical forecaster evaluation tool."},{"key":"e_1_3_2_39_2","unstructured":"Wenxiang Jiao Wenxuan Wang Jen tse Huang Xing Wang and Zhaopeng Tu. 2023. Is ChatGPT a good translator? Yes with GPT-4 as the engine. arXiv:2301.08745. Retrieved from https:\/\/arxiv.org\/abs\/2301.08745"},{"key":"e_1_3_2_40_2","unstructured":"Ming Jin Shiyu Wang Lintao Ma Zhixuan Chu James Y. Zhang Xiaoming Shi Pin-Yu Chen Yuxuan Liang Yuan-Fang Li Shirui Pan et al. 2023. Time-llm: Time series forecasting by reprogramming large language models. arXiv:2310.01728. Retrieved from https:\/\/arxiv.org\/abs\/2310.01728"},{"key":"e_1_3_2_41_2","unstructured":"Cameron R. Jones and Benjamin K. Bergen. 2024. People cannot distinguish GPT-4 from a Human in a Turing Test. arXiv:2405.08007. Retrieved from https:\/\/arxiv.org\/abs\/2405.08007"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Ezra Karger Pavel D. Atanasov and Philip Tetlock. 2022. Improving Judgments of Existential Risk: Better Forecasts Questions Explanations Policies. Retrieved from https:\/\/ssrn.com\/abstract=4001628","DOI":"10.2139\/ssrn.4001628"},{"issue":"2","key":"e_1_3_2_43_2","first-page":"16","article-title":"The chess master and the computer","volume":"57","author":"Kasparov Garry","year":"2010","unstructured":"Garry Kasparov. 2010. The chess master and the computer. The New York Review of Books 57, 2 (2010), 16\u201319.","journal-title":"The New York Review of Books"},{"key":"e_1_3_2_44_2","unstructured":"Daniel Martin Katz Michael James Bommarito Shang Gao and Pablo Arredondo. 2023. GPT-4 Passes the Bar Exam. Retrieved from https:\/\/ssrn.com\/abstract=4389233"},{"key":"e_1_3_2_45_2","unstructured":"Megan Kinniment Lucas Jun Koba Sato Haoxing Du Brian Goodrich Max Hasin Lawrence Chan Luke Harold Miles Tao R. Lin Hjalmar Wijk Joel Burget et al. 2023. Evaluating language-model agents on realistic autonomous tasks. arXiv:2312.11671. Retrieved from https:\/\/arxiv.org\/abs\/2312.11671"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-short.18"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1037\/a0036677"},{"key":"e_1_3_2_48_2","unstructured":"Thomas McAndrew Maimuna S. Majumder Andrew A. Lover Srini Venkatramanan Paolo Bocchini Tamay Besiroglu Allison Codi Gaia Dempsey Sam Abbott Sylvain Chevalier et al. 2024. Assessing human judgment forecasts in the rapid spread of the mpox outbreak: Insights and challenges for pandemic preparedness. arXiv:2404.14686. Retrieved from https:\/\/arxiv.org\/abs\/2404.14686"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3670691"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1037\/xap0000040"},{"key":"e_1_3_2_51_2","unstructured":"Jordy Meow. 2024. AI Engine. Retrieved from https:\/\/wordpress.org\/plugins\/ai-engine\/"},{"key":"e_1_3_2_52_2","unstructured":"Humza Naveed Asad Ullah Khan Shi Qiu Muhammad Saqib Saeed Anwar Muhammad Usman Nick Barnes and Ajmal Mian. 2023. A comprehensive Overview of Large Language Models. Retrieved from https:\/\/github.com\/humza909\/LLM_Survey.git"},{"key":"e_1_3_2_53_2","unstructured":"Harsha Nori Nicholas King Scott Mayer McKinney Dean Carignan and Eric Horvitz. 2023. Capabilities of GPT-4 on medical challenge problems. arXiv:2303.13375. Retrieved from https:\/\/arxiv.org\/abs\/2303.13375"},{"key":"e_1_3_2_54_2","unstructured":"Shakked Noy and Whitney Zhang. 2023. Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. Retrieved from https:\/\/ssrn.com\/abstract=4375283"},{"key":"e_1_3_2_55_2","unstructured":"OpenAI. 2023. New models and developer products announced at DevDay. Retrieved from https:\/\/help.openai.com\/en\/articles\/8555510-gpt-4-turbo."},{"key":"e_1_3_2_56_2","unstructured":"OpenAI. 2024. Models - OpenAI API. Retrieved July 25 2024 from https:\/\/platform.openai.com\/docs\/models."},{"key":"e_1_3_2_57_2","unstructured":"Peter S. Park Simon Goldstein Aidan O\u2019Gara Michael Chen and Dan Hendrycks. 2023. AI deception: A survey of examples risks and potential solutions. arXiv:2308.14752. Retrieved from https:\/\/arxiv.org\/abs\/2308.14752"},{"key":"e_1_3_2_58_2","unstructured":"Peter S. Park and Max Tegmark. 2023. Divide-and-conquer dynamics in AI-driven disempowerment. arXiv:2310.06009. Retrieved from https:\/\/arxiv.org\/abs\/2310.06009"},{"key":"e_1_3_2_59_2","unstructured":"Max Peeperkorn Tom Kouwenhoven Dan Brown and Anna Jordanous. 2024. Is temperature the Creativity parameter of large language models? arXiv:2405.00492. Retrieved from https:\/\/arxiv.org\/abs\/2405.00492"},{"key":"e_1_3_2_60_2","unstructured":"Sida Peng Eirini Kalliamvakou Peter Cihon and Mert Demirer. 2023. The impact of AI on developer productivity: Evidence from github copilot. arXiv:2302.06590. Retrieved from https:\/\/arxiv.org\/abs\/2302.06590"},{"issue":"1","key":"e_1_3_2_61_2","doi-asserted-by":"crossref","first-page":"e103","DOI":"10.52225\/narra.v3i1.103","article-title":"ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations","volume":"3","author":"Sallam Malik","year":"2023","unstructured":"Malik Sallam, Nesreen Salim, Muna Barakat, and Alaa Al-Tammemi. 2023. ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations. Narra J 3, 1 (2023), e103\u2013e103.","journal-title":"Narra J"},{"issue":"5","key":"e_1_3_2_62_2","first-page":"73","article-title":"Superforecasting: How to upgrade your company\u2019s judgment","volume":"94","author":"Schoemaker Paul J. H.","year":"2016","unstructured":"Paul J. H. Schoemaker and Philip E. Tetlock. 2016. Superforecasting: How to upgrade your company\u2019s judgment. Harvard Business Review 94, 5 (2016), 73\u201378.","journal-title":"Harvard Business Review"},{"key":"e_1_3_2_63_2","unstructured":"Philipp Schoenegger Spencer Greenberg Alexander Grishin Joshua Lewis and Lucius Caviola. 2024. Can AI understand human personality? \u2013 Comparing human experts and AI systems at predicting personality correlations. arXiv:2406.08170. Retrieved from https:\/\/arxiv.org\/abs\/2406.08170"},{"key":"e_1_3_2_64_2","unstructured":"Philipp Schoenegger and Peter S. Park. 2023. Large language model prediction capabilities: Evidence from a real-world forecasting tournament. arXiv:2310.13014. Retrieved from https:\/\/arxiv.org\/abs\/2310.13014"},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","unstructured":"Philipp Schoenegger Indre Tuminauskaite Peter S. Park and Philip E. Tetlock. 2024. Wisdom of the silicon crowd: Llm ensemble prediction capabilities match Human crowd accuracy. arXiv:2402.19379. Retrieved from https:\/\/arxiv.org\/abs\/2402.19379","DOI":"10.1126\/sciadv.adp1528"},{"key":"e_1_3_2_66_2","unstructured":"Zhiqiang Shen Tianhua Tao Liqun Ma Willie Neiswanger Joel Hestness Natalia Vassilieva Daria Soboleva and Eric Xing. 2023. SlimPajama-DC: Understanding data combinations for LLM training. arXiv:2309.10818. Retrieved from https:\/\/arxiv.org\/abs\/2309.10818"},{"issue":"5","key":"e_1_3_2_67_2","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1177\/17456916231181102","article-title":"Three challenges for AI-assisted decision-making","volume":"19","author":"Steyvers Mark","year":"2023","unstructured":"Mark Steyvers and Aakriti Kumar. 2023. Three challenges for AI-assisted decision-making. Perspectives on Psychological Science 19, 5 (2023), 722\u2013734.","journal-title":"Perspectives on Psychological Science"},{"key":"e_1_3_2_68_2","unstructured":"Lawrence H. Summers and Steve Rattner. 2023. Larry Summers on Who Could Be Replaced by AI [Interviewed by Bloomberg TV\u2019S David Westin]. Retrieved from https:\/\/www.youtube.com\/watch?v=8Epl9yAu0gk"},{"key":"e_1_3_2_69_2","volume-title":"World Artificial Intelligence Conference in Shanghai","author":"Sutton Rich","year":"2023","unstructured":"Rich Sutton. 2023. AI succession [Youtube video of talk]. World Artificial Intelligence Conference in Shanghai. Retrieved from https:\/\/www.youtube.com\/watch?v=NgHFMolXs3U"},{"key":"e_1_3_2_70_2","volume-title":"Superforecasting: The Art and Science of Prediction","author":"Tetlock Philip E.","year":"2016","unstructured":"Philip E. Tetlock and Dan Gardner. 2016. Superforecasting: The Art and Science of Prediction. Random House."},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aal3147"},{"key":"e_1_3_2_72_2","volume-title":"Proceedings of the International Conference on Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the International Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_73_2","unstructured":"Vectara. 2024. Leaderboard comparing LLM performance at producing hallucinations when summarizing short documents. Retrieved July 24 2024 from https:\/\/github.com\/vectara\/hallucination-leaderboard"},{"key":"e_1_3_2_74_2","first-page":"20","article-title":"Chatgpt for robotics: Design principles and model abilities","volume":"2","author":"Vemprala Sai","year":"2023","unstructured":"Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor. 2023. Chatgpt for robotics: Design principles and model abilities. Microsoft Autonomous Systems & Robotics Research 2 (2023), 20.","journal-title":"Microsoft Autonomous Systems & Robotics Research"},{"key":"e_1_3_2_75_2","first-page":"370","volume-title":"Proceedings of the Conference on Human Information Interaction and Retrieval","author":"Wang Ben","year":"2024","unstructured":"Ben Wang, Jiqun Liu, Jamshed Karimnazarov, and Nicolas Thompson. 2024. Task supportive and personalized Human-large language model interaction: A user study. In Proceedings of the Conference on Human Information Interaction and Retrieval, 370\u2013375."},{"key":"e_1_3_2_76_2","unstructured":"Yubo Wang Xueguang Ma Ge Zhang Yuansheng Ni Abhranil Chandra Shiguang Guo Weiming Ren Aaran Arulraj Xuan He Ziyan Jiang et al. 2024. Mmlu-Pro: A more robust and challenging multi-task language understanding benchmark. arXiv:2406.01574. Retrieved from https:\/\/arxiv.org\/abs\/2406.01574"},{"key":"e_1_3_2_77_2","unstructured":"Jason Wei Yi Tay Rishi Bommasani Colin Raffel Barret Zoph Sebastian Borgeaud Dani Yogatama Maarten Bosma Denny Zhou Donald Metzler et al. 2022. Emergent abilities of large language models. arXiv:2206.07682. Retrieved from https:\/\/arxiv.org\/abs\/2206.07682"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2022.4410"},{"key":"e_1_3_2_79_2","unstructured":"Zhiheng Xi Wenxiang Chen Xin Guo Wei He Yiwen Ding Boyang Hong Ming Zhang Junzhe Wang Senjie Jin Enyu Zhou et al. 2023. The rise and potential of large language model based agents: A survey. arXiv:2309.07864. Retrieved from https:\/\/arxiv.org\/abs\/2309.07864"},{"key":"e_1_3_2_80_2","unstructured":"Changrong Xiao Wenxing Ma Sean Xin Xu Kunpeng Zhang Yufang Wang and Qi Fu. 2024. From automation to augmentation: Large language models elevating essay scoring landscape. arXiv:2401.06431. Retrieved from https:\/\/arxiv.org\/abs\/2401.06431"},{"key":"e_1_3_2_81_2","unstructured":"Benfeng Xu An Yang Junyang Lin Quan Wang Chang Zhou Yongdong Zhang and Zhendong Mao. 2023. ExpertPrompting: Instructing large language models to be distinguished experts. arXiv:2305.14688. Retrieved from https:\/\/arxiv.org\/abs\/2305.14688"},{"key":"e_1_3_2_82_2","first-page":"66","volume-title":"Proceedings of the AAAI Symposium Series","volume":"3","author":"Yang Diyi","year":"2024","unstructured":"Diyi Yang. 2024. Human-AI interaction in the age of large language models. In Proceedings of the AAAI Symposium Series, Vol. 3. 66\u201367."},{"key":"e_1_3_2_83_2","doi-asserted-by":"crossref","unstructured":"Qihui Zhang Chujie Gao Dongping Chen Yue Huang Yixin Huang Zhenyang Sun Shilin Zhang Weiye Li Zhengyan Fu Yao Wan and Lichao Sun. 2024. LLM-as-a-coauthor: Can mixed human-written and machine-generated text be detected? arXiv:2401.05952. Retrieved from https:\/\/arxiv.org\/abs\/2401.05952","DOI":"10.18653\/v1\/2024.findings-naacl.29"}],"container-title":["ACM Transactions on Interactive Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3707649","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3707649","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:38Z","timestamp":1750295858000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3707649"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,10]]},"references-count":82,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3707649"],"URL":"https:\/\/doi.org\/10.1145\/3707649","relation":{},"ISSN":["2160-6455","2160-6463"],"issn-type":[{"value":"2160-6455","type":"print"},{"value":"2160-6463","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,10]]},"assertion":[{"value":"2024-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}