{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T19:47:01Z","timestamp":1776109621963,"version":"3.50.1"},"reference-count":106,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"name":"National Science Foundation","award":["DGE1745016 and DGE2140739"],"award-info":[{"award-number":["DGE1745016 and DGE2140739"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput.-Hum. Interact."],"published-print":{"date-parts":[[2025,12,31]]},"abstract":"<jats:p>Spreadsheet programming is challenging. Programmers use spreadsheet programming knowledge (e.g., formulas) and problem-solving skills to combine actions into complex tasks. Advancements in large language models have introduced language agents that observe, plan, and perform tasks, showing promise for spreadsheet creation. We present TableTalk, a spreadsheet programming agent embodying three design principles\u2014scaffolding, flexibility, and incrementality\u2014derived from studies with seven spreadsheet programmers and 85 Excel templates. TableTalk guides programmers through structured plans based on professional workflows, generating three potential next steps to adapt plans to programmer needs. It uses pre-defined tools to generate spreadsheet components and incrementally build spreadsheets. In a study with 20 programmers, TableTalk produced higher-quality spreadsheets 2.3 times more likely to be preferred than the baseline. It reduced cognitive load and thinking time by 12.6%. From this, we derive design guidelines for agentic spreadsheet programming tools and discuss implications on spreadsheet programming, end-user programming, AI-assisted programming, and human-agent collaboration.<\/jats:p>","DOI":"10.1145\/3765286","type":"journal-article","created":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T15:33:37Z","timestamp":1757000017000},"page":"1-49","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["TableTalk: Scaffolding Spreadsheet Development with a Language Agent"],"prefix":"10.1145","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6722-9959","authenticated-orcid":false,"given":"Jenny T.","family":"Liang","sequence":"first","affiliation":[{"name":"Software &amp; Societal Systems, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1048-2352","authenticated-orcid":false,"given":"Aayush","family":"Kumar","sequence":"additional","affiliation":[{"name":"PROSE Team, Microsoft Corporation, Bangalore, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9042-1946","authenticated-orcid":false,"given":"Yasharth","family":"Bajpai","sequence":"additional","affiliation":[{"name":"PROSE Team, Microsoft Corporation, Bangalore, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9226-9634","authenticated-orcid":false,"given":"Sumit","family":"Gulwani","sequence":"additional","affiliation":[{"name":"PROSE Team, Microsoft Corporation, Redmond, Washington, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3727-3291","authenticated-orcid":false,"given":"Vu","family":"Le","sequence":"additional","affiliation":[{"name":"PROSE Team, Microsoft Corporation, Redmond, Washington, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6182-815X","authenticated-orcid":false,"given":"Chris","family":"Parnin","sequence":"additional","affiliation":[{"name":"PROSE Team, Microsoft Corporation, Redmond, Washington, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5559-5932","authenticated-orcid":false,"given":"Arjun","family":"Radhakrishna","sequence":"additional","affiliation":[{"name":"PROSE Team, Microsoft Corporation, Redmond, Washington, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5153-2686","authenticated-orcid":false,"given":"Ashish","family":"Tiwari","sequence":"additional","affiliation":[{"name":"PROSE Team, Microsoft Corporation, Redmond, Washington, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3921-9416","authenticated-orcid":false,"given":"Emerson","family":"Murphy-Hill","sequence":"additional","affiliation":[{"name":"PROSE Team, Microsoft Corporation, Redmond, Washington, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8061-9000","authenticated-orcid":false,"given":"Gustavo","family":"Soares","sequence":"additional","affiliation":[{"name":"PROSE Team, Microsoft Corporation, Redmond, Washington, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,12,9]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"The Burning Glass. 2024. 2023 Skills Compass Report\u2014The Burning Glass Institute. Retrieved August 27 2024 from https:\/\/static1.squarespace.com\/static\/6197797102be715f55c0e0a1\/t\/63ea41b5a9bd001d8061abe3\/1676296630197\/Skills+Compass+Report+2023_final.pdf"},{"key":"e_1_3_2_3_2","unstructured":"OpenAI Platform. 2024. Assistants Overview\u2014OpenAI API. Retrieved August 27 2024 from https:\/\/platform.openai.com\/docs\/assistants\/overview"},{"key":"e_1_3_2_4_2","unstructured":"Microsoft Learn. 2024. ExcelScript Package. Retrieved August 27 2024 from https:\/\/learn.microsoft.com\/en-us\/javascript\/api\/office-scripts\/excelscript?view=office-scripts"},{"key":"e_1_3_2_5_2","unstructured":"Cursor. 2025. The AI Code Editor. Retrieved July 28 2025 from https:\/\/cursor.com\/"},{"key":"e_1_3_2_6_2","unstructured":"Devin. 2025. Devin the AI Software Engineer. Retrieved January 9 2025 from https:\/\/devin.ai\/"},{"key":"e_1_3_2_7_2","unstructured":"GitHub. 2025. GPT Pilot. Retrieved July 28 2025 from https:\/\/github.com\/Pythagora-io\/gpt-pilot"},{"key":"e_1_3_2_8_2","unstructured":"Visual Studio Code. 2025. VSCode Agent Mode. Retrieved 30 May 2025 from https:\/\/code.visualstudio.com\/docs\/copilot\/chat\/chat-agent-mode"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1002\/9780470050118.ecse415"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLHCC.2004.29"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/1134285.1134312"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2007.39"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLHCC.2005.70"},{"key":"e_1_3_2_14_2","unstructured":"Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et al. 2023. Gpt-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_2_15_2","first-page":"436","volume-title":"Categorical Data Analysis","author":"Agresti Alan","year":"2012","unstructured":"Alan Agresti. 2012. Categorical Data Analysis. Vol. 792. John Wiley & Sons, 436\u2013439."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300233"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-emnlp.423"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3502070"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/VL\/HCC51201.2021.9576337"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/VL\/HCC60511.2024.00011"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3236061"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.5555\/2835587.2835589"},{"key":"e_1_3_2_23_2","unstructured":"Gagan Bansal Jennifer Wortman Vaughan Saleema Amershi Eric Horvitz Adam Fourney Hussein Mozannar Victor Dibia and Daniel S. Weld. 2024. Challenges in human-agent communication. arXiv:2412.10380. Retrieved from https:\/\/arxiv.org\/abs\/2412.10380"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3586030"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3545945.3569759"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/14872.001.0001","volume-title":"Moral Codes: Designing Alternatives to AI","author":"Blackwell Alan F.","year":"2024","unstructured":"Alan F. Blackwell. 2024. Moral Codes: Designing Alternatives to AI. MIT Press."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1201\/9781498710411-35\/sus-quick-dirty-usability-scale-john-brooke"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2003.1201191"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2002.1010060"},{"key":"e_1_3_2_30_2","unstructured":"Stephen Casper Luke Bailey Rosco Hunter Carson Ezell Emma Cabal\u00e9 Michael Gerovitch Stewart Slocum Kevin Wei Nikola Jurkovic Ariba Khan et al. 2025. The AI agent index. arXiv:2502.01635. Retrieved from https:\/\/arxiv.org\/abs\/2502.01635"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3501833"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3714002"},{"key":"e_1_3_2_33_2","first-page":"1661","volume-title":"International Conference on Machine Learning (ICML \u201921)","author":"Chen Xinyun","year":"2021","unstructured":"Xinyun Chen, Petros Maniatis, Rishabh Singh, Charles Sutton, Hanjun Dai, Max Lin, and Denny Zhou. 2021. Spreadsheetcoder: Formula prediction from semi-structured context. In International Conference on Machine Learning (ICML \u201921). PMLR, 1661\u20131672. Retrieved from https:\/\/proceedings.mlr.press\/v139\/chen21m.html"},{"key":"e_1_3_2_34_2","unstructured":"Haoyu Dong Jianbo Zhao Yuzhang Tian Junyu Xiong Shiyu Xia Mengyu Zhou Yun Lin Jos\u00e9 Cambronero Yeye He Shi Han and Dongmei Zhang. 2025. SpreadsheetLLM: Encoding spreadsheets for large language models. arXiv:2407.09025. Retrieved from https:\/\/arxiv.org\/abs\/2407.09025"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467228"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0956796805005794"},{"key":"e_1_3_2_37_2","unstructured":"KJ Feng Kevin Pu Matt Latzke Tal August Pao Siangliulue Jonathan Bragg Daniel S. Weld Amy X. Zhang and Joseph Chee Chang. 2024. Cocoa: Co-planning and co-execution with AI agents. arXiv:2412.10999. Retrieved from https:\/\/arxiv.org\/abs\/2412.10999"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/VL-HCC57772.2023.00017"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.sigdial-1.29"},{"key":"e_1_3_2_40_2","doi-asserted-by":"crossref","DOI":"10.1017\/9781009581738","volume-title":"Generative AI in Computer Science Education: Challenges and Opportunities","author":"Franklin Diana","year":"2025","unstructured":"Diana Franklin, Paul Denny, David A. Gonzalez-Maldonado, and Minh Tran. 2025. Generative AI in Computer Science Education: Challenges and Opportunities. Cambridge University Press."},{"key":"e_1_3_2_41_2","first-page":"10764","volume-title":"International Conference on Machine Learning (ICML \u201923)","author":"Gao Luyu","year":"2023","unstructured":"Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. Pal: Program-aided language models. In International Conference on Machine Learning (ICML \u201923). PMLR, 10764\u201310799. Retrieved from https:\/\/proceedings.mlr.press\/v202\/gao23f"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Priyanshu Gupta Shashank Kirtania Ananya Singha Sumit Gulwani Arjun Radhakrishna Sherry Shi and Gustavo Soares. 2024. Metareflection: Learning instructions for language agents using past reflections. arXiv:2405.13009. Retrieved from https:\/\/arxiv.org\/abs\/2405.13009","DOI":"10.18653\/v1\/2024.emnlp-main.477"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.29333\/ajqr\/14887"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0166-4115(08)62386-9"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.34293\/education.v8i4.3232"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/302979.303030"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642415"},{"key":"e_1_3_2_48_2","unstructured":"Carlos E. Jimenez John Yang Alexander Wettig Shunyu Yao Kexin Pei Ofir Press and Karthik Narasimhan. 2024. SWE-bench: Can language models resolve real-world GitHub issues? arXiv:2310.06770. Retrieved from https:\/\/arxiv.org\/abs\/2310.06770"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i12.29197"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3631802.3631806"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/1922649.1922658"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-013-9279-3"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLHCC.2004.47"},{"key":"e_1_3_2_54_2","unstructured":"Aayush Kumar Yasharth Bajpai Sumit Gulwani Gustavo Soares and Emerson Murphy-Hill. 2025. Sharp Tools: How developers wield agentic AI in real software engineering tasks. arXiv:2506.12347. Retrieved from https:\/\/arxiv.org\/abs\/2506.12347"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.2307\/2529310"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-020-09810-1"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3713778"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3587102.3588785"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1080\/10447318.2018.1455307"},{"key":"e_1_3_2_60_2","article-title":"SheetCopilot: Bringing software productivity to the next level through large language models. In","author":"Li Hongxin","year":"2024","unstructured":"Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, and Zhao-Xiang Zhang. 2024. SheetCopilot: Bringing software productivity to the next level through large language models. In Advances in Neural Information Processing Systems (NeurIPs \u201924), Vol. 36. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/file\/0ff30c4bf31db0119a6219e0d250e037-Paper-Conference.pdf","journal-title":"Advances in Neural Information Processing Systems (NeurIPs \u201924)"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00047"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","unstructured":"Jenny T. Liang Aayush Kumar Yasharth Bajpai Sumit Gulwani Vu Le Chris Parnin Arjun Rahakrishna Ashish Tiwari Emerson Murphy-Hill* and Gustavo Soares*. 2025. Supplemental Materials to \u201cTableTalk: Scaffolding Spreadsheet Development with a Language Agent.\u201d DOI: 10.6084\/m9.figshare.28293059.v1","DOI":"10.6084\/m9.figshare.28293059.v1"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3608128"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549082"},{"key":"e_1_3_2_65_2","unstructured":"Junwei Liu Kaixin Wang Yixuan Chen Xin Peng Zhenpeng Chen Lingming Zhang and Yiling Lou. 2024. Large language model-based agents for software engineering: A survey. arXiv:2409.02977. Retrieved from https:\/\/arxiv.org\/abs\/2409.02977"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3332165.3347908"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613905.3650756"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642149"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.817"},{"key":"e_1_3_2_70_2","volume-title":"Advances in Neural Information Processing Systems (NeurIPS \u201924),","volume":"36","author":"Madaan Aman","year":"2024","unstructured":"Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. 2024. Self-refine: Iterative refinement with self-feedback. In Advances in Neural Information Processing Systems (NeurIPS \u201924), Vol. 36. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/file\/91edff07232fb1b55a505a9e9f6c0ff3-Paper-Conference.pdf"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3641936"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/108844.108903"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2016.200"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1016\/0020-7373(91)90040-E"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLHCC.2018.8506540"},{"key":"e_1_3_2_76_2","doi-asserted-by":"crossref","unstructured":"Debjit Paul Mete Ismayilzada Maxime Peyrard Beatriz Borges Antoine Bosselut Robert West and Boi Faltings. 2024. REFINER: Reasoning feedback on intermediate representations (March 2024) 1100\u20131126. Retrieved from https:\/\/aclanthology.org\/2024.eacl-long.67\/","DOI":"10.18653\/v1\/2024.eacl-long.67"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.265"},{"key":"e_1_3_2_78_2","first-page":"2","volume-title":"International Conference on Intelligence Analysis","volume":"5","author":"Pirolli Peter","year":"2005","unstructured":"Peter Pirolli and Stuart Card. 2005. The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In International Conference on Intelligence Analysis, Vol. 5, 2\u20134. Retrieved from https:\/\/www.researchgate.net\/profile\/Peter-Pirolli\/publication\/215439203_The_sensemaking_process_and_leverage_points_for_analyst_technology_as_identified_through_cognitive_task_analysis\/links\/02bfe50f09ca94efc0000000\/The-sensemaking-process-and-leverage-points-for-analyst-technology-as-identified-through-cognitive-task-analysis.pdf"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/3617367"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3713357"},{"key":"e_1_3_2_81_2","first-page":"273","volume-title":"Scaffolding","author":"Reiser Brian J.","year":"2018","unstructured":"Brian J. Reiser. 2018. Scaffolding complex learning: The mechanisms of structuring and problematizing student work. In Scaffolding. Psychology Press, 273\u2013304. Retrieved from https:\/\/www.taylorfrancis.com\/chapters\/edit\/10.4324\/9780203764411-2\/scaffolding-complex-learning-mechanisms-structuring-problematizing-student-work-brian-reiser"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-19027-3_5"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1145\/3708524"},{"key":"e_1_3_2_84_2","doi-asserted-by":"crossref","DOI":"10.4324\/9781003377986","volume-title":"Qualitative Research: The Essential Guide to Theory and Practice","author":"Savin-Baden Maggi","year":"2023","unstructured":"Maggi Savin-Baden and Claire Major. 2023. Qualitative Research: The Essential Guide to Theory and Practice. Routledge."},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02505026"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLHCC.2017.8103472"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLHCC.2005.34"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.5555\/3666122.3669119"},{"issue":"8","key":"e_1_3_2_89_2","first-page":"n8","article-title":"Your chi-square test is statistically significant: now what?","volume":"20","author":"Sharpe Donald","year":"2015","unstructured":"Donald Sharpe. 2015. Your chi-square test is statistically significant: now what? Practical Assessment, Research & Evaluation 20, 8 (2015), n8. Retrieved from https:\/\/eric.ed.gov\/?id=EJ1059772","journal-title":"Practical Assessment, Research & Evaluation"},{"key":"e_1_3_2_90_2","article-title":"Reflexion: Language agents with verbal reinforcement learning","author":"Shinn Noah","year":"2024","unstructured":"Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2024. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems (NeurIPS \u201924), Vol. 36. Retrieved from https:\/\/papers.nips.cc\/paper_files\/paper\/2023\/file\/1b44b878bb782e6954cd888628510e90-Paper-Conference.pdf","journal-title":"Advances in Neural Information Processing Systems (NeurIPS \u201924)"},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490099.3511161"},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.1145\/3586183.3606756"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.1145\/3640543.3645159"},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642902"},{"key":"e_1_3_2_95_2","unstructured":"Michele Tufano Anisha Agarwal Jinu Jang Roshanak Zilouchian Moghaddam and Neel Sundaresan. 2024. AutoDev: Automated AI-driven development. arXiv:2403.08299. Retrieved from https:\/\/arxiv.org\/abs\/2403.08299"},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491101.3519665"},{"key":"e_1_3_2_97_2","unstructured":"Sanidhya Vijayvargiya Xuhui Zhou Akhila Yerukola Maarten Sap and Graham Neubig. 2025. Interactive agents to overcome ambiguity in software engineering. arXiv:2502.13069. Retrieved from https:\/\/arxiv.org\/abs\/2502.13069"},{"key":"e_1_3_2_98_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-024-40231-1"},{"key":"e_1_3_2_99_2","doi-asserted-by":"publisher","DOI":"10.1145\/3630106.3658984"},{"key":"e_1_3_2_100_2","unstructured":"Xingyao Wang Boxuan Li Yufan Song Frank F. Xu Xiangru Tang Mingchen Zhuge Jiayi Pan Yueqi Song Bowen Li Jaskirat Singh et al. 2025. OpenHands: An open platform for AI software developers as generalist agents. arXiv:2407.16741. Retrieved from https:\/\/arxiv.org\/abs\/2407.16741"},{"key":"e_1_3_2_101_2","unstructured":"Zhiheng Xi Wenxiang Chen Xin Guo Wei He Yiwen Ding Boyang Hong Ming Zhang Junzhe Wang Senjie Jin Enyu Zhou et al. 2023. The rise and potential of large language model based agents: A survey. arXiv:2309.07864. Retrieved from https:\/\/arxiv.org\/abs\/2309.07864"},{"key":"e_1_3_2_102_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40593-025-00457-x"},{"key":"e_1_3_2_103_2","article-title":"SWE-agent: Agent-computer interfaces enable automated software engineering","author":"Yang John","year":"2024","unstructured":"John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R. Narasimhan, and Ofir Press. 2024. SWE-agent: Agent-computer interfaces enable automated software engineering. In Advances in Neural Information Processing Systems (NeurIPs \u201924). Retrieved from https:\/\/openreview.net\/pdf?id=mXpq6ut8J3","journal-title":"Advances in Neural Information Processing Systems (NeurIPs \u201924)"},{"key":"e_1_3_2_104_2","article-title":"Tree of thoughts: Deliberate problem solving with large language models","author":"Yao Shunyu","year":"2024","unstructured":"Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2024. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems (NeurIPS \u201924), Vol. 36. Retrieved from https:\/\/paper_files\/paper\/2023\/file\/271db9922b8d1f4dd7aaef84ed5ac703-Paper-Conference.pdf","journal-title":"Advances in Neural Information Processing Systems (NeurIPS \u201924)"},{"key":"e_1_3_2_105_2","volume-title":"International Conference on Learning Representations (ICLR\u201923). Retrieved from","author":"Yao Shunyu","year":"2023","unstructured":"Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR\u201923). Retrieved from https:\/\/openreview.net\/forum?id=WE_vluYUL-X"},{"key":"e_1_3_2_106_2","unstructured":"Linghao Zhang Shilin He Chaoyun Zhang Yu Kang Bowen Li Chengxing Xie Junhao Wang Maoquan Wang Yufan Huang Shengyu Fu et al. 2025. SWE-bench goes live! arXiv:2505.23419. Retrieved from https:\/\/arxiv.org\/abs\/2505.23419"},{"key":"e_1_3_2_107_2","doi-asserted-by":"publisher","DOI":"10.1145\/3650212.3680384"}],"container-title":["ACM Transactions on Computer-Human Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3765286","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T13:57:09Z","timestamp":1765288629000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3765286"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,9]]},"references-count":106,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12,31]]}},"alternative-id":["10.1145\/3765286"],"URL":"https:\/\/doi.org\/10.1145\/3765286","relation":{},"ISSN":["1073-0516","1557-7325"],"issn-type":[{"value":"1073-0516","type":"print"},{"value":"1557-7325","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,9]]},"assertion":[{"value":"2025-02-18","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}