{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,15]],"date-time":"2026-07-15T20:06:23Z","timestamp":1784145983289,"version":"3.55.0"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"PLDI","funder":[{"name":"Royal Society University Research Fellowship","award":["URF\\R\\221031"],"award-info":[{"award-number":["URF\\R\\221031"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2025,6,10]]},"abstract":"<jats:p>\n                    Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. However, a significant limitation when using LLMs for code translation is scalability: existing works have shown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation by developing a modular approach to translation, where we partition the code into small code fragments which can be translated independently and semantically validated (that is, by checking I\/O equivalence). When this approach is applied naively, we discover that LLMs are unreliable when translating features of the source language that do not have a direct mapping to the target language, and that the LLM often gets stuck in repair loops when attempting to fix errors. To address these issues, we introduce two key concepts: (1)\n                    <jats:italic toggle=\"yes\">feature mapping<\/jats:italic>\n                    , which integrates predefined translation rules with LLM-based translation to guide the LLM in navigating subtle language differences and producing semantically accurate code; and (2)\n                    <jats:italic toggle=\"yes\">type-compatibility<\/jats:italic>\n                    , which facilitates localized checks at the function signature level to detect errors early, thereby narrowing the scope of potential repairs. We apply our approach to translating real-world Go codebases to Rust, demonstrating that we can consistently generate reliable Rust translations for projects up to 9,700 lines of code and 780 functions, with an average of 73% of functions successfully validated for I\/O equivalence, considerably higher than any existing work.\n                  <\/jats:p>","DOI":"10.1145\/3729315","type":"journal-article","created":{"date-parts":[[2025,6,13]],"date-time":"2025-06-13T16:02:27Z","timestamp":1749830547000},"page":"1616-1641","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Scalable, Validated Code Translation of Entire Projects using Large Language Models"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3309-0439","authenticated-orcid":false,"given":"Hanliang","family":"Zhang","sequence":"first","affiliation":[{"name":"University of Bristol, Bristol, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9106-934X","authenticated-orcid":false,"given":"Cristina","family":"David","sequence":"additional","affiliation":[{"name":"University of Bristol, Bristol, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7780-630X","authenticated-orcid":false,"given":"Meng","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Bristol, Bristol, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7790-6570","authenticated-orcid":false,"given":"Brandon","family":"Paulsen","sequence":"additional","affiliation":[{"name":"Amazon, Arlington, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6681-5283","authenticated-orcid":false,"given":"Daniel","family":"Kroening","sequence":"additional","affiliation":[{"name":"Amazon, Seattle, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,6,13]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"[n.d.]. C to Go Translator. https:\/\/github.com\/gotranspile\/cgo."},{"key":"e_1_3_2_3_2","unstructured":"[n.d.]. C2Rust Transpiler. https:\/\/c2rust.com\/.."},{"key":"e_1_3_2_4_2","unstructured":"[n.d.]. Claude. https:\/\/www.anthropic.com\/index\/introducing-claude.."},{"key":"e_1_3_2_5_2","unstructured":"[n.d.]. Cloud SDK: Libraries and Command Line Interface. https:\/\/cloud.google.com\/sdk\/. Accessed: 2024-11-05."},{"key":"e_1_3_2_6_2","unstructured":"[n.d.]. Download Azure SDKs and Tools. https:\/\/azure.microsoft.com\/en-us\/downloads\/. Accessed: 2024-11-05."},{"key":"e_1_3_2_7_2","unstructured":"[n.d.]. Eliminating Memory Safety Vulnerabilities Once and For All (DARPA). https:\/\/www.darpa.mil\/news-events\/2024-07-31a."},{"key":"e_1_3_2_8_2","unstructured":"[n.d.]. Gemini. https:\/\/blog.google\/technology\/ai\/google-gemini-ai\/."},{"key":"e_1_3_2_9_2","unstructured":"[n.d.]. Go standard regular expression library. https:\/\/pkg.go.dev\/regexp."},{"key":"e_1_3_2_10_2","unstructured":"[n.d.]. Moov ACH. https:\/\/github.com\/moov-io\/ach."},{"key":"e_1_3_2_11_2","unstructured":"[n.d.]. Provide check digit algorithms and calculators written in Go. https:\/\/github.com\/osamingo\/checkdigit."},{"key":"e_1_3_2_12_2","unstructured":"[n.d.]. S2 geometry library in Go. https:\/\/github.com\/golang\/geo."},{"key":"e_1_3_2_13_2","unstructured":"[n.d.]. Sharpen - Automated Java->C# conversion. https:\/\/github.com\/mono\/sharpen."},{"key":"e_1_3_2_14_2","unstructured":"[n.d.]. Stats - Golang Statistics Package. https:\/\/github.com\/montanaflynn\/stats."},{"key":"e_1_3_2_15_2","unstructured":"[n.d.]. Streaming approximate histograms in Go. https:\/\/github.com\/VividCortex\/gohistogram."},{"key":"e_1_3_2_16_2","unstructured":"[n.d.]. String comparison and edit distance algorithms library. https:\/\/github.com\/hollson\/go-edlib."},{"key":"e_1_3_2_17_2","unstructured":"[n.d.]. Takes a full name and splits it into individual name parts. https:\/\/github.com\/polera\/gonameparts."},{"key":"e_1_3_2_18_2","unstructured":"[n.d.]. TextRank implementation in Golang with extendable features (summarization phrase extraction) and multithreading (goroutine). https:\/\/github.com\/DavidBelic\/TextRank\/tree\/master."},{"key":"e_1_3_2_19_2","unstructured":"[n.d.]. Tools to Build on AWS. https:\/\/aws.amazon.com\/developer\/tools\/. Accessed: 2024-11-05."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.14"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/2491411.2491430"},{"key":"e_1_3_2_22_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew N. Carr Jan Leike Josh Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE55347.2025.00022"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3639477.3639719"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485498"},{"key":"e_1_3_2_26_2","unstructured":"Hasan Ferit Eniser Hanliang Zhang Cristina David Meng Wang Maria Christakis Brandon Paulsen Joey Dodds and Daniel Kroening. 2024. Towards Translating Real-World Code with LLMs: A Study of Translating to Rust. arXiv:2405.11514 [cs.SE]."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-39322-9-5"},{"key":"e_1_3_2_28_2","doi-asserted-by":"crossref","unstructured":"Robert Griesemer Raymond Hu Wen Kokke Julien Lange Ian Lance Taylor Bernardo Toninho Philip Wadler and Nobuko Yoshida. 2020. Featherweight Go. arXiv:2005.11710 [cs.PL].","DOI":"10.1145\/3428217"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3264835"},{"key":"e_1_3_2_30_2","unstructured":"Ali Reza Ibrahimzada Kaiyao Ke Mrigank Pawagi Muhammad Salman Abid Raneet Pan Saurabh Sinha and Reyhaneh Jabbarvand. 2024. Repository-Level Compositional Code Translation and Validation. arXiv:2410.24117 [cs.SE]."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2818567.2818579"},{"key":"e_1_3_2_32_2","unstructured":"Prithwish Jana Piyush Jha Haoyang Ju Gautham Kishore Aryan Mahajan and Vijay Ganesh. 2023. Attention Compilation and Solver-based Symbolic Analysis are All You Need. arXiv preprint arXiv:2306.06755 (2023)."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00114"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICST.2010.64"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2568225.2568318"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3719345"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.52202\/075280-0943"},{"key":"e_1_3_2_38_2","volume-title":"Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks","author":"Lu Shuai","year":"2021","unstructured":"Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, Vol. 1. https:\/\/datasets-benchmarks-proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/c16a5320fa475530d9583c34fd356ef5-Paper-round1.pdf."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/1190216.1190220"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349314"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00054"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380363"},{"key":"e_1_3_2_43_2","unstructured":"OpenAI Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat Red Avila Igor Babuschkin Suchir Balaji Valerie Balcom Paul Baltescu Haiming Bao Mohammad Bavarian Jeff Belgun Irwan Bello Jake Berdine Gabriel Bernadett-Shapiro Christopher Berner Lenny Bogdonoff Oleg Boiko Madelaine Boyd Anna-Luisa Brakman Greg Brockman Tim Brooks Miles Brundage Kevin Button Trevor Cai Rosie Campbell Andrew Cann Brittany Carey Chelsea Carlson Rory Carmichael Brooke Chan Che Chang Fotis Chantzis Derek Chen Sully Chen Ruby Chen Jason Chen Mark Chen Ben Chess Chester Cho Casey Chu Hyung Won Chung Dave Cummings Jeremiah Currier Yunxing Dai Cory Decareaux Thomas Degry Noah Deutsch Damien Deville Arka Dhar David Dohan Steve Dowling Sheila Dunning Adrien Ecoffet Atty Eleti Tyna Eloundou David Farhi Liam Fedus Niko Felix Sim\u00f3n Posada Fishman Juston Forte Isabella Fulford Leo Gao Elie Georges Christian Gibson Vik Goel Tarun Gogineni Gabriel Goh Rapha Gontijo-Lopes Jonathan Gordon Morgan Grafstein Scott Gray Ryan Greene Joshua Gross Shixiang Shane Gu Yufei Guo Chris Hallacy Jesse Han Jeff Harris Yuchen He Mike Heaton Johannes Heidecke Chris Hesse Alan Hickey Wade Hickey Peter Hoeschele Brandon Houghton Kenny Hsu Shengli Hu Xin Hu Joost Huizinga Shantanu Jain Shawn Jain Joanne Jang Angela Jiang Roger Jiang Haozhun Jin Denny Jin Shino Jomoto Billie Jonn Heewoo Jun Tomer Kaftan \u0141ukasz Kaiser Ali Kamali Ingmar Kanitscheider Nitish Shirish Keskar Tabarak Khan Logan Kilpatrick Jong Wook Kim Christina Kim Yongjik Kim Jan Hendrik Kirchner Jamie Kiros Matt Knight Daniel Kokotajlo \u0141ukasz Kondraciuk Andrew Kondrich Aris Konstantinidis Kyle Kosic Gretchen Krueger Vishal Kuo Michael Lampe Ikai Lan Teddy Lee Jan Leike Jade Leung Daniel Levy Chak Ming Li Rachel Lim Molly Lin Stephanie Lin Mateusz Litwin Theresa Lopez Ryan Lowe Patricia Lue Anna Makanju Kim Malfacini Sam Manning Todor Markov Yaniv Markovski Bianca Martin Katie Mayer Andrew Mayne Bob McGrew Scott Mayer McKinney Christine McLeavey Paul McMillan Jake McNeil David Medina Aalok Mehta Jacob Menick Luke Metz Andrey Mishchenko Pamela Mishkin Vinnie Monaco Evan Morikawa Daniel Mossing Tong Mu Mira Murati Oleg Murk David M\u00e9ly Ashvin Nair Reiichiro Nakano Rajeev Nayak Arvind Neelakantan Richard Ngo Hyeonwoo Noh Long Ouyang Cullen O\u2019 Keefe Jakub Pachocki Alex Paino Joe Palermo Ashley Pantuliano Giambattista Parascandolo Joel Parish Emy Parparita Alex Passos Mikhail Pavlov Andrew Peng Adam Perelman Filipe de Avila Belbute Peres Michael Petrov Henrique Ponde de Oliveira Pinto Michael Pokorny Michelle Pokrass Vitchyr H. Pong Tolly Powell Alethea Power Boris Power Elizabeth Proehl Raul Puri Alec Radford Jack Rae Aditya Ramesh Cameron Raymond Francis Real Kendra Rimbach Carl Ross Bob Rotsted Henri Roussez Nick Ryder Mario Saltarelli Ted Sanders Shibani Santurkar Girish Sastry Heather Schmidt David Schnurr John Schulman Daniel Selsam Kyla Sheppard Toki Sherbakov Jessica Shieh Sarah Shoker Pranav Shyam Szymon Sidor Eric Sigler Maddie Simens Jordan Sitkin Katarina Slama Ian Sohl Benjamin Sokolowsky Yang Song Natalie Staudacher Felipe Petroski Such Natalie Summers Ilya Sutskever Jie Tang Nikolas Tezak Madeleine B. Thompson Phil Tillet Amin Tootoonchian Elizabeth Tseng Preston Tuggle Nick Turley Jerry Tworek Juan Felipe Cer\u00f3n Uribe Andrea Vallone Arun Vijayvergiya Chelsea Voss Carroll Wainwright Justin Jay Wang Alvin Wang Ben Wang Jonathan Ward Jason Wei CJ Weinmann Akila Welihinda Peter Welinder Jiayi Weng Lilian Weng Matt Wiethoff Dave Willner Clemens Winter Samuel Wolrich Hannah Wong Lauren Workman Sherwin Wu Jeff Wu Michael Wu Kai Xiao Tao Xu Sarah Yoo Kevin Yu Qiming Yuan Wojciech Zaremba Rowan Zellers Chong Zhang Marvin Zhang Shengjia Zhao Tianhao Zheng Juntang Zhuang William Zhuk and Barret Zoph. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]. https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884845"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639226"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523703"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","unstructured":"Brandon Paulsen. 2025. Artifact for Scalable Validated Code Translation of Entire Projects using Large Language Models. doi:10.5281\/zenodo.15242640","DOI":"10.5281\/zenodo.15242640"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/1993316.1993558"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2017.27"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0054170"},{"key":"e_1_3_2_51_2","unstructured":"Ruchir Puri David S. Kung Geert Janssen Wei Zhang Giacomo Domeniconi Vladimir Zolotov Julian Dolby Jie Chen Mihir Choudhury Lindsey Decker Veronika Thost Luca Buratti Saurabh Pujar Shyam Ramji Ulrich Finkler Susan Malaika and Frederick Reiss. 2021. CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. arXiv:2105.12655 [cs.SE]."},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3497454"},{"key":"e_1_3_2_53_2","unstructured":"Baptiste Roziere Jie M. Zhang Fran\u00e7ois Charton Mark Harman Gabriel Synnaeve and Guillaume Lample. 2022. Leveraging Automated Unit Tests for Unsupervised Code Translation. arXiv:2110.06773 [cs.SE]."},{"key":"e_1_3_2_54_2","unstructured":"Momoko Shiraishi and Takahiro Shinagawa. 2024. Content-aware Code Segmentation for C-to-Rust Translation using Large Language Models. arXiv:2409.10506 [cs.SE]."},{"key":"e_1_3_2_55_2","unstructured":"Marc Szafraniec Baptiste Rozi\u00e8re Hugh Leather Fran\u00e7ois Charton Patrick Labatut and Gabriel Synnaeve. 2023. Code Translation with Compiler Representations. arXiv:2207.03578 [cs.PL]."},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.119"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3636430"},{"key":"e_1_3_2_58_2","unstructured":"David Tolnay. 2024. Anyhow. https:\/\/github.com\/dtolnay\/anyhow.."},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00129"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.337"},{"key":"e_1_3_2_61_2","unstructured":"Aidan Z. H. Yang Yoshiki Takashima Brandon Paulsen Josiah Dodds and Daniel Kroening. 2024. VERT: Verified Equivalent Rust Translation with Large Language Models as Few-Shot Learners. arXiv:2404.18852 [cs.PL]."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2407.07472"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-37709-9-22"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3616350"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729315","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,7,15]],"date-time":"2026-07-15T19:46:47Z","timestamp":1784144807000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729315"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,10]]},"references-count":63,"journal-issue":{"issue":"PLDI","published-print":{"date-parts":[[2025,6,10]]}},"alternative-id":["10.1145\/3729315"],"URL":"https:\/\/doi.org\/10.1145\/3729315","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,10]]},"assertion":[{"value":"2024-11-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}