{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T11:18:09Z","timestamp":1775128689622,"version":"3.50.1"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA2","license":[{"start":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T00:00:00Z","timestamp":1728345600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["SES-2326173, SES-2326174, SES-2326175"],"award-info":[{"award-number":["SES-2326173, SES-2326174, SES-2326175"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,10,8]]},"abstract":"<jats:p>\n                    Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming language. Code LLMs produce impressive results on\n                    <jats:italic toggle=\"yes\">high-resource programming languages<\/jats:italic>\n                    that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with\n                    <jats:italic toggle=\"yes\">low-resource languages<\/jats:italic>\n                    that have limited training data available (e.g., OCaml, Racket, and several others).\n                  <\/jats:p>\n                  <jats:p>\n                    This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, called M\n                    <jats:sc>ulti<\/jats:sc>\n                    PL-T, generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. M\n                    <jats:sc>ulti<\/jats:sc>\n                    PL-T translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize unit tests for commented code from a high-resource source language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate the code from the high-resource source language to a target low-resource language. This gives us a corpus of candidate training data in the target language, but many of these translations are wrong. 3) We use a lightweight compiler to compile the test cases generated in (1) from the source language to the target language, which allows us to filter our obviously wrong translations. The result is a training corpus in the target low-resource language where all items have been validated with test cases. We apply this approach to generate tens of thousands of new, validated training items for five low-resource languages: Julia, Lua, OCaml, R, and Racket, using Python as the source high-resource language. Furthermore, we use an open Code LLM (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done.\n                  <\/jats:p>\n                  <jats:p>\n                    Using datasets generated with M\n                    <jats:sc>ulti<\/jats:sc>\n                    PL-T, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket that outperform other fine-tunes of these base models on the natural language to code task. We also present Racket fine-tunes for two very recent models, DeepSeek Coder and StarCoder2, to show that M\n                    <jats:sc>ulti<\/jats:sc>\n                    PL-T continues to outperform other fine-tuning approaches for low-resource languages. The M\n                    <jats:sc>ulti<\/jats:sc>\n                    PL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.\n                  <\/jats:p>","DOI":"10.1145\/3689735","type":"journal-article","created":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T03:23:04Z","timestamp":1728357784000},"page":"677-708","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":30,"title":["Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9318-7454","authenticated-orcid":false,"given":"Federico","family":"Cassano","sequence":"first","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0494-7245","authenticated-orcid":false,"given":"John","family":"Gouwar","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5837-6097","authenticated-orcid":false,"given":"Francesca","family":"Lucchetti","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2533-1242","authenticated-orcid":false,"given":"Claire","family":"Schlesinger","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1904-6193","authenticated-orcid":false,"given":"Anders","family":"Freeman","sequence":"additional","affiliation":[{"name":"Wellesley College, Wellesley, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5717-4210","authenticated-orcid":false,"given":"Carolyn Jane","family":"Anderson","sequence":"additional","affiliation":[{"name":"Wellesley College, Wellesley, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5222-7720","authenticated-orcid":false,"given":"Molly Q","family":"Feldman","sequence":"additional","affiliation":[{"name":"Oberlin College, Oberlin, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0014-7670","authenticated-orcid":false,"given":"Michael","family":"Greenberg","sequence":"additional","affiliation":[{"name":"Stevens Institute of Technology, Hoboken, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4849-6776","authenticated-orcid":false,"given":"Abhinav","family":"Jangda","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7493-3271","authenticated-orcid":false,"given":"Arjun","family":"Guha","sequence":"additional","affiliation":[{"name":"Northeastern University, Northeastern, USA"},{"name":"Roblox, San Mateo, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,10,8]]},"reference":[{"key":"e_1_3_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510049"},{"key":"e_1_3_1_3_1","unstructured":"Loubna Ben Allal. 2024. Big Code Models Leaderboard. https:\/\/huggingface.co\/spaces\/bigcode\/bigcode-models-leaderboard."},{"key":"e_1_3_1_4_1","unstructured":"Loubna Ben Allal Raymond Li Denis Kocetkov Chenghao Mou Christopher Akiki Carlos Munoz Ferrandis Niklas Muennighoff Mayank Mishra Alex Gu Manan Dey Logesh Kumar Umapathi Carolyn Jane Anderson Yangtian Zi Joel Lamy Poirier Hailey Schoelkopf Sergey Troshin Dmitry Abulkhanov Manuel Romero Michael Lappert Francesco De Toni Bernardo Garc\u00eda del R\u00edo Qian Liu Shamik Bose Urvashi Bhattacharyya Terry Yue Zhuo Ian Yu Paulo Villegas Marco Zocca Sourab Mangrulkar David Lansky Huu Nguyen Danish Contractor Luis Villa Jia Li Dzmitry Bahdanau Yacine Jernite Sean Hughes Daniel Fried Arjun Guha Harm de Vries and Leandro von Werra. 2023. SantaCoder: Don\u2019t Reach for the Stars!. In Deep Learning for Code Workshop (DL4C)."},{"key":"e_1_3_1_5_1","unstructured":"Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin Alexandre Passos Siamak Shakeri Emanuel Taropa Paige Bailey Zhifeng Chen Eric Chu Jonathan H. Clark Laurent El Shafey Yanping Huang Kathy Meier-Hellstern Gaurav Mishra Erica Moreira Mark Omernick Kevin Robinson Sebastian Ruder Yi Tay Kefan Xiao Yuanzhong Xu Yujing Zhang Gustavo Hernandez Abrego Junwhan Ahn Jacob Austin Paul Barham Jan Botha James Bradbury Siddhartha Brahma Kevin Brooks Michele Catasta Yong Cheng Colin Cherry Christopher A. Choquette-Choo Aakanksha Chowdhery Cl\u00e9ment Crepy Shachi Dave Mostafa Dehghani Sunipa Dev Jacob Devlin Mark D\u00edaz Nan Du Ethan Dyer Vlad Feinberg Fangxiaoyu Feng Vlad Fienber Markus Freitag Xavier Garcia Sebastian Gehrmann Lucas Gonzalez Guy Gur-Ari Steven Hand Hadi Hashemi Le Hou Joshua Howland Andrea Hu Jeffrey Hui Jeremy Hurwitz Michael Isard Abe Ittycheriah Matthew Jagielski Wenhao Jia Kathleen Kenealy Maxim Krikun Sneha Kudugunta Chang Lan Katherine Lee Benjamin Lee Eric Li Music Li Wei Li YaGuang Li Jian Li Hyeontaek Lim Hanzhao Lin Zhongtao Liu Frederick Liu Marcello Maggioni Aroma Mahendru Joshua Maynez Vedant Misra Maysam Moussalem Zachary Nado John Nham Eric Ni Andrew Nystrom Alicia Parrish Marie Pellat Martin Polacek Alex Polozov Reiner Pope Siyuan Qiao Emily Reif Bryan Richter Parker Riley Alex Castro Ros Aurko Roy Brennan Saeta Rajkumar Samuel Renee Shelby Ambrose Slone Daniel Smilkov David R. So Daniel Sohn Simon Tokumine Dasha Valter Vijay Vasudevan Kiran Vodrahalli Xuezhi Wang Pidong Wang Zirui Wang Tao Wang John Wieting Yuhuai Wu Kelvin Xu Yunhan Xu Linting Xue Pengcheng Yin Jiahui Yu Qiao Zhang Steven Zheng Ce Zheng Weikang Zhou Denny Zhou Slav Petrov and Yonghui Wu. 2023. PaLM 2 Technical Report. arXiv:2305.10403 [cs.CL]"},{"key":"e_1_3_1_6_1","unstructured":"Anthropic. 2023a. Model Card and Evaluations for Claude Models. https:\/\/www-files.anthropic.com\/production\/images\/Model-Card-Claude-2.pdf Accessed: August 17 2023."},{"key":"e_1_3_1_7_1","unstructured":"Anthropic. 2023b. Terms of Service. https:\/\/console.anthropic.com\/legal\/terms Accessed: August 17 2023."},{"key":"e_1_3_1_8_1","unstructured":"Ben Athiwaratkun Sanjay Krishna Gouda Zijian Wang Xiaopeng Li Yuchen Tian Ming Tan Wasi Uddin Ahmad Shiqi Wang Qing Sun Mingyue Shang Sujan Kumar Gonugondla Hantian Ding Varun Kumar Nathan Fulton Arash Farahani Siddhartha Jain Robert Giaquinto Haifeng Qian Murali Krishna Ramanathan Ramesh Nallapati Baishakhi Ray Parminder Bhatia Sudipta Sengupta Dan Roth and Bing Xiang. 2022. Multi-Lingual Evaluation of Code Generation Models. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_1_9_1","unstructured":"Jacob Austin Augustus Odena Maxwell Nye Maarten Bosma Henryk Michalewski David Dohan Ellen Jiang Carrie Cai Michael Terry Quoc Le and Charles Sutton. 2021. Program Synthesis with Large Language Models. arXiv preprint arXiv:2108.07732 (2021)."},{"key":"e_1_3_1_10_1","unstructured":"Hannah McLean Babe Sydney Nguyen Yangtian Zi Arjun Guha Molly Q. Feldman and Carolyn Jane Anderson. 2024. StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code. In Findings of the Association for Computational Linguistics."},{"key":"e_1_3_1_11_1","unstructured":"Razan Baltaji Saurabh Pujar Louis Mandel Martin Hirzel Luca Buratti and Lav Varshney. 2024. Learning Transfers over Several Programming Languages. arXiv:2310.16937 [cs.CL]"},{"key":"e_1_3_1_12_1","unstructured":"Patrick Barei\u00df Beatriz Souza Marcelo d\u2019Amorim and Michael Pradel. 2022. Code Generation Tools (Almost) for Free? A Study of Few-Shot Pre-Trained Language Models on Code. arXiv:2206.01335 [cs]"},{"key":"e_1_3_1_13_1","doi-asserted-by":"publisher","unstructured":"Hudson Borges Andre Hora and Marco Tulio Valente. 2016. Understanding the Factors That Impact the Popularity of GitHub Repositories. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). 334\u2013344. https:\/\/doi.org\/10.1109\/ICSME.2016.31 10.1109\/ICSME.2016.31","DOI":"10.1109\/ICSME.2016.31"},{"key":"e_1_3_1_14_1","unstructured":"Collin Burns Pavel Izmailov Jan Hendrik Kirchner Bowen Baker Leo Gao Leopold Aschenbrenner Yining Chen Adrien Ecoffet Manas Joglekar Jan Leike Ilya Sutskever and Jeffrey Wu. 2024. Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. In International Conference on Machine Learning (ICML)."},{"key":"e_1_3_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2023.3267446"},{"key":"e_1_3_1_16_1","unstructured":"Federico Cassano Luisa Li Akul Sethi Noah Shinn Abby Brennan-Jones Anton Lozhkov Carolyn Jane Anderson and Arjun Guha. 2024. Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions. In International Workshop on Large Language Models for Code (LLM4Code)."},{"key":"e_1_3_1_17_1","unstructured":"Sahil Chaudhary. 2023. Code Alpaca: An Instruction-following LLaMA model for code generation. https:\/\/github.com\/sahil280114\/codealpaca."},{"key":"e_1_3_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3524610.3527917"},{"key":"e_1_3_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3624062.3624088"},{"key":"e_1_3_1_20_1","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)."},{"key":"e_1_3_1_21_1","unstructured":"CodeWhisperer. 2023. ML-powered Coding Companion \u2013 Amazon CodeWhisperer \u2013 Amazon Web Services. https:\/\/aws.amazon.com\/codewhisperer\/."},{"key":"e_1_3_1_22_1","unstructured":"Github Copilot. 2023. Github Copilot Your AI pair programmer. https:\/\/github.com\/features\/copilot"},{"key":"e_1_3_1_23_1","unstructured":"Harm de Vries. 2023. Go smol or go home. https:\/\/www.harmdevries.com\/post\/model-size-vs-compute-overhead\/."},{"key":"e_1_3_1_24_1","unstructured":"Felipe Hoffa. 2016. GitHub on BigQuery: Analyze All the Open Source Code. https:\/\/cloud.google.com\/blog\/topics\/public-datasets\/github-on-bigquery-analyze-all-the-open-source-code."},{"key":"e_1_3_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3616243"},{"key":"e_1_3_1_26_1","unstructured":"Google. 2023. Generative AI Terms of Service. https:\/\/policies.google.com\/terms\/generative-ai Accessed: August 17 2023."},{"key":"e_1_3_1_27_1","unstructured":"Suriya Gunasekar Yi Zhang Jyoti Aneja Caio C\u00e9sar Teodoro Mendes Allie Del Giorno Sivakanth Gopi Mojan Java-heripi Piero Kauffmann Gustavo de Rosa Olli Saarikivi Adil Salim Shital Shah Harkirat Singh Behl Xin Wang S\u00e9bastien Bubeck Ronen Eldan Adam Tauman Kalai Yin Tat Lee and Yuanzhi Li. 2023. Textbooks Are All You Need. arXiv:2306.11644 [cs.CL]"},{"key":"e_1_3_1_28_1","doi-asserted-by":"publisher","unstructured":"Daya Guo Qihao Zhu Dejian Yang Zhenda Xie Kai Dong Wentao Zhang Guanting Chen Xiao Bi Y. Wu Y. K. Li Fuli Luo Yingfei Xiong and Wenfeng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming \u2013 The Rise of Code Intelligence. https:\/\/doi.org\/10.48550\/arXiv.2401.14196 10.48550\/arXiv.2401.14196 arXiv:2401.14196 [cs]","DOI":"10.48550\/arXiv.2401.14196"},{"key":"e_1_3_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/800233.807050"},{"key":"e_1_3_1_30_1","unstructured":"Jordan Hoffmann Sebastian Borgeaud Arthur Mensch Elena Buchatskaya Trevor Cai Eliza Rutherford Diego de las Casas Lisa Anne Hendricks Johannes Welbl Aidan Clark Tom Hennigan Eric Noland Katherine Millican George van den Driessche Bogdan Damoc Aurelia Guy Simon Osindero Karen Simonyan Erich Elsen Oriol Vinyals Jack William Rae and Laurent Sifre. 2022. An empirical analysis of compute-optimal large language model training. In Advances in Neural Information Processing Systems Alice H. Oh Alekh Agarwal Danielle Belgrave and Kyunghyun Cho (Eds.)."},{"key":"e_1_3_1_31_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i4.25642"},{"key":"e_1_3_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/SCAM59687.2023.00038"},{"key":"e_1_3_1_33_1","unstructured":"Denis Kocetkov Raymond Li Loubna Ben Allal Jia Li Chenghao Mou Carlos Mu\u00f1oz Ferrandis Yacine Jernite Margaret Mitchell Sean Hughes Thomas Wolf Dzmitry Bahdanau Leandro von Werra and Harm de Vries. 2023. The Stack: 3 TB of Permissively Licensed Source Code. In Deep Learning for Code Workshop (DL4C)."},{"key":"e_1_3_1_34_1","doi-asserted-by":"crossref","unstructured":"Woosuk Kwon Zhuohan Li Siyuan Zhuang Ying Sheng Lianmin Zheng Cody Hao Yu Joseph E. Gonzalez Hao Zhang and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. In ACM SIGOPS Symposium on Operating Systems Principles (SOSP).","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_3_1_35_1","unstructured":"Yuhang Lai Chengxi Li Yiming Wang Tianyi Zhang Ruiqi Zhong Luke Zettlemoyer Scott Wen-tau Yih Daniel Fried Sida Wang and Tao Yu. 2023. DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation. In International Conference on Machine Learning (ICML)."},{"key":"e_1_3_1_36_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.577"},{"key":"e_1_3_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00085"},{"key":"e_1_3_1_38_1","unstructured":"Raymond Li Loubna Ben Allal Yangtian Zi Niklas Muennighoff Denis Kocetkov Chenghao Mou Marc Marone Christopher Akiki Jia Li Jenny Chim Qian Liu Evgenii Zheltonozhskii Terry Yue Zhuo Thomas Wang Olivier Dehaene Mishig Davaadorj Joel Lamy-Poirier Joao Monteiro Oleh Shliazhko Nicolas Gontier Nicholas Meade Armel Zebaze Ming-Ho Yee Logesh Kumar Umapathi Jian Zhu Benjamin Lipkin Muhtasham Oblokulov Zhiruo Wang Rudra Murthy Jason Stillerman Siva Sankalp Patel Dmitry Abulkhanov Marco Zocca Manan Dey Zhihan Zhang Nour Fahmy Urvashi Bhattacharyya Wenhao Yu Swayam Singh Sasha Luccioni Paulo Villegas Maxim Kunakov Fedor Zhdanov Manuel Romero Tony Lee Nadav Timor Jennifer Ding Claire Schlesinger Hailey Schoelkopf Jan Ebert Tri Dao Mayank Mishra Alex Gu Jennifer Robinson Carolyn Jane Anderson Brendan Dolan-Gavitt Danish Contractor Siva Reddy Daniel Fried Dzmitry Bahdanau Yacine Jernite Carlos Mu\u00f1oz Ferrandis Sean Hughes Thomas Wolf Arjun Guha Leandro von Werra and Harm de Vries. 2023. StarCoder: May the Source Be with You! Transactions of Machine Learning Research (TMLR) (Dec. 2023)."},{"key":"e_1_3_1_39_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1219032"},{"key":"e_1_3_1_40_1","unstructured":"Anton Lozhkov Raymond Li Loubna Ben Allal Federico Cassano Joel Lamy-Poirier Nouamane Tazi Ao Tang Dmytro Pykhtar Jiawei Liu Yuxiang Wei Tianyang Liu Max Tian Denis Kocetkov Arthur Zucker Younes Belkada Zijian Wang Qian Liu Dmitry Abulkhanov Indraneil Paul Zhuang Li Wen-Ding Li Megan Risdal Jia Li Jian Zhu Terry Yue Zhuo Evgenii Zheltonozhskii Nii Osae Osae Dade Wenhao Yu Lucas Krau\u00df Naman Jain Yixuan Su Xuanli He Manan Dey Edoardo Abati Yekun Chai Niklas Muennighoff Xiangru Tang Muhtasham Oblokulov Christopher Akiki Marc Marone Chenghao Mou Mayank Mishra Alex Gu Binyuan Hui Tri Dao Armel Zebaze Olivier Dehaene Nicolas Patry Canwen Xu Julian McAuley Han Hu Torsten Scholak Sebastien Paquet Jennifer Robinson Carolyn Jane Anderson Nicolas Chapados Mostofa Patwary Nima Tajbakhsh Yacine Jernite Carlos Mu\u00f1oz Ferrandis Lingming Zhang Sean Hughes Thomas Wolf Arjun Guha Leandro von Werra and Harm de Vries. 2024. StarCoder 2 and The Stack v2: The Next Generation. arXiv:2402.19173 [cs.SE]"},{"key":"e_1_3_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-59762-7_2"},{"key":"e_1_3_1_42_1","unstructured":"Ziyang Luo Can Xu Pu Zhao Qingfeng Sun Xiubo Geng Wenxiang Hu Chongyang Tao Jing Ma Qingwei Lin and Daxin Jiang. 2024. WizardCoder: Empowering Code Large Language Models with Evol-Instruct. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_1_43_1","unstructured":"Niklas Muennighoff Qian Liu Armel Zebaze Qinkai Zheng Binyuan Hui Terry Yue Zhuo Swayam Singh Xiangru Tang Leandro von Werra and Shayne Longpre. 2024. OctoPack: Instruction Tuning Code Large Language Models. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_1_44_1","unstructured":"Vijayaraghavan Murali Chandra Maddila Imad Ahmad Michael Bolin Daniel Cheng Negar Ghorbani Renuka Fernandez and Nachiappan Nagappan. 2023. CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Authoring. arXiv:2305.12050 [cs.SE]"},{"key":"e_1_3_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639187"},{"key":"e_1_3_1_46_1","unstructured":"Erik Nijkamp Hiroaki Hayashi Caiming Xiong Silvio Savarese and Yingbo Zhou. 2023. CodeGen2: Lessons for Training LLMs on Programming and Natural Languages. arXiv:2305.02309 [cs.LG]"},{"key":"e_1_3_1_47_1","unstructured":"OpenAI. 2023a. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]"},{"key":"e_1_3_1_48_1","unstructured":"OpenAI. 2023b. Terms of Service. https:\/\/openai.com\/policies\/terms-of-useAccessed: August 17 2023."},{"key":"e_1_3_1_49_1","doi-asserted-by":"publisher","DOI":"10.5555\/3618408.3619516"},{"key":"e_1_3_1_50_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training Language Models to Follow Instructions with Human Feedback. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 35. Curran Associates, Inc."},{"key":"e_1_3_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639226"},{"key":"e_1_3_1_52_1","unstructured":"Tung Phung Jos\u00e9 Pablo Cambronero Sumit Gulwani Tobias Kohn Rupak Majumdar Adish Kumar Singla and Gustavo Soares. 2023. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. ArXiv abs\/2302.04662 (2023)."},{"key":"e_1_3_1_53_1","unstructured":"Pyright. 2023. Static Type Checker for Python. https:\/\/github.com\/Microsoft\/pyright"},{"key":"e_1_3_1_54_1","doi-asserted-by":"crossref","unstructured":"Samyam Rajbhandari Jeff Rasley Olatunji Ruwase and Yuxiong He. 2020. ZeRO: Memory Optimizations toward Training Trillion Parameter Models. In International Conference for High Performance Computing Networking Storage and Analysis (SC).","DOI":"10.1109\/SC41405.2020.00024"},{"key":"e_1_3_1_55_1","unstructured":"Replit. 2023. Replit Code v1.3. https:\/\/huggingface.co\/replit\/replit-code-v1-3b."},{"key":"e_1_3_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581641.3584037"},{"key":"e_1_3_1_57_1","unstructured":"Baptiste Roziere Jie Zhang Francois Charton Mark Harman Gabriel Synnaeve and Guillaume Lample. 2021. Leveraging Automated Unit Tests for Unsupervised Code Translation. In International Conference on Learning Representations."},{"key":"e_1_3_1_58_1","unstructured":"Baptiste Rozi\u00e8re Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat Xiaoqing Ellen Tan Yossi Adi Jingyu Liu Tal Remez J\u00e9r\u00e9my Rapin Artyom Kozhevnikov Ivan Evtimov Joanna Bitton Manish Bhatt Cristian Canton Ferrer Aaron Grattafiori Wenhan Xiong Alexandre D\u00e9fossez Jade Copet Faisal Azhar Hugo Touvron Louis Martin Nicolas Usunier Thomas Scialom and Gabriel Synnaeve. 2023. Code Llama: Open Foundation Models for Code. arXiv:2308.12950 [cs.CL] https:\/\/arxiv.org\/abs\/2308.12950"},{"key":"e_1_3_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2023.3334955"},{"key":"e_1_3_1_60_1","unstructured":"TabNine. 2023. AI Assistant for Software Developers | Tabnine. https:\/\/www.tabnine.com\/."},{"key":"e_1_3_1_61_1","doi-asserted-by":"crossref","unstructured":"Yizhong Wang Yeganeh Kordi Swaroop Mishra Alisa Liu Noah A. Smith Daniel Khashabi and Hannaneh Hajishirzi. 2023. Self-Instruct: Aligning Language Model with Self Generated Instructions. In Annual Meeting of the Association of Computation Linguistics (ACL).","DOI":"10.18653\/v1\/2023.acl-long.754"},{"key":"e_1_3_1_62_1","unstructured":"Yuxiang Wei Zhe Wang Jiawei Liu Yifeng Ding and Lingming Zhang. 2024. Magicoder: Empowering Code Generation with OSS-Instruct. In International Conference on Machine Learning (ICML)."},{"key":"e_1_3_1_63_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639121"},{"key":"e_1_3_1_65_1","unstructured":"Yiqing Xie Atharva Naik Daniel Fried and Carolyn Rose. 2024. Data Augmentation for Code Translation with Comparable Corpora and Multiple References. In Findings of EMNLP."},{"key":"e_1_3_1_66_1","unstructured":"Frank F. Xu Uri Alon Graham Neubig and Vincent J. Hellendoorn. 2022. A Systematic Evaluation of Large Language Models of Code. In Deep Learning for Code Workshop (DL4C)."},{"key":"e_1_3_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599790"},{"key":"e_1_3_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3520312.3534864"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689735","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3689735","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T09:06:11Z","timestamp":1770195971000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689735"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,8]]},"references-count":67,"journal-issue":{"issue":"OOPSLA2","published-print":{"date-parts":[[2024,10,8]]}},"alternative-id":["10.1145\/3689735"],"URL":"https:\/\/doi.org\/10.1145\/3689735","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,8]]},"assertion":[{"value":"2024-04-05","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}