{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T10:02:05Z","timestamp":1775815325596,"version":"3.50.1"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>Code generation is a longstanding challenge, aiming to generate a code snippet based on a natural language description. Usually, expensive text-code paired data is essential for training a code generation model. Recently, thanks to the success of pre-training techniques, large language models are trained on large unlabelled code corpora and perform well in generating code. In this paper, we investigate how to leverage an unlabelled code corpus to train a model for library-oriented code generation. Since it is a common practice for programmers to reuse third-party libraries, in which case the text-code paired data are harder to obtain due to the huge number of libraries. We observe that library-oriented code snippets are more likely to share similar code sketches. Hence, we present CERT with two steps: a sketcher generates the sketch, then a generator fills the details in the sketch. Both the sketcher and generator are continually pre-trained upon a base model using unlabelled data. Also, we carefully craft two benchmarks to evaluate library-oriented code generation named PandasEval and NumpyEval. Experimental results have shown the impressive performance of CERT. For example, it surpasses the base model by an absolute 15.67% improvement in terms of pass@1 on PandasEval. Our work is available at https:\/\/github.com\/microsoft\/PyCodeGPT.<\/jats:p>","DOI":"10.24963\/ijcai.2022\/329","type":"proceedings-article","created":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T02:55:56Z","timestamp":1657940156000},"page":"2369-2375","source":"Crossref","is-referenced-by-count":47,"title":["CERT: Continual Pre-training on Sketches for Library-oriented Code Generation"],"prefix":"10.24963","author":[{"given":"Daoguang","family":"Zan","sequence":"first","affiliation":[{"name":"Cooperative Innovation Center, Institute of Software, Chinese Academy of Sciences"},{"name":"University of Chinese Academy of Sciences"}]},{"given":"Bei","family":"Chen","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia"}]},{"given":"Dejian","family":"Yang","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia"}]},{"given":"Zeqi","family":"Lin","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia"}]},{"given":"Minsu","family":"Kim","sequence":"additional","affiliation":[{"name":"Korea University"}]},{"given":"Bei","family":"Guan","sequence":"additional","affiliation":[{"name":"Integrative Innovation Center, Institute of Software, Chinese Academy of Sciences"},{"name":"University of Chinese Academy of Sciences"}]},{"given":"Yongji","family":"Wang","sequence":"additional","affiliation":[{"name":"Integrative Innovation Center, Institute of Software, Chinese Academy of Sciences"},{"name":"University of Chinese Academy of Sciences"},{"name":"State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences"}]},{"given":"Weizhu","family":"Chen","sequence":"additional","affiliation":[{"name":"Microsoft Azure AI"}]},{"given":"Jian-Guang","family":"Lou","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia"}]}],"member":"10584","event":{"name":"Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}","theme":"Artificial Intelligence","location":"Vienna, Austria","acronym":"IJCAI-2022","number":"31","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2022,7,23]]},"end":{"date-parts":[[2022,7,29]]}},"container-title":["Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2022,7,18]],"date-time":"2022-07-18T11:09:06Z","timestamp":1658142546000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2022\/329"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2022\/329","relation":{},"subject":[],"published":{"date-parts":[[2022,7]]}}}