{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T02:13:49Z","timestamp":1775873629981,"version":"3.50.1"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA2","license":[{"start":{"date-parts":[[2023,10,16]],"date-time":"2023-10-16T00:00:00Z","timestamp":1697414400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["1918889,1762299"],"award-info":[{"award-number":["1918889,1762299"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2023,10,16]]},"abstract":"<jats:p>Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extraction tasks that involve both a syntactic and semantic component. To address this issue, we introduce semantic regexes, a generalization of regular expressions that facilitates combined syntactic and semantic reasoning about textual data. We also propose a novel learning algorithm that can synthesize semantic regexes from a small number of positive and negative examples. Our proposed learning algorithm uses a combination of neural sketch generation and compositional type-directed synthesis for fast and effective generalization from a small number of examples.  We have implemented these ideas in a new tool called Smore and evaluated it on representative data extraction tasks involving several textual datasets. Our evaluation shows that semantic regexes can better support complex data extraction tasks than standard regular expressions and that our learning algorithm significantly outperforms existing tools, including state-of-the-art neural networks and program synthesis tools.<\/jats:p>","DOI":"10.1145\/3622863","type":"journal-article","created":{"date-parts":[[2023,10,16]],"date-time":"2023-10-16T15:41:29Z","timestamp":1697470889000},"page":"1848-1877","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["Data Extraction via Semantic Regular Expression Synthesis"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4680-5157","authenticated-orcid":false,"given":"Qiaochu","family":"Chen","sequence":"first","affiliation":[{"name":"University of Texas at Austin, Austin, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-2690-6059","authenticated-orcid":false,"given":"Arko","family":"Banerjee","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2080-0443","authenticated-orcid":false,"given":"\u00c7a\u011fatay","family":"Demiralp","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7061-7298","authenticated-orcid":false,"given":"Greg","family":"Durrett","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8006-1230","authenticated-orcid":false,"given":"I\u015f\u0131l","family":"Dillig","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,10,16]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL\u201902 Workshop on Unsupervised Lexical Acquisition. 291\u2013300","author":"Alquezar R.","unstructured":"R. Alquezar and A. Sanfeliu . 1994 . Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL\u201902 Workshop on Unsupervised Lexical Acquisition. 291\u2013300 . R. Alquezar and A. Sanfeliu. 1994. Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL\u201902 Workshop on Unsupervised Lexical Acquisition. 291\u2013300."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1181"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.12"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/0890-5401(87)90052-6"},{"key":"e_1_2_1_5_1","volume-title":"Program Synthesis with Large Language Models. CoRR, abs\/2108.07732","author":"Austin Jacob","year":"2021","unstructured":"Jacob Austin , Augustus Odena , Maxwell I. Nye , Maarten Bosma , Henryk Michalewski , David Dohan , Ellen Jiang , Carrie J. Cai , Michael Terry , Quoc V. Le , and Charles Sutton . 2021. Program Synthesis with Large Language Models. CoRR, abs\/2108.07732 ( 2021 ), arXiv:2108.07732. arxiv:2108.07732 Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs\/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-25803-9_1"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-04083-2_11"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2009.07.002"},{"key":"e_1_2_1_9_1","volume-title":"Advances in Neural Information Processing Systems","author":"Brown Tom","year":"1877","unstructured":"Tom Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared D Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , Sandhini Agarwal , Ariel Herbert-Voss , Gretchen Krueger , Tom Henighan , Rewon Child , Aditya Ramesh , Daniel Ziegler , Jeffrey Wu , Clemens Winter , Chris Hesse , Mark Chen , Eric Sigler , Mateusz Litwin , Scott Gray , Benjamin Chess , Jack Clark , Christopher Berner , Sam McCandlish , Alec Radford , Ilya Sutskever , and Dario Amodei . 2020. Language Models are Few-Shot Learners . In Advances in Neural Information Processing Systems , H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc. , 1877 \u20131901. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877\u20131901."},{"key":"e_1_2_1_10_1","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Pond\u00e9 de Oliveira Pinto Jared Kaplan Harrison Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew N. Carr Jan Leike Joshua Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR abs\/2107.03374 (2021) arXiv:2107.03374. arxiv:2107.03374 \t\t\t\t  Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Pond\u00e9 de Oliveira Pinto Jared Kaplan Harrison Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew N. Carr Jan Leike Joshua Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR abs\/2107.03374 (2021) arXiv:2107.03374. arxiv:2107.03374"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.8144182"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3453483.3454047"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385988"},{"key":"e_1_2_1_14_1","volume-title":"Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=lH1PV42cbF","author":"Cheng Zhoujun","year":"2023","unstructured":"Zhoujun Cheng , Tianbao Xie , Peng Shi , Chengzu Li , Rahul Nadkarni , Yushi Hu , Caiming Xiong , Dragomir Radev , Mari Ostendorf , Luke Zettlemoyer , Noah A. Smith , and Tao Yu . 2023 . Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=lH1PV42cbF Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, and Tao Yu. 2023. Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=lH1PV42cbF"},{"key":"e_1_2_1_15_1","unstructured":"Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra Adam Roberts Paul Barham Hyung Won Chung Charles Sutton Sebastian Gehrmann Parker Schuh Kensen Shi Sasha Tsvyashchenko Joshua Maynez Abhishek Rao Parker Barnes Yi Tay Noam M. Shazeer Vinodkumar Prabhakaran Emily Reif Nan Du Benton C. Hutchinson Reiner Pope James Bradbury Jacob Austin Michael Isard Guy Gur-Ari Pengcheng Yin Toju Duke Anselm Levskaya Sanjay Ghemawat Sunipa Dev Henryk Michalewski Xavier Garc\u00eda Vedant Misra Kevin Robinson Liam Fedus Denny Zhou Daphne Ippolito David Luan Hyeontaek Lim Barret Zoph Alexander Spiridonov Ryan Sepassi David Dohan Shivani Agrawal Mark Omernick Andrew M. Dai Thanumalayan Sankaranarayana Pillai Marie Pellat Aitor Lewkowycz Erica Moreira Rewon Child Oleksandr Polozov Katherine Lee Zongwei Zhou Xuezhi Wang Brennan Saeta Mark D\u00edaz Orhan Firat Michele Catasta Jason Wei Kathleen S. Meier-Hellstern Douglas Eck Jeff Dean Slav Petrov and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. ArXiv abs\/2204.02311 (2022). \t\t\t\t  Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra Adam Roberts Paul Barham Hyung Won Chung Charles Sutton Sebastian Gehrmann Parker Schuh Kensen Shi Sasha Tsvyashchenko Joshua Maynez Abhishek Rao Parker Barnes Yi Tay Noam M. Shazeer Vinodkumar Prabhakaran Emily Reif Nan Du Benton C. Hutchinson Reiner Pope James Bradbury Jacob Austin Michael Isard Guy Gur-Ari Pengcheng Yin Toju Duke Anselm Levskaya Sanjay Ghemawat Sunipa Dev Henryk Michalewski Xavier Garc\u00eda Vedant Misra Kevin Robinson Liam Fedus Denny Zhou Daphne Ippolito David Luan Hyeontaek Lim Barret Zoph Alexander Spiridonov Ryan Sepassi David Dohan Shivani Agrawal Mark Omernick Andrew M. Dai Thanumalayan Sankaranarayana Pillai Marie Pellat Aitor Lewkowycz Erica Moreira Rewon Child Oleksandr Polozov Katherine Lee Zongwei Zhou Xuezhi Wang Brennan Saeta Mark D\u00edaz Orhan Firat Michele Catasta Jason Wei Kathleen S. Meier-Hellstern Douglas Eck Jeff Dean Slav Petrov and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. ArXiv abs\/2204.02311 (2022)."},{"key":"e_1_2_1_16_1","volume-title":"Structured information extraction from complex scientific text with fine-tuned large language models. arXiv, 2212.05238","author":"Dunn Alexander","year":"2022","unstructured":"Alexander Dunn , John Dagdelen , Nicholas Walker , Sanghoon Lee , Andrew S. Rosen , Gerbrand Ceder , Kristin Persson , and Anubhav Jain . 2022. Structured information extraction from complex scientific text with fine-tuned large language models. arXiv, 2212.05238 ( 2022 ). Alexander Dunn, John Dagdelen, Nicholas Walker, Sanghoon Lee, Andrew S. Rosen, Gerbrand Ceder, Kristin Persson, and Anubhav Jain. 2022. Structured information extraction from complex scientific text with fine-tuned large language models. arXiv, 2212.05238 (2022)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192382"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3062341.3062351"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2737977"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350\u2013355","author":"Firoiu Laura","unstructured":"Laura Firoiu , Tim Oates , and Paul R. Cohen . 1998. Learning Regular Languages from Positive Evidence . In Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350\u2013355 . Laura Firoiu, Tim Oates, and Paul R. Cohen. 1998. Learning Regular Languages from Positive Evidence. In Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350\u2013355."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2837614.2837629"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning -","volume":"70","author":"Gaunt Alexander L.","year":"2017","unstructured":"Alexander L. Gaunt , Marc Brockschmidt , Nate Kushman , and Daniel Tarlow . 2017 . Differentiable Programs with Neural Libraries . In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML\u201917). JMLR.org, 1213\u20131222. Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, and Daniel Tarlow. 2017. Differentiable Programs with Neural Libraries. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML\u201917). JMLR.org, 1213\u20131222."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0019-9958(78)90562-4"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523722"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1926385.1926423"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning, Hal Daum\u00e9 III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research","volume":"4506","author":"Huang Jiani","year":"2020","unstructured":"Jiani Huang , Calvin Smith , Osbert Bastani , Rishabh Singh , Aws Albarghouthi , and Mayur Naik . 2020 . Generating Programmatic Referring Expressions via Program Synthesis . In Proceedings of the 37th International Conference on Machine Learning, Hal Daum\u00e9 III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research , Vol. 119). PMLR, 4495\u2013 4506 . https:\/\/proceedings.mlr.press\/v119\/huang20h.html Jiani Huang, Calvin Smith, Osbert Bastani, Rishabh Singh, Aws Albarghouthi, and Mayur Naik. 2020. Generating Programmatic Referring Expressions via Program Synthesis. In Proceedings of the 37th International Conference on Machine Learning, Hal Daum\u00e9 III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research, Vol. 119). PMLR, 4495\u20134506. https:\/\/proceedings.mlr.press\/v119\/huang20h.html"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3386027"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510203"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.747"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/256167.256195"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2594291.2594333"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2993236.2993244"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403153"},{"key":"e_1_2_1_34_1","volume-title":"Lambda-calculus models of programming languages. Ph. D. Dissertation","author":"Morris James Hiram","unstructured":"James Hiram Morris . 1968. Lambda-calculus models of programming languages. Ph. D. Dissertation . Massachusetts Institute of Technology . Cambridge. James Hiram Morris. 1968. Lambda-calculus models of programming languages. Ph. D. Dissertation. Massachusetts Institute of Technology. Cambridge."},{"key":"e_1_2_1_35_1","volume-title":"CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=iaYcJKpY2B_","author":"Nijkamp Erik","year":"2023","unstructured":"Erik Nijkamp , Bo Pang , Hiroaki Hayashi , Lifu Tu , Huan Wang , Yingbo Zhou , Silvio Savarese , and Caiming Xiong . 2023 . CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=iaYcJKpY2B_ Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=iaYcJKpY2B_"},{"key":"e_1_2_1_36_1","volume-title":"https:\/\/openai.com\/blog\/chatgpt Accessed on","author":"Introducing AI.","year":"2023","unstructured":"Open AI. 2022. Introducing Chat GPT. https:\/\/openai.com\/blog\/chatgpt Accessed on March 16, 2023 OpenAI. 2022. Introducing ChatGPT. https:\/\/openai.com\/blog\/chatgpt Accessed on March 16, 2023"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2738007"},{"key":"e_1_2_1_38_1","volume-title":"Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.)","author":"Parekh Rajesh","unstructured":"Rajesh Parekh and Vasant Honavar . 1996. An incremental interactive algorithm for regular grammar inference . In Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.) . Springer Berlin Heidelberg , Berlin, Heidelberg . 238\u2013249. isbn:978-3-540-70678-6 Rajesh Parekh and Vasant Honavar. 1996. An incremental interactive algorithm for regular grammar inference. In Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 238\u2013249. isbn:978-3-540-70678-6"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010822518073"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1037"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1142"},{"key":"e_1_2_1_42_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=KmtVD97J43e","author":"Poesia Gabriel","year":"2022","unstructured":"Gabriel Poesia , Alex Polozov , Vu Le , Ashish Tiwari , Gustavo Soares , Christopher Meek , and Sumit Gulwani . 2022 . Synchromesh: Reliable Code Generation from Pre-trained Language Models . In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=KmtVD97J43e Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=KmtVD97J43e"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2908080.2908093"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2814270.2814310"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485535"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380608"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/2832249.2832359"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC \u201989)","author":"Rivest R. L.","unstructured":"R. L. Rivest and R. E. Schapire . 1989. Inference of Finite Automata Using Homing Sequences . In Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC \u201989) . ACM, 411\u2013420. R. L. Rivest and R. E. Schapire. 1989. Inference of Finite Automata Using Homing Sequences. In Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC \u201989). ACM, 411\u2013420."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3496139"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.396"},{"key":"e_1_2_1_51_1","volume-title":"HOUDINI: Lifelong Learning as Program Synthesis. In Advances in Neural Information Processing Systems","author":"Valkov Lazar","year":"2018","unstructured":"Lazar Valkov , Dipak Chaudhari , Akash Srivastava , Charles Sutton , and Swarat Chaudhuri . 2018 . HOUDINI: Lifelong Learning as Program Synthesis. In Advances in Neural Information Processing Systems , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). 31, Curran Associates, Inc. . https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/edc27f139c3b4e4bb29d1cdbc45663f9-Paper.pdf Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton, and Swarat Chaudhuri. 2018. HOUDINI: Lifelong Learning as Program Synthesis. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). 31, Curran Associates, Inc.. https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/edc27f139c3b4e4bb29d1cdbc45663f9-Paper.pdf"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485477"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183729"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.146"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485489"},{"key":"e_1_2_1_56_1","volume-title":"The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=ZTCxT2t2Ru","author":"Zhou Shuyan","year":"2023","unstructured":"Shuyan Zhou , Uri Alon , Frank F. Xu , Zhengbao Jiang , and Graham Neubig . 2023 . DocPrompting: Generating Code by Retrieving the Docs . In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=ZTCxT2t2Ru Shuyan Zhou, Uri Alon, Frank F. Xu, Zhengbao Jiang, and Graham Neubig. 2023. DocPrompting: Generating Code by Retrieving the Docs. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=ZTCxT2t2Ru"},{"key":"e_1_2_1_57_1","volume-title":"On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex. arXiv, 2301.12868","author":"Zhuo Terry Yue","year":"2023","unstructured":"Terry Yue Zhuo , Zhuang Li , Yujin Huang , Fatemeh Shiri , Weiqing Wang , Gholamreza Haffari , and Yuan-Fang Li. 2023. On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex. arXiv, 2301.12868 ( 2023 ). Terry Yue Zhuo, Zhuang Li, Yujin Huang, Fatemeh Shiri, Weiqing Wang, Gholamreza Haffari, and Yuan-Fang Li. 2023. On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex. arXiv, 2301.12868 (2023)."}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3622863","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3622863","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:27Z","timestamp":1750298247000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3622863"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,16]]},"references-count":57,"journal-issue":{"issue":"OOPSLA2","published-print":{"date-parts":[[2023,10,16]]}},"alternative-id":["10.1145\/3622863"],"URL":"https:\/\/doi.org\/10.1145\/3622863","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,16]]},"assertion":[{"value":"2023-10-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}