{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:06:47Z","timestamp":1750309607564,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA2","license":[{"start":{"date-parts":[[2023,10,16]],"date-time":"2023-10-16T00:00:00Z","timestamp":1697414400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1918839,CCF-2217064,2030859"],"award-info":[{"award-number":["CCF-1918839,CCF-2217064,2030859"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["HR00112190046"],"award-info":[{"award-number":["HR00112190046"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2023,10,16]]},"abstract":"<jats:p>Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program.<\/jats:p>\n          <jats:p>We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.<\/jats:p>","DOI":"10.1145\/3622856","type":"journal-article","created":{"date-parts":[[2023,10,16]],"date-time":"2023-10-16T15:41:29Z","timestamp":1697470889000},"page":"1648-1676","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8318-0363","authenticated-orcid":false,"given":"Alex","family":"Renda","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2757-9182","authenticated-orcid":false,"given":"Yi","family":"Ding","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6928-0456","authenticated-orcid":false,"given":"Michael","family":"Carbin","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,10,16]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1833349.1778766"},{"key":"e_1_2_2_2_1","volume-title":"International Conference on Learning Representations.","author":"Agarwala Atish","year":"2021","unstructured":"Atish Agarwala , Abhimanyu Das , Brendan Juba , Rina Panigrahy , Vatsal Sharan , Xin Wang , and Qiuyi Zhang . 2021 . One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks . In International Conference on Learning Representations. Atish Agarwala, Abhimanyu Das, Brendan Juba, Rina Panigrahy, Vatsal Sharan, Xin Wang, and Qiuyi Zhang. 2021. One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks. In International Conference on Learning Representations."},{"key":"e_1_2_2_3_1","volume-title":"Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. In International Conference on Machine Learning.","author":"Arora Sanjeev","year":"2019","unstructured":"Sanjeev Arora , Simon Du , Wei Hu , Zhiyuan Li , and Ruosong Wang . 2019 . Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. In International Conference on Machine Learning. Sanjeev Arora, Simon Du, Wei Hu, Zhiyuan Li, and Ruosong Wang. 2019. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. In International Conference on Machine Learning."},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2384616.2384681"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2814270.2814301"},{"key":"e_1_2_2_6_1","volume-title":"KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In USENIX Conference on Operating Systems Design and Implementation.","author":"Cadar Cristian","year":"2008","unstructured":"Cristian Cadar , Daniel Dunbar , and Dawson Engler . 2008 . KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In USENIX Conference on Operating Systems Design and Implementation. Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In USENIX Conference on Operating Systems Design and Implementation."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2544173.2509546"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3182162"},{"key":"e_1_2_2_9_1","volume-title":"Region-Based Active Learning. In International Conference on Artificial Intelligence and Statistics.","author":"Cortes Corinna","year":"2019","unstructured":"Corinna Cortes , Giulia DeSalvo , Claudio Gentile , Mehryar Mohri , and Ningshan Zhang . 2019 . Region-Based Active Learning. In International Conference on Artificial Intelligence and Statistics. Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, and Ningshan Zhang. 2019. Region-Based Active Learning. In International Conference on Artificial Intelligence and Statistics."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/512950.512973"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2535838.2535874"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2589750"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898717761"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.869367"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11957-6_16"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1168857.1168882"},{"key":"e_1_2_2_17_1","article-title":"Design of Advanced Color: Temperature Control System for HDTV Applications","volume":"41","author":"Kang Bongsoon","year":"2002","unstructured":"Bongsoon Kang , Ohak Moon , Changhee Hong , Honam Lee , Bonghwan Cho , and Youngsun Kim . 2002 . Design of Advanced Color: Temperature Control System for HDTV Applications . Journal of the Korean Physical Society , 41 , 6 (2002). Bongsoon Kang, Ohak Moon, Changhee Hong, Honam Lee, Bonghwan Cho, and Youngsun Kim. 2002. Design of Advanced Color: Temperature Control System for HDTV Applications. Journal of the Korean Physical Society, 41, 6 (2002).","journal-title":"Journal of the Korean Physical Society"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/360248.360252"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPS.2019.2948339"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3380446.3430636"},{"key":"e_1_2_2_21_1","unstructured":"David Lettier. 2019. 3D Game Shaders For Beginners. https:\/\/github.com\/lettier\/3d-game-shaders-for-beginners \t\t\t\t  David Lettier. 2019. 3D Game Shaders For Beginners. https:\/\/github.com\/lettier\/3d-game-shaders-for-beginners"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3015465"},{"volume-title":"Towards Automated Construction of Compiler Optimizations","author":"Mendis Charith","key":"e_1_2_2_23_1","unstructured":"Charith Mendis . 2020. Towards Automated Construction of Compiler Optimizations . Massachusetts Institute of Technology . Cambridge, MA. Charith Mendis. 2020. Towards Automated Construction of Compiler Optimizations. Massachusetts Institute of Technology. Cambridge, MA."},{"key":"e_1_2_2_24_1","unstructured":"Charith Mendis Cambridge Yang Yewen Pu Saman Amarasinghe and Michael Carbin. 2019. Compiler Auto-Vectorization with Imitation Learning. In Advances in Neural Information Processing Systems. \t\t\t\t  Charith Mendis Cambridge Yang Yewen Pu Saman Amarasinghe and Michael Carbin. 2019. Compiler Auto-Vectorization with Imitation Learning. In Advances in Neural Information Processing Systems."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2660193.2660231"},{"key":"e_1_2_2_26_1","volume-title":"Andrew Stewart, Goran Fernlund, Anoush Poursartip, and Frank Wood.","author":"Munk Andreas","year":"2019","unstructured":"Andreas Munk , Adam \u015acibior , At\u0131l\u0131m G\u00fcne\u015f Baydin , Andrew Stewart, Goran Fernlund, Anoush Poursartip, and Frank Wood. 2019 . Deep Probabilistic Surrogate Networks for Universal Simulator Approximation . arxiv:1910.11950. Andreas Munk, Adam \u015acibior, At\u0131l\u0131m G\u00fcne\u015f Baydin, Andrew Stewart, Goran Fernlund, Anoush Poursartip, and Frank Wood. 2019. Deep Probabilistic Surrogate Networks for Universal Simulator Approximation. arxiv:1910.11950."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-03811-6"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41524-020-00431-2"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00045"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3486607.3486748"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993518"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3799-8"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095430.1081750"},{"volume-title":"Active Learning Literature Survey","author":"Settles Burr","key":"e_1_2_2_34_1","unstructured":"Burr Settles . 2009. Active Learning Literature Survey . University of Wisconsin-Madison Department of Computer Sciences. Burr Settles. 2009. Active Learning Literature Survey. University of Wisconsin-Madison Department of Computer Sciences."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2019.00052"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-19249-9_33"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1667239.1667243"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procir.2018.03.087"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1002\/9781118162934.ch11"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322996"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1968.1972"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-21852-6_3"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/355586.364791"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/3054.001.0001"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2016.2630270"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3622856","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3622856","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:27Z","timestamp":1750298247000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3622856"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,16]]},"references-count":45,"journal-issue":{"issue":"OOPSLA2","published-print":{"date-parts":[[2023,10,16]]}},"alternative-id":["10.1145\/3622856"],"URL":"https:\/\/doi.org\/10.1145\/3622856","relation":{},"ISSN":["2475-1421"],"issn-type":[{"type":"electronic","value":"2475-1421"}],"subject":[],"published":{"date-parts":[[2023,10,16]]},"assertion":[{"value":"2023-10-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}