{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T07:11:18Z","timestamp":1777360278427,"version":"3.51.4"},"reference-count":59,"publisher":"MIT Press","issue":"4","content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,12,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases, allowing for more individuals to be explored with the same number of program executions. However, sampling randomly can exclude important cases from the down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while still benefiting from reduced per-evaluation costs.<\/jats:p>","DOI":"10.1162\/evco_a_00346","type":"journal-article","created":{"date-parts":[[2024,1,25]],"date-time":"2024-01-25T20:25:00Z","timestamp":1706214300000},"page":"307-337","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":13,"title":["Informed Down-Sampled Lexicase Selection: Identifying Productive Training Cases for Efficient Problem Solving"],"prefix":"10.1162","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5789-0492","authenticated-orcid":true,"given":"Ryan","family":"Boldi","sequence":"first","affiliation":[{"name":"University of Massachusetts, Amherst, MA 01003, USA rbahlousbold@umass.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8209-1465","authenticated-orcid":true,"given":"Martin","family":"Briesch","sequence":"additional","affiliation":[{"name":"Johannes Gutenberg University, Mainz, 55128, Germany briesch@uni-mainz.de"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8873-7143","authenticated-orcid":true,"given":"Dominik","family":"Sobania","sequence":"additional","affiliation":[{"name":"Johannes Gutenberg University, Mainz, 55128, Germany dsobania@uni-mainz.de"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0994-2718","authenticated-orcid":true,"given":"Alexander","family":"Lalejini","sequence":"additional","affiliation":[{"name":"Grand Valley State University, Allendale, MI 49401, USA lalejina@gvsu.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2330-6809","authenticated-orcid":true,"given":"Thomas","family":"Helmuth","sequence":"additional","affiliation":[{"name":"Hamilton College, Clinton, NY, 13323, USA thelmuth@hamilton.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3376-427X","authenticated-orcid":true,"given":"Franz","family":"Rothlauf","sequence":"additional","affiliation":[{"name":"Johannes Gutenberg University, Mainz, 55128, Germany rothlauf@uni-mainz.de"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2924-1732","authenticated-orcid":true,"given":"Charles","family":"Ofria","sequence":"additional","affiliation":[{"name":"Michigan State University, East Lansing, MI 48824, USA ofria@msu.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5299-4797","authenticated-orcid":true,"given":"Lee","family":"Spector","sequence":"additional","affiliation":[{"name":"Amherst College, Amherst, MA 01002, USA lspector@amherst.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2024,12,2]]},"reference":[{"key":"2024120202545252200_B1","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1145\/3321707.3321828","article-title":"Lexicase selection in learning classifier systems","author":"Aenugu","year":"2019","journal-title":"Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)"},{"key":"2024120202545252200_B2","article-title":"Practical coreset constructions for machine learning","author":"Bachem","year":"2017"},{"key":"2024120202545252200_B3","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1145\/3583133.3590713","article-title":"The problem solving benefits of down-sampling vary by selection scheme","author":"Boldi","year":"2023","journal-title":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation"},{"key":"2024120202545252200_B4","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.8185133","article-title":"ryanboldi\/Informed-Down-Sampled-Lexicase: Informed Down-Sampling Experimentation Code (GitHub repository)","author":"Boldi","year":"2023"},{"key":"2024120202545252200_B5","article-title":"The environmental discontinuity hypothesis for down-sampled lexicase selection","author":"Boldi","year":"2022"},{"key":"2024120202545252200_B6","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1145\/3583133.3590751","article-title":"A static analysis of informed down-samples","author":"Boldi","year":"2023","journal-title":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation"},{"key":"2024120202545252200_B7","article-title":"Genetic algorithms for function optimization","author":"Brindle","year":"1980"},{"key":"2024120202545252200_B8","first-page":"1952","article-title":"Online continual learning from imbalanced data","volume":"119","author":"Chrysakis","year":"2020","journal-title":"Proceedings of the 37th International Conference on Machine Learning"},{"issue":"2","key":"2024120202545252200_B9","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1109\/4235.996017","article-title":"A fast and elitist multiobjective genetic algorithm: NSGA-II","volume":"6","author":"Deb","year":"2002","journal-title":"IEEE Transactions on Evolutionary Computation"},{"key":"2024120202545252200_B10","doi-asserted-by":"crossref","DOI":"10.1145\/3520304.3534026","article-title":"Lexicase selection at scale","author":"Ding","year":"2022","journal-title":"Genetic and Evolutionary Computation Conference Companion"},{"key":"2024120202545252200_B11","article-title":"Optimizing neural networks with gradient lexicase selection","author":"Ding","year":"2021","journal-title":"International Conference on Learning Representations"},{"key":"2024120202545252200_B12","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1145\/3205651.3205780","article-title":"Ecological theory provides insights about evolutionary computation","author":"Dolson","year":"2018","journal-title":"Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO)"},{"key":"2024120202545252200_B13","first-page":"5060","article-title":"Exploring position independent initialisation in grammatical evolution","author":"Fagan","year":"2016","journal-title":"IEEE Congress on Evolutionary Computation"},{"key":"2024120202545252200_B14","doi-asserted-by":"publisher","first-page":"1194","DOI":"10.1145\/3067695.3082469","article-title":"PonyGE2: Grammatical evolution in Python","author":"Fenton","year":"2017","journal-title":"Proceedings of the Genetic and Evolutionary Computation Conference Companion"},{"key":"2024120202545252200_B15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-030-39958-0_1","article-title":"Characterizing the effects of random subsampling on lexicase selection","volume-title":"Genetic programming theory and practice XVII","author":"Ferguson","year":"2020"},{"key":"2024120202545252200_B16","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1007\/978-3-319-55696-3_17","article-title":"A grammar design pattern for arbitrary program synthesis problems in genetic programming","author":"Forstenlechner","year":"2017","journal-title":"European Conference on Genetic Programming"},{"key":"2024120202545252200_B17","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1007\/978-3-319-30668-1_13","article-title":"Grammar design for derivation tree based genetic programming systems","author":"Forstenlechner","year":"2016","journal-title":"European Conference on Genetic Programming"},{"key":"2024120202545252200_B18","first-page":"171","article-title":"An ecology-based evolutionary algorithm to evolve solutions to complex problems","volume-title":"Artificial Life 13","author":"Goings","year":"2012"},{"key":"2024120202545252200_B19","first-page":"237","article-title":"Benchmarking parent selection for program synthesis by genetic programming","author":"Helmuth","year":"2020","journal-title":"Proceedings of the Genetic and Evolutionary Computation Conference Companion"},{"key":"2024120202545252200_B20","doi-asserted-by":"publisher","DOI":"10.1145\/3449639.3459285","article-title":"PSB2: The second program synthesis benchmark suite","author":"Helmuth","year":"2021","journal-title":"Genetic and Evolutionary Computation Conference (GECCO)"},{"issue":"3","key":"2024120202545252200_B21","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1007\/s10710-022-09434-y","article-title":"Applying genetic programming to PSB2: The next generation program synthesis benchmark suite","volume":"23","author":"Helmuth","year":"2022","journal-title":"Genetic Programming and Evolvable Machines"},{"key":"2024120202545252200_B22","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1007\/978-3-031-14721-0_34","article-title":"Population diversity leads to short running times of lexicase selection","author":"Helmuth","year":"2022","journal-title":"Parallel Problem Solving from Nature"},{"key":"2024120202545252200_B23","first-page":"983","article-title":"Effects of lexicase and tournament selection on diversity recovery and maintenance","author":"Helmuth","year":"2016","journal-title":"Proceedings of the Conference on Genetic and Evolutionary Computation Companion (GECCO)"},{"key":"2024120202545252200_B24","doi-asserted-by":"crossref","first-page":"1127","DOI":"10.1145\/3205455.3205603","article-title":"Program synthesis using uniform mutation by addition and deletion","author":"Helmuth","year":"2018","journal-title":"Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)"},{"issue":"3","key":"2024120202545252200_B25","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1007\/s10710-020-09377-2","article-title":"On the importance of specialists for lexicase selection","volume":"21","author":"Helmuth","year":"2020","journal-title":"Genetic Programming and Evolvable Machines"},{"key":"2024120202545252200_B26","first-page":"1039","article-title":"General program synthesis benchmark suite","author":"Helmuth","year":"2015","journal-title":"Proceedings of the Genetic and Evolutionary Computation Conference"},{"key":"2024120202545252200_B27","first-page":"341","article-title":"Explaining and exploiting the advantages of down-sampled lexicase selection","author":"Helmuth","year":"2020","journal-title":"Artificial Life Conference Proceedings"},{"key":"2024120202545252200_B28","first-page":"1","article-title":"Problem-solving benefits of down-sampled lexicase selection","author":"Helmuth","year":"2021","journal-title":"Artificial Life"},{"issue":"5","key":"2024120202545252200_B29","doi-asserted-by":"publisher","first-page":"630","DOI":"10.1109\/TEVC.2014.2362729","article-title":"Solving uncompromising problems with lexicase selection","volume":"19","author":"Helmuth","year":"2015","journal-title":"IEEE Transactions on Evolutionary Computation"},{"key":"2024120202545252200_B30","doi-asserted-by":"crossref","first-page":"2028","DOI":"10.1145\/3319619.3326900","article-title":"Random subsampling improves performance in lexicase selection","author":"Hernandez","year":"2019","journal-title":"Proceedings of the Genetic and Evolutionary Computation Conference Companion"},{"key":"2024120202545252200_B31","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1007\/978-981-16-8113-4_5","article-title":"An exploration of exploration: Measuring the ability of lexicase selection to find obscure pathways to optimality","volume-title":"Genetic programming theory and practice XVIII","author":"Hernandez","year":"2022"},{"key":"2024120202545252200_B32","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1287\/moor.10.2.180","article-title":"A best possible heuristic for the k-center problem","volume":"10","author":"Hochbaum","year":"1985","journal-title":"Mathematics of Operations Research"},{"key":"2024120202545252200_B33","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/1090.001.0001","volume-title":"Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control and artificial intelligence","author":"Holland","year":"1992"},{"key":"2024120202545252200_B34","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/ICEC.1994.350037","article-title":"A niched Pareto genetic algorithm for multiobjective optimization","author":"Horn","year":"1994","journal-title":"Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence"},{"key":"2024120202545252200_B35","article-title":"Introduction to coresets: Accurate Coresets","author":"Jubran","year":"2019"},{"key":"2024120202545252200_B36","first-page":"169","volume-title":"Behavioral program synthesis: Insights and prospects","author":"Krawiec","year":"2016"},{"key":"2024120202545252200_B37","first-page":"741","article-title":"Epsilon-lexicase selection for regression","author":"La Cava","year":"2016","journal-title":"Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)"},{"key":"2024120202545252200_B38","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.79665","article-title":"Artificial selection methods from evolutionary computing show promise for directed evolution of microbes","volume":"11","author":"Lalejini","year":"2022","journal-title":"eLife"},{"key":"2024120202545252200_B39","article-title":"Online batch selection for faster training of neural networks","author":"Loshchilov","year":"2015","journal-title":"arXiv"},{"key":"2024120202545252200_B40","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/978-3-030-04735-1_7","article-title":"Lexicase selection beyond genetic programming","volume-title":"Genetic programming theory and practice XVI","author":"Metevier","year":"2019"},{"key":"2024120202545252200_B41","first-page":"290","article-title":"Lexicase selection outperforms previous strategies for incremental evolution of virtual creature controllers","author":"Moore","year":"2017","journal-title":"Proceedings of the Fourteenth European Conference on Artificial Life"},{"key":"2024120202545252200_B42","first-page":"20596","article-title":"Deep learning on a data diet: Finding important examples early in training","volume":"34","author":"Paul","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024120202545252200_B43","article-title":"An overview of gradient descent optimization algorithms","author":"Ruder","year":"2017"},{"key":"2024120202545252200_B44","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1007\/BFb0055930","article-title":"Grammatical evolution: Evolving programs for an arbitrary language","author":"Ryan","year":"1998","journal-title":"European Conference on Genetic Programming"},{"key":"2024120202545252200_B45","article-title":"Co-evolution of fitness maximizers and fitness predictors","author":"Schmidt","year":"2005","journal-title":"Late breaking paper at Genetic and Evolutionary Computation Conference (GECCO)"},{"key":"2024120202545252200_B46","doi-asserted-by":"publisher","first-page":"736","DOI":"10.1109\/TEVC.2008.919006","article-title":"Coevolution of fitness predictors","volume":"12","author":"Schmidt","year":"2008","journal-title":"IEEE Transactions on Evolutionary Computation"},{"key":"2024120202545252200_B47","first-page":"182","article-title":"Coevolution in Cartesian genetic programming","author":"\u0160ikulov\u00e1","year":"2012","journal-title":"Genetic Programming"},{"key":"2024120202545252200_B48","first-page":"153","article-title":"Population diversity in an immune system model: Implications for genetic search","volume-title":"Foundations of genetic algorithms","author":"Smith","year":"1993"},{"key":"2024120202545252200_B49","doi-asserted-by":"crossref","first-page":"1019","DOI":"10.1145\/3512290.3528700","article-title":"Choose your programming copilot: A comparison of the program synthesis performance of GitHub Copilot and genetic programming","author":"Sobania","year":"2022","journal-title":"Proceedings of the Genetic and Evolutionary Computation Conference"},{"key":"2024120202545252200_B50","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/978-3-030-44094-7_14","article-title":"Challenges of program synthesis with grammatical evolution","author":"Sobania","year":"2020","journal-title":"European Conference on Genetic Programming (Part of EvoStar)"},{"key":"2024120202545252200_B51","first-page":"118","article-title":"Program synthesis with genetic programming: The influence of batch sizes","author":"Sobania","year":"2022","journal-title":"Proceedings of the 25th European Conference on Genetic Programming, Held as Part of EvoStar 2022"},{"issue":"1","key":"2024120202545252200_B52","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1109\/TEVC.2022.3162324","article-title":"A comprehensive survey on program synthesis with evolutionary algorithms","volume":"27","author":"Sobania","year":"2022","journal-title":"IEEE Transactions on Evolutionary Computation"},{"key":"2024120202545252200_B53","first-page":"401","article-title":"Assessment of problem modality by differential performance of lexicase selection in genetic programming: A preliminary report","author":"Spector","year":"2012","journal-title":"Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computations (GECCO)"},{"key":"2024120202545252200_B54","author":"Spector","year":"2004","journal-title":"Push 3.0 programming language description"},{"issue":"1","key":"2024120202545252200_B55","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1023\/A:1014538503543","article-title":"Genetic programming and autoconstructive evolution with the Push programming language","volume":"3","author":"Spector","year":"2002","journal-title":"Genetic Programming and Evolvable Machines"},{"key":"2024120202545252200_B56","first-page":"89","article-title":"Lexicase selection with weighted shuffle","volume-title":"Genetic programming theory and practice XV","author":"Troise","year":"2017"},{"issue":"2","key":"2024120202545252200_B57","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1007\/s10710-013-9210-0","article-title":"A survey of semantic methods in genetic programming","volume":"15","author":"Vanneschi","year":"2014","journal-title":"Genetic Programming and Evolvable Machines"},{"key":"2024120202545252200_B58","first-page":"33","article-title":"Grammatically-based genetic programming","volume-title":"Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications","author":"Whigham","year":"1995"},{"issue":"11","key":"2024120202545252200_B59","doi-asserted-by":"publisher","first-page":"2059","DOI":"10.14778\/3476249.3476262","article-title":"Doing more with less: Characterizing dataset downsampling for AutoML","volume":"14","author":"Zogaj","year":"2021","journal-title":"Proceedings of the VLDB Endowment"}],"container-title":["Evolutionary Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/evco\/article-pdf\/32\/4\/307\/2477839\/evco_a_00346.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/evco\/article-pdf\/32\/4\/307\/2477839\/evco_a_00346.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,2]],"date-time":"2024-12-02T02:55:14Z","timestamp":1733108114000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/evco\/article\/32\/4\/307\/119216\/Informed-Down-Sampled-Lexicase-Selection"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":59,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2024,12,2]]},"published-print":{"date-parts":[[2024,12,2]]}},"URL":"https:\/\/doi.org\/10.1162\/evco_a_00346","relation":{},"ISSN":["1530-9304"],"issn-type":[{"value":"1530-9304","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}