{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T11:09:26Z","timestamp":1740136166530,"version":"3.37.3"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,1,31]],"date-time":"2021-01-31T00:00:00Z","timestamp":1612051200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,1,31]],"date-time":"2021-01-31T00:00:00Z","timestamp":1612051200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Projekt DEAL"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Comput"],"published-print":{"date-parts":[[2022,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The initial population in genetic programming (GP) should form a representative sample of all possible solutions (the search space). While large populations accurately approximate the distribution of possible solutions, small populations tend to incorporate a sampling error. This paper analyzes how the size of a GP population affects the sampling error and contributes to answering the question of how to size initial GP populations. First, we present a probabilistic model of the expected number of subtrees for GP populations initialized with full, grow, or ramped half-and-half. Second, based on our frequency model, we present a model that estimates the sampling error for a given GP population size. We validate our models empirically and show that, compared to smaller population sizes, our recommended population sizes largely reduce the sampling error of measured fitness values. Increasing the population sizes even more, however, does not considerably reduce the sampling error of fitness values. Last, we recommend population sizes for some widely used benchmark problem instances that result in a low sampling error. A low sampling error at initialization is necessary (but not sufficient) for a reliable search since lowering the sampling error means that the overall random variations in a random sample are reduced. Our results indicate that sampling error is a severe problem for GP, making large initial population sizes necessary to obtain a low sampling error. Our model allows practitioners of GP to determine a minimum initial population size so that the sampling error is lower than a threshold, given a confidence level.<\/jats:p>","DOI":"10.1007\/s11047-020-09828-w","type":"journal-article","created":{"date-parts":[[2021,1,31]],"date-time":"2021-01-31T04:29:26Z","timestamp":1612067366000},"page":"173-186","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["On sampling error in genetic programming"],"prefix":"10.1007","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8629-0285","authenticated-orcid":false,"given":"Dirk","family":"Schweim","sequence":"first","affiliation":[]},{"given":"David","family":"Wittenberg","sequence":"additional","affiliation":[]},{"given":"Franz","family":"Rothlauf","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,1,31]]},"reference":[{"key":"9828_CR1","doi-asserted-by":"crossref","unstructured":"Burlacu B, Kommenda M, Affenzeller M (2015) Building blocks identification based on subtree sample counts for genetic programming. In: Proceedings of the 2015 Asia-Pacific conference on computer aided system engineering, IEEE Computer Society, APCASE \u201915, pp 152\u2013157","DOI":"10.1109\/APCASE.2015.34"},{"key":"9828_CR2","doi-asserted-by":"publisher","first-page":"432","DOI":"10.1007\/978-3-319-74718-7_52","volume-title":"Computer aided systems theory\u2014EUROCAST 2017","author":"B Burlacu","year":"2018","unstructured":"Burlacu B, Affenzeller M, Kommenda M, Kronberger G, Winkler S (2018a) Analysis of schema frequencies in genetic programming. In: Moreno-D\u00edaz R, Pichler F, Quesada-Arencibia A (eds) Computer aided systems theory\u2014EUROCAST 2017. Springer, Cham, pp 432\u2013438"},{"key":"9828_CR3","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1007\/978-3-319-90512-9_2","volume-title":"Genetic programming theory and practice XV","author":"B Burlacu","year":"2018","unstructured":"Burlacu B, Affenzeller M, Kommenda M, Kronberger G, Winkler S (2018b) Schema analysis in tree-based genetic programming. In: Banzhaf W, Olson RS, Tozier W, Riolo R (eds) Genetic programming theory and practice XV. Springer, Cham, pp 17\u201337"},{"key":"9828_CR4","volume-title":"Sampling techniques","author":"WG Cochran","year":"1977","unstructured":"Cochran WG (1977) Sampling techniques, 3rd edn. Wiley, New York","edition":"3"},{"key":"9828_CR5","unstructured":"De Jong KA (1975) An analysis of the behavior of a class of genetic adaptive systems. Doctoral dissertation, University of Michigan, Ann Arbor, MI"},{"key":"9828_CR6","first-page":"2171","volume":"13","author":"FA Fortin","year":"2012","unstructured":"Fortin FA, De Rainville FM, Gardner MA, Parizeau M, Gagn\u00e9 C (2012) DEAP: evolutionary algorithms made easy. J Mach Learn Res 13:2171\u20132175","journal-title":"J Mach Learn Res"},{"key":"9828_CR7","volume-title":"Genetic algorithms in search, optimization, and machine learning","author":"DE Goldberg","year":"1989","unstructured":"Goldberg DE (1989a) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley Publishing Company Inc, Boston"},{"key":"9828_CR8","first-page":"70","volume-title":"Proceedings of the 3rd international conference on genetic algorithms","author":"DE Goldberg","year":"1989","unstructured":"Goldberg DE (1989b) Sizing populations for serial and parallel genetic algorithms. In: Schaffer J (ed) Proceedings of the 3rd international conference on genetic algorithms. Morgan Kaufmann Publishers Inc., San Francisco, pp 70\u201379"},{"key":"9828_CR9","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3643-4","volume-title":"The design of innovation: lessons from and for competent genetic algorithms, genetic algorithms and evolutionary computation","author":"DE Goldberg","year":"2002","unstructured":"Goldberg DE (2002) The design of innovation: lessons from and for competent genetic algorithms, genetic algorithms and evolutionary computation, vol 7. Springer, Boston. https:\/\/doi.org\/10.1007\/978-1-4757-3643-4"},{"issue":"3","key":"9828_CR10","first-page":"265","volume":"5","author":"DE Goldberg","year":"1991","unstructured":"Goldberg DE, Rudnick M (1991) Genetic algorithms and the variance of fitness. Complex Syst 5(3):265\u2013278","journal-title":"Complex Syst"},{"key":"9828_CR11","unstructured":"Goldberg DE, Segrest P (1987) Finite Markov chain analysis of genetic algorithms. In: Proceedings of the second international conference on genetic algorithms and their application. L. Erlbaum Associates Inc., Hillsdale, pp 1\u20138. http:\/\/dl.acm.org\/citation.cfm?id=42512.42513"},{"issue":"4","key":"9828_CR12","first-page":"333","volume":"6","author":"DE Goldberg","year":"1992","unstructured":"Goldberg DE, Deb K, Clark JH (1992) Genetic algorithms, noise, and the sizing of populations. Complex Syst 6(4):333\u2013362","journal-title":"Complex Syst"},{"key":"9828_CR13","first-page":"336","volume-title":"Proceedings of the genetic and evolutionary computation conference 2001","author":"DE Goldberg","year":"2001","unstructured":"Goldberg DE, Sastry K, Latoza T (2001) On the supply of building blocks. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt HM, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the genetic and evolutionary computation conference 2001. Morgan Kaufmann Publishers, San Francisco, pp 336\u2013342"},{"issue":"3","key":"9828_CR14","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1162\/evco.1999.7.3.231","volume":"7","author":"G Harik","year":"1999","unstructured":"Harik G, Cant\u00fa-Paz E, Goldberg DE, Miller BL (1999) The Gambler\u2019s ruin problem, genetic algorithms, and the sizing of populations. Evol Comput 7(3):231\u2013253","journal-title":"Evol Comput"},{"key":"9828_CR15","doi-asserted-by":"crossref","unstructured":"Hemberg E, Veeramachaneni K, McDermott J, Berzan C, O\u2019Reilly UM (2012) An investigation of local patterns for estimation of distribution genetic programming. In: Proceedings of the 14th annual conference on genetic and evolutionary computation (GECCO \u201912). ACM, New York, pp 767\u2013774","DOI":"10.1145\/2330163.2330270"},{"issue":"2","key":"9828_CR16","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1137\/0202009","volume":"2","author":"JH Holland","year":"1973","unstructured":"Holland JH (1973) Genetic algorithms and the optimal allocation of trials. SIAM J Comput 2(2):88\u2013105","journal-title":"SIAM J Comput"},{"key":"9828_CR17","volume-title":"Adaptation in natural and artificial systems","author":"JH Holland","year":"1975","unstructured":"Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor"},{"key":"9828_CR18","doi-asserted-by":"crossref","unstructured":"Hu T, Banzhaf W (2009) The role of population size in rate of evolution in genetic programming. In: Vanneschi L, Gustafson S, Moraglio A, De Falco I, Ebner M (eds) Proceedings of the 12th European conference on genetic programming (EuroGP 2009), LNCS, vol 5481. Springer, Berlin, pp 85\u201396","DOI":"10.1007\/978-3-642-01181-8_8"},{"key":"9828_CR19","doi-asserted-by":"crossref","unstructured":"Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: European conference on genetic programming. Springer, Berlin, pp 70\u201382","DOI":"10.1007\/3-540-36599-0_7"},{"issue":"2","key":"9828_CR20","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1007\/s10710-013-9205-x","volume":"15","author":"K Kim","year":"2014","unstructured":"Kim K, Shan Y, Nguyen XH, McKay RIB (2014) Probabilistic model building in genetic programming: a critical review. Genet Program Evol Mach 15(2):115\u2013167","journal-title":"Genet Program Evol Mach"},{"key":"9828_CR21","volume-title":"Genetic programming: on the programming of computers by means of natural selection","author":"JR Koza","year":"1992","unstructured":"Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge"},{"key":"9828_CR22","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4614-5897-5","volume-title":"Statistics for business and financial economics","author":"CF Lee","year":"2013","unstructured":"Lee CF, Lee JC, Lee AC (2013) Statistics for business and financial economics, 3rd edn. Springer, New York","edition":"3"},{"issue":"3","key":"9828_CR23","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1109\/4235.873237","volume":"4","author":"S Luke","year":"2000","unstructured":"Luke S (2000) Two fast tree-creation algorithms for genetic programming. IEEE Trans Evol Comput 4(3):274\u2013283","journal-title":"IEEE Trans Evol Comput"},{"key":"9828_CR24","doi-asserted-by":"crossref","unstructured":"McDermott J, White D, Luke S, Manzoni L, Castelli M, Vanneschi L, Ja\u015bkowski W, Krawiec K, Harper R, De Jong K, O\u2019Reilly UM (2012) Genetic programming needs better benchmarks. In: GECCO\u201912\u2014proceedings of the 14th international conference on genetic and evolutionary computation, pp 791\u2013798","DOI":"10.1145\/2330163.2330273"},{"key":"9828_CR25","first-page":"73","volume-title":"Foundations of genetic algorithms","author":"UM O\u2019Reilly","year":"1994","unstructured":"O\u2019Reilly UM, Oppacher F (1994) The troubling aspects of a building block hypothesis for genetic programming. In: Whitley LD (ed) Foundations of genetic algorithms, vol 3. Morgan Kaufmann, Estes Park, pp 73\u201388"},{"issue":"4","key":"9828_CR26","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1162\/evco.1997.5.4.401","volume":"5","author":"L Pagie","year":"1997","unstructured":"Pagie L, Hogeweg P (1997) Evolutionary consequences of coevolving targets. Evol Comput 5(4):401\u2013418","journal-title":"Evol Comput"},{"issue":"2","key":"9828_CR27","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1023\/A:1011552313821","volume":"2","author":"R Poli","year":"2001","unstructured":"Poli R (2001) Exact schema theory for genetic programming and variable-length genetic algorithms with one-point crossover. Genet Program Evol Mach 2(2):123\u2013163","journal-title":"Genet Program Evol Mach"},{"issue":"3","key":"9828_CR28","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1162\/evco.1998.6.3.231","volume":"6","author":"R Poli","year":"1998","unstructured":"Poli R, Langdon WB (1998) Schema theory for genetic programming with one-point crossover and point mutation. Evol Comput 6(3):231\u2013252","journal-title":"Evol Comput"},{"issue":"2","key":"9828_CR29","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1162\/106365603766646825","volume":"11","author":"R Poli","year":"2003","unstructured":"Poli R, McPhee NF (2003) General schema theory for genetic programming with subtree-swapping crossover: part II. Evol Comput 11(2):169\u2013206","journal-title":"Evol Comput"},{"key":"9828_CR30","unstructured":"Poli R, Langdon WB, McPhee NF (2008) A field guide to genetic programming. Lulu Enterprises, http:\/\/www.gp-field-guide.org.uk"},{"key":"9828_CR31","unstructured":"Reeves CR (1993) Using genetic algorithms with small populations. In: Proceedings of the 5th international conference on genetic algorithms. Morgan Kaufmann Publishers Inc., San Francisco, pp 92\u201399"},{"key":"9828_CR32","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-72962-4","volume-title":"Design of modern heuristics: principles and application. Natural computing series","author":"F Rothlauf","year":"2011","unstructured":"Rothlauf F (2011) Design of modern heuristics: principles and application. Natural computing series. Springer, Heidelberg"},{"key":"9828_CR33","volume-title":"Model assisted survey sampling. Springer series in statistics","author":"CE S\u00e4rndal","year":"1992","unstructured":"S\u00e4rndal CE, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer series in statistics. Springer, New York"},{"key":"9828_CR34","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1007\/978-1-4419-8983-3_9","volume-title":"Genetic programming theory and practice","author":"K Sastry","year":"2003","unstructured":"Sastry K, O\u2019Reilly UM, Goldberg DE, Hill D (2003) Building-block supply in genetic programming. Genetic programming theory and practice. Springer, Boston, pp 137\u2013154"},{"key":"9828_CR35","doi-asserted-by":"publisher","unstructured":"Sastry K, O\u2019Reilly UM, Goldberg DE (2005) Population sizing for genetic programming based on decision-making. In: Genetic programming theory and practice II. Springer, New York, pp 49\u201365. https:\/\/doi.org\/10.1007\/0-387-23254-0_4","DOI":"10.1007\/0-387-23254-0_4"},{"key":"9828_CR36","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1007\/978-3-540-34954-9_6","volume":"160","author":"Y Shan","year":"2006","unstructured":"Shan Y, McKay RIB, Essam D, Abbass H (2006) A survey of probabilistic model building genetic programming. Scal Optim Probab Model 160:121\u2013160. https:\/\/doi.org\/10.1007\/978-3-540-34954-9_6","journal-title":"Scal Optim Probab Model"},{"issue":"2","key":"9828_CR37","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1007\/s10710-010-9121-2","volume":"12","author":"NQ Uy","year":"2011","unstructured":"Uy NQ, Hoai NX, O\u2019Neill M, McKay RI, Galv\u00e1n-L\u00f3pez E (2011) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evol Mach 12(2):91\u2013119. https:\/\/doi.org\/10.1007\/s10710-010-9121-2","journal-title":"Genet Program Evol Mach"},{"key":"9828_CR38","unstructured":"Walsh P, Ryan C (1996) Paragen: a novel technique for the autoparallelisation of sequential programs using GP. In: Proceedings of the 1st annual conference on genetic programming. MIT Press, Cambridge, pp 406\u2013409"},{"key":"9828_CR39","doi-asserted-by":"crossref","unstructured":"Whigham PA (1995) A schema theorem for context-free grammars. In: IEEE conference on evolutionary computation, vol 1. IEEE Press, Perth, pp 178\u2013181","DOI":"10.1109\/ICEC.1995.489140"}],"container-title":["Natural Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11047-020-09828-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11047-020-09828-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11047-020-09828-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,7]],"date-time":"2022-06-07T13:44:32Z","timestamp":1654609472000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11047-020-09828-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,31]]},"references-count":39,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,6]]}},"alternative-id":["9828"],"URL":"https:\/\/doi.org\/10.1007\/s11047-020-09828-w","relation":{},"ISSN":["1567-7818","1572-9796"],"issn-type":[{"type":"print","value":"1567-7818"},{"type":"electronic","value":"1572-9796"}],"subject":[],"published":{"date-parts":[[2021,1,31]]},"assertion":[{"value":"25 November 2020","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 January 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}