{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,4]],"date-time":"2026-01-04T06:04:41Z","timestamp":1767506681480,"version":"build-2065373602"},"reference-count":65,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2022,8,16]],"date-time":"2022-08-16T00:00:00Z","timestamp":1660608000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004424","name":"Water Research Commission","doi-asserted-by":"publisher","award":["K5\/2966\/\/4"],"award-info":[{"award-number":["K5\/2966\/\/4"]}],"id":[{"id":"10.13039\/501100004424","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Forest plantations in South Africa impose genus-specific demands on limited soil moisture. Hence, plantation composition and distribution mapping is critical for water conservation planning. Genus maps are used to quantify the impact of post-harvest genus-exchange activities in the forestry sector. Collecting genus data using in situ methods is costly and time-consuming, especially when performed at regional or national scales. Although remotely sensed data and machine learning show potential for mapping genera at regional scales, the efficacy of such methods is highly dependent on the size and quality of the training data used to build the models. However, it is not known what sampling scheme (e.g., sample size, proportion per genus, and spatial distribution) is most effective to map forest genera over large and complex areas. Using Sentinel-2 imagery as inputs, this study evaluated the effects of different sampling strategies (e.g., even, uneven, and area-proportionate) for training the random forests machine learning classifier to differentiate between Acacia, Eucalyptus, and Pinus trees in South Africa. Sample size (s) was related to the number of input features (n) to better understand the potential impact of sample sparseness. The results show that an even sample with maximum size (100%, s~91n) produced the highest overall accuracy (76.3%). Although larger training set sizes (s &gt; n) resulted in higher OAs, a saturation point was reached at s~64n.<\/jats:p>","DOI":"10.3390\/rs14163992","type":"journal-article","created":{"date-parts":[[2022,8,17]],"date-time":"2022-08-17T03:15:27Z","timestamp":1660706127000},"page":"3992","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Impact of Training Set Configurations for Differentiating Plantation Forest Genera with Sentinel-2 Imagery and Machine Learning"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6937-1887","authenticated-orcid":false,"given":"Caley","family":"Higgs","sequence":"first","affiliation":[{"name":"Department of Geography and Environmental Studies, Stellenbosch University, 82 Ryneveld St, Stellenbosch 7600, South Africa"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5631-0206","authenticated-orcid":false,"given":"Adriaan","family":"van Niekerk","sequence":"additional","affiliation":[{"name":"Department of Geography and Environmental Studies, Stellenbosch University, 82 Ryneveld St, Stellenbosch 7600, South Africa"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,16]]},"reference":[{"key":"ref_1","first-page":"12","article-title":"Remote sensing of forest health and vitality: A South African perspective","volume":"1","author":"Xulu","year":"2018","journal-title":"South. For."},{"key":"ref_2","first-page":"58","article-title":"A Silviculturalmap of Southern Africa","volume":"67","author":"Poynton","year":"1971","journal-title":"S. Afr. J. Sci."},{"key":"ref_3","unstructured":"FP&M SETA (2014). Paper and Pulp Sector, FP&M SETA."},{"key":"ref_4","unstructured":"Steyl, I. (1997). Strategic Environmental Assessment for Stream Flow Reduction Activities in South Africa, Department of Water Affairs & Forestry, South Africa."},{"key":"ref_5","first-page":"161","article-title":"Polygon-based aggregation of remotely sensed data for regional ecological analyses","volume":"4","author":"Wicks","year":"2002","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_6","unstructured":"Scott, D.F., Prinsloo, F.W., Moses, G., Mehlomakulu, M., and Simmers, A.D.A. (2000). A Re-Analysis of the South African Catchment Afforestation Experimental Data: Report to the Water Research Commission, WRC."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4314\/wsa.v36i1.50901","article-title":"Measurement of grassland evaporation using a surface-layer scintillometer","volume":"36","author":"Savage","year":"2010","journal-title":"Water SA"},{"key":"ref_8","first-page":"31","article-title":"Some Effects of Afforestation on Streamflow in the Western Cape Province, South Africa","volume":"12","year":"1987","journal-title":"Water SA"},{"key":"ref_9","first-page":"49","article-title":"Accomplishments and Dynamics of the South African Afforestation Permit System","volume":"172","year":"1995","journal-title":"South Afr. For. J."},{"key":"ref_10","first-page":"27","article-title":"A new approach to modelling streamflow reductions resulting from commercial afforestation in south africa","volume":"196","author":"Gush","year":"2002","journal-title":"S. Afr. For. J."},{"key":"ref_11","unstructured":"Clulow, A.D., Everson, C.S., and Gush, M.B. (2011). The Long-Term Impact of Acacia Mearnsii Trees on Evaporation, Streamflow and Groundwater Resources, Water Research Commission Report No. TT505\/11; WRC."},{"key":"ref_12","unstructured":"FSA (2019). Environmental Guidelines for Commercial Forestry Plantations in South Africa, Forestry South Africa."},{"key":"ref_13","unstructured":"Forestry South Africa (2019). Timber Plantation Ownership, Forestry South Africa."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1687","DOI":"10.5194\/acp-9-1687-2009","article-title":"Operational climate monitoring from space: The EUMETSAT satellite application facility on climate monitoring (CM-SAF)","volume":"9","author":"Schulz","year":"2009","journal-title":"Atmos. Chem. Phys."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.habitatint.2016.02.003","article-title":"GlobeLand30 as an alternative fine-scale global land cover map: Challenges, possibilities, and implications for developing countries","volume":"55","author":"Tayyebi","year":"2016","journal-title":"Habitat Int."},{"key":"ref_16","unstructured":"Department of Environmental Affairs (2019). South African National Land-Cover 2018 Report & Accuracy Assessment, Department of Environmental Affairs, South Africa."},{"key":"ref_17","unstructured":"L\u00fcck, W. (2018). Generating Automated Forestry Geoinformation Products From Remotely Sensed Imagery. [Master\u2019s Thesis, Stellenbosch Unviersity]."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1016\/S0034-4257(01)00209-7","article-title":"Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method","volume":"77","author":"Ek","year":"2001","journal-title":"Remote Sens. Environ."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1080\/01431160802311125","article-title":"Discrimination of dominant forest types for Matschie\u2019s tree kangaroo conservation in Papua New Guinea using high-resolution remote sensing data","volume":"30","author":"Stabach","year":"2009","journal-title":"Int. J. Remote Sens."},{"key":"ref_20","first-page":"349","article-title":"Assessing the utility WorldView-2 imagery for tree species mapping in South African subtropical humid forest and the conservation implications: Dukuduku forest patch as case study","volume":"38","author":"Cho","year":"2015","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_21","unstructured":"Francois, A., and Leckie, D.G. (2006). Francios The individual tree crown approach to Ikonos images of a Coniferous Plantation Area. Photogrammetric Engineering & Remote Sensing, American Society for Photogrammetry and Remote Sensing."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2661","DOI":"10.3390\/rs4092661","article-title":"Tree species classification with Random forest using very high spatial resolution 8-band worldView-2 satellite data","volume":"4","author":"Immitzer","year":"2012","journal-title":"Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.1016\/j.rse.2010.01.002","article-title":"Remote Sensing of Environment Synergistic use of QuickBird multispectral imagery and LIDAR data for object-based forest species classi fi cation","volume":"114","author":"Ke","year":"2010","journal-title":"Remote Sens. Environ."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1016\/j.rse.2012.06.011","article-title":"A comparative analysis of high spatial resolution IKONOS and WorldView-2 imagery for mapping urban tree species","volume":"124","author":"Pu","year":"2012","journal-title":"Remote Sens. Environ."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"501","DOI":"10.14358\/PERS.83.7.501","article-title":"Northern Conifer Forest Species Classification Using Multispectral Data Acquired from an Unmanned Aerial Vehicle","volume":"83","author":"Franklin","year":"2017","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"5236","DOI":"10.1080\/01431161.2017.1363442","article-title":"Deciduous tree species classification using object-based analysis and machine learning with unmanned aerial vehicle multispectral data","volume":"39","author":"Franklin","year":"2018","journal-title":"Int. J. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5453","DOI":"10.1080\/01431160500285076","article-title":"Classification of coniferous tree species and age classes using hyperspectral data and geostatistical methods","volume":"26","author":"Buddenbaum","year":"2005","journal-title":"Int. J. Remote Sens."},{"key":"ref_28","first-page":"e12267","article-title":"Guidelines of the minimum sample size requirements for Cohen\u2019 s Kappa. Epidemiol","volume":"17","author":"Bujang","year":"2017","journal-title":"Biostat. Public Health"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"5660","DOI":"10.3390\/rs70505660","article-title":"Mapping species composition of forests and tree plantations in northeastern Costa Rica with an integration of hyperspectral and multitemporal landsat imagery","volume":"7","author":"Fagan","year":"2015","journal-title":"Remote Sens."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.isprsjprs.2013.01.013","article-title":"Commercial tree species discrimination using airborne AISA Eagle hyperspectral imagery and partial least squares discriminant analysis (PLS-DA) in KwaZulu-Natal, South Africa","volume":"79","author":"Peerbhay","year":"2013","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"3020","DOI":"10.3390\/s8053020","article-title":"Seasonal effect on tree species classification in an urban environment using hyperspectral data, LiDAR, and an object-oriented approach","volume":"8","author":"Voss","year":"2008","journal-title":"Sensors"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Nomura, K., and Mitchard, E.T.A. (2018). More than meets the eye: Using Sentinel-2 to map small plantations in complex forest landscapes. Remote Sens., 10.","DOI":"10.3390\/rs10111693"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/10106049.2019.1585483","article-title":"Examining the effectiveness of Sentinel-1 and 2 imagery for commercial forest species mapping","volume":"36","author":"Mngadi","year":"2019","journal-title":"Geocarto Int."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/j.rse.2016.01.017","article-title":"Discrimination of tropical forest types, dominant species, and mapping of functional guilds by hyperspectral and simulated multispectral Sentinel-2 data","volume":"176","author":"Puletti","year":"2016","journal-title":"Remote Sens. Environ."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1437","DOI":"10.3390\/w7041437","article-title":"Urban flood mapping based on unmanned aerial vehicle remote sensing and random forest classifier-A case of yuyao, China","volume":"7","author":"Feng","year":"2015","journal-title":"Water"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/S0034-4257(02)00096-2","article-title":"Overview of the radiometric and biophysical performance of the MODIS vegetation indices","volume":"83","author":"Huete","year":"2002","journal-title":"Remote Sens. Environ."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"953","DOI":"10.5194\/isprs-archives-XLI-B8-953-2016","article-title":"The combination of UAV survey and Landsat imagery for monitoring of crop vigor in precision agriculture","volume":"41","author":"Lukas","year":"2016","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Loggenberg, K., Strever, A., Greyling, B., and Poona, N. (2018). Modelling water stress in a Shiraz vineyard using hyperspectral imaging and machine learning. Remote Sens., 10.","DOI":"10.3390\/rs10020202"},{"key":"ref_39","first-page":"363","article-title":"The Hughes phenomenon in hyperspectral classification based on the ground spectrum of grasslands in the region around Qinghai Lake","volume":"Volume 8910","author":"Ma","year":"2013","journal-title":"International Symposium on Photoelectronic Detection and Imaging 2013: Imaging Spectrometer Technologies and Applications"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.isprsjprs.2016.01.011","article-title":"Random forest in remote sensing: A review of applications and future directions","volume":"114","author":"Belgiu","year":"2016","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Congalton, R.G., and Green, K. (2019). Assessing the Accuracy of Remotely Sensed Data, Assessing the Accuracy of Remotely Sensed Data; Taylor & Francis Group. [3rd ed.].","DOI":"10.1201\/9780429052729"},{"key":"ref_42","unstructured":"Mather, P.M. (2004). Computer Processing of Remotely-Sensed Images, John Wiley & Sons Ltd.. [3rd ed.]."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Thanh Noi, P., and Kappas, M. (2017). Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery. Sensors, 18.","DOI":"10.3390\/s18010018"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"5273","DOI":"10.1080\/01431160903130937","article-title":"Sample size determination for image classification accuracy assessment and comparison","volume":"30","author":"Foody","year":"2009","journal-title":"Int. J. Remote Sens."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.rse.2006.03.004","article-title":"Training set size requirements for the classification of a specific class","volume":"104","author":"Foody","year":"2006","journal-title":"Remote Sens. Environ."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2632","DOI":"10.1109\/TGRS.2012.2216272","article-title":"Tree species classification in boreal forests with hyperspectral data","volume":"51","author":"Dalponte","year":"2013","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"8489","DOI":"10.3390\/rs70708489","article-title":"On the importance of training data sample selection in Random Forest image classification: A case study in peatland ecosystem mapping","volume":"7","author":"Millard","year":"2015","journal-title":"Remote Sens."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/j.isprsjprs.2015.03.014","article-title":"Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin","volume":"105","author":"Mellor","year":"2015","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"9655","DOI":"10.3390\/rs70809655","article-title":"An evaluation of different training sample allocation schemes for discrete and continuous land cover classification using decision tree-based algorithms","volume":"7","author":"Colditz","year":"2015","journal-title":"Remote Sens."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1186\/s42408-018-0001-0","article-title":"An assessment of climate, weather, and fuel factors influencing a large, destructive wildfire in the Knysna region","volume":"14","author":"Kraaij","year":"2018","journal-title":"S. Afr. Fire Ecol."},{"key":"ref_51","unstructured":"ESA (2015). ESA\u2019s Optical High-Resolution Mission for GMES Operational Services, ESA."},{"key":"ref_52","first-page":"1","article-title":"Habitat assessment of small mammals in the Umvoti Vlei conservancy, KwaZulu-Natal, South Africa","volume":"31","author":"Fuller","year":"2001","journal-title":"Afr. J. Wildl. Res."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"632","DOI":"10.1016\/j.rse.2017.09.037","article-title":"Identifying the genus or species of individual trees using a three-wavelength airborne lidar system","volume":"204","author":"Budei","year":"2018","journal-title":"Remote Sens. Environ."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"4407","DOI":"10.1080\/01431161.2011.552923","article-title":"Death to Kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment","volume":"32","author":"Pontius","year":"2011","journal-title":"Int. J. Remote Sens."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/S0034-4257(01)00295-4","article-title":"Status of land cover classification accuracy assessment","volume":"80","author":"Foody","year":"2002","journal-title":"Remote Sens. Environ."},{"key":"ref_57","first-page":"93","article-title":"An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm","volume":"67","author":"Ghimire","year":"2012","journal-title":"Remote Sens."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Mahdianpari, M., Salehi, B., Mohammadimanesh, F., Homayouni, S., and Gill, E. (2019). The first wetland inventory map of newfoundland at a spatial resolution of 10 m using sentinel-1 and sentinel-2 data on the Google Earth Engine cloud computing platform. Remote Sens., 11.","DOI":"10.3390\/rs11010043"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1080\/10106049.2018.1520923","article-title":"Mapping distribution of Sundarban mangroves using Sentinel-2 data and new spectral metric for detecting their health condition","volume":"35","author":"Manna","year":"2020","journal-title":"Geocarto Int."},{"key":"ref_60","unstructured":"Shetty, S. (2019). Analysis of Machine Learning Classifiers for LULC Classification on Google Earth Engine Analysis of Machine Learning Classifiers for LULC Classification on Google Earth Engine. [Masters Thesis, University of Twente]."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"49","DOI":"10.4018\/ijagr.2014070104","article-title":"Impact of training set size on object-based land cover classification: A comparison of three classifiers","volume":"5","author":"Myburgh","year":"2014","journal-title":"Int. J. Appl. Geospatial. Res."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"648","DOI":"10.1016\/j.rse.2017.09.035","article-title":"Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 Landsat sites","volume":"204","author":"Heydari","year":"2018","journal-title":"Remote Sens. Environ."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1080\/01431160412331269698","article-title":"Random forest classifier for remote sensing classification","volume":"26","author":"Pal","year":"2005","journal-title":"Int. J. Remote Sens."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1016\/j.patrec.2005.08.011","article-title":"Random forests for land cover classification","volume":"27","author":"Gislason","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_65","first-page":"360","article-title":"Understanding interobserver agreement: The kappa statistic","volume":"37","author":"Viera","year":"2005","journal-title":"Fam. Med."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/16\/3992\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:10:48Z","timestamp":1760141448000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/16\/3992"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,16]]},"references-count":65,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["rs14163992"],"URL":"https:\/\/doi.org\/10.3390\/rs14163992","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2022,8,16]]}}}