{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T13:32:25Z","timestamp":1776519145866,"version":"3.51.2"},"reference-count":57,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,1,18]],"date-time":"2019-01-18T00:00:00Z","timestamp":1547769600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>High spatial resolution (1\u20135 m) remotely sensed datasets are increasingly being used to map land covers over large geographic areas using supervised machine learning algorithms. Although many studies have compared machine learning classification methods, sample selection methods for acquiring training and validation data for machine learning, and cross-validation techniques for tuning classifier parameters are rarely investigated, particularly on large, high spatial resolution datasets. This work, therefore, examines four sample selection methods\u2014simple random, proportional stratified random, disproportional stratified random, and deliberative sampling\u2014as well as three cross-validation tuning approaches\u2014k-fold, leave-one-out, and Monte Carlo methods. In addition, the effect on the accuracy of localizing sample selections to a small geographic subset of the entire area, an approach that is sometimes used to reduce costs associated with training data collection, is investigated. These methods are investigated in the context of support vector machines (SVM) classification and geographic object-based image analysis (GEOBIA), using high spatial resolution National Agricultural Imagery Program (NAIP) orthoimagery and LIDAR-derived rasters, covering a 2,609 km2 regional-scale area in northeastern West Virginia, USA. Stratified-statistical-based sampling methods were found to generate the highest classification accuracy. Using a small number of training samples collected from only a subset of the study area provided a similar level of overall accuracy to a sample of equivalent size collected in a dispersed manner across the entire regional-scale dataset. There were minimal differences in accuracy for the different cross-validation tuning methods. The processing time for Monte Carlo and leave-one-out cross-validation were high, especially with large training sets. For this reason, k-fold cross-validation appears to be a good choice. Classifications trained with samples collected deliberately (i.e., not randomly) were less accurate than classifiers trained from statistical-based samples. This may be due to the high positive spatial autocorrelation in the deliberative training set. Thus, if possible, samples for training should be selected randomly; deliberative samples should be avoided.<\/jats:p>","DOI":"10.3390\/rs11020185","type":"journal-article","created":{"date-parts":[[2019,1,18]],"date-time":"2019-01-18T11:26:55Z","timestamp":1547810815000},"page":"185","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":266,"title":["Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9580-9213","authenticated-orcid":false,"given":"Christopher","family":"A. Ramezan","sequence":"first","affiliation":[{"name":"Department of Geology and Geography, West Virginia University, Morgantown, WV 26506, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0414-9748","authenticated-orcid":false,"given":"Timothy","family":"A. Warner","sequence":"additional","affiliation":[{"name":"Department of Geology and Geography, West Virginia University, Morgantown, WV 26506, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4412-5599","authenticated-orcid":false,"given":"Aaron","family":"E. Maxwell","sequence":"additional","affiliation":[{"name":"Department of Geology and Geography, West Virginia University, Morgantown, WV 26506, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,1,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1016\/j.rse.2014.07.028","article-title":"Importance of sample size, data type and prediction method for remote sensing-based estimations of aboveground forest biomass","volume":"154","author":"Fassnacht","year":"2014","journal-title":"Remote Sens. Environ."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Li, K., Li, J., Liu, Y., and Castiglione, A. (2016). Selecting Training Samples from Large-Scale Remote-Sensing Samples Using an Active Learning Algorithm. Computational Intelligence and Intelligent Systems, Springer.","DOI":"10.1007\/978-981-10-0356-1"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1080\/01431160600746456","article-title":"A survey of image classification methods and techniques for improving classification performance","volume":"28","author":"Lu","year":"2007","journal-title":"Int. J. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"5273","DOI":"10.1080\/01431160903130937","article-title":"Sample size determination for image classification accuracy assessment and comparison","volume":"30","author":"Foody","year":"2009","journal-title":"Int. J. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2067","DOI":"10.1080\/01431161.2014.885152","article-title":"Assessing the impact of training sample selection of accuracy of an urban classification: A case study in Denver, Colorado","volume":"35","author":"Jin","year":"2014","journal-title":"Int. J. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"895","DOI":"10.1080\/13658816.2010.498378","article-title":"Thematic accuracy assessment of geographic object-based image classification","volume":"25","author":"Radoux","year":"2011","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1080\/01431161.2010.541950","article-title":"Impact of sample size allocation when using stratified random sampling to estimate accuracy and area of land-cover change","volume":"3","author":"Stehman","year":"2012","journal-title":"Remote Sens. Lett."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1016\/j.isprsjprs.2017.06.001","article-title":"A review of supervised object-based land-cover image classification","volume":"130","author":"Ma","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1016\/j.isprsjprs.2009.06.004","article-title":"Object based image analysis for remote sensing","volume":"65","author":"Blaschke","year":"2010","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Foody, G.M., Pal, M., Rocchini, D., Garzon-Lopez, C.X., and Bastin, L. (2016). The Sensitivity of mapping Methods to Reference Data Quality: Training Supervised Image Classifications with Imperfect Reference Data. ISPRS Int. J. Geo-Inf., 5.","DOI":"10.3390\/ijgi5110199"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/0034-4257(91)90048-B","article-title":"A Review of Assessing the Accuracy of Classifications of Remotely Sensed Data","volume":"37","author":"Congalton","year":"1991","journal-title":"Remote Sens. Environ."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.rse.2006.03.004","article-title":"Training Set Size Requirements for the Classification of a Specific Class","volume":"104","author":"Foody","year":"2006","journal-title":"Remote Sens. Environ."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"16164","DOI":"10.3390\/rs71215817","article-title":"Evaluation of Sampling Methods for Validation of Remotely Sensed Fractional Vegetation Cover","volume":"7","author":"Mu","year":"2015","journal-title":"Remote Sens."},{"key":"ref_14","first-page":"1155","article-title":"The Effect of Training Strategies on Supervised Classification at Different Spatial Resolutions","volume":"68","author":"Chen","year":"2002","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2177","DOI":"10.1080\/01431160310001618464","article-title":"Examining the effect of spatial resolution and texture windows size on classification accuracy: An urban environment case","volume":"25","author":"Chen","year":"2004","journal-title":"Int. J. Remote Sens."},{"key":"ref_16","first-page":"593","article-title":"A comparison of sampling schemes used in generating error matrices for assessing the accuracy of maps generated from remotely sensed data","volume":"54","author":"Congalton","year":"1988","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2784","DOI":"10.1080\/01431161.2018.1433343","article-title":"Implementation of machine-learning classification in remote sensing: An applied review","volume":"39","author":"Maxwell","year":"2018","journal-title":"Int. J. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"4923","DOI":"10.1080\/01431161.2014.930207","article-title":"Estimating area and map accuracy for stratified random sampling when the strata are different from the map classes","volume":"35","author":"Stehman","year":"2014","journal-title":"Int. J. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Warner, T.A., Nellis, M.D., and Foody, G.M. (2009). Accuracy assessment. The SAGE Handbook of Remote Sensing, Sage Publications Ltd.","DOI":"10.4135\/9780857021052"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1344","DOI":"10.1109\/JSTARS.2012.2215310","article-title":"Evaluation of SVM, RVM and SMLR for accurate image classification with limited ground data","volume":"5","author":"Pal","year":"2012","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1109\/LGRS.2013.2246539","article-title":"An Effective Strategy to Reduce the Labeling Cost in the Definition of Training Sets by Active Learning","volume":"11","author":"Demir","year":"2014","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wuttke, S., Middlemann, W., and Stilla, U. (2015, January 25\u201327). Concept for a compound analysis in active learning remote sensing. Proceedings of the International Archives of the Photogrammetry, Remote Sensing, and Spatial Information Sciences, Munich, Germany.","DOI":"10.5194\/isprsarchives-XL-3-W2-273-2015"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.rse.2015.07.028","article-title":"LiDAR based prediction of forest biomass using hierarchial models with spatially varying coefficients","volume":"169","author":"Babcock","year":"2015","journal-title":"Remote Sens. Environ."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Brenning, A. (2012, January 22\u201327). Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest. Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, Germany.","DOI":"10.1109\/IGARSS.2012.6352393"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.cageo.2013.10.008","article-title":"Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information","volume":"63","author":"Cracknell","year":"2014","journal-title":"Comput. Geosci."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"9806479","DOI":"10.1155\/2017\/9806479","article-title":"A Machine Learning and Cross-Validation Approach for the Discrimination of Vegetation Physiognomic Types Using Satellite Based Multispectral and Multitemporal Data","volume":"2017","author":"Sharma","year":"2017","journal-title":"Scientifica"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1016\/j.rse.2011.11.020","article-title":"A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery","volume":"118","author":"Duro","year":"2012","journal-title":"Remote Sens. Environ."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/j.2517-6161.1974.tb00994.x","article-title":"Cross-validatory choice and assessment of statistical predictions","volume":"36","author":"Stone","year":"1974","journal-title":"J. R. Stat. Soc."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning, Springer.","DOI":"10.1007\/978-0-387-84858-7"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1080\/01621459.1984.10478083","article-title":"Cross-Validation of Regression Models","volume":"387","author":"Picard","year":"1984","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Braun, E.L. (1950). Deciduous Forests of Eastern North America, Hafner Publishing Company.","DOI":"10.1097\/00010694-195102000-00012"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1080\/15481603.2014.912874","article-title":"Comparison of NAIP orthophotography and RapidEye satellite imagery for mapping of mining and mine reclamation","volume":"51","author":"Maxwell","year":"2014","journal-title":"GISci. Remote Sens."},{"key":"ref_33","unstructured":"WVU NRAC (2018, December 01). Aerial Lidar Acquistion Report: Preston County and North Branch (Potomac) LIDAR *.LAS 1.2 Data Comprehensive and Bare Earth. West Virginia Department of Environmental Protection. Available online: http:\/\/wvgis.wvu.edu\/lidar\/data\/WVDEP_2011_Deliverable4\/WVDEP_deliverable_4_Project_Report.pdf."},{"key":"ref_34","unstructured":"ESRI (2017). ArcGIS Desktop: Release 10.5.1, Environmental Systems Research Institute."},{"key":"ref_35","unstructured":"Charaniya, A.P., Manduchi, R., and Lodha, S.K. (July, January 27). Supervised parametric classification of aerial LIDAR data. Proceedings of the IEEE 2004 Conferences on Computer Vision and Pattern Recognition Workshop, Washington, DC, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"28099","DOI":"10.3390\/s151128099","article-title":"A Review of LIDAR Radiometric Processing: From Ad Hoc Intensity correction to Rigorous Radiometric Calibration","volume":"15","author":"Kashani","year":"2015","journal-title":"Sensors"},{"key":"ref_37","first-page":"259","article-title":"Assessing the possibility of land-cover classification using LIDAR intensity data","volume":"34","author":"Song","year":"2002","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"954","DOI":"10.1080\/01431161.2014.1001086","article-title":"Assessing machine learning algorithms and image- and LiDAR-derived variables for GEOBIA classification of mining and mine reclamation","volume":"36","author":"Maxwell","year":"2015","journal-title":"Int. J. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Be\u015fol, B., Alganci, U., and Sertel, E. (2017, January 15\u201318). The use of object based classification with nDSM to increase the accuracy of building detection. Proceedings of the 25th Signal Processing and Communications Applications Conference (SIU), Antalya, Turkey.","DOI":"10.1109\/SIU.2017.7960700"},{"key":"ref_40","unstructured":"Lear, R.F. (2018, December 28). NAIP Quality Samples. United States Department of Agriculture Aerial Photography Field Office, Available online: https:\/\/www.fsa.usda.gov\/Internet\/FSA_File\/naip_quality_samples_pdf.pdf."},{"key":"ref_41","unstructured":"Trimble (2018). Trimble eCognition Suite 9.3.2, Trimble Germany GmbH."},{"key":"ref_42","unstructured":"Shan, J., and Toth, C.K. (2008). Airborne and Spaceborne Laser Profilers and Scanners. Topographic Laser Ranging and Scanning: Principles and Processing, CRC Press."},{"key":"ref_43","unstructured":"Baatz, M., and Sch\u00e4pe, A. (2000, January 30). Multiresolution segmentation\u2014An optimization approach for high quality multi-scale image segmentation. Proceedings of the Angewandte Geographische Informations-Verarbeitung XII, Karlsruhe, Germany."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1016\/j.isprsjprs.2014.07.002","article-title":"Comparing supervised and unsupervised multiresolution segmentation approaches for extracting buildings from very high resolution imagery","volume":"96","author":"Belgiu","year":"2014","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.isprsjprs.2013.11.018","article-title":"Automated parameterization for multi-scale image segmentation on multiple layers","volume":"88","author":"Csillik","year":"2014","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2825","DOI":"10.1080\/01431161003745608","article-title":"Multi-scale texture segmentation and classification of salt marsh using digital aerial imagery with very high spatial resolution","volume":"32","author":"Kim","year":"2011","journal-title":"Int. J. Remote Sens."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"118","DOI":"10.4236\/ars.2016.52010","article-title":"Assessing Net Primary Production in Montane Wetlands from Proximal, Airborne, and Satellite Remote Sensing","volume":"5","author":"Maguigan","year":"2016","journal-title":"Adv. Remote Sens."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1080\/00045608.2013.776884","article-title":"Establishing Qualitative Geographic Sample Size in the Presence of Spatial Autocorrelation","volume":"103","author":"Griffith","year":"2013","journal-title":"Ann. Assoc. Am. Geogr."},{"key":"ref_49","unstructured":"Kuhn, M. (2018, February 21). Caret: Classification and Regression Training. R package Version 6.0-71. Available online: https:\/\/CRAN.R-project.org\/package=caret."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Scheuenemeyer, J.H., and Drew, L.J. (2010). Statistics for Earth and Environmental Scientists, John Wiley & Sons.","DOI":"10.1002\/9780470650707"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.isprsjprs.2010.11.001","article-title":"Support vector machines in remote sensing: A review","volume":"66","author":"Mountrakis","year":"2011","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_52","unstructured":"Meyer, D. (2018, February 21). Support Vector Machines: The Interface to Libsvm in Package e1071. R Package Version 6.0-71. Available online: https:\/\/CRAN.R-project.org\/package=e1071."},{"key":"ref_53","unstructured":"Ulrich, J.M. (2018, February 21). Microbenchmark: Accurate Timing Functions. R Package Version 1.4-4. Available online: https:\/\/cran.r-project.org\/web\/packages\/microbenchmark\/microbenchmark.pdf."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/BF02295996","article-title":"Note on the sampling error of the difference between correlated proportions or percentages","volume":"12","author":"McNemar","year":"1947","journal-title":"Psychometrika"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"627","DOI":"10.14358\/PERS.70.5.627","article-title":"Thematic Map Comparison: Evaluating the Statistical Significance of Differences in Classification Accuracy","volume":"70","author":"Foody","year":"2004","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_56","unstructured":"Benediktsson, J.A., Kittler, J., and Roli, F. (2009). Classifying Remote Sensing Data with Support Vector Machines and Imbalanced Training Data, Springer. CMS 2009, LNCS 5519."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"5243","DOI":"10.1080\/01431160903131000","article-title":"Sampling designs for accuracy assessment of land cover","volume":"30","author":"Stehman","year":"2009","journal-title":"Int. J. Remote Sens."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/2\/185\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:27:10Z","timestamp":1760185630000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/2\/185"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,18]]},"references-count":57,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,1]]}},"alternative-id":["rs11020185"],"URL":"https:\/\/doi.org\/10.3390\/rs11020185","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,1,18]]}}}