{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T05:12:45Z","timestamp":1772687565840,"version":"3.50.1"},"reference-count":43,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,12,20]],"date-time":"2019-12-20T00:00:00Z","timestamp":1576800000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005632","name":"Narodowe Centrum Bada\u0144 i Rozwoju","doi-asserted-by":"publisher","award":["DZP\/BIOSTRATEG-II\/390\/2015"],"award-info":[{"award-number":["DZP\/BIOSTRATEG-II\/390\/2015"]}],"id":[{"id":"10.13039\/501100005632","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Supervised classification methods, used for many applications, including vegetation mapping require accurate \u201cground truth\u201d to be effective. Nevertheless, it is common for the quality of this data to be poorly verified prior to it being used for the training and validation of classification models. The fact that noisy or erroneous parts of the reference dataset are not removed is usually explained by the relatively high resistance of some algorithms to errors. The objective of this study was to demonstrate the rationale for cleaning the reference dataset used for the classification of heterogeneous non-forest vegetation, and to present a workflow based on the t-distributed stochastic neighbor embedding (t-SNE) algorithm for the better integration of reference data with remote sensing data in order to improve outcomes. The proposed analysis is a new application of the t-SNE algorithm. The effectiveness of this workflow was tested by classifying three heterogeneous non-forest Natura 2000 habitats: Molinia meadows (Molinion caeruleae; code 6410), species-rich Nardus grassland (code 6230) and dry heaths (code 4030), employing two commonly used algorithms: random forest (RF) and AdaBoost (AB), which, according to the literature, differ in their resistance to errors in reference datasets. Polygons collected in the field (on-ground reference data) in 2016 and 2017, containing no intentional errors, were used as the on-ground reference dataset. The remote sensing data used in the classification were obtained in 2017 during the peak growing season by a HySpex sensor consisting of two imaging spectrometers covering spectral ranges of 0.4\u20130.9 \u03bcm (VNIR-1800) and 0.9\u20132.5 \u03bcm (SWIR-384). The on-ground reference dataset was gradually cleaned by verifying candidate polygons selected by visual interpretation of t-SNE plots. Around 40\u201350% of candidate polygons were ultimately found to contain errors. Altogether, 15% of reference polygons were removed. As a result, the quality of the final map, as assessed by the Kappa and F1 accuracy measures as well as by visual evaluation, was significantly improved. The global map accuracy increased by about 6% (in Kappa coefficient), relative to the baseline classification obtained using random removal of the same number of reference polygons.<\/jats:p>","DOI":"10.3390\/rs12010039","type":"journal-article","created":{"date-parts":[[2019,12,23]],"date-time":"2019-12-23T03:15:01Z","timestamp":1577070901000},"page":"39","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["The t-SNE Algorithm as a Tool to Improve the Quality of Reference Data Used in Accurate Mapping of Heterogeneous Non-Forest Vegetation"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8040-7833","authenticated-orcid":false,"given":"Anna","family":"Halladin-D\u0105browska","sequence":"first","affiliation":[{"name":"MGGP Aero sp. z o.o., 33-100 Tarn\u00f3w, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adam","family":"Kania","sequence":"additional","affiliation":[{"name":"Definity Sp. z o.o., 52-116 Wroc\u0142aw, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0831-2992","authenticated-orcid":false,"given":"Dominik","family":"Kope\u0107","sequence":"additional","affiliation":[{"name":"Department of Geobotany and Plant Ecology, Faculty of Biology and Environmental, University of Lodz, 90-237 \u0141\u00f3d\u017a, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,12,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Foody, G.M., Pal, M., Rocchini, D., Garzon-Lopez, C., and Bastin, L. (2016). The sensitivity of mapping methods to reference data quality: Training supervised image classifications with imperfect reference data. ISPRS Int. J. Geo-Inf., 5.","DOI":"10.3390\/ijgi5110199"},{"key":"ref_2","unstructured":"Lillesand, T.M., and Kiefer, R.W. (1994). Remote Sensing and Image Interpretation, John Wiley & Sons. [3rd ed.]."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/j.isprsjprs.2015.03.014","article-title":"Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin","volume":"105","author":"Mellor","year":"2015","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2496","DOI":"10.3390\/ijgi4042496","article-title":"Impacts of species misidentification on species distribution modeling with presence-only data","volume":"4","author":"Costa","year":"2015","journal-title":"ISPRS Int. J. Geo-Inf."},{"key":"ref_5","unstructured":"Mather, P.M. (2004). Computer Processing of Remotely-Sensed Images: An Introduction, John Wiley and Sons. [3rd ed.]."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Congalton, R.G., and Green, K. (2008). Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, CRC Press.","DOI":"10.1201\/9781420055139"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.rse.2006.03.004","article-title":"Training set size requirements for the classification of a specific class","volume":"104","author":"Foody","year":"2006","journal-title":"Remote Sens. Environ."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1080\/14498596.2012.733616","article-title":"Assessing the quality of training data in the supervised classification of remotely sensed imagery: A correlation analysis","volume":"57","author":"Ge","year":"2012","journal-title":"J. Spat. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Pelletier, C., Valero, S., Inglada, J., Champion, N., Sicre, C.M., and Dedieu, G. (2017). Effect of training class label noise on classification performances for land cover mapping with satellite image time series. Remote Sens., 9.","DOI":"10.3390\/rs9020173"},{"key":"ref_10","unstructured":"Guo, L. (2011). Margin Framework for Ensemble Classifiers. Application to Remote Sensing Data. [Ph.D. Thesis, University of Bordeaux]."},{"key":"ref_11","unstructured":"Kope\u0107, K., Wylaz\u0142owska, J., Niedzielko, J., Jaroci\u0144ska, A., Borzuchowski, J., Pi\u00f3rkowski, H., B\u0142o\u0144ska, A., Niedzielko, M., Halladin-D\u0105browska, A., and Michalska-Hejduk, D. Auxiliary work in WP3 under the programme \u201cNatural Environment, Agriculture and Forestry\u201d BIOSTRATEG II.: The innovative approach supporting monitoring of non-forest Natura 2000 habitats, using remote sensing methods (HabitARS), Unpublished work."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ramaswamy, S., Rastogi, R., and Shim KAIST, K. (2000, January 15\u201318). Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA.","DOI":"10.1145\/342009.335437"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/2133360.2133363","article-title":"Isolation-based anomaly detection","volume":"6","author":"Liu","year":"2012","journal-title":"ACM Trans. Knowl. Discov. Data"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Breunig, M.M., Kriegel, H.-P., Ng, R.T., and Sander, J. (2000, January 15\u201318). LOF: Identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA.","DOI":"10.1145\/342009.335388"},{"key":"ref_15","unstructured":"B\u00fcschenfeld, T., and Ostermann, J. (September, January 25). Automatic refinement of training data for classification of satellite imagery. Proceedings of the PISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Melbourne, Australia."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"850","DOI":"10.1016\/j.envsoft.2008.11.012","article-title":"Increasing the accuracy of neural network classification using refined training data","volume":"24","author":"Kavzoglu","year":"2009","journal-title":"Environ. Model. Softw."},{"key":"ref_17","unstructured":"Mather, P.M. (1976). Computational Methods of Multivariate Analysis in Physical Geography, John Wiley and Sons."},{"key":"ref_18","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhang, J., Chen, L., Zhuo, L., Liang, X., and Li, J. (2018). An efficient hyperspectral image retrieval method: Deep spectral-spatial feature extraction with DCGAN and dimensionality reduction using t-SNE-based NM hashing. Remote Sens., 10.","DOI":"10.3390\/rs10020271"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhong, Z., Li, J., Ma, L., Jiang, H., and Zhao, H. (2017, January 23\u201328). Deep residual networks for hyperspectral image classification. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8127330"},{"key":"ref_21","unstructured":"Dai, X., Guo, S., and Li, X. (2018, January 10\u201314). Novel hyperspectral image classification method based on the t-SNE and AdaBoost algorithms. Proceedings of the Association of American Geographers Annual Meeting, New Orleans, LA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"4172","DOI":"10.1109\/TGRS.2007.905311","article-title":"Dimensionality reduction based on clonal selection for hyperspectral imagery","volume":"45","author":"Zhang","year":"2007","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","unstructured":"Halladin-D\u0105browska, A., Kania, A., S\u0142awik, \u0141., Niedzielko, J., Borzuchowski, J., Wylaz\u0142owska, J., Michalska-Hejduka, D., and Kope\u0107, D. (2018, January 26\u201329). The t-SNE Machine Learning Algorithm As A Novel Tool Supporting The Classification Of Non-forest Natura 2000 Habitats. Proceedings of the Sixth International Conference on Remote Sensing and Geoinformation of Environment, Paphos, Cyprus."},{"key":"ref_24","unstructured":"Kania, A. (2018, January 24\u201328). Interactive tool for real-time delivery of remote sensing based vegetation maps and support of botanical data collection. Proceedings of the 10th International Conference on Ecological Informatics. Translating Ecological Data into Knowledge and Decisions in a Rapidly Changing World, Jena, Germany."},{"key":"ref_25","unstructured":"Chan, W., Spanhove, T., Ma, J., Vanden Borre, J., Paelinckx, D., and Canters, F. (July, January 29). Natura 2000 habitat identification and conservation status assessment with superresolution enhanced hyperspectral (CHRIS\/PROBA) imagery. Proceedings of the GEOBIA 2010-Geographic Object-Based Image Analysis, Ghent, Belgium."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Folleco, A., Khoshgoftaar, T.M., Hulse, J., and Van Bullard, L. (2008, January 13\u201315). Identifying learners robust to low quality data. Proceedings of the IEEE International Conference on Information Reuse and Integration, Las Vegas, NV, USA.","DOI":"10.1109\/IRI.2008.4583028"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.isprsjprs.2011.11.002","article-title":"An assessment of the effectiveness of a random forest classifier for land-cover classification","volume":"67","author":"Ghimire","year":"2012","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1016\/j.knosys.2016.03.024","article-title":"A robust multi-class AdaBoost algorithm for mislabeled noisy data","volume":"102","author":"Sun","year":"2016","journal-title":"Knowl.-Based Syst."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1109\/TPAMI.2007.250609","article-title":"A Comparison of decision tree ensemble creation techniques","volume":"29","author":"Banfield","year":"2007","journal-title":"IEEE Trans. Pattern. Anal. Mach. Intell."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1023\/A:1007607513941","article-title":"An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization","volume":"40","author":"Dietterich","year":"2000","journal-title":"Mach. Learn."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"S\u0142awik, \u0141., Niedzielko, J., Kania, A., Pi\u00f3rkowski, H., and Kope\u0107, D. (2019). Multiple flights or single flight instrument fusion of hyperspectral and ALS data? A comparison of their performance for vegetation mapping. Remote Sens., 11.","DOI":"10.3390\/rs11080970"},{"key":"ref_32","unstructured":"ENVI API Programming Guide (2019, February 08). Harris Geospatial Solutions Documentation Center. Available online: http:\/\/www.harrisgeospatial.com\/docs\/ProgrammingGuideIntroduction.html."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"290","DOI":"10.5589\/m13-038","article-title":"Wetland mapping with LiDAR derivatives, SAR polarimetric decompositions, and LiDAR-SAR fusion using a random forest classifier","volume":"39","author":"Millard","year":"2013","journal-title":"Can. J. Remote Sens."},{"key":"ref_34","first-page":"3221","article-title":"Accelerating t-SNE using tree-based algorithms","volume":"15","author":"Courville","year":"2014","journal-title":"J Mach. Learn. Res."},{"key":"ref_35","unstructured":"(2019, September 05). Vegetation Classification Studio Software, Version 2.13\/hb. Available online: http:\/\/www.definity.pl\/vcs."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A Coefficient of Agreement for Nominal Scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educ. Psychol. Meas."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1111\/avsc.12204","article-title":"Evaluating an unmanned aerial vehicle-based approach for assessing habitat extent and condition in fine-scale early successional mountain mosaics","volume":"19","author":"Henriques","year":"2016","journal-title":"Appl. Veg. Sci."},{"key":"ref_38","first-page":"83","article-title":"Using information layers for mapping grassland habitat distribution at local to regional scales","volume":"37","author":"Buck","year":"2015","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_39","first-page":"211","article-title":"Remote sensing of scattered natura 2000 habitats using a one-class classifier","volume":"33","author":"Stenzel","year":"2014","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_40","first-page":"25","article-title":"Grassland habitat mapping by intra-annual time series analysis -Comparison of RapidEye and TerraSAR-X satellite data","volume":"34","author":"Schuster","year":"2015","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"8056","DOI":"10.3390\/rs6098056","article-title":"Categorizing grassland vegetation with full-waveform airborne laser scanning: A feasibility study for detecting natura 2000 habitat types","volume":"6","author":"Zlinszky","year":"2014","journal-title":"Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"8316918","DOI":"10.1155\/2018\/8316918","article-title":"The Impact of Simulated Spectral Noise on Random Forest and Oblique Random Forest Classification Performance","volume":"2018","author":"Agjee","year":"2018","journal-title":"J. Spectrosc."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/1\/39\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:44:04Z","timestamp":1760190244000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/1\/39"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,20]]},"references-count":43,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,1]]}},"alternative-id":["rs12010039"],"URL":"https:\/\/doi.org\/10.3390\/rs12010039","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12,20]]}}}