{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T12:09:58Z","timestamp":1774440598220,"version":"3.50.1"},"reference-count":45,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2017,4,13]],"date-time":"2017-04-13T00:00:00Z","timestamp":1492041600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Like computer vision before, remote sensing has been radically changed by the introduction of deep learning and, more notably, Convolution Neural Networks. Land cover classification, object detection and scene understanding in aerial images rely more and more on deep networks to achieve new state-of-the-art results. Recent architectures such as Fully Convolutional Networks can even produce pixel level annotations for semantic mapping. In this work, we present a deep-learning based segment-before-detect method for segmentation and subsequent detection and classification of several varieties of wheeled vehicles in high resolution remote sensing images. This allows us to investigate object detection and classification on a complex dataset made up of visually similar classes, and to demonstrate the relevance of such a subclass modeling approach. Especially, we want to show that deep learning is also suitable for object-oriented analysis of Earth Observation data as effective object detection can be obtained as a byproduct of accurate semantic segmentation. First, we train a deep fully convolutional network on the ISPRS Potsdam and the NZAM\/ONERA Christchurch datasets and show how the learnt semantic maps can be used to extract precise segmentation of vehicles. Then, we show that those maps are accurate enough to perform vehicle detection by simple connected component extraction. This allows us to study the repartition of vehicles in the city. Finally, we train a Convolutional Neural Network to perform vehicle classification on the VEDAI dataset, and transfer its knowledge to classify the individual vehicle instances that we detected.<\/jats:p>","DOI":"10.3390\/rs9040368","type":"journal-article","created":{"date-parts":[[2017,4,13]],"date-time":"2017-04-13T10:55:44Z","timestamp":1492080944000},"page":"368","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":227,"title":["Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6486-3102","authenticated-orcid":false,"given":"Nicolas","family":"Audebert","sequence":"first","affiliation":[{"name":"ONERA, The French Aerospace Lab, F-91761 Palaiseau, France"},{"name":"Institut de Recherche en Informatique et Syst\u00e8mes Al\u00e9atoires (IRISA), University Bretagne Sud, UMR 6074, F-56000 Vannes, France"}]},{"given":"Bertrand","family":"Le Saux","sequence":"additional","affiliation":[{"name":"ONERA, The French Aerospace Lab, F-91761 Palaiseau, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2384-8202","authenticated-orcid":false,"given":"S\u00e9bastien","family":"Lef\u00e8vre","sequence":"additional","affiliation":[{"name":"Institut de Recherche en Informatique et Syst\u00e8mes Al\u00e9atoires (IRISA), University Bretagne Sud, UMR 6074, F-56000 Vannes, France"}]}],"member":"1968","published-online":{"date-parts":[[2017,4,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","article-title":"The Pascal Visual Object Classes Challenge: A Retrospective","volume":"111","author":"Everingham","year":"2014","journal-title":"Int. J. Comput. Vis."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_4","first-page":"1","article-title":"Processing of Extremely High-Resolution LiDAR and RGB Data: Outcome of the 2015 IEEE GRSS Data Fusion Contest Part A: 2-D Contest","volume":"9","author":"Gatta","year":"2016","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Audebert, N., Le Saux, B., and Lef\u00e8vre, S. (2016, January 10\u201315). How Useful is Region-Based Classification of Remote Sensing Images in a Deep Learning Framework?. Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China.","DOI":"10.1109\/IGARSS.2016.7730327"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Nogueira, K., Penatti, O.A.B., and Dos Santos, J.A. (arXiv, 2016). Towards Better Exploiting Convolutional Neural Networks for Remote Sensing Scene Classification, arXiv.","DOI":"10.1016\/j.patcog.2016.07.001"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"473","DOI":"10.5194\/isprs-annals-III-3-473-2016","article-title":"Semantic Segmentation of Aerial Images with an Ensemble of CNNs","volume":"3","author":"Marmanis","year":"2016","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Paisitkriangkrai, S., Sherrah, J., Janney, P., and Hengel, A.V.D. (2015, January 7\u201312). Effective Semantic Pixel Labelling with Convolutional Networks and Conditional Random Fields. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA.","DOI":"10.1109\/CVPRW.2015.7301381"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"293","DOI":"10.5194\/isprsannals-I-3-293-2012","article-title":"The ISPRS benchmark on urban object classification and 3D building reconstruction","volume":"1","author":"Rottensteiner","year":"2012","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1127\/1432-8364\/2010\/0041","article-title":"The DGPF test on digital aerial camera evaluation\u2014Overview and test design","volume":"2","author":"Cramer","year":"2010","journal-title":"Photogramm. Fernerkund. Geoinf."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Penatti, O.A.B., Nogueira, K., and dos Santos, J.A. (2015, January 7\u201312). Do Deep Features Generalize from Everyday Objects to Remote Sensing and Aerial Scenes Domains?. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA.","DOI":"10.1109\/CVPRW.2015.7301382"},{"key":"ref_12","unstructured":"Badrinarayanan, V., Kendall, A., and Cipolla, R. (arXiv, 2015). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.jvcir.2015.11.002","article-title":"Vehicle Detection in Aerial Imagery: A small target detection benchmark","volume":"34","author":"Razakarivony","year":"2016","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lagrange, A., Saux, B.L., Beaup\u00e8re, A., Boulch, A., Chan-Hon-Tong, A., Herbin, S., Randrianarivo, H., and Ferecatu, M. (2015, January 26\u201331). Benchmarking Classification of Earth-Observation Data: From Learning Explicit Features to Convolutional Networks. Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy.","DOI":"10.1109\/IGARSS.2015.7326745"},{"key":"ref_15","unstructured":"Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. (2015, January 7\u20139). Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P.H.S. (2015, January 7\u201313). Conditional Random Fields as Recurrent Neural Networks. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.179"},{"key":"ref_17","unstructured":"Sherrah, J. (arXiv, 2016). Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery, arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2016, January 10\u201315). Fully Convolutional Neural Networks for Remote Sensing Image Classification. Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China.","DOI":"10.1109\/IGARSS.2016.7730322"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Audebert, N., Le Saux, B., and Lef\u00e8vre, S. (2016, January 20\u201324). Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks. Proceedings of the Computer Vision\u2014ACCV, Taipei, Taiwan.","DOI":"10.1007\/978-3-319-54181-5_12"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Michel, J., Grizonnet, M., Inglada, J., Malik, J., Bricier, A., and Lahlou, O. (2011, January 24\u201329). Local Feature Based Supervised Object Detection: Sampling, Learning and Detection Strategies. Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium, Vancouver, BC, Canada.","DOI":"10.1109\/IGARSS.2011.6049689"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Gleason, J., Nefian, A.V., Bouyssounousse, X., Fong, T., and Bebis, G. (2011, January 9\u201313). Vehicle Detection from Aerial Imagery. Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China.","DOI":"10.1109\/ICRA.2011.5979853"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Randrianarivo, H., Saux, B.L., and Ferecatu, M. (2013, January 21\u201326). Urban Structure Detection with Deformable Part-Based Models. Proceedings of the 2013 IEEE International Geoscience and Remote Sensing Symposium\u2014IGARSS, Melbourne, Australia.","DOI":"10.1109\/IGARSS.2013.6721126"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1007\/s00138-015-0687-9","article-title":"Pose-invariant vehicle identification in aerial electro-optical imagery","volume":"26","author":"Janney","year":"2015","journal-title":"Mach. Vis. Appl."},{"key":"ref_24","unstructured":"Randrianarivo, H., Saux, B.L., Ferecatu, M., and Crucianu, M. (2016, January 15\u201317). Contextual Discriminatively Trained Model Mixture for Object Detection in Aerial Images. Proceedings of the International Conference on Big Data from Space (BiDS\u201916), Santa Cruz de Tenerife, Spain."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kamenetsky, D., and Sherrah, J. (2015, January 23\u201325). Aerial Car Detection and Urban Understanding. Proceedings of the 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Adelaide, Australia.","DOI":"10.1109\/DICTA.2015.7371225"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1797","DOI":"10.1109\/LGRS.2014.2309695","article-title":"Vehicle Detection in Satellite Images by Hybrid Deep Convolutional Neural Networks","volume":"11","author":"Chen","year":"2014","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"871","DOI":"10.14358\/PERS.75.7.871","article-title":"Object-based detection and classification of vehicles from high-resolution aerial photography","volume":"75","author":"Holt","year":"2009","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.isprsjprs.2008.09.005","article-title":"Classification-based vehicle detection in high-resolution satellite images","volume":"64","author":"Eikvil","year":"2009","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Audebert, N., Le Saux, B., and Lef\u00e8vre, S. (2016, January 22). On the Usability of Deep Networks for Object-Based Image Analysis. Proceedings of the International Conference on Geo-Object based Image Analysis (GEOBIA16), Enschede, The Netherlands.","DOI":"10.3990\/2.399"},{"key":"ref_30","unstructured":"Simonyan, K., and Zisserman, A. (arXiv, 2014). Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_32","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Noh, H., Hong, S., and Han, B. (2015, January 7\u201313). Learning Deconvolution Network for Semantic Segmentation. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.178"},{"key":"ref_34","unstructured":"Marmanis, D., Schindler, K., Wegner, J.D., Galliani, S., Datcu, M., and Stilla, U. (arXiv, 2016). Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection, arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhou, W., Shao, Z., and Cheng, Q. (2016, January 4\u20136). Deep Feature Representations for High-Resolution Remote Sensing Scene Classification. Proceedings of the 2016 4th International Workshop on Earth Observation and Remote Sensing Applications (EORSA), Guangzhou, China.","DOI":"10.1109\/EORSA.2016.7552825"},{"key":"ref_37","unstructured":"Pereira, F., Burges, C.J.C., Bottou, L., and Weinberger, K.Q. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25, Curran Associates, Inc."},{"key":"ref_38","unstructured":"Beucher, S., and Meyer, F. (1992). The Morphological Approach to Segmentation: The Watershed Transformation, Optical Engineering New York-Marcel Dekker Inc."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Dai, J., He, K., and Sun, J. (arXiv, 2015). Instance-aware Semantic Segmentation via Multi-task Network Cascades, arXiv.","DOI":"10.1109\/CVPR.2016.343"},{"key":"ref_40","first-page":"1929","article-title":"Dropout: A Simple Way to Prevent Neural Networks from Overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1109\/MGRS.2016.2548504","article-title":"Domain Adaptation for the Classification of Remote Sensing Data: An Overview of Recent Advances","volume":"4","author":"Tuia","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Courty, N., Flamary, R., Tuia, D., and Rakotomamonjy, A. (IEEE Trans. Pattern Anal. Mach. Intell., 2016). Optimal Transport for Domain Adaptation, IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2016.2615921"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Ren, M., and Zemel, R.S. (arXiv, 2016). End-to-End Instance Segmentation and Counting with Recurrent Attention, arXiv.","DOI":"10.1109\/CVPR.2017.39"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Firat, O., Can, G., and Vural, F.T.Y. (2014, January 24\u201328). Representation Learning for Contextual Object and Region Detection in Remote Sensing. Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden.","DOI":"10.1109\/ICPR.2014.637"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/9\/4\/368\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:32:37Z","timestamp":1760207557000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/9\/4\/368"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,4,13]]},"references-count":45,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2017,4]]}},"alternative-id":["rs9040368"],"URL":"https:\/\/doi.org\/10.3390\/rs9040368","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,4,13]]}}}