{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T15:29:17Z","timestamp":1770737357495,"version":"3.49.0"},"reference-count":26,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2017,12,29]],"date-time":"2017-12-29T00:00:00Z","timestamp":1514505600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41501485"],"award-info":[{"award-number":["41501485"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"High Resolution Earth Observation Science Foundation","award":["GFZX04060103"],"award-info":[{"award-number":["GFZX04060103"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>In recent years, Fully Convolutional Networks (FCN) have led to a great improvement of semantic labeling for various applications including multi-modal remote sensing data. Although different fusion strategies have been reported for multi-modal data, there is no in-depth study of the reasons of performance limits. For example, it is unclear, why an early fusion of multi-modal data in FCN does not lead to a satisfying result. In this paper, we investigate the contribution of individual layers inside FCN and propose an effective fusion strategy for the semantic labeling of color or infrared imagery together with elevation (e.g., Digital Surface Models). The sensitivity and contribution of layers concerning classes and multi-modal data are quantified by recall and descent rate of recall in a multi-resolution model. The contribution of different modalities to the pixel-wise prediction is analyzed explaining the reason of the poor performance caused by the plain concatenation of different modalities. Finally, based on the analysis an optimized scheme for the fusion of layers with image and elevation information into a single FCN model is derived. Experiments are performed on the ISPRS Vaihingen 2D Semantic Labeling dataset (infrared and RGB imagery as well as elevation) and the Potsdam dataset (RGB imagery and elevation). Comprehensive evaluations demonstrate the potential of the proposed approach.<\/jats:p>","DOI":"10.3390\/rs10010052","type":"journal-article","created":{"date-parts":[[2017,12,29]],"date-time":"2017-12-29T10:58:47Z","timestamp":1514545127000},"page":"52","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["Effective Fusion of Multi-Modal Remote Sensing Data in a Fully Convolutional Network for Semantic Labeling"],"prefix":"10.3390","volume":"10","author":[{"given":"Wenkai","family":"Zhang","sequence":"first","affiliation":[{"name":"Key Laboratory of Spatial Information Processing and Application System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Huairou District, Beijing 100049, China"}]},{"given":"Hai","family":"Huang","sequence":"additional","affiliation":[{"name":"Institute for Applied Computer Science, Bundeswehr University Munich, Werner-Heisenberg-Weg 39, D-85577 Neubiberg, Germany"}]},{"given":"Matthias","family":"Schmitz","sequence":"additional","affiliation":[{"name":"Institute for Applied Computer Science, Bundeswehr University Munich, Werner-Heisenberg-Weg 39, D-85577 Neubiberg, Germany"}]},{"given":"Xian","family":"Sun","sequence":"additional","affiliation":[{"name":"Key Laboratory of Spatial Information Processing and Application System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Hongqi","family":"Wang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Spatial Information Processing and Application System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Helmut","family":"Mayer","sequence":"additional","affiliation":[{"name":"Institute for Applied Computer Science, Bundeswehr University Munich, Werner-Heisenberg-Weg 39, D-85577 Neubiberg, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2017,12,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_3","unstructured":"Pereira, F., Burges, C.J.C., Bottou, L., and Weinberger, K.Q. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, Curran Associates, Inc."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Fu, G., Liu, C., Zhou, R., Sun, T., and Zhang, Q. (2017). Classification for high resolution remote sensing imagery using a fully convolutional network. Remote Sens., 9.","DOI":"10.3390\/rs9050498"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"L\u00e4ngkvist, M., Kiselev, A., Alirezaie, M., and Loutfi, A. (2016). Classification and segmentation of satellite orthoimagery using convolutional neural networks. Remote Sens., 8.","DOI":"10.3390\/rs8040329"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhang, M., Hu, X., Zhao, L., Lv, Y., Luo, M., and Pang, S. (2017). Learning dual multi-scale manifold ranking for semantic segmentation of high-resolution images. Remote Sens., 9.","DOI":"10.20944\/preprints201704.0061.v1"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P.H. (2015, January 7\u201313). Conditional random fields as recurrent neural networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.179"},{"key":"ref_8","unstructured":"Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (arXiv, 2016). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Hazirbas, C., Ma, L., Domokos, C., and Cremers, D. (2016, January 20\u201324). FuseNet: Incorporating depth into semantic segmentation via fusion-based CNN architecture. Proceedings of the Asian Conference on Computer Vision ACCV, Taipei, Taiwan.","DOI":"10.1007\/978-3-319-54181-5_14"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"473","DOI":"10.5194\/isprs-annals-III-3-473-2016","article-title":"Semantic segmentation of aerial images with an ensemble of CNSS","volume":"3","author":"Marmanis","year":"2016","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","article-title":"Fully convolutional networks for semantic segmentation","volume":"39","author":"Shelhamer","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","unstructured":"Sherrah, J. (arXiv, 2016). Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery, arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1109\/TGRS.2016.2612821","article-title":"Convolutional neural networks for large-scale remote-sensing image classification","volume":"55","author":"Maggiori","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1360","DOI":"10.1109\/TIP.2005.852470","article-title":"Toward automatic phenotyping of developing embryos from videos","volume":"14","author":"Ning","year":"2005","journal-title":"IEEE Trans. Image Process."},{"key":"ref_15","unstructured":"Pinheiro, P.H.O., and Collobert, R. (arXiv, 2013). Recurrent convolutional neural networks for scene parsing, arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zeiler, M.D., and Fergus, R. (2014, January 6\u201312). Visualizing and understanding convolutional networks. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"ref_17","unstructured":"Zintgraf, L.M., Cohen, T.S., Adel, T., and Welling, M. (arXiv, 2017). Visualizing deep neural network decisions: Prediction difference analysis, arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"513","DOI":"10.5194\/isprs-archives-XLII-1-W1-513-2017","article-title":"A multi-resolution fusion model incorporating color and elevation for semantic segmentation","volume":"XLII-1\/W1","author":"Zhang","year":"2017","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2016.2644615"},{"key":"ref_20","unstructured":"Badrinarayanan, V., Kendall, A., and Cipolla, R. (arXiv, 2015). Segnet: A deep convolutional encoder-decoder architecture for image segmentation, arXiv."},{"key":"ref_21","unstructured":"Kampffmeyer, M., Salberg, A.-B., and Jenssen, R. (July, January 26). Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Audebert, N., Saux, B.L., and Lef\u00e8vre, S. (arXiv, 2016). Semantic segmentation of earth observation data using multimodal and multi-scale deep networks, arXiv.","DOI":"10.1007\/978-3-319-54181-5_12"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Gupta, S., Girshick, R., Arbel\u00e1ez, P., and Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. Computer Vision\u2014ECCV 2014, Springer.","DOI":"10.1007\/978-3-319-10584-0_23"},{"key":"ref_24","unstructured":"Gerke, M., Rottensteiner, F., Wegner, J.D., and Sohn, G. (2014, January 5\u20137). ISPRS semantic labeling contest. Proceedings of the Photogrammetric Computer Vision\u2014PCV, Zurich, Switzerland."},{"key":"ref_25","unstructured":"Simonyan, K., and Zisserman, A. (arXiv, 2014). Very deep convolutional networks for large-scale image recognition, arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014, January 3\u20137). Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2654889"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/10\/1\/52\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:56:01Z","timestamp":1760208961000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/10\/1\/52"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,29]]},"references-count":26,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2018,1]]}},"alternative-id":["rs10010052"],"URL":"https:\/\/doi.org\/10.3390\/rs10010052","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,12,29]]}}}