{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T11:22:06Z","timestamp":1780053726612,"version":"3.54.0"},"reference-count":30,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,1,13]],"date-time":"2019-01-13T00:00:00Z","timestamp":1547337600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001655","name":"Deutscher Akademischer Austauschdienst","doi-asserted-by":"publisher","award":["50019750"],"award-info":[{"award-number":["50019750"]}],"id":[{"id":"10.13039\/501100001655","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>The tremendous advances in deep neural networks have demonstrated the superiority of deep learning techniques for applications such as object recognition or image classification. Nevertheless, deep learning-based methods usually require a large amount of training data, which mainly comes from manual annotation and is quite labor-intensive. In order to reduce the amount of manual work required for generating enough training data, we hereby propose to leverage existing labeled data to generate image annotations automatically. Specifically, the pixel labels are firstly transferred from one image modality to another image modality via geometric transformation to create initial image annotations, and then additional information (e.g., height measurements) is incorporated for Bayesian inference to update the labeling beliefs. Finally, the updated label assignments are optimized with a fully connected conditional random field (CRF), yielding refined labeling for all pixels in the image. The proposed approach is tested on two different scenarios, i.e., (1) label propagation from annotated aerial imagery to unmanned aerial vehicle (UAV) imagery and (2) label propagation from map database to aerial imagery. In each scenario, the refined image labels are used as pseudo-ground truth data for training a convolutional neural network (CNN). Results demonstrate that our model is able to produce accurate label assignments even around complex object boundaries; besides, the generated image labels can be effectively leveraged for training CNNs and achieve comparable classification accuracy as manual image annotations, more specifically, the per-class classification accuracy of the networks trained by the manual image annotations and the generated image labels have a difference within     \u00b1 5 %    .<\/jats:p>","DOI":"10.3390\/rs11020145","type":"journal-article","created":{"date-parts":[[2019,1,14]],"date-time":"2019-01-14T12:20:07Z","timestamp":1547468407000},"page":"145","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Automatic Annotation of Airborne Images by Label Propagation Based on a Bayesian-CRF Model"],"prefix":"10.3390","volume":"11","author":[{"given":"Xiangyu","family":"Zhuo","sequence":"first","affiliation":[{"name":"Remote Sensing Technology Institute, German Aerospace Center, 82234 Wessling, Germany"},{"name":"Remote Sensing Technology, Technische Universit\u00e4t M\u00fcnchen, 80333 Munich, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5805-8892","authenticated-orcid":false,"given":"Friedrich","family":"Fraundorfer","sequence":"additional","affiliation":[{"name":"Remote Sensing Technology Institute, German Aerospace Center, 82234 Wessling, Germany"},{"name":"Institute for Computer Graphics and Vision, Graz University of Technology, 8010 Graz, Austria"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1718-0004","authenticated-orcid":false,"given":"Franz","family":"Kurz","sequence":"additional","affiliation":[{"name":"Remote Sensing Technology Institute, German Aerospace Center, 82234 Wessling, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8122-1475","authenticated-orcid":false,"given":"Peter","family":"Reinartz","sequence":"additional","affiliation":[{"name":"Remote Sensing Technology Institute, German Aerospace Center, 82234 Wessling, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2019,1,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1007\/s11263-007-0090-8","article-title":"LabelMe: A Database and Web-Based Tool for Image Annotation","volume":"77","author":"Russell","year":"2008","journal-title":"Int. J. Comput. Vis."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Gaidon, A., Wang, Q., Cabon, Y., and Vig, E. (arXiv, 2016). Virtual worlds as proxy for multi-object tracking analysis, arXiv.","DOI":"10.1109\/CVPR.2016.470"},{"key":"ref_4","unstructured":"Ros, G., Sellart, L., Materzynska, J., Vazquez, D., and Lopez, A.M. (July, January 26). The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas Valley, NV, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Vijayanarasimhan, S., and Grauman, K. (2012, January 7\u201313). Active frame selection for label propagation in videos. Proceedings of the 12th European Conference on Computer Vision, Florence, Italy.","DOI":"10.1007\/978-3-642-33715-4_36"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Sun, C., and Lu, H. (2017). Interactive video segmentation via local appearance model. IEEE Transactions on Circuits and Systems for Video Technology, IEEE.","DOI":"10.1109\/TCSVT.2016.2543038"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Song, J., Gao, L., Puscas, M.M., Nie, F., Shen, F., and Sebe, N. (2016, January 15\u201319). Joint graph learning and video segmentation via multiple cues and topology calibration. Proceedings of the 24th ACM international conference on Multimedia, Amsterdam, The Netherlands.","DOI":"10.1145\/2964284.2964295"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2751","DOI":"10.1109\/TPAMI.2013.54","article-title":"Semi-Supervised Video Segmentation Using Tree Structured Graphical Models","volume":"35","author":"Badrinarayanan","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Badrinarayanan, V., Galasso, F., and Cipolla, R. (2010, January 13\u201318). Label propagation in video sequences. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540054"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1007\/s11263-011-0512-5","article-title":"Motion Coherent Tracking Using Multi-label MRF Optimization","volume":"100","author":"Tsai","year":"2012","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Fidler, S., Yuille, A.L., and Urtasun, R. (2014, January 23\u201328). Beat the mturkers: Automatic image labeling from weak 3d supervision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.409"},{"key":"ref_13","unstructured":"Xiao, J., and Quan, L. (Octorber, January 29). Multiple view semantic segmentation for street view images. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Namin, S.T., Najafi, M., Salzmann, M., and Petersson, L. (2015, January 5\u20139). A Multi-modal Graphical Model for Scene Analysis. Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2015.139"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xie, J., Kiefel, M., Sun, M.T., and Geiger, A. (July, January 26). Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas Valley, NV, USA.","DOI":"10.1109\/CVPR.2016.401"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Mustikovela, S.K., Yang, M.Y., and Rother, C. (2016). Can Ground Truth Label Propagation from Video help Semantic Segmentation?. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-49409-8_66"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Budvytis, I., Sauer, P., Roddick, T., Breen, K., and Cipolla, R. (2017, January 22\u201329). Large Scale Labelled Video Data Augmentation for Semantic Segmentation in Driving Scenarios. Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCV Workshop), Venice, Italy.","DOI":"10.1109\/ICCVW.2017.36"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2013). Bayesian Data Analysis, CRC Press.","DOI":"10.1201\/b16018"},{"key":"ref_19","unstructured":"Kr\u00e4henb\u00fchl, P., and Koltun, V. (2011). Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. Adv. Neural Inf. Proc. Syst., 109\u2013117."},{"key":"ref_20","unstructured":"Lafferty, J., McCallum, A., and Pereira, F.C. (July, January 28). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001), Williamstown, MA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1109\/TPAMI.2016.2547384","article-title":"Top-down visual saliency via joint CRF and dictionary learning","volume":"39","author":"Yang","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1007\/s11263-008-0202-0","article-title":"Robust higher order potentials for enforcing label consistency","volume":"82","author":"Kohli","year":"2009","journal-title":"Int. J. Comput. Vis."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ladick\u00fd, L., Russell, C., Kohli, P., and Torr, P.H.S. (Octorber, January 29). Associative hierarchical CRFs for object class image segmentation. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan.","DOI":"10.1109\/ICCV.2009.5459248"},{"key":"ref_24","first-page":"189","article-title":"Performance of a real-time sensor and processing system on a helicopter","volume":"1","author":"Kurz","year":"2014","journal-title":"ISPRS"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhuo, X., Koch, T., Kurz, F., Fraundorfer, F., and Reinartz, P. (2017). Automatic UAV Image Geo-Registration by Matching UAV Images to Georeferenced Image Data. Remote Sens., 9.","DOI":"10.3390\/rs9040376"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.isprsjprs.2011.10.002","article-title":"Parameter-free ground filtering of LiDAR data for automatic DTM generation","volume":"67","author":"Mongus","year":"2012","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1127\/1432-8364\/2010\/0041","article-title":"The DGPF-Test on Digital Airborne Camera Evaluation\u2014Over-view and Test Design","volume":"2010","author":"Cramer","year":"2010","journal-title":"Photogramm. Fernerkundung Geoinf."},{"key":"ref_29","unstructured":"Shi, Z., Siva, P., and Xiang, T. (arXiv, 2017). Transfer learning by ranking for weakly supervised object annotation, arXiv."},{"key":"ref_30","unstructured":"Fort, K., Nazarenko, A., and Rosset, S. (2012, January 8\u201315). Modeling the complexity of manual annotation tasks: A grid of analysis. Proceedings of the 2012 International Conference on Computational Linguistics, Mumbai, India."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/2\/145\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:25:38Z","timestamp":1760185538000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/2\/145"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,13]]},"references-count":30,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,1]]}},"alternative-id":["rs11020145"],"URL":"https:\/\/doi.org\/10.3390\/rs11020145","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,1,13]]}}}