{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T14:32:22Z","timestamp":1775745142362,"version":"3.50.1"},"reference-count":65,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2022,5,7]],"date-time":"2022-05-07T00:00:00Z","timestamp":1651881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61991423"],"award-info":[{"award-number":["61991423"]}]},{"name":"National Natural Science Foundation of China","award":["U1805264"],"award-info":[{"award-number":["U1805264"]}]},{"name":"National Natural Science Foundation of China","award":["XDB32050100"],"award-info":[{"award-number":["XDB32050100"]}]},{"name":"Strategic Priority Research Program of the Chinese Academy of Sciences","award":["61991423"],"award-info":[{"award-number":["61991423"]}]},{"name":"Strategic Priority Research Program of the Chinese Academy of Sciences","award":["U1805264"],"award-info":[{"award-number":["U1805264"]}]},{"name":"Strategic Priority Research Program of the Chinese Academy of Sciences","award":["XDB32050100"],"award-info":[{"award-number":["XDB32050100"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Single-view height estimation and semantic segmentation have received increasing attention in recent years and play an important role in the photogrammetry and remote sensing communities. The height information and semantic information of images are correlated, and some recent works have shown that multi-task learning methods can achieve complementation of task-related features and improve the prediction results of the multiple tasks. Although much progress has been made in recent works, how to effectively extract and fuse height features and semantic features is still an open issue. In this paper, a self- and cross-enhancement network (SCE-Net) is proposed to jointly perform height estimation and semantic segmentation on single aerial images. A feature separation\u2013fusion module is constructed to effectively separate and fuse height features and semantic features based on an attention mechanism for feature representation enhancement across tasks. In addition, a height-guided feature distance loss and a semantic-guided feature distance loss are designed based on deep metric learning to achieve task-aware feature representation enhancement. Extensive experiments are conducted on the Vaihingen dataset and the Potsdam dataset to verify the effectiveness of the proposed method. The experimental results demonstrate that the proposed SCE-Net could outperform the state-of-the-art methods and achieve better performance in both height estimation and semantic segmentation.<\/jats:p>","DOI":"10.3390\/rs14092252","type":"journal-article","created":{"date-parts":[[2022,5,8]],"date-time":"2022-05-08T23:27:25Z","timestamp":1652052445000},"page":"2252","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["SCE-Net: Self- and Cross-Enhancement Network for Single-View Height Estimation and Semantic Segmentation"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3874-6521","authenticated-orcid":false,"given":"Siyuan","family":"Xing","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4015-1615","authenticated-orcid":false,"given":"Qiulei","family":"Dong","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Zhanyi","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"3477","DOI":"10.1080\/01431161.2016.1182666","article-title":"Digital terrain models derived from digital surface model uniform regions in urban areas","volume":"37","author":"Beumier","year":"2016","journal-title":"Int. J. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/j.isprsjprs.2016.09.013","article-title":"3d change detection\u2013approaches and applications","volume":"122","author":"Qin","year":"2016","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"43","DOI":"10.5194\/isprs-annals-III-8-43-2016","article-title":"Automatic building damage detection method using high-resolution remote sensing images and 3d gis model","volume":"3","author":"Tu","year":"2016","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"481","DOI":"10.18280\/ts.380228","article-title":"A region-based efficient network for accurate object detection","volume":"38","author":"Guan","year":"2021","journal-title":"Trait. Signal"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Thiagarajan, K., Anandan, M.M., Stateczny, A., Divakarachari, P.B., and Lingappa, H.K. (2021). Satellite image classification using a hierarchical ensemble learning and correlation coefficient-based gravitational search algorithm. Remote Sens., 13.","DOI":"10.3390\/rs13214351"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wenkel, S., Alhazmi, K., Liiv, T., Alrshoud, S., and Simon, M. (2021). Confidence score: The forgotten dimension of object detection performance evaluation. Sensors, 21.","DOI":"10.3390\/s21134350"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Shivappriya, S.N., Priyadarsini, M.J.P., Stateczny, A., Puttamadappa, C., and Parameshachari, B.D. (2021). Cascade object detection and remote sensing object detection method based on trainable activation function. Remote Sens., 13.","DOI":"10.3390\/rs13020200"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Srivastava, S., Volpi, M., and Tuia, D. (2017, January 23\u201328). Joint height estimation and semantic labeling of monocular aerial images with cnns. Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2017, Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8128167"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Zhong, Y., and Wang, J. (August, January 28). Pop-net: Encoder-dual decoder for semantic segmentation and single-view height estimation. Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2019, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8897927"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/0924-2716(89)90027-0","article-title":"Relief mapping using nonphotographic spaceborne imagery","volume":"44","author":"Raggam","year":"1989","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Roncella, R., Bruno, N., Diotri, F., Thoeni, K., and Giacomini, A. (2021). Photogrammetric digital surface model reconstruction in extreme low-light environments. Remote Sens., 13.","DOI":"10.3390\/rs13071261"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"4361","DOI":"10.1109\/TGRS.2018.2817122","article-title":"Generation of highly accurate dems over flat areas by means of dual-frequency and dual-baseline airborne sar interferometry","volume":"56","author":"Pinheiro","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ka, M.H., Shimkin, P.E., Baskakov, A.I., and Babokin, M.I. (2019). A new single-pass sar interferometry technique with a single-antenna for terrain height measurements. Remote Sens., 11.","DOI":"10.3390\/rs11091070"},{"key":"ref_14","unstructured":"Mou, L., and Zhu, X.X. (2018). Im2height: Height estimation from single monocular imagery via fully residual convolutional-deconvolutional network. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, Y., and Chen, X. (2019, January 8\u201312). Multi-path fusion network for high-resolution height estimation from a single orthophoto. Proceedings of the 2019 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2017, Shanghai, China.","DOI":"10.1109\/ICMEW.2019.00-89"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1016\/j.isprsjprs.2019.01.013","article-title":"Height estimation from single aerial images using a deep convolutional encoder-decoder network","volume":"149","author":"Amirkolaee","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_17","first-page":"1","article-title":"Height estimation from single aerial images using a deep ordinal regression network","volume":"19","author":"Li","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liu, C.J., Krylov, V.A., Kane, P., Kavanagh, G., and Dahyot, R. (2020). Im2elevation: Building height estimation from single-view aerial imagery. Remote Sens., 12.","DOI":"10.3390\/rs12172719"},{"key":"ref_19","first-page":"1","article-title":"Gated feature aggregation for height estimation from single aerial images","volume":"19","author":"Xing","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1109\/LGRS.2020.2976485","article-title":"Soft-aligned gradient-chaining network for height estimation from single aerial images","volume":"18","author":"Mo","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Karatsiolis, S., Kamilaris, A., and Cole, I. (2021). Img2ndsm: Height estimation from single airborne rgb images with deep learning. Remote Sens., 13.","DOI":"10.3390\/rs13122417"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"794","DOI":"10.1109\/LGRS.2018.2806945","article-title":"Img2dsm: Height simulation from single imagery using conditional generative adversarial net","volume":"15","author":"Ghamisi","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1288","DOI":"10.1109\/LGRS.2020.2997295","article-title":"U-img2dsm: Unpaired simulation of digital surface models with generative adversarial networks","volume":"18","author":"Paoletti","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Panagiotou, E., Chochlakis, G., Grammatikopoulos, L., and Charou, E. (2020). Generating elevation surface from a single rgb remotely sensed image using deep learning. Remote Sens., 12.","DOI":"10.3390\/rs12122002"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 8\u201310). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Noh, H., Hong, S., and Han, B. (2015, January 13\u201316). Learning deconvolution network for semantic segmentation. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.178"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1109\/TGRS.2016.2616585","article-title":"Dense semantic labeling of subdecimeter resolution images with convolutional neural networks","volume":"55","author":"Volpi","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.isprsjprs.2017.11.011","article-title":"Beyond rgb: Very high resolution urban remote sensing with multimodal deep networks","volume":"140","author":"Audebert","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2015, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I. (2017, January 21\u201326). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.isprsjprs.2019.07.007","article-title":"Treeunet: Adaptive tree convolutional neural networks for subdecimeter aerial image segmentation","volume":"156","author":"Yue","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_33","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., and Fowlkes, C.C. (2016, January 8\u201316). Laplacian pyramid reconstruction and refinement for semantic segmentation. Proceedings of the European Conference on Computer Vision, ECCV 2018, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46487-9_32"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Bilinski, P., and Prisacariu, V. (2018, January 18\u201322). Dense decoder shortcut connections for single-pass semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00690"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Yang, M., Yu, K., Zhang, C., Li, Z., and Yang, K. (2018, January 18\u201322). Denseaspp for semantic segmentation in street scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00388"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"7503","DOI":"10.1109\/TGRS.2019.2913861","article-title":"Dynamic multi-context segmentation of remote sensing images based on convolutional networks","volume":"57","author":"Nogueira","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2016, January 8\u201316). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision, ECCV 2018, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"4544","DOI":"10.1109\/TGRS.2016.2543748","article-title":"Spectral\u2013spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach","volume":"54","author":"Zhao","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hung, W.C., Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., and Yang, M.H. (2017, January 22\u201329). Scene parsing with global context embedding. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.287"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., and Agrawal, A. (2018, January 18\u201322). Context encoding for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00747"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"7557","DOI":"10.1109\/TGRS.2020.2979552","article-title":"Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images","volume":"58","author":"Mou","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Yi, Y., Zhang, Z., Zhang, W., Zhang, C., Li, W., and Zhao, T. (2019). Semantic segmentation of urban buildings from vhr remote sensing imagery using a deep convolutional neural network. Remote Sens., 11.","DOI":"10.3390\/rs11151774"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Mou, L., Hua, Y., and Zhu, X.X. (2019, January 16\u201320). A relation-augmented fully convolutional network for semantic segmentation in aerial scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01270"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"5367","DOI":"10.1109\/TGRS.2020.2964675","article-title":"Semantic segmentation of large-size vhr remote sensing images using a two-stage multiscale training architecture","volume":"58","author":"Ding","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1016\/j.isprsjprs.2018.01.021","article-title":"Land cover mapping at very high resolution with rotation equivariant cnns: Towards small yet accurate models","volume":"145","author":"Marcos","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1016\/j.isprsjprs.2017.11.009","article-title":"Classification with an edge: Improving semantic image segmentation with boundary detection","volume":"135","author":"Marmanis","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Eigen, D., and Fergus, R. (2015, January 13\u201316). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.304"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Xu, D., Ouyang, W., Wang, X., and Sebe, N. (2018, January 18\u201322). Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00077"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/j.isprsjprs.2018.06.007","article-title":"Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images","volume":"144","author":"Volpi","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Papadomanolaki, M., Karantzalos, K., and Vakalopoulou, M. (August, January 28). A multi-task deep learning framework coupling semantic segmentation and image reconstruction for very high resolution imagery. Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2019, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8898133"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Wang, C., Pei, J., Wang, Z., Huang, Y., Wu, J., Yang, H., and Yang, J. (2020). When deep learning meets multi-task learning in sar atr: Simultaneous target recognition and segmentation. Remote Sens., 12.","DOI":"10.3390\/rs12233863"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"1391","DOI":"10.1109\/LGRS.2019.2947783","article-title":"Multitask learning of height and semantics from aerial images","volume":"17","author":"Carvalho","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Mahmud, J., Price, T., Bapat, A., and Frahm, J.M. (2020, January 14\u201319). Boundary-aware 3d building reconstruction from a single overhead image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2020, Virtual.","DOI":"10.1109\/CVPR42600.2020.00052"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1109\/JSTARS.2020.3043442","article-title":"Boundary-aware multitask learning for remote sensing imagery","volume":"14","author":"Wang","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Fu, H., Gong, M., Wang, C., Batmanghelich, K., and Tao, D. (2018, January 18\u201322). Deep ordinal regression network for monocular depth estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00214"},{"key":"ref_59","unstructured":"Schultz, M., and Joachims, T. (2003, January 8\u201313). Learning a distance metric from relative comparisons. Proceedings of the Advances in Neural Information Processing Systems, NeurIPS 2003, Vancouver and Whistler, Vancouver, BC, Canada; Whistler, BC, Canada."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., and Wu, Y. (2014, January 24\u201327). Learning fine-grained image similarity with deep ranking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.180"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Jung, H., Park, E., and Yoo, S. (2021, January 11\u201317). Fine-grained semantics-aware representation enhancement for self-supervised monocular depth estimation. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2021, Virtual.","DOI":"10.1109\/ICCV48922.2021.01241"},{"key":"ref_62","first-page":"1","article-title":"Geometry-aware segmentation of remote sensing images via joint height estimation","volume":"19","author":"Li","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_63","unstructured":"Gerke, M. (2022, March 21). Use of the Stair Vision Library within the ISPRS 2d Semantic Labeling Benchmark (Vaihingen). Available online: http:\/\/www2.isprs.org\/commissions\/comm3\/wg4\/2d-sem-label-vaihingen.html."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Carvalho, M., Saux, B.L., Trouv\u00e9-Peloux, P., Almansa, A., and Champagnat, F. (2018, January 7\u201310). On regression losses for deep depth estimation. Proceedings of the 25th IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece.","DOI":"10.1109\/ICIP.2018.8451312"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Alidoost, F., Arefi, H., and Tombari, F. (2019). 2d image-to-3d model: Knowledge-based 3d building reconstruction (3dbr) using single aerial images and convolutional neural networks (cnns). Remote Sens., 11.","DOI":"10.3390\/rs11192219"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/9\/2252\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:07:44Z","timestamp":1760137664000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/9\/2252"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,7]]},"references-count":65,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["rs14092252"],"URL":"https:\/\/doi.org\/10.3390\/rs14092252","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,7]]}}}