{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T16:14:45Z","timestamp":1778948085702,"version":"3.51.4"},"reference-count":50,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2022,9,22]],"date-time":"2022-09-22T00:00:00Z","timestamp":1663804800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Natural Science Foundation for Distinguished Young Scholars of Henan Province","award":["212300410014"],"award-info":[{"award-number":["212300410014"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Detecting buildings, segmenting building footprints, and extracting building edges from high-resolution remote sensing images are vital in applications such as urban planning, change detection, smart cities, and map-making and updating. The tasks of building detection, footprint segmentation, and edge extraction affect each other to a certain extent. However, most previous works have focused on one of these three tasks and have lacked a multitask learning framework that can simultaneously solve the tasks of building detection, footprint segmentation and edge extraction, making it difficult to obtain smooth and complete buildings. This study proposes a novel multiscale and multitask deep learning framework to consider the dependencies among building detection, footprint segmentation, and edge extraction while completing all three tasks. In addition, a multitask feature fusion module is introduced into the deep learning framework to increase the robustness of feature extraction. A multitask loss function is also introduced to balance the training losses among the various tasks to obtain the best training results. Finally, the proposed method is applied to open-source building datasets and large-scale high-resolution remote sensing images and compared with other advanced building extraction methods. To verify the effectiveness of multitask learning, the performance of multitask learning and single-task training is compared in ablation experiments. The experimental results show that the proposed method has certain advantages over other methods and that multitask learning can effectively improve single-task performance.<\/jats:p>","DOI":"10.3390\/rs14194744","type":"journal-article","created":{"date-parts":[[2022,9,22]],"date-time":"2022-09-22T23:07:55Z","timestamp":1663888075000},"page":"4744","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["A Multiscale and Multitask Deep Learning Framework for Automatic Building Extraction"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9690-6373","authenticated-orcid":false,"given":"Jichong","family":"Yin","sequence":"first","affiliation":[{"name":"Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fang","family":"Wu","sequence":"additional","affiliation":[{"name":"Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4984-4920","authenticated-orcid":false,"given":"Yue","family":"Qiu","sequence":"additional","affiliation":[{"name":"Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3822-7804","authenticated-orcid":false,"given":"Anping","family":"Li","sequence":"additional","affiliation":[{"name":"78098 Troops, Chengdu 610000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chengyi","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6184-1300","authenticated-orcid":false,"given":"Xianyong","family":"Gong","sequence":"additional","affiliation":[{"name":"Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"5309","DOI":"10.1080\/01431161.2015.1093195","article-title":"An overview of 21 global and 43 regional land-cover mapping products","volume":"36","author":"Grekousis","year":"2015","journal-title":"Int. J. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1080\/22797254.2017.1416676","article-title":"Automatic building footprint extraction from high-resolution satellite image using mathematical morphology","volume":"51","author":"Gavankar","year":"2018","journal-title":"Eur. J. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1109\/JSTARS.2011.2168195","article-title":"Morphological Building\/Shadow Index for Building Extraction From High-Resolution Imagery Over Urban Areas","volume":"5","author":"Huang","year":"2012","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_4","first-page":"281","article-title":"Multi-scale solution for building extraction from LiDAR and image data","volume":"11","author":"Vu","year":"2009","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Bi, Q., Qin, K., Zhang, H., Zhang, Y., Li, Z., and Xu, K. (2019). A Multi-Scale Filtering Building Index for Building Extraction in Very High-Resolution Satellite Imagery. Remote Sens., 11.","DOI":"10.3390\/rs11050482"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Jabari, S., Zhang, Y., and Suliman, A. (2014, January 13\u201318). Stereo-based building detection in very high resolution satellite imagery using IHS color system. Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada.","DOI":"10.1109\/IGARSS.2014.6946930"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1156","DOI":"10.1109\/TGRS.2008.2008440","article-title":"Urban-Area and Building Detection Using SIFT Keypoints and Graph Theory","volume":"47","author":"Sirmacek","year":"2009","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1016\/j.isprsjprs.2010.02.002","article-title":"An efficient stochastic approach for building footprint extraction from digital elevation models","volume":"65","author":"Tournaire","year":"2010","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_9","first-page":"148","article-title":"Building change detection through multi-scale GEOBIA approach by integrating deep belief networks with fuzzy ontologies","volume":"7","author":"Argyridis","year":"2016","journal-title":"Int. J. Image Data Fusion"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Gavrilescu, R., Zet, C., Fo\u0219al\u0103u, C., Skoczylas, M., and Cotovanu, D. (2018, January 18\u201319). Faster R-CNN: An Approach to Real-Time Object Detection. Proceedings of the 2018 International Conference and Exposition on Electrical And Power Engineering (EPE), Iasi, Romania.","DOI":"10.1109\/ICEPE.2018.8559776"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","article-title":"Fully Convolutional Networks for Semantic Segmentation","volume":"39","author":"Shelhamer","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1109\/TPAMI.2018.2844175","article-title":"Mask R-CNN","volume":"42","author":"He","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1109\/TGRS.2018.2858817","article-title":"Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set","volume":"57","author":"Ji","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1016\/j.isprsjprs.2019.11.004","article-title":"Building segmentation through a gated graph convolutional neural network with deep structured feature embedding","volume":"159","author":"Shi","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yuan, W., and Xu, W. (2021). MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer. Remote Sens., 13.","DOI":"10.3390\/rs13234743"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016, January 8\u201316). SSD: Single Shot MultiBox Detector. Proceedings of the Computer Vision\u2013European Conference on Computer Vision 2016, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_21","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"663","DOI":"10.3390\/rs14030663","article-title":"A Precision Efficient Method for Collapsed Building Detection in Post-Earthquake UAV Images Based on the Improved NMS Algorithm and Faster R-CNN","volume":"14","author":"Ding","year":"2022","journal-title":"Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Bai, T., Pang, Y., Wang, J., Han, K., Luo, J., Wang, H., Lin, J., Wu, J., and Zhang, H. (2020). An Optimized Faster R-CNN Method Based on DRNet and RoI Align for Building Detection in Remote Sensing Images. Remote Sens., 12.","DOI":"10.3390\/rs12050762"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1842","DOI":"10.1109\/JSTARS.2020.2991391","article-title":"Refined Extraction Of Building Outlines From High-Resolution Remote Sensing Imagery Based on a Multifeature Convolutional Neural Network and Morphological Filtering","volume":"13","author":"Xie","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"6169","DOI":"10.1109\/TGRS.2020.3026051","article-title":"MAP-Net: Multiple Attending Path Neural Network for Building Footprint Extraction From Remote Sensed Imagery","volume":"59","author":"Zhu","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 15\u201320). Deep High-Resolution Representation Learning for Human Pose Estimation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ma, J., Wu, L., Tang, X., Liu, F., Zhang, X., and Jiao, L. (2020). Building Extraction of Aerial Images by a Global and Multi-Scale Encoder-Decoder Network. Remote Sens., 12.","DOI":"10.3390\/rs12152350"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2013MICCAI 2015, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lu, T., Ming, D., Lin, X., Hong, Z., Bai, X., and Fang, J. (2018). Detecting Building Edges from High Spatial Resolution Remote Sensing Imagery Using Richer Convolution Features Network. Remote Sens., 10.","DOI":"10.3390\/rs10091496"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wu, G., Guo, Z., Shi, X., Chen, Q., Xu, Y., Shibasaki, R., and Shao, X. (2018). A Boundary Regulated Network for Accurate Roof Segmentation and Outline Extraction. Remote Sens., 10.","DOI":"10.3390\/rs10081195"},{"key":"ref_32","unstructured":"Jiwani, A., Ganguly, S., Ding, C., Zhou, N., and Chan, D.M. (2021). A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the Computer Vision\u2013European Conference on Computer Vision 2018, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"7502","DOI":"10.1109\/TGRS.2020.2973720","article-title":"Building Footprint Generation by Integrating Convolution Neural Network With Feature Pairwise Conditional Random Field (FPCRF)","volume":"58","author":"Li","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"9362","DOI":"10.1109\/TGRS.2019.2926397","article-title":"Multi-Scale and Multi-Task Deep Learning Framework for Automatic Road Extraction","volume":"57","author":"Lu","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Teichmann, M., Weber, M., Z\u00f6llner, M., Cipolla, R., and Urtasun, R. (2018, January 26\u201330). MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving. Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China.","DOI":"10.1109\/IVS.2018.8500504"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wu, D., Liao, M., Zhang, W., Wang, X., Bai, X., Cheng, W., and Liu, W. (2021). YOLOP: You Only Look Once for Panoptic Driving Perception. arXiv.","DOI":"10.1007\/s11633-022-1339-y"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Bischke, B., Helber, P., Folz, J., Borth, D., and Dengel, A. (2019, January 22\u201325). Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803050"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Bolya, D., Zhou, C., Xiao, F., and Lee, Y.J. (November, January 27). YOLACT: Real-Time Instance Segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00925"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201321). Path Aggregation Network for Instance Segmentation. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., and Garcia-Rodriguez, J. (2017). A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv.","DOI":"10.1016\/j.asoc.2018.05.018"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Chen, J., Zhang, D., Wu, Y., Chen, Y., and Yan, X. (2022). A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery. Remote Sens., 14.","DOI":"10.3390\/rs14092276"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Qiu, Y., Wu, F., Yin, J., Liu, C., Gong, X., and Wang, A. (2022). MSL-Net: An Efficient Network for Building Extraction from Aerial Imagery. Remote Sens., 14.","DOI":"10.3390\/rs14163914"},{"key":"ref_46","unstructured":"Mnih, V. (2013). Machine Learning for Aerial Image Labeling. [Ph.D. Thesis, University of Toronto]."},{"key":"ref_47","unstructured":"YICHO-YUE (2022, July 11). GitHub Repository. Available online: https:\/\/github.com\/Yicho-Yue\/RSIBE."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Zhao, K., Kang, J., Jung, J., and Sohn, G. (2018, January 18\u201322). Building Extraction from Satellite Images Using Mask R-CNN with Building Boundary Regularization. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00045"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Zhai, R., Li, A., Yin, J., Du, J., and Qiu, Y. (2022). A Progressive Simplification Method for Buildings Based on Structural Subdivision. ISPRS Int. J. Geo-Inf., 11.","DOI":"10.3390\/ijgi11070393"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/19\/4744\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:37:43Z","timestamp":1760143063000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/19\/4744"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,22]]},"references-count":50,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["rs14194744"],"URL":"https:\/\/doi.org\/10.3390\/rs14194744","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,22]]}}}