{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:55:24Z","timestamp":1774630524342,"version":"3.50.1"},"reference-count":45,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2022,10,1]],"date-time":"2022-10-01T00:00:00Z","timestamp":1664582400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Sichuan Province Key Laboratory of Higher Education Institutions","award":["SC-QWLY-2021-Y-02"],"award-info":[{"award-number":["SC-QWLY-2021-Y-02"]}]},{"name":"Sichuan Province Key Laboratory of Higher Education Institutions","award":["035200242"],"award-info":[{"award-number":["035200242"]}]},{"name":"the Doctoral Science Foundation","award":["SC-QWLY-2021-Y-02"],"award-info":[{"award-number":["SC-QWLY-2021-Y-02"]}]},{"name":"the Doctoral Science Foundation","award":["035200242"],"award-info":[{"award-number":["035200242"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Building information extraction utilizing remote sensing technology has vital applications in many domains, such as urban planning, cadastral mapping, geographic information censuses, and land-cover change analysis. In recent years, deep learning algorithms with strong feature construction ability have been widely used in automatic building extraction. However, most methods using semantic segmentation networks cannot obtain object-level building information. Some instance segmentation networks rely on predefined detectors and have weak detection ability for buildings with complex shapes and multiple scales. In addition, the advantages of multi-modal remote sensing data have not been effectively exploited to improve model performance with limited training samples. To address the above problems, we proposed a CNN framework with an adaptive center point detector for the object-level extraction of buildings. The proposed framework combines object detection and semantic segmentation with multi-modal data, including high-resolution aerial images and LiDAR data, as inputs. Meanwhile, we developed novel modules to optimize and fuse multi-modal features. Specifically, the local spatial\u2013spectral perceptron can mutually compensate for semantic information and spatial features. The cross-level global context module can enhance long-range feature dependence. The adaptive center point detector explicitly models deformable convolution to improve detection accuracy, especially for buildings with complex shapes. Furthermore, we constructed a building instance segmentation dataset using multi-modal data for model training and evaluation. Quantitative analysis and visualized results verified that the proposed network can improve the accuracy and efficiency of building instance segmentation.<\/jats:p>","DOI":"10.3390\/rs14194920","type":"journal-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T03:07:28Z","timestamp":1665371248000},"page":"4920","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["Multi-Modal Feature Fusion Network with Adaptive Center Point Detector for Building Instance Extraction"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7917-2910","authenticated-orcid":false,"given":"Qinglie","family":"Yuan","sequence":"first","affiliation":[{"name":"Department of Civil Engineering and Geospatial Information Science Research Centre (GISRC), Faculty of Engineering, Universiti Putra Malaysia (UPM), Serdang 43400, Selangor, Malaysia"},{"name":"School of Civil and Architecture Engineering, Panzhihua University, Panzhihua 617000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Helmi Zulhaidi","family":"Mohd Shafri","sequence":"additional","affiliation":[{"name":"Department of Civil Engineering and Geospatial Information Science Research Centre (GISRC), Faculty of Engineering, Universiti Putra Malaysia (UPM), Serdang 43400, Selangor, Malaysia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,10,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"108717","DOI":"10.1016\/j.patcog.2022.108717","article-title":"HFA-Net: High frequency attention siamese network for building change detection in VHR remote sensing images","volume":"129","author":"Zheng","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3909","DOI":"10.1109\/TSP.2016.2552511","article-title":"ISAR cross-range scaling using iterative processing via principal component analysis and bisection algorithm","volume":"64","author":"Kang","year":"2016","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2100491","DOI":"10.1002\/andp.202100491","article-title":"Simulating the Scattering Echo and Inverse Synthetic Aperture Lidar Imaging of Rough Targets","volume":"534","author":"Xue","year":"2022","journal-title":"Ann. Phys."},{"key":"ref_4","unstructured":"Tian, H., Mao, H., Liu, Z., and Zeng, Z. (2020). Sparse imaging of airborne inverse synthetic aperture lidar micro-moving targets. Infrared Laser Range, 1\u201310."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2417","DOI":"10.1109\/TGRS.2012.2210901","article-title":"A change detection approach to flood mapping in urban areas using TerraSAR-X","volume":"51","author":"Giustarini","year":"2013","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1109\/TGRS.2014.2312393","article-title":"Automatic Construction of 3-D Building Model From Airborne LiDAR Data Through 2-D Snake Algorithm","volume":"53","author":"Yan","year":"2015","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1109\/JSTARS.2011.2168195","article-title":"Morphological building\/shadow index for building extraction from high-resolution imagery over urban areas","volume":"5","author":"Huang","year":"2011","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1016\/j.isprsjprs.2017.06.005","article-title":"Automatic building extraction from LiDAR data fusion of point and grid-based features","volume":"130","author":"Du","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_9","first-page":"137","article-title":"A building extraction approach for Airborne Laser Scanner data utilizing the Object Based Image Analysis paradigm","volume":"52","author":"Tomljenovic","year":"2016","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/j.isprsjprs.2018.08.009","article-title":"Extraction of residential building instances in suburban areas from mobile LiDAR data","volume":"144","author":"Xia","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2081","DOI":"10.1109\/JSTARS.2020.2992298","article-title":"Automatic building extraction via adaptive iterative segmentation with LiDAR data and high spatial resolution imagery fusion","volume":"13","author":"Chen","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1864","DOI":"10.1109\/JSTARS.2015.2470547","article-title":"A novel building and tree detection method from LiDAR data and aerial images","volume":"9","author":"Zarea","year":"2015","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_13","first-page":"904","article-title":"An inverse synthetic aperture lidar imaging algorithm","volume":"40","author":"Yang","year":"2010","journal-title":"Laser Infrared"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Ji, S., Shen, Y., Lu, M., and Zhang, Y. (2019). Building instance change detection from large-scale aerial images using convolutional neural networks and simulated samples. Remote Sens., 11.","DOI":"10.3390\/rs11111343"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1016\/j.isprsjprs.2021.05.002","article-title":"Object-level change detection with a dual correlation attention-guided detector","volume":"177","author":"Zhang","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lee, Y., and Park, J. (2020, January 14\u201319). CenterMask: Real-Time Anchor-Free Instance Segmentation. Proceedings of the CVPR 2020: Computer Vision and Pattern Recognition, Virtual, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01392"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wu, T., Hu, Y., Peng, L., and Chen, R. (2020). Improved anchor-free instance segmentation for building extraction from high-resolution remote sensing images. Remote Sens., 12.","DOI":"10.3390\/rs12182910"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yuan, Q., Shafri, H.Z.M., Alias, A.H., and Hashim, S.J.B. (2021). Multi-scale semantic feature optimization and fusion network for building extraction using high-resolution aerial images and LiDAR data. Remote Sens., 13.","DOI":"10.3390\/rs13132473"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path Aggregation Network for Instance Segmentation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). FCOS: Fully Convolutional One-Stage Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_21","unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2020). Objects as points. arXiv."},{"key":"ref_22","first-page":"17721","article-title":"Solov2: Dynamic and fast instance segmentation","volume":"33","author":"Wang","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","unstructured":"Bolya, D., Zhou, C., Xiao, F., and Lee, Y.J. (November, January 27). Yolact: Real-time instance segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Li, Y., Qi, H., Dai, J., Ji, X., and Wei, Y. (2017, January 21\u201326). Fully convolutional instance-aware semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.472"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Dai, J., He, K., and Sun, J. (, January 27\u201330). Instance-aware semantic segmentation via multi-task network cascades. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.","DOI":"10.1109\/CVPR.2016.343"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"107194","DOI":"10.1016\/j.compeleceng.2021.107194","article-title":"A fast instance segmentation with one-stage multi-task deep neural network for autonomous driving","volume":"93","author":"Tseng","year":"2021","journal-title":"Comput. Electr. Eng."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Bischke, B., Helber, P., Folz, J., Borth, D., and Dengel, A. (2019, January 22\u201325). Multi-task learning for segmentation of building footprints with deep neural networks. Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803050"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1983","DOI":"10.1007\/s11554-020-01007-5","article-title":"Joint multi-task cascade for instance segmentation","volume":"17","author":"Wen","year":"2020","journal-title":"J. Real-Time Image Process."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Yoo, J.H., Kim, Y., Kim, J., and Choi, J.W. (2020, January 23\u201328). 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58583-9_43"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201323). Frustum pointnets for 3d object detection from rgb-d data. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1016\/j.isprsjprs.2021.12.007","article-title":"CMGFNet: A deep cross-modal gated fusion network for building extraction from very high-resolution remote sensing images","volume":"184","author":"Hosseinpour","year":"2022","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Cao, Z., Diao, W., Sun, X., Lyu, X., Yan, M., and Fu, K. (2021). C3net: Cross-modal feature recalibrated, cross-scale semantic aggregated and compact network for semantic segmentation of multi-modal high-resolution aerial images. Remote Sens., 13.","DOI":"10.3390\/rs13030528"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, W., and Neumann, U. (2018, January 8\u201314). Depth-aware cnn for rgb-d segmentation. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01252-6_9"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"7012","DOI":"10.1109\/TIP.2020.3028289","article-title":"DPANet: Depth potentiality-aware gated attention network for RGB-D salient object detection","volume":"30","author":"Chen","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017, January 21\u201326). Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.195"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Li, D., Hu, J., Wang, C., Li, X., She, Q., Zhu, L., and Chen, Q. (2021, January 20\u201325). Involution: Inverting the inherence of convolution for visual recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01214"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017, January 22\u201329). Deformable convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_42","unstructured":"(2022, July 30). Available online: https:\/\/earthexplorer.usgs.gov\/."},{"key":"ref_43","unstructured":"(2022, July 30). Available online: https:\/\/coast.noaa.gov\/."},{"key":"ref_44","unstructured":"(2022, July 30). Available online: https:\/\/www.cloudcompare.org."},{"key":"ref_45","unstructured":"Glorot, X., and Bengio, Y. (2010, January 13\u201315). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/19\/4920\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:45:17Z","timestamp":1760143517000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/19\/4920"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,1]]},"references-count":45,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["rs14194920"],"URL":"https:\/\/doi.org\/10.3390\/rs14194920","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,1]]}}}