{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T21:00:24Z","timestamp":1774126824216,"version":"3.50.1"},"reference-count":47,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2020,3,19]],"date-time":"2020-03-19T00:00:00Z","timestamp":1584576000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the National Key R&amp;D Program of China under Grant","award":["2017-YFB0502700"],"award-info":[{"award-number":["2017-YFB0502700"]}]},{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61501098"],"award-info":[{"award-number":["61501098"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the High-Resolution Earth Observation Youth Foundation","award":["GFZX04061502"],"award-info":[{"award-number":["GFZX04061502"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Instance segmentation in high-resolution (HR) remote sensing imagery is one of the most challenging tasks and is more difficult than object detection and semantic segmentation tasks. It aims to predict class labels and pixel-wise instance masks to locate instances in an image. However, there are rare methods currently suitable for instance segmentation in the HR remote sensing images. Meanwhile, it is more difficult to implement instance segmentation due to the complex background of remote sensing images. In this article, a novel instance segmentation approach of HR remote sensing imagery based on Cascade Mask R-CNN is proposed, which is called a high-quality instance segmentation network (HQ-ISNet). In this scheme, the HQ-ISNet exploits a HR feature pyramid network (HRFPN) to fully utilize multi-level feature maps and maintain HR feature maps for remote sensing images\u2019 instance segmentation. Next, to refine mask information flow between mask branches, the instance segmentation network version 2 (ISNetV2) is proposed to promote further improvements in mask prediction accuracy. Then, we construct a new, more challenging dataset based on the synthetic aperture radar (SAR) ship detection dataset (SSDD) and the Northwestern Polytechnical University very-high-resolution 10-class geospatial object detection dataset (NWPU VHR-10) for remote sensing images instance segmentation which can be used as a benchmark for evaluating instance segmentation algorithms in the high-resolution remote sensing images. Finally, extensive experimental analyses and comparisons on the SSDD and the NWPU VHR-10 dataset show that (1) the HRFPN makes the predicted instance masks more accurate, which can effectively enhance the instance segmentation performance of the high-resolution remote sensing imagery; (2) the ISNetV2 is effective and promotes further improvements in mask prediction accuracy; (3) our proposed framework HQ-ISNet is effective and more accurate for instance segmentation in the remote sensing imagery than the existing algorithms.<\/jats:p>","DOI":"10.3390\/rs12060989","type":"journal-article","created":{"date-parts":[[2020,3,19]],"date-time":"2020-03-19T10:01:35Z","timestamp":1584612095000},"page":"989","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":120,"title":["HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery"],"prefix":"10.3390","volume":"12","author":[{"given":"Hao","family":"Su","sequence":"first","affiliation":[{"name":"School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"given":"Shunjun","family":"Wei","sequence":"additional","affiliation":[{"name":"School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"given":"Shan","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"given":"Jiadian","family":"Liang","sequence":"additional","affiliation":[{"name":"School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"given":"Chen","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"given":"Jun","family":"Shi","sequence":"additional","affiliation":[{"name":"School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"given":"Xiaoling","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,3,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"8983","DOI":"10.1109\/TGRS.2019.2923988","article-title":"Dense Attention Pyramid Networks for Multi-Scale Ship Detection in SAR Images","volume":"57","author":"Cui","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"6699","DOI":"10.1109\/TGRS.2018.2841808","article-title":"Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network","volume":"56","author":"Mou","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Su, H., Wei, S., Yan, M., Wang, C., Shi, J., and Zhang, X. (August, January 28). Object Detection and Instance Segmentation in Remote Sensing Imagery Based on Precise Mask R-CNN. Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8898573"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wang, Y., Wang, C., Zhang, H., Dong, Y., and Wei, S. (2019). Automatic Ship Detection Based on RetinaNet Using Multi-Resolution Gaofen-3 Imagery. Remote Sens., 11.","DOI":"10.3390\/rs11050531"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhang, J., Lin, S., Ding, L., and Bruzzone, L. (2020). Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images. Remote Sens., 12.","DOI":"10.3390\/rs12040701"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ma, H., Liu, Y., Ren, Y., and Yu, J. (2020). Detection of Collapsed Buildings in Post-Earthquake Remote Sensing Images Based on the Improved YOLOv3. Remote Sens., 12.","DOI":"10.3390\/rs12010044"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Gong, Y., Xiao, Z., Tan, X., Sui, H., Xu, C., Duan, H., and Li, D. (2019). Context-Aware Convolutional Neural Network for Object Detection in VHR Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens.","DOI":"10.1109\/TGRS.2019.2930246"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"9820","DOI":"10.1109\/TGRS.2019.2929598","article-title":"Multi-Layer Abstraction Saliency for Airport Detection in SAR Images","volume":"57","author":"Liu","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wei, S., Su, H., Ming, J., Wang, C., Yan, M., Kumar, D., Shi, J., and Zhang, X. (2020). Precise and Robust Ship Detection for High-Resolution SAR Imagery Based on HR-SDNet. Remote Sens., 12.","DOI":"10.3390\/rs12010167"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.isprsjprs.2018.04.003","article-title":"Multi-scale object detection in remote sensing imagery with convolutional neural networks","volume":"145","author":"Deng","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"8333","DOI":"10.1109\/TGRS.2019.2920534","article-title":"DRBox-v2: An Improved Detector with Rotatable Boxes for Target Detection in SAR Images","volume":"57","author":"An","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Xiao, X., Zhou, Z., Wang, B., Li, L., and Miao, L. (2019). Ship Detection under Complex Backgrounds Based on Accurate Rotated Anchor Boxes from Paired Semantic Segmentation. Remote Sens., 11.","DOI":"10.3390\/rs11212506"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1100","DOI":"10.1109\/TGRS.2018.2864716","article-title":"Buildings Detection in VHR SAR Images Using Fully Convolution Neural Networks","volume":"57","author":"Shahzad","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1633","DOI":"10.1109\/JSTARS.2018.2810320","article-title":"Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images","volume":"11","author":"Chen","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"3252","DOI":"10.1109\/JSTARS.2018.2860989","article-title":"Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module","volume":"11","author":"Yu","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Peng, C., Li, Y., Jiao, L., Chen, Y., and Shang, R. (2019). Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.","DOI":"10.1109\/JSTARS.2019.2906387"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"7503","DOI":"10.1109\/TGRS.2019.2913861","article-title":"Dynamic multicontext segmentation of remote sensing images based on convolutional networks","volume":"57","author":"Nogueira","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., and Loy, C.C. (2019, January 16\u201320). Hybrid task cascade for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00511"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2019). Cascade R-CNN: High Quality Object Detection and Instance Segmentation. arXiv.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Huang, Z., Huang, L., Gong, Y., Huang, C., and Wang, X. (2019, January 16\u201320). Mask scoring r-cnn. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00657"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, J., Qu, C., and Shao, J. (2017, January 13\u201314). Ship detection in SAR images based on an improved faster R-CNN. Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China.","DOI":"10.1109\/BIGSARDATA.2017.8124934"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_26","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). Ssd: Single shot multibox detector. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_28","unstructured":"Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C. (2017). Dssd: Deconvolutional single shot detector. arXiv."},{"key":"ref_29","unstructured":"Li, Z., and Zhou, F. (2017). FSSD: Feature fusion single shot multibox detector. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_33","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, MIT PRESS."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201323). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., and Adam, H. (2018, January 18\u201323). Masklab: Instance segmentation by refining object detection with semantic and direction features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00422"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2978","DOI":"10.1109\/TPAMI.2017.2775623","article-title":"Proposal-free network for instance-level object segmentation","volume":"40","author":"Liang","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Bai, M., and Urtasun, R. (2017, January 21\u201326). Deep watershed transform for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.305"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019). Deep high-resolution representation learning for human pose estimation. arXiv.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_41","unstructured":"Sun, K., Zhao, Y., Jiang, B., Cheng, T., Xiao, B., Liu, D., and Wang, J. (2019). High-Resolution Representations for Labeling Pixels and Regions. arXiv."},{"key":"ref_42","unstructured":"Wada, K. (2018, July 20). labelme: Image Polygonal Annotation with Python. Available online: https:\/\/github.com\/wkentaro\/labelme."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the 13th European Conference, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_44","unstructured":"Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., and Zhang, Z. (2019). MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_46","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 8\u201316). Identity mappings in deep residual networks. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"766","DOI":"10.1109\/LGRS.2010.2047242","article-title":"Flexible dynamic block adaptive quantization for Sentinel-1 SAR missions","volume":"7","author":"Attema","year":"2010","journal-title":"IEEE Geosci. Remote Sens. Lett."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/6\/989\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:09:46Z","timestamp":1760173786000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/6\/989"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,19]]},"references-count":47,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2020,3]]}},"alternative-id":["rs12060989"],"URL":"https:\/\/doi.org\/10.3390\/rs12060989","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,19]]}}}