{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,7]],"date-time":"2026-07-07T15:56:10Z","timestamp":1783439770711,"version":"3.54.6"},"reference-count":51,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2020,5,29]],"date-time":"2020-05-29T00:00:00Z","timestamp":1590710400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Nature Science Foundation of China","award":["61703240"],"award-info":[{"award-number":["61703240"]}]},{"name":"National Nature Science Foundation of China","award":["61673244"],"award-info":[{"award-number":["61673244"]}]},{"name":"Key R&amp;D projects of Shandong province of China","award":["2019JZZY010130"],"award-info":[{"award-number":["2019JZZY010130"]}]},{"name":"National Key R&amp;D Program of China","award":["2018YFB1305300"],"award-info":[{"award-number":["2018YFB1305300"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>With the advantage of high maneuverability, Unmanned Aerial Vehicles (UAVs) have been widely deployed in vehicle monitoring and controlling. However, processing the images captured by UAV for the extracting vehicle information is hindered by some challenges including arbitrary orientations, huge scale variations and partial occlusion. In seeking to address these challenges, we propose a novel Multi-Scale and Occlusion Aware Network (MSOA-Net) for UAV based vehicle segmentation, which consists of two parts including a Multi-Scale Feature Adaptive Fusion Network (MSFAF-Net) and a Regional Attention based Triple Head Network (RATH-Net). In MSFAF-Net, a self-adaptive feature fusion module is proposed, which can adaptively aggregate hierarchical feature maps from multiple levels to help Feature Pyramid Network (FPN) deal with the scale change of vehicles. The RATH-Net with a self-attention mechanism is proposed to guide the location-sensitive sub-networks to enhance the vehicle of interest and suppress background noise caused by occlusions. In this study, we release a large comprehensive UAV based vehicle segmentation dataset (UVSD), which is the first public dataset for UAV based vehicle detection and segmentation. Experiments are conducted on the challenging UVSD dataset. Experimental results show that the proposed method is efficient in detecting and segmenting vehicles, and outperforms the compared state-of-the-art works.<\/jats:p>","DOI":"10.3390\/rs12111760","type":"journal-article","created":{"date-parts":[[2020,6,2]],"date-time":"2020-06-02T09:19:27Z","timestamp":1591089567000},"page":"1760","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":39,"title":["Multi-Scale and Occlusion Aware Network for Vehicle Detection and Segmentation on UAV Aerial Images"],"prefix":"10.3390","volume":"12","author":[{"given":"Wang","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Control Science and Engineering, Shandong University, Jinan 250061, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5516-2486","authenticated-orcid":false,"given":"Chunsheng","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Control Science and Engineering, Shandong University, Jinan 250061, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Faliang","family":"Chang","sequence":"additional","affiliation":[{"name":"School of Control Science and Engineering, Shandong University, Jinan 250061, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ye","family":"Song","sequence":"additional","affiliation":[{"name":"School of Control Science and Engineering, Shandong University, Jinan 250061, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2020,5,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kanistras, K., Martins, G., Rutherford, M.J., and Valavanis, K.P. (2013, January 28\u201331). A survey of unmanned aerial vehicles (UAVs) for traffic monitoring. Proceedings of the International Conference on Unmanned Aircraft Systems (ICUAS), Atlanta, GA, USA.","DOI":"10.1109\/ICUAS.2013.6564694"},{"key":"ref_2","unstructured":"Zhu, P., Wen, L., Bian, X., Ling, H., and Hu, Q. (2018). Vision meets drones: A challenge. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_5","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 8\u201314). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lu, X., Li, B., Yue, Y., Li, Q., and Yan, J. (2019, January 16\u201320). Grid R-CNN. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00754"},{"key":"ref_7","unstructured":"Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., and Sun, J. (2017). Light-Head R-CNN: In Defense of Two-Stage Object Detector. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_9","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). Fcos: Fully convolutional one-stage object detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., and Luo, Z. (2017). R2cnn: Rotational region cnn for orientation robust scene text detection. arXiv.","DOI":"10.1109\/ICPR.2018.8545598"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3111","DOI":"10.1109\/TMM.2018.2818020","article-title":"Arbitrary-oriented scene text detection via rotation proposals","volume":"20","author":"Ma","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., and Guo, Z. (2018). Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes Based on Multiscale Rotation Dense Feature Pyramid Networks. Remote Sens., 10.","DOI":"10.3390\/rs10010132"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Xia, G., Bai, X., Ding, J., Zhu, Z., Belongie, S.J., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201322). DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_14","unstructured":"Wang, P., Jiao, B., Yang, L., Yang, Y., Zhang, S., Wei, W., and Zhang, Y. (November, January 27). Vehicle Re-Identification in Aerial Imagery: Dataset and Approach. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, J., Ding, J., Guo, H., Cheng, W., Pan, T., and Yang, W. (2019). Mask OBB: A Semantic Attention-Based Mask Oriented Bounding Box Representation for Multi-Category Object Detection in Aerial Images. Remote Sens., 11.","DOI":"10.3390\/rs11242930"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201322). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_18","unstructured":"Liang, T., Wang, Y., Zhao, Q., Zhang, H., Tang, Z., and Ling, H. (2019). MFPN: A Novel Mixture Feature Pyramid Network of Multiple Architectures for Object Detection. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE international conference on computer vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, C., Guo, Y., Li, S., and Chang, F. (2019). ACFBased Region Proposal Extraction for YOLOv3 Network Towards High-Performance Cyclist Detection in High Resolution Images. Sensors, 19.","DOI":"10.3390\/s19122671"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Huang, Z., Huang, L., Gong, Y., Huang, C., and Wang, X. (2019, January 16\u201320). Mask scoring r-cnn. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00657"},{"key":"ref_22","unstructured":"Chen, X., Girshick, R.B., He, K., and Doll\u00e1r, P. (November, January 27). TensorMask: A Foundation for Dense Object Segmentation. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_23","unstructured":"Bolya, D., Zhou, C., Xiao, F., and Lee, Y.J. (November, January 27). YOLACT: Real-time instance segmentation. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wang, X., Kong, T., Shen, C., Jiang, Y., and Li, L. (2019). SOLO: Segmenting Objects by Locations. arXiv.","DOI":"10.1007\/978-3-030-58523-5_38"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Xie, E., Sun, P., Song, X., Wang, W., Liu, X., Liang, D., Shen, C., and Luo, P. (2019). PolarMask: Single Shot Instance Segmentation with Polar Representation. arXiv.","DOI":"10.1109\/CVPR42600.2020.01221"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lee, Y., and Park, J. (2019). CenterMask: Real-Time Anchor-Free Instance Segmentation. arXiv.","DOI":"10.1109\/CVPR42600.2020.01392"},{"key":"ref_27","unstructured":"Ying, H., Huang, Z., Liu, S., Shao, T., and Zhou, K. (2019). EmbedMask: Embedding Coupling for One-stage Instance Segmentation. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Miyoshi, G.T., Arruda, M.S., Osco, L.P., Marcato Junior, J., Gon\u00e7alves, D.N., Imai, N.N., Tommaselli, A.M.G., Honkavaara, E., and Gon\u00e7alves, W.N. (2020). A Novel Deep Learning Method to Identify Single Tree Species in UAV-Based Hyperspectral Images. Remote Sens., 12.","DOI":"10.3390\/rs12081294"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Bozcan, I., and Kayacan, E. (2020). AU-AIR: A Multi-modal Unmanned Aerial Vehicle Dataset for Low Altitude Traffic Surveillance. arXiv.","DOI":"10.1109\/ICRA40945.2020.9196845"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Hsieh, M.R., Lin, Y.L., and Hsu, W.H. (2017, January 22\u201329). Drone-based object counting by spatially regularized regional proposal network. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.446"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Robicquet, A., Sadeghian, A., Alahi, A., and Savarese, S. (2016, January 8\u201316). Learning social etiquette: Human trajectory understanding in crowded scenes. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_33"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Mueller, M., Smith, N., and Ghanem, B. (2016, January 8\u201316). A benchmark and simulator for uav tracking. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_27"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., and Tian, Q. (2018, January 8\u201314). The unmanned aerial vehicle benchmark: Object detection and tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_23"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Barekatain, M., Mart\u00ed, M., Shih, H., Murray, S., Nakayama, K., Matsuo, Y., and Prendinger, H. (2017, January 21\u201326). Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.267"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Li, S., and Yeung, D.Y. (2017, January 4\u20139). Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11205"},{"key":"ref_36","unstructured":"(2020, May 21). DJI Matrice 200. Available online: https:\/\/www.dji.com\/be\/matrice-200-series."},{"key":"ref_37","unstructured":"Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Sun, X., and Fu, K. (November, January 27). SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"3377","DOI":"10.1109\/TGRS.2019.2954328","article-title":"FMSSD: Feature-Merged Single-Shot Detection for Multiscale Objects in Large-Scale Remote Sensing Imagery","volume":"58","author":"Wang","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1109\/TITS.2018.2859348","article-title":"Hybrid Cascade Structure for License Plate Detection in Large Visual Surveillance Scenes","volume":"20","author":"Liu","year":"2019","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Qiu, H., Li, H., Wu, Q., Meng, F., Ngan, K.N., and Shi, H. (2019). A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for Object Detection in Remote Sensing Images. Remote Sens., 11.","DOI":"10.3390\/rs11131594"},{"key":"ref_41","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_42","unstructured":"Jie, H., Li, S., and Gang, S. (2018, January 18\u201322). Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201322). Non-local Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 13\u201315). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_46","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). PyTorch: An imperative style, high-performance deep learning library. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Bolya, D., Zhou, C., Xiao, F., and Lee, Y.J. (2019). YOLACT++: Better Real-time Instance Segmentation. arXiv.","DOI":"10.1109\/ICCV.2019.00925"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Milz, S., R\u00fcdiger, T., and S\u00fcss, S. (2018, January 8\u201314). Aerial GANeration: Towards Realistic Data Augmentation Using Conditional GANs. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-11012-3_5"},{"key":"ref_49","unstructured":"Hong, S., Kang, S., and Cho, D. (November, January 27). Patch-Level Augmentation for Object Detection in Aerial Images. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Shrivastava, A., Gupta, A., and Girshick, R. (2016, January 27\u201330). Training region-based object detectors with online hard example mining. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.89"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Fu, K., Chen, Z., Zhang, Y., and Sun, X. (2019). Enhanced Feature Representation in Detection for Optical Remote Sensing Images. Remote Sens., 11.","DOI":"10.3390\/rs11182095"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/11\/1760\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:33:52Z","timestamp":1760175232000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/11\/1760"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,29]]},"references-count":51,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,6]]}},"alternative-id":["rs12111760"],"URL":"https:\/\/doi.org\/10.3390\/rs12111760","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,29]]}}}