{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,21]],"date-time":"2025-12-21T21:16:45Z","timestamp":1766351805081,"version":"build-2065373602"},"reference-count":63,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2020,12,25]],"date-time":"2020-12-25T00:00:00Z","timestamp":1608854400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2019YFC0121502","2017YFB1302400"],"award-info":[{"award-number":["2019YFC0121502","2017YFB1302400"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Most scenes in practical applications are dynamic scenes containing moving objects, so accurately segmenting moving objects is crucial for many computer vision applications. In order to efficiently segment all the moving objects in the scene, regardless of whether the object has a predefined semantic label, we propose a two-level nested octave U-structure network with a multi-scale attention mechanism, called U2-ONet. U2-ONet takes two RGB frames, the optical flow between these frames, and the instance segmentation of the frames as inputs. Each stage of U2-ONet is filled with the newly designed octave residual U-block (ORSU block) to enhance the ability to obtain more contextual information at different scales while reducing the spatial redundancy of the feature maps. In order to efficiently train the multi-scale deep network, we introduce a hierarchical training supervision strategy that calculates the loss at each level while adding knowledge-matching loss to keep the optimization consistent. The experimental results show that the proposed U2-ONet method can achieve a state-of-the-art performance in several general moving object segmentation datasets.<\/jats:p>","DOI":"10.3390\/rs13010060","type":"journal-article","created":{"date-parts":[[2020,12,27]],"date-time":"2020-12-27T20:52:21Z","timestamp":1609102341000},"page":"60","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["U2-ONet: A Two-Level Nested Octave U-Structure Network with a Multi-Scale Attention Mechanism for Moving Object Segmentation"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9207-2076","authenticated-orcid":false,"given":"Chenjie","family":"Wang","sequence":"first","affiliation":[{"name":"State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4422-6584","authenticated-orcid":false,"given":"Chengyuan","family":"Li","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8943-079X","authenticated-orcid":false,"given":"Jun","family":"Liu","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, China"}]},{"given":"Bin","family":"Luo","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5901-8932","authenticated-orcid":false,"given":"Xin","family":"Su","sequence":"additional","affiliation":[{"name":"School of Remote Sensing and Information Engineering, Wuhan University, 129 Luoyu Road, Wuhan 430079, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0165-0561","authenticated-orcid":false,"given":"Yajun","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, China"}]},{"given":"Yan","family":"Gao","sequence":"additional","affiliation":[{"name":"Zhuhai Da Hengqin Science and Technology Development Co., Ltd., Unit 1, 33 Haihe Street, Hengqin New Area, Zhuhai 519031, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,25]]},"reference":[{"key":"ref_1","first-page":"37","article-title":"Visual SLAM and structure from motion in dynamic environments: A survey","volume":"51","author":"Saputra","year":"2018","journal-title":"ACM Comput. Surv. (CSUR)"},{"doi-asserted-by":"crossref","unstructured":"Runz, M., Buffier, M., and Agapito, L. (2018, January 16\u201320). Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, Germany.","key":"ref_2","DOI":"10.1109\/ISMAR.2018.00024"},{"doi-asserted-by":"crossref","unstructured":"Wang, R., Wan, W., Wang, Y., and Di, K. (2019). A New RGB-D SLAM Method with Moving Object Detection for Dynamic Indoor Scenes. Remote. Sens., 11.","key":"ref_3","DOI":"10.3390\/rs11101143"},{"doi-asserted-by":"crossref","unstructured":"Wang, Z., Zhang, Q., Li, J., Zhang, S., and Liu, J. (2019). A Computationally Efficient Semantic SLAM Solution for Dynamic Scenes. Remote Sens., 11.","key":"ref_4","DOI":"10.3390\/rs11111363"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"89777","DOI":"10.1109\/ACCESS.2019.2927211","article-title":"Distractor-Aware Visual Tracking by Online Siamese Network","volume":"7","author":"Zha","year":"2019","journal-title":"IEEE Access"},{"key":"ref_6","first-page":"156","article-title":"Motion Perception in Reinforcement Learning with Dynamic Objects","volume":"87","author":"Amiranashvili","year":"2018","journal-title":"Conf. Robot. Learn. (CoRL)"},{"doi-asserted-by":"crossref","unstructured":"Shah, S., Dey, D., Lovett, C., and Kapoor, A. (2017). AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. Field and Service Robotics, Springer.","key":"ref_7","DOI":"10.1007\/978-3-319-67361-5_40"},{"doi-asserted-by":"crossref","unstructured":"Baradel, F., Wolf, C., Mille, J., and Taylor, G.W. (2018, January 18\u201322). Glimpse Clouds: Human Activity Recognition From Unstructured Feature Points. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","key":"ref_8","DOI":"10.1109\/CVPR.2018.00056"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1109\/TMM.2014.2298377","article-title":"An Advanced Moving Object Detection Algorithm for Automatic Traffic Monitoring in Real-World Limited Bandwidth Networks","volume":"16","author":"Chen","year":"2014","journal-title":"IEEE Trans. Multimed."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.cviu.2013.11.009","article-title":"Robust PCA via Principal Component Pursuit: A review for a comparative evaluation in video surveillance","volume":"122","author":"Bouwmans","year":"2014","journal-title":"Comput. Vis. Image Underst."},{"doi-asserted-by":"crossref","unstructured":"Wang, C., Luo, B., Zhang, Y., Zhao, Q., Yin, L., Wang, W., Su, X., Wang, Y., and Li, C. (2020). DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation. arXiv.","key":"ref_11","DOI":"10.1109\/LRA.2020.3045647"},{"doi-asserted-by":"crossref","unstructured":"Zhao, X., Qin, Q., and Luo, B. (2019). Motion Segmentation Based on Model Selection in Permutation Space for RGB Sensors. Sensors, 19.","key":"ref_12","DOI":"10.3390\/s19132936"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"432","DOI":"10.1109\/LSP.2017.2777997","article-title":"Permutation preference based alternate sampling and clustering for motion segmentation","volume":"25","author":"Zhang","year":"2017","journal-title":"IEEE Signal Process. Lett."},{"doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","key":"ref_14","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer.","key":"ref_16","DOI":"10.1007\/978-3-319-24574-4_28"},{"unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv.","key":"ref_17"},{"doi-asserted-by":"crossref","unstructured":"Bideau, P., RoyChowdhury, A., Menon, R.R., and Learned-Miller, E. (2018, January 18\u201323). The best of both worlds: Combining cnns and geometric constraints for hierarchical motion segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_18","DOI":"10.1109\/CVPR.2018.00060"},{"doi-asserted-by":"crossref","unstructured":"Xie, C., Xiang, Y., Harchaoui, Z., and Fox, D. (2019, January 15\u201320). Object discovery in videos as foreground motion clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_19","DOI":"10.1109\/CVPR.2019.01023"},{"doi-asserted-by":"crossref","unstructured":"Dave, A., Tokmakov, P., and Ramanan, D. (2019, January 27\u201328). Towards segmenting anything that moves. Proceedings of the IEEE International Conference on Computer Vision Workshops, Seoul, Korea.","key":"ref_20","DOI":"10.1109\/ICCVW.2019.00187"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"5557","DOI":"10.1109\/TIP.2020.2984893","article-title":"Motion Segmentation of RGB-D Sequences: Combining Semantic and Motion Information Using Statistical Inference","volume":"29","author":"Muthu","year":"2020","journal-title":"IEEE Trans. Image Process."},{"unstructured":"Chen, Y., Fan, H., Xu, B., Yan, Z., Kalantidis, Y., Rohrbach, M., Yan, S., and Feng, J. (November, January 27). Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","key":"ref_22"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"107404","DOI":"10.1016\/j.patcog.2020.107404","article-title":"U2-Net: Going deeper with nested U-structure for salient object detection","volume":"106","author":"Qin","year":"2020","journal-title":"Pattern Recognit."},{"doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and So Kweon, I. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","key":"ref_24","DOI":"10.1007\/978-3-030-01234-2_1"},{"doi-asserted-by":"crossref","unstructured":"Papazoglou, A., and Ferrari, V. (2013, January 2\u20138). Fast Object Segmentation in Unconstrained Video. Proceedings of the 2013 IEEE International Conference on Computer Vision, Darling Harbour, Sydney, Australia.","key":"ref_25","DOI":"10.1109\/ICCV.2013.223"},{"key":"ref_26","first-page":"8","article-title":"Video Segmentation by Non-Local Consensus voting","volume":"2","author":"Faktor","year":"2014","journal-title":"BMVC"},{"unstructured":"Wang, W., Shen, J., and Porikli, F. (2015, January 7\u201312). Saliency-aware geodesic video object segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","key":"ref_27"},{"doi-asserted-by":"crossref","unstructured":"Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., and Sorkine-Hornung, A. (2016, January 27\u201330). A benchmark dataset and evaluation methodology for video object segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","key":"ref_28","DOI":"10.1109\/CVPR.2016.85"},{"doi-asserted-by":"crossref","unstructured":"Wang, W., Song, H., Zhao, S., Shen, J., Zhao, S., Hoi, S.C., and Ling, H. (2019, January 15\u201320). Learning unsupervised video object segmentation through visual attention. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_29","DOI":"10.1109\/CVPR.2019.00318"},{"unstructured":"Wang, W., Lu, X., Shen, J., Crandall, D.J., and Shao, L. (November, January 27). Zero-shot video object segmentation via attentive graph neural networks. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","key":"ref_30"},{"doi-asserted-by":"crossref","unstructured":"Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., and Porikli, F. (2019, January 15\u201320). See more, know more: Unsupervised video object segmentation with co-attention siamese networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_31","DOI":"10.1109\/CVPR.2019.00374"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3083","DOI":"10.1109\/TMM.2019.2918730","article-title":"Automatic Video Object Segmentation Based on Visual and Motion Saliency","volume":"21","author":"Peng","year":"2019","journal-title":"IEEE Trans. Multimed."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1934","DOI":"10.1109\/TMM.2018.2890361","article-title":"Multilevel Model for Video Object Segmentation Based on Supervision Optimization","volume":"21","author":"Chen","year":"2019","journal-title":"IEEE Trans. Multimed."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1109\/TIP.2019.2930152","article-title":"Unsupervised online video object segmentation with motion property understanding","volume":"29","author":"Zhuo","year":"2019","journal-title":"IEEE Trans. Image Process."},{"doi-asserted-by":"crossref","unstructured":"Yang, Z., Wei, Y., and Yang, Y. (2020). Collaborative video object segmentation by foreground-background integration. arXiv.","key":"ref_35","DOI":"10.1007\/978-3-030-58558-7_20"},{"doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","key":"ref_36","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_38","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1016\/j.patrec.2020.01.024","article-title":"BshapeNet: Object detection and instance segmentation with bounding shape masks","volume":"131","author":"Kang","year":"2020","journal-title":"Pattern Recognit. Lett."},{"doi-asserted-by":"crossref","unstructured":"Peng, S., Jiang, W., Pi, H., Li, X., Bao, H., and Zhou, X. (2020, January 14\u201319). Deep Snake for Real-Time Instance Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_40","DOI":"10.1109\/CVPR42600.2020.00856"},{"unstructured":"Hurtik, P., Molek, V., Hula, J., Vajgl, M., Vlasanek, P., and Nejezchleba, T. (2020). Poly-YOLO: Higher speed, more precise detection and instance segmentation for YOLOv3. arXiv.","key":"ref_41"},{"doi-asserted-by":"crossref","unstructured":"Kong, S., and Fowlkes, C.C. (2018, January 18\u201323). Recurrent pixel embedding for instance grouping. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_42","DOI":"10.1109\/CVPR.2018.00940"},{"doi-asserted-by":"crossref","unstructured":"Neven, D., Brabandere, B.D., Proesmans, M., and Gool, L.V. (2019, January 15\u201320). Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_43","DOI":"10.1109\/CVPR.2019.00904"},{"unstructured":"Ying, H., Huang, Z., Liu, S., Shao, T., and Zhou, K. (2019). Embedmask: Embedding coupling for one-stage instance segmentation. arXiv.","key":"ref_44"},{"doi-asserted-by":"crossref","unstructured":"Chen, L., Strauch, M., and Merhof, D. (2019). Instance Segmentation of Biomedical Images with an Object-Aware Embedding Learned with Local Constraints. International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer.","key":"ref_45","DOI":"10.1007\/978-3-030-32239-7_50"},{"unstructured":"Xu, X., Cheong, L.F., and Li, Z. (2019). 3D Rigid Motion Segmentation with Mixed and Unknown Number of Models. IEEE Trans. Pattern Anal. Mach. Intell.","key":"ref_46"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1393","DOI":"10.1109\/TIP.2010.2042647","article-title":"Multibody structure-and-motion segmentation by branch-and-bound model selection","volume":"19","author":"Thakoor","year":"2010","journal-title":"IEEE Trans. Image Process."},{"doi-asserted-by":"crossref","unstructured":"Zhao, Q., Zhang, Y., Qin, Q., and Luo, B. (2020). Quantized Residual Preference Based Linkage Clustering for Model Selection and Inlier Segmentation in Geometric Multi-Model Fitting. Sensors, 20.","key":"ref_48","DOI":"10.3390\/s20133806"},{"doi-asserted-by":"crossref","unstructured":"Sultana, M., Mahmood, A., and Jung, S.K. (2020). Unsupervised Moving Object Detection in Complex Scenes Using Adversarial Regularizations. IEEE Trans. Multimed., 1.","key":"ref_49","DOI":"10.1109\/TMM.2020.3006419"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"2688","DOI":"10.1109\/TIP.2018.2795740","article-title":"Submodular trajectories for better motion segmentation in videos","volume":"27","author":"Shen","year":"2018","journal-title":"IEEE Trans. Image Process."},{"doi-asserted-by":"crossref","unstructured":"Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. (2017, January 21\u201326). Flownet 2.0: Evolution of optical flow estimation with deep networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","key":"ref_51","DOI":"10.1109\/CVPR.2017.179"},{"doi-asserted-by":"crossref","unstructured":"Li, C., Luo, B., Hong, H., Su, X., Wang, Y., Liu, J., Wang, C., Zhang, J., and Wei, L. (2020). Object Detection Based on Global-Local Saliency Constraint in Aerial Images. Remote Sens., 12.","key":"ref_52","DOI":"10.3390\/rs12091435"},{"unstructured":"Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., and Tu, Z. (2015). Deeply-supervised nets. Artificial Intelligence and Statistics.","key":"ref_53"},{"doi-asserted-by":"crossref","unstructured":"Li, D., and Chen, Q. (2020, January 13\u201319). Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_54","DOI":"10.1109\/CVPR42600.2020.00766"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1016\/0734-189X(85)90016-7","article-title":"Topological structural analysis of digitized binary images by border following","volume":"30","author":"Suzuki","year":"1985","journal-title":"Comput. Vis. Graph. Image Process."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1109\/TPAMI.2013.242","article-title":"Segmentation of moving objects by long term video analysis","volume":"36","author":"Ochs","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"unstructured":"Pont-Tuset, J., Perazzi, F., Caelles, S., Arbel\u00e1ez, P., Sorkine-Hornung, A., and Van Gool, L. (2017). The 2017 davis challenge on video object segmentation. arXiv.","key":"ref_57"},{"doi-asserted-by":"crossref","unstructured":"Xu, N., Yang, L., Fan, Y., Yue, D., Liang, Y., Yang, J., and Huang, T. (2018). Youtube-vos: A large-scale video object segmentation benchmark. arXiv.","key":"ref_58","DOI":"10.1007\/978-3-030-01228-1_36"},{"doi-asserted-by":"crossref","unstructured":"Siam, M., Mahgoub, H., Zahran, M., Yogamani, S., Jagersand, M., and El-Sallab, A. (2018, January 4\u20137). Modnet: Motion and appearance based moving object detection network for autonomous driving. Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA.","key":"ref_59","DOI":"10.1109\/ITSC.2018.8569744"},{"doi-asserted-by":"crossref","unstructured":"Rashed, H., Ramzy, M., Vaquero, V., El Sallab, A., Sistu, G., and Yogamani, S. (2019, January 27\u201328). Fusemodnet: Real-time camera and lidar based moving object detection for robust low-light autonomous driving. Proceedings of the IEEE International Conference on Computer Vision Workshops, Seoul, Korea.","key":"ref_60","DOI":"10.1109\/ICCVW.2019.00293"},{"unstructured":"Bideau, P., and Learned-Miller, E. (2016). A detailed rubric for motion segmentation. arXiv.","key":"ref_61"},{"doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","key":"ref_62","DOI":"10.1109\/ICCV.2015.123"},{"doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","key":"ref_63","DOI":"10.1109\/CVPR.2017.660"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/1\/60\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:46:20Z","timestamp":1760179580000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/1\/60"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,25]]},"references-count":63,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["rs13010060"],"URL":"https:\/\/doi.org\/10.3390\/rs13010060","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2020,12,25]]}}}