{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T06:56:36Z","timestamp":1764053796195,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,3,30]],"date-time":"2022-03-30T00:00:00Z","timestamp":1648598400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["Grant 61901439"],"award-info":[{"award-number":["Grant 61901439"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Stereomatching plays an essential role in 3D reconstruction using very-high-resolution (VHR) remote sensing images. However, it still faces unignorable challenges due to the multi-scale objects in large scenes and the multi-modality probability distribution in challenging regions, especially the occluded and textureless areas. Accurate disparity estimation in stereo matching for multi-scale objects has become a hard but crucial task. In this paper, to tackle these problems, we design a novel confidence-aware unimodal cascade and fusion pyramid network for stereo matching. The fused cost volume from the coarsest scale is used to generate the initial disparity map, and then the learnable confidence maps are generated to construct the unimodal cost distributions, which are used to narrow down the next-stage disparity search range. Moreover, we design a cross-scale interaction aggregation module to leverage multi-scale information. Both smooth-L1 loss and stereo focal loss are applied to regularize the disparity map and unimodal cost distribution, respectively. Compared to two state-of-the-art stereo matching networks, extensive experimental results show that our proposed network outperforms them in terms of average endpoint error (EPE) and the fraction of erroneous pixels (D1).<\/jats:p>","DOI":"10.3390\/rs14071667","type":"journal-article","created":{"date-parts":[[2022,3,30]],"date-time":"2022-03-30T21:28:39Z","timestamp":1648675719000},"page":"1667","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Confidence-Aware Cascade Network for Multi-Scale Stereo Matching of Very-High-Resolution Remote Sensing Images"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5740-5197","authenticated-orcid":false,"given":"Rongshu","family":"Tao","sequence":"first","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Huairou District, Beijing 101408, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2063-9816","authenticated-orcid":false,"given":"Yuming","family":"Xiang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Huairou District, Beijing 101408, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Hongjian","family":"You","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Huairou District, Beijing 101408, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"7133","DOI":"10.1109\/TGRS.2018.2848725","article-title":"Weakly Supervised Semantic Segmentation for Joint Key Local Structure Localization and Classification of Aurora Image","volume":"56","author":"Niu","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_2","unstructured":"Chen, H., Lin, M., Zhang, H., Yang, G., Xia, G.S., Zheng, X., and Zhang, L. (August, January 28). Multi-Level Fusion of the Multi-Receptive Fields Contextual Networks and Disparity Network for Pairwise Semantic Stereo. Proceedings of the International Geoscience and Remote Sensing Symposium, Yokohama, Japan."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Chen, C., Seff, A., Kornhauser, A.L., and Xiao, J. (2015, January 7\u201313). DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving. Proceedings of the International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.312"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Schmid, K., Tomic, T., Ruess, F., Hirschm\u00fcller, H., and Suppa, M. (2013, January 3\u20137). Stereo vision based indoor\/outdoor navigation for flying robots. Proceedings of the 2013 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan.","DOI":"10.1109\/IROS.2013.6696922"},{"key":"ref_5","unstructured":"Engel, J., St\u00fcckler, J., and Cremers, D. (October, January 28). Large-scale direct SLAM with stereo cameras. Proceedings of the Intelligent Robots and Systems, Hamburg, Germany."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.patrec.2019.05.011","article-title":"A benchmark image dataset for industrial tools","volume":"125","author":"Luo","year":"2019","journal-title":"Pattern Recognit. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1023\/A:1014573219977","article-title":"A taxonomy and evaluation of dense two-frame stereo correspondence algorithms","volume":"47","author":"Scharstein","year":"2002","journal-title":"Int. J. Comput. Vis."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Sun, J., Shum, H.Y., and Zheng, N. (2002, January 28\u201331). Stereo Matching Using Belief Propagation. Proceedings of the European Conference on Computer Vision, Copenhagen, Denmark.","DOI":"10.1007\/3-540-47967-8_34"},{"key":"ref_9","unstructured":"Kolmogorov, V., and Zabih, R. (2001, January 7\u201314). Computing visual correspondence with occlusions using graph cuts. Proceedings of the International Conference on Computer Vision, Vancouver, BC, Canada."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1109\/TPAMI.2006.70","article-title":"Adaptive support-weight approach for correspondence search","volume":"28","author":"Yoon","year":"2006","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Rhemann, C., Hosni, A., Bleyer, M., Rother, C., and Gelautz, M. (2011, January 20\u201325). Fast cost-volume filtering for visual correspondence and beyond. Proceedings of the Computer Vision and Pattern Recognition, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995372"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Min, D., Lu, J., and Do, M.N. (2011, January 6\u201313). A revisit to cost aggregation in stereo matching: How far can we reduce its computational redundancy?. Proceedings of the International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126416"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hermann, S., Klette, R., and Destefanis, E. (2009). Inclusion of a second-order prior into semi-global matching. Pacific-Rim Symposium on Image and Video Technology, Springer.","DOI":"10.1007\/978-3-540-92957-4_55"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhu, K., d\u2019Angelo, P., Butenuth, M., Angelo, P., and Butenuth, M. (2011). A performance study on different stereo matching costs using airborne image sequences and satellite images. Lecture Notes in Computer Science, Proceedings of the ISPRS Conference on Photogrammetric Image Analysis, Munich, Germany, 5\u20137 October 2011, Springer.","DOI":"10.1007\/978-3-642-24393-6_14"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/TPAMI.2007.1166","article-title":"Stereo processing by semiglobal matching and mutual information","volume":"30","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","first-page":"2287","article-title":"Stereo matching by training a convolutional neural network to compare image patches","volume":"17","author":"LeCun","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Luo, W., Schwing, A.G., and Urtasun, R. (2016, January 27\u201330). Efficient deep learning for stereo matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.614"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Guney, F., and Geiger, A. (2015, January 7\u201312). Displets: Resolving stereo ambiguities using object knowledge. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299044"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Seki, A., and Pollefeys, M. (2017, January 21\u201326). Sgm-nets: Semi-global matching with neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.703"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., and Brox, T. (2016, January 27\u201330). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.438"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., and Stefano, L.D. (2019, January 16\u201320). Real-Time Self-Adaptive Deep Stereo. Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00028"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Xu, H., and Zhang, J. (2020, January 13\u201319). AANet: Adaptive Aggregation Network for Efficient Stereo Matching. Proceedings of the Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00203"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Kendall, A., Martirosyan, H., Dasgupta, S., and Henry, P. (2017, January 22\u201329). End-to-End Learning of Geometry and Context for Deep Stereo Regression. Proceedings of the International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.17"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chang, J.R., and Chen, Y.S. (2018, January 18\u201323). Pyramid Stereo Matching Network. Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00567"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhang, F., Prisacariu, V.A., Yang, R., and Torr, P.H.S. (2019, January 16\u201320). GA-Net: Guided Aggregation Net for End-To-End Stereo Matching. Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00027"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Guo, X., Kai, Y., Wukui, Y., Wang, X., and Li, H. (2019, January 16\u201320). Group-Wise Correlation Stereo Network. Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00339"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sun, D., Yang, X., Liu, M.Y., and Kautz, J. (2018, January 18\u201322). PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00931"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., and Tan, P. (2020, January 13\u201319). Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching. Proceedings of the Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00257"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Cheng, S., Xu, Z., Zhu, S., Li, Z., Li, L.E., Ramamoorthi, R., and Su, H. (2020, January 13\u201319). Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness. Proceedings of the Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00260"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Shen, Z., Dai, Y., and Rao, Z. (2021, January 20\u201325). CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching. Proceedings of the Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01369"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Chen, Y., Bai, X., Yu, S., Yu, K., Li, Z., and Yang, K. (2020, January 7\u201312). Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching. Proceedings of the National Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6991"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Tao, R., Xiang, Y., and You, H. (2020). An Edge-Sense Bidirectional Pyramid Network for Stereo Matching of VHR Remote Sensing Images. Remote Sens., 12.","DOI":"10.3390\/rs12244025"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Osco, L.P., Junior, J.M., Ramos, A.P.M., de Castro Jorge, L.A., Fatholahi, S.N., de Andrade Silva, J., Matsubara, E.T., Pistori, H., Gon\u00e7alves, W.N., and Li, J. (2021). A Review on Deep Learning in UAV Remote Sensing. arXiv.","DOI":"10.1016\/j.jag.2021.102456"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/MGRS.2019.2893783","article-title":"2019 Data Fusion Contest [Technical Committees]","volume":"7","author":"Saux","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Bosch, M., Foster, K., Christie, G., Wang, S., Hager, G.D., and Brown, M. (2019, January 7\u201311). Semantic Stereo for Incidental Satellite Images. Proceedings of the Workshop on Applications of Computer Vision, Waikoloa Village, HI, USA.","DOI":"10.1109\/WACV.2019.00167"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhang, K., Fang, Y., Min, D., Sun, L., Yang, S., Yan, S., and Tian, Q. (2014, January 23\u201328). Cross-Scale Cost Aggregation for Stereo Matching. Proceedings of the Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.206"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yang, G., Manela, J., Happold, M., and Ramanan, D. (2019, January 16\u201320). Hierarchical Deep Stereo Matching on High-Resolution Images. Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00566"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Yang, J., Mao, W., Alvarez, J.M., and Liu, M. (2020, January 13\u201319). Cost Volume Pyramid Based Depth Inference for Multi-View Stereo. Proceedings of the Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00493"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Xu, Q., and Tao, W. (2019, January 16\u201320). Multi-Scale Geometric Consistency Guided Multi-View Stereo. Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00563"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 16\u201320). Deep High-Resolution Representation Learning for Human Pose Estimation. Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/7\/1667\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:46:56Z","timestamp":1760136416000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/7\/1667"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,30]]},"references-count":43,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["rs14071667"],"URL":"https:\/\/doi.org\/10.3390\/rs14071667","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2022,3,30]]}}}