{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T03:03:09Z","timestamp":1771038189925,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,1,18]],"date-time":"2024-01-18T00:00:00Z","timestamp":1705536000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41971363"],"award-info":[{"award-number":["41971363"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2022YFB3903705"],"award-info":[{"award-number":["2022YFB3903705"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["202202AF080004"],"award-info":[{"award-number":["202202AF080004"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["41971363"],"award-info":[{"award-number":["41971363"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFB3903705"],"award-info":[{"award-number":["2022YFB3903705"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["202202AF080004"],"award-info":[{"award-number":["202202AF080004"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Major Science and Technology Projects of Yunnan Province","award":["41971363"],"award-info":[{"award-number":["41971363"]}]},{"name":"Major Science and Technology Projects of Yunnan Province","award":["2022YFB3903705"],"award-info":[{"award-number":["2022YFB3903705"]}]},{"name":"Major Science and Technology Projects of Yunnan Province","award":["202202AF080004"],"award-info":[{"award-number":["202202AF080004"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>With the development of remote sensing satellite technology for Earth observation, remote sensing stereo images have been used for three-dimensional reconstruction in various fields, such as urban planning and construction. However, remote sensing images often contain noise, occluded regions, untextured areas, and repeated textures, which can lead to reduced accuracy in stereo matching and affect the quality of 3D reconstruction results. To reduce the impact of complex scenes in remote sensing images on stereo matching and to ensure both speed and accuracy, we propose a new end-to-end stereo matching network based on convolutional neural networks (CNNs). The proposed stereo matching network can learn features at different scales from the original images and construct cost volumes with varying scales to obtain richer scale information. Additionally, when constructing the cost volume, we introduce negative disparity to adapt to the common occurrence of both negative and non-negative disparities in remote sensing stereo image pairs. For cost aggregation, we employ a 3D convolution-based encoder\u2013decoder structure that allows the network to adaptively aggregate information. Before feature aggregation, we also introduce an attention module to retain more valuable feature information, enhance feature representation, and obtain a higher-quality disparity map. By training on the publicly available US3D dataset, we obtain an accuracy of 1.115 pixels in end-point error (EPE) and 5.32% in the error pixel ratio (D1) on the test dataset, and the inference speed is 92 ms. Comparing our model with existing state-of-the-art models, we achieve higher accuracy, and the network is beneficial for the three-dimensional reconstruction of remote sensing images.<\/jats:p>","DOI":"10.3390\/rs16020387","type":"journal-article","created":{"date-parts":[[2024,1,18]],"date-time":"2024-01-18T06:41:22Z","timestamp":1705560082000},"page":"387","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Stereo Matching Method for Remote Sensing Images Based on Attention and Scale Fusion"],"prefix":"10.3390","volume":"16","author":[{"given":"Kai","family":"Wei","sequence":"first","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Xiaoxia","family":"Huang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4481-1871","authenticated-orcid":false,"given":"Hongga","family":"Li","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Niu, J., Song, R., and Li, Y. (2006, January 20\u201323). A Stereo Matching Method Based on Kernel Density Estimation. Proceedings of the 2006 IEEE International Conference on Information Acquisition, Veihai, China.","DOI":"10.1109\/ICIA.2006.306019"},{"key":"ref_2","unstructured":"Sonka, M., Hlavac, V., and Boyle, R. (2013). Image Processing, Analysis and Machine Vision, Springer."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Suliman, A., Zhang, Y., and Al-Tahir, R. (2016, January 10\u201315). Enhanced disparity maps from multi-view satellite images. Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China.","DOI":"10.1109\/IGARSS.2016.7729608"},{"key":"ref_4","unstructured":"Scharstein, D. (2001, January 9\u201310). A taxonomy and evaluation of dense two-frame stereo correspondence. Proceedings of the IEEE Workshop on Stereo and Multi-Baseline Vision, Kauai, HI, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zabih, R., and Woodfill, J. (1994, January 2\u20136). Non-parametric local transforms for computing visual correspondence. Proceedings of the Computer Vision\u2014ECCV\u201994: Third European Conference on Computer Vision, Stockholm, Sweden.","DOI":"10.1007\/BFb0028345"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1431","DOI":"10.1109\/TIP.2008.925372","article-title":"Cost aggregation and occlusion handling with WLS in stereo matching","volume":"17","author":"Min","year":"2008","journal-title":"IEEE Trans. Image Process."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1109\/TPAMI.1985.4767639","article-title":"Stereo by intra-and inter-scanline search using dynamic programming","volume":"7","author":"Ohta","year":"1985","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","unstructured":"Hong, L., and Chen, G. (July, January 27). Segment-based stereo matching using graph cuts. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"787","DOI":"10.1109\/TPAMI.2003.1206509","article-title":"Stereo matching using belief propagation","volume":"25","author":"Sun","year":"2003","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","unstructured":"Hirschmuller, H. (2005, January 20\u201325). Accurate and efficient stereo processing by semi-global matching and mutual information. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_12","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zbontar, J., and LeCun, Y. (2015, January 7\u20132). Computing the stereo matching cost with a convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298767"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chen, Z., Sun, X., Wang, L., Yu, Y., and Huang, C. (2015, January 7\u201313). A deep visual correspondence embedding model for stereo matching costs. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.117"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Batsos, K., and Mordohai, P. (2018, January 5\u20138). Recresnet: A recurrent residual cnn architecture for disparity map enhancement. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00036"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., and Brox, T. (2016, January 27\u201330). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.438"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., and Bry, A. (2017, January 22\u201329). End-to-end learning of geometry and context for deep stereo regression. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.17"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., and Izadi, S. (2018, January 8\u201314). Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01267-0_35"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chang, J.-R., and Chen, Y.-S. (2018, January 18\u201323). Pyramid stereo matching network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00567"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Guo, X., Yang, K., Yang, W., Wang, X., and Li, H. (2019, January 15\u201320). Group-wise correlation stereo network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00339"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yang, G., Manela, J., Happold, M., and Ramanan, D. (2019, January 15\u201320). Hierarchical deep stereo matching on high-resolution images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00566"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"4025","DOI":"10.3390\/rs12244025","article-title":"An edge-sense bidirectional pyramid network for stereo matching of vhr remote sensing images","volume":"12","author":"Tao","year":"2020","journal-title":"Remote Sens."},{"key":"ref_24","first-page":"102456","article-title":"A review on deep learning in UAV remote sensing","volume":"102","author":"Osco","year":"2021","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_25","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1109\/LSP.2020.2973813","article-title":"A stereo attention module for stereo image super-resolution","volume":"27","author":"Ying","year":"2020","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1109\/TMM.2021.3050092","article-title":"Cross parallax attention network for stereo image super-resolution","volume":"24","author":"Chen","year":"2021","journal-title":"IEEE Trans. Multimed."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"5511118","DOI":"10.1109\/TGRS.2023.3272639","article-title":"AAtt-CNN: Automatical Attention-based Convolutional Neural Networks for Hyperspectral Image Classification","volume":"61","author":"Paoletti","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"5516817","DOI":"10.1109\/TGRS.2023.3295097","article-title":"Parameter-free attention network for spectral-spatial hyperspectral image classification","volume":"61","author":"Paoletti","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"6138","DOI":"10.1109\/TGRS.2020.3029527","article-title":"Bidirectional guided attention network for 3-D semantic detection of remote sensing images","volume":"59","author":"Rao","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Rao, Z., Xiong, B., He, M., Dai, Y., He, R., Shen, Z., and Li, X. (2023, January 17\u201324). Masked representation learning for domain generalized stereo matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00526"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., and Hu, Q. (2020, January 13\u201319). ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Qin, Z., Zhang, P., Wu, F., and Li, X. (2021, January 10\u201317). Fcanet: Frequency channel attention networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00082"},{"key":"ref_35","unstructured":"Jaderberg, M., Simonyan, K., and Zisserman, A. (2015). Spatial transformer networks. arXiv."},{"key":"ref_36","unstructured":"Almahairi, A., Ballas, N., Cooijmans, T., Zheng, Y., Larochelle, H., and Courville, A. (2016, January 19\u201324). Dynamic capacity networks. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_37","unstructured":"Yang, L., Zhang, R.-Y., Li, L., and Xie, X. (2021, January 18\u201324). Simam: A simple, parameter-free attention module for convolutional neural networks. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/MGRS.2019.2893783","article-title":"2019 data fusion contest [technical committees]","volume":"7","author":"Yokoya","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Bosch, M., Foster, K., Christie, G., Wang, S., Hager, G.D., and Brown, M. (2019, January 7\u201311). Semantic stereo for incidental satellite images. Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2019.00167"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1016\/j.isprsjprs.2022.04.020","article-title":"HMSM-Net: Hierarchical multi-scale matching network for disparity estimation of high-resolution satellite stereo images","volume":"188","author":"He","year":"2022","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"6953","DOI":"10.1007\/s40747-023-01106-3","article-title":"DSC-MVSNet: Attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo","volume":"9","author":"Zhang","year":"2023","journal-title":"Complex Intell. Syst."},{"key":"ref_43","unstructured":"Tulyakov, S., Ivanov, A., and Fleuret, F. (2018). Practical deep stereo (pds): Toward applications-friendly deep stereo matching. arXiv."},{"key":"ref_44","unstructured":"Chen, C., Chen, X., and Cheng, H. (November, January 27). On the over-smoothing problem of cnn based disparity estimation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/2\/387\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:49:43Z","timestamp":1760104183000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/2\/387"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,18]]},"references-count":44,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,1]]}},"alternative-id":["rs16020387"],"URL":"https:\/\/doi.org\/10.3390\/rs16020387","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,18]]}}}