{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T16:49:52Z","timestamp":1770742192125,"version":"3.49.0"},"reference-count":46,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,12,20]],"date-time":"2022-12-20T00:00:00Z","timestamp":1671494400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62176199"],"award-info":[{"award-number":["62176199"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61805189"],"award-info":[{"award-number":["61805189"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Road segmentation from remote sensing images is an important task in many applications. However, due to the high density of roads and the complex background, the roads are often occluded by trees. This makes accurate road segmentation a challenge task. Most existing road segmentation networks rely on convolutions with small kernels; however, these methods often cannot obtain satisfying results because the long-range dependencies are not captured and the intrinsic relationships between feature maps at different scales are not fully exploited. In this paper, a deep neural network based on a cross-scale axial attention mechanism is proposed to address this problem. This model enables low-resolution features to aggregate global contextual information from high-resolution features. Among them, the axial attention mechanism realizes global attention by using vertical and horizontal attention sequentially. With this strategy, the dense long-range dependencies can be captured with extremely low computational cost. The cross-scale mechanism enables the model to effectively combine the high-resolution fine-grained features and the low-resolution coarse-grained features. The proposed method enables the network to propagate the information without losing details. Our method achieves IoUs of 58.98 and 65.28 on the Massachusetts Roads dataset and DeepGlobe dataset and outperforms other methods.<\/jats:p>","DOI":"10.3390\/rs15010003","type":"journal-article","created":{"date-parts":[[2022,12,20]],"date-time":"2022-12-20T05:08:54Z","timestamp":1671512934000},"page":"3","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["CSANet: Cross-Scale Axial Attention Network for Road Segmentation"],"prefix":"10.3390","volume":"15","author":[{"given":"Xianghai","family":"Cao","sequence":"first","affiliation":[{"name":"School of Artificial of Intelligence, Xidian University, Xi\u2019an 710071, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kai","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Artificial of Intelligence, Xidian University, Xi\u2019an 710071, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Licheng","family":"Jiao","sequence":"additional","affiliation":[{"name":"School of Artificial of Intelligence, Xidian University, Xi\u2019an 710071, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1097","DOI":"10.1109\/LRA.2021.3056344","article-title":"icurb: Imitation learning-based detection of road curbs using aerial images for autonomous driving","volume":"6","author":"Xu","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"7248","DOI":"10.1109\/LRA.2021.3097512","article-title":"Topo-boundary: A benchmark dataset on topological road-boundary detection using aerial images for autonomous driving","volume":"6","author":"Xu","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Steger, C., Glock, C., Eckstein, W., Mayer, H., and Radig, B. (1995). Model-based road extraction from images. Automatic Extraction of Man-Made Objects from Aerial and Space Images, Springer.","DOI":"10.1007\/978-3-0348-9242-1_26"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1946","DOI":"10.1109\/JSTARS.2015.2449296","article-title":"Road extraction from very high resolution remote sensing optical images based on texture analysis and beamlet transform","volume":"9","author":"Sghaier","year":"2015","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"M\u00e1ttyus, G., Luo, W., and Urtasun, R. (2017, January 22\u201329). Deeproadmapper: Extracting road topology from aerial images. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.372"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Batra, A., Singh, S., Pang, G., Basu, S., Jawahar, C., and Paluri, M. (2019, January 16\u201317). Improved road connectivity by joint learning of orientation and segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01063"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Mosinska, A., Marquez-Neila, P., Kozi\u0144ski, M., and Fua, P. (2018, January 18\u201323). Beyond the pixel-wise loss for topology-aware delineation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00331"},{"key":"ref_8","first-page":"5614115","article-title":"Split Depth-wise Separable Graph-Convolution Network for Road Extraction in Complex Environments from High-resolution Remote-Sensing Images","volume":"60","author":"Zhou","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1016\/j.ins.2020.05.062","article-title":"Global context based automatic road segmentation via dilated convolutional neural network","volume":"535","author":"Lan","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/j.isprsjprs.2019.10.001","article-title":"Spatial information inference net: Road extraction using road-specific contextual information","volume":"158","author":"Tao","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1867","DOI":"10.1109\/LGRS.2018.2864342","article-title":"Road segmentation in SAR satellite images with deep fully convolutional neural networks","volume":"15","author":"Henry","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"4673","DOI":"10.1109\/TGRS.2020.3016086","article-title":"Road segmentation for remote sensing images using adversarial spatial pyramid networks","volume":"59","author":"Shamsolmoali","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1080\/2150704X.2018.1557791","article-title":"A Y-Net deep learning method for road segmentation using high-resolution visible remote sensing images","volume":"10","author":"Li","year":"2019","journal-title":"Remote Sens. Lett."},{"key":"ref_14","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wegner, J.D., Montoya-Zegarra, J.A., and Schindler, K. (2013, January 23\u201328). A higher-order CRF model for road network extraction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.222"},{"key":"ref_16","first-page":"1","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_18","unstructured":"Mnih, V. (2013). Machine Learning for Aerial Image Labeling, University of Toronto."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D., and Raskar, R. (2018, January 18\u201322). Deepglobe 2018: A challenge to parse the earth through satellite images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00031"},{"key":"ref_20","first-page":"5521213","article-title":"Hyperspectral Imagery Classification Based on Contrastive Learning","volume":"60","author":"Hou","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1016\/j.isprsjprs.2022.06.008","article-title":"UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery","volume":"190","author":"Wang","year":"2022","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C.C., Lin, D., and Jia, J. (2018, January 8\u201314). Psanet: Point-wise spatial attention network for scene parsing. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_17"},{"key":"ref_25","unstructured":"Yuan, Y., Huang, L., Guo, J., Zhang, C., Chen, X., and Wang, J. (2018). Ocnet: Object context network for scene parsing. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. (2018, January 18\u201323). Learning a discriminative feature network for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00199"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Cao, Y., Xu, J., Lin, S., Wei, F., and Hu, H. (2019, January 27\u201328). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Chi, L., Yuan, Z., Mu, Y., and Wang, C. (2020, January 13\u201319). Non-local neural networks with grouped bilinear attentional transforms. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01182"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Yin, M., Yao, Z., Cao, Y., Li, X., Zhang, Z., Lin, S., and Hu, H. (2020, January 23\u201328). Disentangled non-local neural networks. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58555-6_12"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., and Chen, L.C. (2020, January 23\u201328). Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58548-8_7"},{"key":"ref_31","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (November, January 27). Ccnet: Criss-cross attention for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Hou, Q., Zhou, D., and Feng, J. (2021, January 20\u201325). Coordinate attention for efficient mobile network design. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Luu, H.M., and Park, S.H. (2021). Extending nn-UNet for brain tumor segmentation. arXiv.","DOI":"10.1007\/978-3-031-09002-8_16"},{"key":"ref_34","unstructured":"Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., and Patel, V.M. (October, January 27). Medical transformer: Gated axial-attention for medical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lou, A., Guan, S., and Loew, M. (2021). CaraNet: Context Axial Reverse Attention Network for Segmentation of Small Medical Objects. arXiv.","DOI":"10.1117\/12.2611802"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lou, A., and Loew, M. (2021, January 19\u201322). Cfpnet: Channel-wise feature pyramid for real-time semantic segmentation. Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Virtual.","DOI":"10.1109\/ICIP42928.2021.9506485"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_38","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_39","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (July, January 26). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A.A. (2017, January 4\u20139). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_41","unstructured":"Mnih, V., and Hinton, G.E. (July, January 26). Learning to label aerial images from noisy data. Proceedings of the 29th International Conference on Machine Learning (ICML-12), Edinburgh, Scotland."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_45","unstructured":"Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv."},{"key":"ref_46","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/1\/3\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:44:38Z","timestamp":1760147078000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/1\/3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,20]]},"references-count":46,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["rs15010003"],"URL":"https:\/\/doi.org\/10.3390\/rs15010003","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,20]]}}}