{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T15:56:26Z","timestamp":1774367786318,"version":"3.50.1"},"reference-count":54,"publisher":"MDPI AG","issue":"24","license":[{"start":{"date-parts":[[2022,12,12]],"date-time":"2022-12-12T00:00:00Z","timestamp":1670803200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Chinese Academy of Sciences Project","award":["CXJJ-20S017"],"award-info":[{"award-number":["CXJJ-20S017"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Planetary rover systems need to perform terrain segmentation to identify feasible driving areas and surround obstacles, which falls into the research area of semantic segmentation. Recently, deep learning (DL)-based methods were proposed and achieved great performance for semantic segmentation. However, due to the on-board processor platform\u2019s strict comstraints on computational complexity and power consumption, existing DL approaches are almost impossible to be deployed on satellites under the burden of extensive computation and large model size. To fill this gap, this paper targeted studying effective and efficient Martian terrain segmentation solutions that are suitable for on-board satellites. In this article, we propose a lightweight ViT-based terrain segmentation method, namely, SegMarsViT. In the encoder part, the mobile vision transformer (MViT) block in the backbone extracts local\u2013global spatial and captures multiscale contextual information concurrently. In the decoder part, the cross-scale feature fusion modules (CFF) further integrate hierarchical context information and the compact feature aggregation module (CFA) combines multi-level feature representation. Moreover, we evaluate the proposed method on three public datasets: AI4Mars, MSL-Seg, and S5Mars. Extensive experiments demonstrate that the proposed SegMarsViT was able to achieve 68.4%, 78.22%, and 67.28% mIoU on the AI4Mars-MSL, MSL-Seg, and S5Mars, respectively, under the speed of 69.52 FPS.<\/jats:p>","DOI":"10.3390\/rs14246297","type":"journal-article","created":{"date-parts":[[2022,12,13]],"date-time":"2022-12-13T03:32:32Z","timestamp":1670902352000},"page":"6297","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["SegMarsViT: Lightweight Mars Terrain Segmentation Network for Autonomous Driving in Planetary Exploration"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9630-4249","authenticated-orcid":false,"given":"Yuqi","family":"Dai","sequence":"first","affiliation":[{"name":"National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Tie","family":"Zheng","sequence":"additional","affiliation":[{"name":"National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1834-7849","authenticated-orcid":false,"given":"Changbin","family":"Xue","sequence":"additional","affiliation":[{"name":"National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Li","family":"Zhou","sequence":"additional","affiliation":[{"name":"National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,12]]},"reference":[{"key":"ref_1","unstructured":"Cakir, S., Gau\u00df, M., H\u00e4ppeler, K., Ounajjar, Y., Heinle, F., and Marchthaler, R. (2022). Semantic Segmentation for Autonomous Driving: Model Evaluation, Dataset Generation, Perspective Comparison, and Real-Time Capability. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Csurka, G., and Perronnin, F. (2008, January 1). A Simple High Performance Approach to Semantic Segmentation. Proceedings of the BMVC, Leeds, UK.","DOI":"10.5244\/C.22.22"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Corso, J.J., Yuille, A., and Tu, Z. (2008, January 23\u201328). Graph-Shifts: Natural Image Labeling by Dynamic Hierarchical Computing. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587490"},{"key":"ref_4","unstructured":"Holder, C.J., and Shafique, M. (2022). On Efficient Real-Time Semantic Segmentation: A Survey. 19. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., and Garcia-Rodriguez, J. (2017). A Review on Deep Learning Techniques Applied to Semantic Segmentation 2017. arXiv.","DOI":"10.1016\/j.asoc.2018.05.018"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation, Springer.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"McGlinchy, J., Johnson, B., Muller, B., Joseph, M., and Diaz, J. (August, January 28). Application of UNet Fully Convolutional Neural Network to Impervious Surface Segmentation in Urban Environment from High Resolution Satellite Imagery. Proceedings of the IGARSS 2019\u20142019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8900453"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sun, J., Shen, J., Wang, X., Mao, Z., and Ren, J. (2022). Bi-Unet: A Dual Stream Network for Real-Time Highway Surface Segmentation. IEEE Trans. Intell. Veh., 15.","DOI":"10.1109\/TIV.2022.3216734"},{"key":"ref_10","unstructured":"Chattopadhyay, S., and Basak, H. (2020). Multi-Scale Attention u-Net (Msaunet): A Modified u-Net Architecture for Scene Segmentation. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Chu, Z., Tian, T., Feng, R., and Wang, L. (August, January 28). Sea-Land Segmentation with Res-UNet and Fully Connected CRF. Proceedings of the IGARSS 2019\u20142019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8900625"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Rothrock, B., Kennedy, R., Cunningham, C., Papon, J., Heverly, M., and Ono, M. (2016, January 13\u201316). SPOC: Deep Learning-Based Terrain Classification for Mars Rover Missions. Proceedings of the AIAA SPACE 2016, American Institute of Aeronautics and Astronautics, Long Beach, CA, USA.","DOI":"10.2514\/6.2016-5539"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Iwashita, Y., Nakashima, K., Stoica, A., and Kurazume, R. (2019, January 28\u201330). Tu-Net and Tdeeplab: Deep Learning-Based Terrain Classification Robust to Illumination Changes, Combining Visible and Thermal Imagery. Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA.","DOI":"10.1109\/MIPR.2019.00057"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Liu, H., Yao, M., Xiao, X., and Cui, H. (2022). A Hybrid Attention Semantic Segmentation Network for Unstructured Terrain on Mars. Acta Astronaut., in press.","DOI":"10.1016\/j.actaastro.2022.08.002"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"41766","DOI":"10.1109\/ACCESS.2022.3167763","article-title":"Benchmark Analysis of Semantic Segmentation Algorithms for Safe Planetary Landing Site Selection","volume":"10","author":"Claudet","year":"2022","journal-title":"IEEE Access"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wang, W., Lin, L., Fan, Z., and Liu, J. (2022). Semi-Supervised Learning for Mars Imagery Classification and Segmentation. arXiv.","DOI":"10.1109\/ICIP42928.2021.9506533"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Goh, E., Chen, J., and Wilson, B. (2022). Mars Terrain Segmentation with Less Labels. arXiv.","DOI":"10.1109\/AERO53065.2022.9843245"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, J., Lin, L., Fan, Z., Wang, W., and Liu, J. (2022). S5Mars: Self-Supervised and Semi-Supervised Learning for Mars Segmentation. arXiv.","DOI":"10.1109\/ICIP42928.2021.9506533"},{"key":"ref_19","first-page":"3152587","article-title":"A Stepwise Domain Adaptive Segmentation Network with Covariate Shift Alleviation for Remote Sensing Imagery","volume":"60","author":"Li","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Swan, R.M., Atha, D., Leopold, H.A., Gildner, M., Oij, S., Chiu, C., and Ono, M. (2021, January 19\u201325). AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA.","DOI":"10.1109\/CVPRW53098.2021.00226"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Dai, Y., Xue, C., and Zhou, L. (2022). Visual Saliency Guided Perceptual Adaptive Quantization Based on HEVC Intra-Coding for Planetary Images. PLoS ONE, 19.","DOI":"10.1371\/journal.pone.0263729"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Tian, Y., Chen, F., Wang, H., and Zhang, S. (2020, January 16). Real-Time Semantic Segmentation Network Based on Lite Reduced Atrous Spatial Pyramid Pooling Module Group. Proceedings of the 2020 5th International Conference on Control, Robotics and Cybernetics (CRC), Wuhan, China.","DOI":"10.1109\/CRC51253.2020.9253492"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, R., Tao, F., Liu, X., Na, J., Leng, H., Wu, J., and Zhou, T. (2022). RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens., 14.","DOI":"10.3390\/rs14133109"},{"key":"ref_24","unstructured":"Li, G., Yun, I., Kim, J., and Kim, J. (2019). DABNet: Depth-Wise Asymmetric Bottleneck for Real-Time Semantic Segmentation. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., and Hajishirzi, H. (2018, January 8\u201314). Espnet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_34"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Siam, M., Gamal, M., Abdel-Razek, M., Yogamani, S., and Jagersand, M. (2018, January 7\u201310). Rtseg: Real-Time Semantic Segmentation Comparative Study. Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece.","DOI":"10.1109\/ICIP.2018.8451495"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1109\/LSP.2021.3051845","article-title":"EACNet: Enhanced Asymmetric Convolution for Real-Time Semantic Segmentation","volume":"28","author":"Li","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. (2018, January 8\u201314). Bisenet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1007\/s11263-021-01515-2","article-title":"BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation","volume":"129","author":"Yu","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_30","unstructured":"Yang, Y., Jiao, L., Liu, X., Liu, F., Yang, S., Feng, Z., and Tang, X. (2022). Transformers Meet Visual Learning Understanding: A Comprehensive Review. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ye, L., Rochan, M., Liu, Z., and Wang, Y. (2019, January 15\u201320). Cross-Modal Self-Attention Network for Referring Image Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01075"},{"key":"ref_32","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2021). Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021). Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. arXiv.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H.S. (2021). Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. arXiv.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Huang, H., Xie, S., Lin, L., Iwamoto, Y., Han, X.-H., Chen, Y.-W., and Tong, R. (2022, January 23\u201329). ScaleFormer: Revisiting the Transformer-Based Backbones from a Scale-Wise Perspective for Medical Image Segmentation. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, Vienna, Austria.","DOI":"10.24963\/ijcai.2022\/135"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Shi, W., Xu, J., and Gao, P. (2022). SSformer: A Lightweight Transformer for Semantic Segmentation. arXiv.","DOI":"10.1109\/MMSP55362.2022.9949177"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 19\u201325). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_38","first-page":"12077","article-title":"SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers","volume":"14","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1016\/j.isprsjprs.2022.06.008","article-title":"UNetFormer: An UNet-like Transformer for Efficient Semantic Segmentation of Remotely Sensed Urban Scene Imagery","volume":"190","author":"Wang","year":"2022","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., and Hu, H. (2022, January 18\u201324). Video Swin Transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00320"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H.R., and Xu, D. (2022, January 18\u201324). Unetr: Transformers for 3d Medical Image Segmentation. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, New Orleans, LA, USA.","DOI":"10.1109\/WACV51458.2022.00181"},{"key":"ref_42","unstructured":"Zhou, H.-Y., Guo, J., Zhang, Y., Yu, L., Wang, L., and Yu, Y. (2021). Nnformer: Interleaved Transformer for Volumetric Segmentation. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018, January 18\u201323). Mobilenetv2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_44","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021). An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"10261","DOI":"10.1109\/TPAMI.2021.3134684","article-title":"MobileSal: Extremely Efficient RGB-D Salient Object Detection","volume":"44","author":"Wu","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"3125","DOI":"10.1109\/TIP.2022.3164550","article-title":"EDN: Salient Object Detection via Extremely-Downsampled Network","volume":"31","author":"Wu","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_47","unstructured":"Contributors, Mms (2022, May 18). MMSegmentation: Openmmlab Semantic Segmentation Toolbox and Benchmark. Available online: https:\/\/github.com\/open-mmlab\/mmsegmentation."},{"key":"ref_48","first-page":"1","article-title":"Pytorch: An Imperative Style, High-Performance Deep Learning Library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Mishra, P., and Sarawadekar, K. (2019, January 17\u201320). Polynomial Learning Rate Policy with Warm Restart for Deep Neural Network. Proceedings of the TENCON 2019\u20142019 IEEE Region 10 Conference (TENCON), Kochi, India.","DOI":"10.1109\/TENCON.2019.8929465"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Strudel, R., Garcia, R., Laptev, I., and Schmid, C. (2021, January 19\u201325). Segmenter: Transformer for Semantic Segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00717"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C.C., Lin, D., and Jia, J. (2018, January 8\u201314). Psanet: Point-Wise Spatial Attention Network for Scene Parsing. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_17"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., and Yan, S. (2022, January 18\u201324). MetaFormer Is Actually What You Need for Vision. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01055"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/24\/6297\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:39:56Z","timestamp":1760146796000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/24\/6297"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,12]]},"references-count":54,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["rs14246297"],"URL":"https:\/\/doi.org\/10.3390\/rs14246297","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,12]]}}}