{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T20:13:17Z","timestamp":1770754397678,"version":"3.50.0"},"reference-count":73,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2024,11,2]],"date-time":"2024-11-02T00:00:00Z","timestamp":1730505600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&amp;D Program of China","doi-asserted-by":"publisher","award":["2023YFC3806000"],"award-info":[{"award-number":["2023YFC3806000"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&amp;D Program of China","doi-asserted-by":"publisher","award":["2023YFC3806002"],"award-info":[{"award-number":["2023YFC3806002"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&amp;D Program of China","doi-asserted-by":"publisher","award":["61936014"],"award-info":[{"award-number":["61936014"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&amp;D Program of China","doi-asserted-by":"publisher","award":["2021SHZDZX0100"],"award-info":[{"award-number":["2021SHZDZX0100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&amp;D Program of China","doi-asserted-by":"publisher","award":["22511105300"],"award-info":[{"award-number":["22511105300"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Natural Science Foundation of China","award":["2023YFC3806000"],"award-info":[{"award-number":["2023YFC3806000"]}]},{"name":"National Natural Science Foundation of China","award":["2023YFC3806002"],"award-info":[{"award-number":["2023YFC3806002"]}]},{"name":"National Natural Science Foundation of China","award":["61936014"],"award-info":[{"award-number":["61936014"]}]},{"name":"National Natural Science Foundation of China","award":["2021SHZDZX0100"],"award-info":[{"award-number":["2021SHZDZX0100"]}]},{"name":"National Natural Science Foundation of China","award":["22511105300"],"award-info":[{"award-number":["22511105300"]}]},{"name":"Shanghai Municipal Science and Technology Major Project","award":["2023YFC3806000"],"award-info":[{"award-number":["2023YFC3806000"]}]},{"name":"Shanghai Municipal Science and Technology Major Project","award":["2023YFC3806002"],"award-info":[{"award-number":["2023YFC3806002"]}]},{"name":"Shanghai Municipal Science and Technology Major Project","award":["61936014"],"award-info":[{"award-number":["61936014"]}]},{"name":"Shanghai Municipal Science and Technology Major Project","award":["2021SHZDZX0100"],"award-info":[{"award-number":["2021SHZDZX0100"]}]},{"name":"Shanghai Municipal Science and Technology Major Project","award":["22511105300"],"award-info":[{"award-number":["22511105300"]}]},{"name":"Shanghai Science and Technology Innovation Action Plan Project","award":["2023YFC3806000"],"award-info":[{"award-number":["2023YFC3806000"]}]},{"name":"Shanghai Science and Technology Innovation Action Plan Project","award":["2023YFC3806002"],"award-info":[{"award-number":["2023YFC3806002"]}]},{"name":"Shanghai Science and Technology Innovation Action Plan Project","award":["61936014"],"award-info":[{"award-number":["61936014"]}]},{"name":"Shanghai Science and Technology Innovation Action Plan Project","award":["2021SHZDZX0100"],"award-info":[{"award-number":["2021SHZDZX0100"]}]},{"name":"Shanghai Science and Technology Innovation Action Plan Project","award":["22511105300"],"award-info":[{"award-number":["22511105300"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2023YFC3806000"],"award-info":[{"award-number":["2023YFC3806000"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2023YFC3806002"],"award-info":[{"award-number":["2023YFC3806002"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["61936014"],"award-info":[{"award-number":["61936014"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2021SHZDZX0100"],"award-info":[{"award-number":["2021SHZDZX0100"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["22511105300"],"award-info":[{"award-number":["22511105300"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Recently, significant advancements have been made in 3D point cloud analysis by leveraging transformer architecture in 3D space. However, it remains challenging to effectively implement local and global learning within irregular and sparse structures of 3D point clouds. This paper presents the Adaptive Interaction Transformer (AIFormer), a novel hierarchical transformer architecture designed to enhance 3D point cloud analysis by fusing local and global features through the adaptive interaction of features. Specifically, AIFormer mainly consists of several stacked AIFormer Blocks. Each AIFormer module employs the Local Relation Aggregation Module and the Global Context Aggregation Module, respectively, to extract local details of relationships within the reference point and long-range dependencies between reference points. Then, the local and global features are fused using the Adaptive Interaction Module for adaptive interaction to optimize the point representation. Additionally, the AIFormer Block further designs geometric relation functions and contextual relative semantic encoding to enhance local and global feature extraction capabilities, respectively. Extensive experiments on three popular 3D point cloud datasets verify that AIFormer achieves state-of-the-art or comparable performances. Our comprehensive ablation study further validates the effectiveness and soundness of the AIFormer design.<\/jats:p>","DOI":"10.3390\/rs16214103","type":"journal-article","created":{"date-parts":[[2024,11,4]],"date-time":"2024-11-04T09:52:54Z","timestamp":1730713974000},"page":"4103","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["AIFormer: Adaptive Interaction Transformer for 3D Point Cloud Understanding"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-4223-910X","authenticated-orcid":false,"given":"Xutao","family":"Chu","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Tongji University, Shanghai 201804, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6109-2522","authenticated-orcid":false,"given":"Shengjie","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Tongji University, Shanghai 201804, China"}]},{"given":"Hongwei","family":"Dai","sequence":"additional","affiliation":[{"name":"School of Computer Engineering, Jiangsu Ocean University, Lianyungang 222005, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,11,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Milioto, A., Vizzo, I., Behley, J., and Stachniss, C. (2019, January 4\u20138). RangeNet++: Fast and Accurate LiDAR Semantic Segmentation. Proceedings of the 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), The Venetian Macao, Macau, China.","DOI":"10.1109\/IROS40897.2019.8967762"},{"key":"ref_2","unstructured":"Li, Y., Bu, R., Sun, M., Wu, W., Di, X., and Chen, B. (2018, January 2\u20138). PointCNN: Convolution On X-Transformed Points. Proceedings of the 31th Advances in Neural Information Processing Systems (NeurIPS), Montr\u00e9al, QC, Canada."},{"key":"ref_3","unstructured":"Jiang, L., Zhao, H., Liu, S., Shen, X., Fu, C.-W., and Jia, J. (November, January 27). Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation. Proceedings of the 17th IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_4","first-page":"1","article-title":"Dynamic Graph CNN for Learning on Point Clouds","volume":"38","author":"Wang","year":"2019","journal-title":"ACM Trans. Graph."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Tang, H., Liu, Z., Zhao, S., Lin, Y., Lin, J., Wang, H., and Han, S. (2020, January 23\u201328). Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution. Proceedings of the European Conference on Computer Vision (ECCV), Virtual.","DOI":"10.1007\/978-3-030-58604-1_41"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Choy, C., Gwak, J., and Savarese, S. (2019, January 15\u201320). 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00319"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Graham, B., Engelcke, M., and van der Maaten, L. (2018, January 18\u201322). 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00961"},{"key":"ref_8","unstructured":"Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. (2015, January 7\u201312). 3D ShapeNets: A Deep Representation for Volumetric Shapes. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA."},{"key":"ref_9","unstructured":"Qi, C.R., Su, H., Kaichun, M., and Guibas, L.J. (2017, January 21\u201326). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA."},{"key":"ref_10","first-page":"8338","article-title":"Learning Semantic Segmentation of Large-Scale Point Clouds with Random Sampling","volume":"44","author":"Hu","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","unstructured":"Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., and Guibas, L.J. (November, January 27). KPConv: Flexible and Deformable Convolution for Point Clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wu, W., Qi, Z., and Fuxin, L. (2019, January 15\u201320). PointConv: Deep Convolutional Networks on 3D Point Clouds. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00985"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Yan, X., Zheng, C., Li, Z., Wang, S., and Cui, S. (2020, January 13\u201319). PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks with Adaptive Sampling. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00563"},{"key":"ref_14","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 3\u20137). An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations (ICLR), Virtual."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s41095-021-0229-5","article-title":"PCT: Point Cloud Transformer","volume":"7","author":"Guo","year":"2021","journal-title":"Comp. Vis. Media"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhao, H., Jiang, L., Jia, J., Torr, P., and Koltun, V. (2021, January 10\u201317). Point Transformer. Proceedings of the 19th IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01595"},{"key":"ref_17","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017, January 4\u20139). PointNet++: Deep hierarchical feature learning on point sets in a metric space. Proceedings of the 30th Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, C., Wan, H., Shen, X., and Wu, Z. (2021). PVT: Point-Voxel Transformer for Point Cloud Learning. arXiv.","DOI":"10.1002\/int.23073"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Park, C., Jeong, Y., Cho, M., and Park, J. (2022, January 18\u201324). Fast Point Transformer. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01644"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Lai, X., Liu, J., Jiang, L., Wang, L., Zhao, H., Liu, S., Qi, X., and Jia, J. (2022, January 18\u201324). Stratified Transformer for 3D Point Cloud Segmentation. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00831"},{"key":"ref_21","unstructured":"Duan, L., Zhao, S., Xue, N., Gong, M., Xia, G.-S., and Tao, D. (2023, January 10\u201316). ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding. Proceedings of the 36th Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Li, X.-L., Guo, M.-H., Mu, T.-J., Martin, R.R., and Hu, S.-M. (2023, January 20\u201322). Long Range Pooling for 3D Large-Scale Scene Understanding. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00993"},{"key":"ref_23","unstructured":"Qiu, H., Yu, B., and Tao, D. (2023). Collect-and-Distribute Transformer for 3D Point Cloud Analysis. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"He, Y., Yu, H., Yang, Z., Liu, X., Sun, W., and Mian, A. (2024). Full Point Encoding for Local Feature Aggregation in 3-D Point Clouds. IEEE Trans. Neural Netw. Learn. Syst., early access.","DOI":"10.1109\/TNNLS.2024.3409891"},{"key":"ref_25","unstructured":"Li, H., Zheng, T., Chi, Z., Yang, Z., Wang, W., Wu, B., Lin, B., and Cai, D. (2023). APPT: Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"4338","DOI":"10.1109\/TPAMI.2020.3005434","article-title":"Deep Learning for 3D Point Clouds: A Survey","volume":"43","author":"Guo","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhu, X., Zhou, H., Wang, T., Hong, F., Ma, Y., Li, W., Li, H., and Lin, D. (2021, January 20\u201325). Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00981"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Yan, X., Gao, J., Li, J., Zhang, R., Li, Z., Huang, R., and Cui, S. (2021, January 2\u20139). Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion. Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), Virtually.","DOI":"10.1609\/aaai.v35i4.16419"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chen, Y., Liu, J., Zhang, X., Qi, X., and Jia, J. (2023, January 20\u201322). LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01296"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Xu, C., Wu, B., Wang, Z., Zhan, W., Vajda, P., Keutzer, K., and Tomizuka, M. (2020, January 23\u201328). SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Virtual.","DOI":"10.1007\/978-3-030-58604-1_1"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"3169","DOI":"10.1007\/s10489-024-05302-7","article-title":"DFAMNet: Dual Fusion Attention Multi-Modal Network for Semantic Segmentation on LiDAR Point Clouds","volume":"54","author":"Li","year":"2024","journal-title":"Appl. Intell."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Puy, G., Boulch, A., and Marlet, R. (2023, January 2\u20136). Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation. Proceedings of the 20th IEEE\/CVF International Conference on Computer Vision (ICCV), Paris, France.","DOI":"10.1109\/ICCV51070.2023.00313"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, Z., David, P., Yue, X., Xi, Z., Gong, B., and Foroosh, H. (2020, January 13\u201319). PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00962"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Lin, Y., Yan, Z., Huang, H., Du, D., Liu, L., Cui, S., and Han, X. (2020, January 13\u201319). FPConv: Learning Local Flattening for Point Convolution. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00435"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Hu, W., Zhao, H., Jiang, L., Jia, J., and Wong, T.-T. (2021, January 20\u201325). Bidirectional Projection Network for Cross Dimension Scene Understanding. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01414"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Peng, B., Wu, X., Jiang, L., Chen, Y., Zhao, H., Tian, Z., and Jia, J. (2024). OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation. arXiv.","DOI":"10.1109\/CVPR52733.2024.02013"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"31767","DOI":"10.1109\/ACCESS.2023.3262560","article-title":"3D Point Cloud Semantic Segmentation System Based on Lightweight FPConv","volume":"11","author":"Fan","year":"2023","journal-title":"IEEE Access"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gong, J., Xu, J., Tan, X., Song, H., Qu, Y., Xie, Y., and Ma, L. (2021, January 20\u201325). Omni-Supervised Point Cloud Segmentation via Gradual Receptive Field Component Reasoning. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01150"},{"key":"ref_39","unstructured":"Qian, G., Li, Y., Peng, H., Mai, J., Hammoud, H., Elhoseiny, M., and Ghanem, B. (December, January 28). PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. Proceedings of the 35th Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Kang, X., Wang, C., and Chen, X. (2023). Region-Enhanced Feature Learning for Scene Semantic Segmentation. IEEE Trans. Multimed., early access.","DOI":"10.1109\/TMM.2023.3342718"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"9374","DOI":"10.1109\/TPAMI.2023.3238516","article-title":"AGConv: Adaptive Graph Convolution on 3D Point Clouds","volume":"45","author":"Wei","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_42","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 30th Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3505244","article-title":"Transformers in Vision: A Survey","volume":"54","author":"Khan","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_44","unstructured":"Han, Q., Fan, Z., Dai, Q., Sun, L., Cheng, M.-M., Liu, J., and Wang, J. (2022, January 25\u201329). On the Connection between Local Attention and Dynamic Depth-Wise Convolution. Proceedings of the International Conference on Learning Representations (ICLR), Virtual."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Chen, Q., Wu, Q., Wang, J., Hu, Q., Hu, T., Ding, E., Cheng, J., and Wang, J. (2022, January 18\u201324). MixFormer: Mixing Features across Windows and Dimensions. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00518"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the 19th IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Gu, J., Kwon, H., Wang, D., Ye, W., Li, M., Chen, Y.-H., Lai, L., Chandra, V., and Pan, D.Z. (2022, January 18\u201324). Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01178"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 10\u201317). Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. Proceedings of the 19th IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., and Li, Y. (2022, January 23\u201327). MaxViT: Multi-Axis Vision Transformer. Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20053-3_27"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Dong, X., Bao, J., Chen, D., Zhang, W., Yu, N., Yuan, L., Chen, D., and Guo, B. (, January 18\u201324). CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01181"},{"key":"ref_51","unstructured":"Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., and Shen, C. (2021, January 6\u201314). Twins: Revisiting the Design of Spatial Attention in Vision Transformers. Proceedings of the 34th Advances in Neural Information Processing Systems (NeurIPS), Virtual."},{"key":"ref_52","unstructured":"Li, W., Wang, X., Xia, X., Wu, J., Li, J., Xiao, X., Zheng, M., and Wen, S. (2022). SepViT: Separable Vision Transformer. arXiv."},{"key":"ref_53","unstructured":"Fan, Q., Huang, H., Zhou, X., and He, R. (2024, January 9\u201315). Lightweight Vision Transformer with Bidirectional Interaction. Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada."},{"key":"ref_54","unstructured":"Lahoud, J., Cao, J., Khan, F.S., Cholakkal, H., Anwer, R.M., Khan, S., and Yang, M.-H. (2024). 3D Vision with Transformers: A Survey. arXiv."},{"key":"ref_55","unstructured":"Lu, D., Xie, Q., Wei, M., Xu, L., and Li, J. (2022). Transformers in 3D Point Clouds: A Survey. arXiv."},{"key":"ref_56","unstructured":"Wu, X., Lao, Y., Jiang, L., Liu, X., and Zhao, H. (December, January 28). Point Transformer V2: Grouped Vector Attention and Partition-Based Pooling. Proceedings of the 35th Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Liu, Z., Yang, X., Tang, H., Yang, S., and Han, S. (2023, January 20\u201322). FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00122"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Fan, L., Pang, Z., Zhang, T., Wang, Y.-X., Zhao, H., Wang, F., Wang, N., and Zhang, Z. (2022, January 18\u201324). Embracing Single Stride 3D Object Detector with Sparse Transformer. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00827"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Xiang, P., Wen, X., Liu, Y.-S., Zhang, H., Fang, Y., and Han, Z. (2023, January 20\u201322). Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/ICCV51070.2023.01634"},{"key":"ref_60","unstructured":"Yang, Y., Guo, Y., Xiong, J., Liu, Y., Pan, H., Wang, P., Tong, X., and Guo, B. (2023). Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Hui, L., Yang, H., Cheng, M., Xie, J., and Yang, J. (2021, January 20\u201325). Pyramid Point Cloud Transformer for Large-Scale Place Recognition. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00604"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Yang, J., Zhang, Q., Ni, B., Li, L., Liu, J., Zhou, M., and Tian, Q. (2019, January 15\u201320). Modeling Point Clouds with Self-Attention and Gumbel Subset Sampling. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00344"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Ai, D., Xu, C., Zhang, X., Ai, Y., Bai, Y., and Liu, Y. (2023, January 24\u201326). ASSA-Net: Semantic Segmentation Network for Point Clouds Based on Adaptive Sampling and Self-Attention. Proceedings of the 2023 5th International Conference on Natural Language Processing (ICNLP), Guangzhou, China.","DOI":"10.1109\/ICNLP58431.2023.00018"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Zhang, C., Wan, H., Shen, X., and Wu, Z. (2022, January 18\u201324). PatchFormer: An Efficient Point Transformer with Patch Attention. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01150"},{"key":"ref_65","unstructured":"Yang, X., Jin, M., He, W., and Chen, Q. (2023). PointCAT: Cross-Attention Transformer for Point Cloud. arXiv."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"4985","DOI":"10.1109\/TCSVT.2023.3247506","article-title":"LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers","volume":"33","author":"Huang","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Armeni, I., Sener, O., Zamir, A.R., Jiang, H., Brilakis, I., Fischer, M., and Savarese, S. (July, January 26). 3D Semantic Parsing of Large-Scale Indoor Spaces. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.170"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Rozenberszki, D., Litany, O., and Dai, A. (2022, January 23\u201327). Language-Grounded Indoor 3D Semantic Segmentation in the Wild. Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19827-4_8"},{"key":"ref_69","unstructured":"Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., and Gall, J. (November, January 27). SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. Proceedings of the 17th IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Tang, L., Zhan, Y., Chen, Z., Yu, B., and Tao, D. (2022, January 18\u201324). Contrastive Boundary Learning for Point Cloud Segmentation. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00830"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Wu, X., Jiang, L., Wang, P.-S., Liu, Z., Liu, X., Qiao, Y., Ouyang, W., He, T., and Zhao, H. (2024, January 17\u201321). Point Transformer V3: Simpler, Faster, Stronger. Proceedings of the 2024 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00463"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"4952","DOI":"10.1109\/TIP.2022.3190709","article-title":"SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation","volume":"31","author":"Tao","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Kong, L., Liu, Y., Chen, R., Ma, Y., Zhu, X., Li, Y., Hou, Y., Qiao, Y., and Liu, Z. (2023, January 2\u20136). Rethinking Range View Representation for LiDAR Segmentation. Proceedings of the 120th IEEE\/CVF International Conference on Computer Vision (ICCV), Paris, France.","DOI":"10.1109\/ICCV51070.2023.00028"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/21\/4103\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:27:28Z","timestamp":1760113648000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/21\/4103"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,2]]},"references-count":73,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2024,11]]}},"alternative-id":["rs16214103"],"URL":"https:\/\/doi.org\/10.3390\/rs16214103","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,2]]}}}