{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,5]],"date-time":"2026-04-05T21:49:05Z","timestamp":1775425745381,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T00:00:00Z","timestamp":1649894400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61871376"],"award-info":[{"award-number":["61871376"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Semantic segmentation is an important component in understanding the 3D point cloud scene. Whether we can effectively obtain local and global contextual information from points is of great significance in improving the performance of 3D point cloud semantic segmentation. In this paper, we propose a self-attention feature extraction module: the local transformer structure. By stacking the encoder layer composed of this structure, we can extract local features while preserving global connectivity. The structure can automatically learn each point feature from its neighborhoods and is invariant to different point orders. We designed two unique key matrices, each of which focuses on the feature similarities and geometric structure relationships between the points to generate attention weight matrices. Additionally, the cross-skip selection of neighbors is used to obtain larger receptive fields for each point without increasing the number of calculations required, and can therefore better deal with the junction between multiple objects. When the new network was verified on the S3DIS, the mean intersection over union was 69.1%, and the segmentation accuracies on the complex outdoor scene datasets Semantic3D and SemanticKITTI were 94.3% and 87.8%, respectively, which demonstrate the effectiveness of the proposed methods.<\/jats:p>","DOI":"10.3390\/info13040198","type":"journal-article","created":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T21:44:06Z","timestamp":1649972646000},"page":"198","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Local Transformer Network on 3D Point Cloud Semantic Segmentation"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7657-9141","authenticated-orcid":false,"given":"Zijun","family":"Wang","sequence":"first","affiliation":[{"name":"Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Yun","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Lifeng","family":"An","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China"}]},{"given":"Jian","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China"}]},{"given":"Haiyang","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"11748","DOI":"10.1109\/JSEN.2020.3035632","article-title":"BiLuNetICP: A Deep Neural Network for Object Semantic Segmentation and 6D Pose Recognition","volume":"21","author":"Tran","year":"2021","journal-title":"IEEE Sens. J."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"113816","DOI":"10.1016\/j.eswa.2020.113816","article-title":"Self-Driving Cars: A Survey","volume":"165","author":"Claudine","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Cortinhal, T., Tzelepis, G., and Aksoy, E.E. (2020). SalsaNext: Fast, Uncertainty-Aware Semantic Segmentation of LiDAR Point Clouds. arXiv.","DOI":"10.1007\/978-3-030-64559-5_16"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, Z., David, P., Yue, X., Xi, Z., Gong, B., and Foroosh, H. (2020, January 16\u201318). PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00962"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Rao, Y., Lu, J., and Zhou, J. (2019, January 27\u201328). Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seoul, Korea.","DOI":"10.1109\/CVPR.2019.00054"},{"key":"ref_6","unstructured":"Gerdzhev, M., Razani, R., Taghavi, E., and Liu, B. (June, January 30). TORNADO-Net: MulTiview tOtal vaRiatioN semAntic segmentation with Diamond inception module. Proceedings of the IEEE international Conference on Robotics and Automation, Xi\u2019an, China."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhou, Z., Zhang, Y., and Foroosh, H. (2021, January 19\u201325). Panoptic-PolarNet: Proposal-Free LIDAR Point Cloud Panoptic Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01299"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhao, H., Jiang, L., Jia, J., Torr, P., and Koltun, V. (2020). Point Transformer. arXiv.","DOI":"10.1109\/ICCV48922.2021.01595"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Milioto, A., Vizzo, I., Behley, J., and Stachniss, C. (2019, January 4\u20138). RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, Macau, China.","DOI":"10.1109\/IROS40897.2019.8967762"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wu, B., Wan, A., Yue, X., and Keutzer, K. (2018, January 21\u201326). SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud. Proceedings of the International Conference on Robotics and Automation, Orlando, FL, USA.","DOI":"10.1109\/ICRA.2018.8462926"},{"key":"ref_11","unstructured":"Liong, V.E., Nguyen, T.N.T., Widjaja, S., Sharma, D., and Chong, Z.J. (2020). AMVNet: Assertion-based Multi-View Fusion Network for LiDAR Semantic Segmentation. arXiv."},{"key":"ref_12","unstructured":"Maturana, D., and Scherer, S. (October, January 28). VoxNet: A 3D Convolutional Neural Network for real-time object recognition. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Graham, B., Engelcke, M., and Maaten, L.V.D. (2018, January 18\u201322). 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00961"},{"key":"ref_14","unstructured":"Qi, C.R., Su, H., and Mo, K. (2017, January 21\u201326). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_15","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017, January 4\u20137). PointNet++: Deep hierarchical feature learning on point sets in a metric space. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wu, W., Qi, Z., and Li, F. (2019, January 16\u201320). PointConv: Deep Convolutional Networks on 3D Point Clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00985"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhao, H., Jiang, L., and Fu, C. (2019, January 16\u201320). PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00571"},{"key":"ref_18","unstructured":"Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., and Guibas, L.J. (November, January 27). KPConv: Flexible and Deformable Convolution for Point Clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z., Trigoni, A., and Markham, A. (2020, January 14\u201319). RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01112"},{"key":"ref_20","first-page":"1","article-title":"Dynamic Graph CNN for Learning on Point Clouds","volume":"149","author":"Wang","year":"2019","journal-title":"ACM Transact. Graph."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, L., Huang, Y., Hou, Y., Zhang, S., and Shan, J. (2019, January 16\u201320). Graph Attention Convolution for Point Cloud Semantic Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01054"},{"key":"ref_22","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 3\u20137). An Image is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_23","unstructured":"Li, Y., Zhang, K., and Gao, J. (2021). LocalViT: Bringing Locality to Vision Transformers. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Cho, K., Merrienboer, B.V., G\u00fcl\u00e7ehre, \u00c7., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014, January 25\u201329). Learning Phrase Representations using RNN Encoder\u2013Decoder for Statistical Machine Translation. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_25","unstructured":"Armeni, I., Sax, S., Zamir, A.R., and Savarese, S. (2017). Joint 2D-3D-semantic data for indoor scene understanding. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Hackel, T., Savimov, N., LADICKY, L., Wegner, J.D., Schindler, K., and Pollefeys, M. (2017). Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark. arXiv.","DOI":"10.5194\/isprs-annals-IV-1-W1-91-2017"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., and Gall, J. (2019, January 27\u201328). SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00939"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Tatarchenko, M., Park, J., Koltun, V., and Zhou, Q. (2018, January 18\u201322). Tangent Convolutions for Dense Prediction in 3D. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00409"},{"key":"ref_29","unstructured":"Li, Y., Bu, R., Sun, M., Wu, W., Di, X., and Chen, B. (2018). PointCNN: Convolution On X-Transformed Points. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Landrieu, L., and Simonovsky, M. (2018, January 18\u201322). Large-scale Point Cloud Semantic Segmentation with Superpoint Graphs. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00479"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Qiu, S., Anwar, S., and Barnes, N. (2021, January 19\u201325). Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00180"},{"key":"ref_32","unstructured":"Boulch, A., Puy, G., and Marlet, R. (December, January 30). FKAConv: Feature-Kernel Alignment for Point Cloud Convolution. Proceedings of the Asian Conference on Computer Vision, Cham, Switzerland."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Fan, S., Dong, Q., Zhu, F., Lv, Y., Ye, P., and Wang, F.Y. (2021, January 19\u201325). SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01427"},{"key":"ref_34","unstructured":"Zhang, Z., Hua, B.S., and Yeung, S.K. (November, January 27). ShellNet: Efficient Point Cloud Convolutional Neural Networks using Concentric Shells Statistics. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Truong, G., Gilani, S.Z., Islam, S.M.S., and Suter, D. (2019, January 2\u20134). Fast Point Cloud Registration using Semantic Segmentation. Proceedings of the Digital Image Computing: Techniques and Applications, Perth, Australia.","DOI":"10.1109\/DICTA47822.2019.8945870"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gong, J., Xu, J., Tan, X., Song, H., Qu, Y., Xie, Y., and Ma, L. (2021, January 19\u201325). Omni-supervised Point Cloud Segmentation via Gradual Receptive Field Component Reasoning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01150"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Wu, B., Zhou, X., Zhao, S., Yue, X., and Keutzer, K. (2019, January 20\u201324). SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud. Proceedings of the International Conference on Robotics and Automation, Montreal, Canada.","DOI":"10.1109\/ICRA.2019.8793495"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/4\/198\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:54:16Z","timestamp":1760136856000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/4\/198"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,14]]},"references-count":37,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["info13040198"],"URL":"https:\/\/doi.org\/10.3390\/info13040198","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,14]]}}}