{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T10:12:23Z","timestamp":1781518343739,"version":"3.54.1"},"reference-count":36,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2024,3,7]],"date-time":"2024-03-07T00:00:00Z","timestamp":1709769600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62363025"],"award-info":[{"award-number":["62363025"]}]},{"name":"National Natural Science Foundation of China","award":["2022-2-58"],"award-info":[{"award-number":["2022-2-58"]}]},{"name":"National Natural Science Foundation of China","award":["23YFFA0064"],"award-info":[{"award-number":["23YFFA0064"]}]},{"name":"Science and Technology Program of Lanzhou","award":["62363025"],"award-info":[{"award-number":["62363025"]}]},{"name":"Science and Technology Program of Lanzhou","award":["2022-2-58"],"award-info":[{"award-number":["2022-2-58"]}]},{"name":"Science and Technology Program of Lanzhou","award":["23YFFA0064"],"award-info":[{"award-number":["23YFFA0064"]}]},{"name":"Key R&amp;D plan of Science and Technology Plan of Gansu Province\u2014Social Development Field Project","award":["62363025"],"award-info":[{"award-number":["62363025"]}]},{"name":"Key R&amp;D plan of Science and Technology Plan of Gansu Province\u2014Social Development Field Project","award":["2022-2-58"],"award-info":[{"award-number":["2022-2-58"]}]},{"name":"Key R&amp;D plan of Science and Technology Plan of Gansu Province\u2014Social Development Field Project","award":["23YFFA0064"],"award-info":[{"award-number":["23YFFA0064"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>In this study, we introduce a novel framework for the semantic segmentation of point clouds in autonomous driving scenarios, termed PVI-Net. This framework uniquely integrates three different data perspectives\u2014point clouds, voxels, and distance maps\u2014executing feature extraction through three parallel branches. Throughout this process, we ingeniously design a point cloud\u2013voxel cross-attention mechanism and a multi-perspective feature fusion strategy for point images. These strategies facilitate information interaction across different feature dimensions of perspectives, thereby optimizing the fusion of information from various viewpoints and significantly enhancing the overall performance of the model. The network employs a U-Net structure and residual connections, effectively merging and encoding information to improve the precision and efficiency of semantic segmentation. We validated the performance of PVI-Net on the SemanticKITTI and nuScenes datasets. The results demonstrate that PVI-Net surpasses most of the previous methods in various performance metrics.<\/jats:p>","DOI":"10.3390\/info15030148","type":"journal-article","created":{"date-parts":[[2024,3,7]],"date-time":"2024-03-07T08:59:37Z","timestamp":1709801977000},"page":"148","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["PVI-Net: Point\u2013Voxel\u2013Image Fusion for Semantic Segmentation of Point Clouds in Large-Scale Autonomous Driving Scenarios"],"prefix":"10.3390","volume":"15","author":[{"given":"Zongshun","family":"Wang","sequence":"first","affiliation":[{"name":"School of Electrical Engineering and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4627-6112","authenticated-orcid":false,"given":"Ce","family":"Li","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jialin","family":"Ma","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhiqiang","family":"Feng","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Limei","family":"Xiao","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2024,3,7]]},"reference":[{"key":"ref_1","first-page":"32398","article-title":"Let images give you more: Point cloud cross-modal training for shape analysis","volume":"35","author":"Yan","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yan, X., Gao, J., Zheng, C., Zheng, C., Zhang, R., Cui, S., and Li, Z. (2022, January 23\u201327). 2dpass: 2d priors assisted semantic segmentation on lidar point clouds. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19815-1_39"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Wei, Y., Zhao, L., Zheng, W., Zhu, Z., Zhou, J., and Lu, J. (2023, January 4\u20136). Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. Proceedings of the IEEE\/CVF International Conference on Computer Vision 2023, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01986"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ottonelli, S., Spagnolo, P., Mazzeo, P.L., and Leo, M. (2013, January 25\u201328). Improved video segmentation with color and depth using a stereo camera. Proceedings of the IEEE International Conference on Industrial Technology 2013, Cape Town, South Africa.","DOI":"10.1109\/ICIT.2013.6505832"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Yang, B., Wang, B., and Li, B. (2023, January 17\u201324). GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2023, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01690"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Xia, Y., Gladkova, M., Wang, R., Li, Q., Stilla, U., Henriques, J.F., and Cremers, D. (2023, January 4\u20136). CASSPR: Cross Attention Single Scan Place Recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision 2023, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00777"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Fan, S., Dong, Q., Zhu, F., Lv, Y., Ye, P., and Wang, F.Y. (2022, January 18\u201324). SCF-Net: Learning spatial contextual features for large-scale point cloud segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA.","DOI":"10.1109\/CVPR46437.2021.01427"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"6835","DOI":"10.1109\/TCSVT.2022.3171968","article-title":"Psnet: Fast data structuring for hierarchical deep learning on point cloud","volume":"32","author":"Li","year":"2022","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Nie, D., Lan, R., Wang, L., and Ren, X. (2022, January 18\u201324). Pyramid architecture for multi-scale processing in point cloud segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01677"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1016\/j.neunet.2018.09.001","article-title":"Dgcnn: A convolutional neural network over large-scale labeled graphs","volume":"108","author":"Phan","year":"2018","journal-title":"Neural Netw."},{"key":"ref_11","unstructured":"Yuan, W., Gu, X., Li, H., Dong, Z., and Zhu, S. (2023). Monocular Scene Reconstruction with 3D SDF Transformers. arXiv."},{"key":"ref_12","unstructured":"Cui, M., Long, J., Feng, M., Li, B., and Kai, H. (2023, January 7\u201314). OctFormer: Efficient octree-based transformer for point cloud compression with local enhancement. Proceedings of the AAAI Conference on Artificial Intelligence 2023, Washington, DC, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Fei, J., Chen, W., Heidenreich, P., Wirges, S., and Stiller, C. (2020, January 14\u201316). SemanticVoxels: Sequential fusion for 3D pedestrian detection using LiDAR point cloud and semantic segmentation. Proceedings of the 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Virtual.","DOI":"10.1109\/MFI49285.2020.9235240"},{"key":"ref_14","unstructured":"Park, C., Jeong, Y., Cho, M., and Park, J. (2024, January 01). Efficient Point Transformer for Large-Scale 3D Scene Understanding. Available online: https:\/\/openreview.net\/forum?id=3SUToIxuIT3."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, H., Shi, C., Shi, S., Lei, M., Wang, S., He, D., Schiele, B., and Wang, L. (2023, January 17\u201324). Dsvt: Dynamic sparse voxel transformer with rotated sets. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2023, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01299"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Milioto, A., Vizzo, I., Behley, J., and Stachniss, C. (2019, January 3\u20138). Rangenet++: Fast and accurate lidar semantic segmentation. Proceedings of the 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China.","DOI":"10.1109\/IROS40897.2019.8967762"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wu, B., Wan, A., Yue, X., and Keutzer, K. (2018, January 21\u201325). Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8462926"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D.L., and Han, S. (June, January 29). Bevfusion: Multi-task multi-sensor fusion with unified bird\u2019s-eye view representation. Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK.","DOI":"10.1109\/ICRA48891.2023.10160968"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Shen, Y., Li, H., Zhao, X., Yang, M., Tan, W., Pu, S., and Mao, H. (2022, January 8\u201312). Maff-net: Filter false positive for 3d vehicle detection with multi-modal adaptive feature fusion. Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China.","DOI":"10.1109\/ITSC55140.2022.9922104"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., and Rodrigo, R. (2022, January 18\u201324). Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00967"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chen, A., Zhang, K., Zhang, R., Wang, Z., Lu, Y., Guo, Y., and Zhang, S. (2023, January 17\u201324). Pimae: Point cloud and image interactive masked autoencoders for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2023, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00512"},{"key":"ref_22","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z., Trigoni, N., and Markham, A. (2020, January 13\u201319). Randla-net: Efficient semantic segmentation of large-scale point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01112"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., and Guibas, L.J. (November, January 27). Kpconv: Flexible and deformable convolution for point clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision 2019, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00651"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Xu, C., Wu, B., Wang, Z., Zhan, W., Vajda, P., Keutzer, K., and Tomizuka, M. (2020, January 23\u201328). Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58604-1_1"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Cortinhal, T., Tzelepis, G., and Erdal Aksoy, E. (2020, January 5\u20137). Salsanext: Fast, uncertainty-aware semantic segmentation of lidar point clouds. Proceedings of the Advances in Visual Computing: 15th International Symposium, ISVC 2020, San Diego, CA, USA.","DOI":"10.1007\/978-3-030-64559-5_16"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, Z., David, P., Yue, X., Xi, Z., Gong, B., and Foroosh, H. (2020, January 13\u201319). Polarnet: An improved grid representation for online lidar point clouds semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00962"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Choy, C., Gwak, J., and Savarese, S. (2019, January 15\u201320). 4d spatio-temporal convnets: Minkowski convolutional neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00319"},{"key":"ref_29","unstructured":"Zhou, H., Zhu, X., Song, X., Ma, Y., Wang, Z., Li, H., and Lin, D. (2020). Cylinder3d: An effective 3d framework for driving-scene lidar semantic segmentation. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Cheng, R., Razani, R., Taghavi, E., Li, E., and Liu, B. (2021, January 20\u201325). 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01236"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, F., Fang, J., Wah, B., and Torr, P. (2020, January 23\u201328). Deep fusionnet for point cloud semantic segmentation. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58586-0_38"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Gerdzhev, M., Razani, R., Taghavi, E., and Bingbing, L. (June, January 30). Tornado-net: Multiview total variation semantic segmentation with diamond inception module. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9562041"},{"key":"ref_33","unstructured":"Liong, V.E., Nguyen, T.N.T., Widjaja, S., Sharma, D., and Chong, Z.J. (2020). Amvnet: Assertion-based multi-view fusion network for lidar semantic segmentation. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Axelsson, M., Holmberg, M., Serra, S., Ovren, H., and Tulldahl, M. (2021, January 20\u201325). Semantic labeling of lidar point clouds for UAV applications. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA.","DOI":"10.1109\/CVPRW53098.2021.00487"},{"key":"ref_35","first-page":"8552","article-title":"Pvnas: 3d neural architecture search with point-voxel convolution","volume":"44","author":"Liu","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Xu, J., Zhang, R., Dou, J., Zhu, Y., Sun, J., and Pu, S. (2022, January 18\u201324). Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision 2022, New Orleans, LA, USA.","DOI":"10.1109\/ICCV48922.2021.01572"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/3\/148\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:10:29Z","timestamp":1760105429000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/3\/148"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,7]]},"references-count":36,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,3]]}},"alternative-id":["info15030148"],"URL":"https:\/\/doi.org\/10.3390\/info15030148","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,7]]}}}