{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T18:48:26Z","timestamp":1760986106660,"version":"build-2065373602"},"reference-count":47,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,3,24]],"date-time":"2022-03-24T00:00:00Z","timestamp":1648080000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Deep point cloud neural networks have achieved promising performance in remote sensing applications, and the prevalence of Transformer in natural language processing and computer vision is in stark contrast to underexplored point-based methods. In this paper, we propose an effective transformer-based network for point cloud learning. To better learn global and local information, we propose a group-in-group relation-based transformer architecture to learn the relationships between point groups to model global information and between points within each group to model local semantic information. To further enhance the local feature representation, we propose a Radius Feature Abstraction (RFA) module to extract radius-based density features characterizing the sparsity of local point clouds. Extensive evaluation on public benchmark datasets demonstrate the effectiveness and competitive performance of our proposed method on point cloud classification and part segmentation.<\/jats:p>","DOI":"10.3390\/rs14071563","type":"journal-article","created":{"date-parts":[[2022,3,24]],"date-time":"2022-03-24T23:31:43Z","timestamp":1648164703000},"page":"1563","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning"],"prefix":"10.3390","volume":"14","author":[{"given":"Shaolei","family":"Liu","sequence":"first","affiliation":[{"name":"Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai 200030, China"},{"name":"Digital Medical Research Center, School of Basic Medical Science, Fudan University, Shanghai 200032, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1204-0942","authenticated-orcid":false,"given":"Kexue","family":"Fu","sequence":"additional","affiliation":[{"name":"Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai 200030, China"},{"name":"Digital Medical Research Center, School of Basic Medical Science, Fudan University, Shanghai 200032, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Manning","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai 200030, China"},{"name":"Digital Medical Research Center, School of Basic Medical Science, Fudan University, Shanghai 200032, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhijian","family":"Song","sequence":"additional","affiliation":[{"name":"Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai 200030, China"},{"name":"Digital Medical Research Center, School of Basic Medical Science, Fudan University, Shanghai 200032, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Wan, J., Xie, Z., Xu, Y., Zeng, Z., Yuan, D., and Qiu, Q. (2021). DGANet A Dilated Graph Attention-Based Network for Local Feature Extraction on 3D Point Clouds. Remote Sens., 13.","DOI":"10.3390\/rs13173484"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wu, W., Xie, Z., Xu, Y., Zeng, Z., and Wan, J. (2021). Point Projection Network: A Multi-View-Based Point Completion Network with Encoder-Decoder Architecture. Remote Sens., 13.","DOI":"10.3390\/rs13234917"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The kitti vision benchmark suite. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Nezhadarya, E., Taghavi, E., Razani, R., Liu, B., and Luo, J. (2020, January 14\u201319). Adaptive hierarchical down-sampling for point cloud classification. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01297"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Park, Y., Lepetit, V., and Woo, W. (2008, January 15\u201318). Multiple 3d object tracking for augmented reality. Proceedings of the 2008 7th IEEE\/ACM International Symposium on Mixed and Augmented Reality, Cambridge, UK.","DOI":"10.1109\/ISMAR.2008.4637336"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"4338","DOI":"10.1109\/TPAMI.2020.3005434","article-title":"Deep learning for 3d point clouds: A survey","volume":"43","author":"Guo","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","unstructured":"Liu, X., Han, Z., Liu, Y.S., and Zwicker, M. (February, January 27). Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Duan, Y., Zheng, Y., Lu, J., Zhou, J., and Tian, Q. (2019, January 15\u201320). Structural relational reasoning of point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00104"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Maturana, D., and Scherer, S. (October, January 28). Voxnet: A 3d convolutional neural network for real-time object recognition. Proceedings of the 2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany.","DOI":"10.1109\/IROS.2015.7353481"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. (2015, January 7\u201313). Multi-view convolutional neural networks for 3d shape recognition. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.114"},{"key":"ref_11","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA."},{"key":"ref_12","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv."},{"key":"ref_13","first-page":"820","article-title":"Pointcnn: Convolution on x-transformed points","volume":"31","author":"Li","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wu, W., Qi, Z., and Fuxin, L. (2019, January 15\u201320). Pointconv: Deep convolutional networks on 3d point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00985"},{"key":"ref_15","unstructured":"Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., and Guibas, L.J. (November, January 27). Kpconv: Flexible and deformable convolution for point clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1016\/j.neunet.2018.09.001","article-title":"Dgcnn: A convolutional neural network over large-scale labeled graphs","volume":"108","author":"Phan","year":"2018","journal-title":"Neural Netw."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Jiang, X., and Ma, X. (2019, January 10\u201312). Dynamic graph CNN with attention module for 3D hand pose estimation. Proceedings of the International Symposium on Neural Networks, Moscow, Russia.","DOI":"10.1007\/978-3-030-22796-8_10"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Pan, X., Xia, Z., Song, S., Li, L.E., and Huang, G. (2021, January 20\u201325). 3d object detection with pointformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00738"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., and Li, H. (2020, January 14\u201319). Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ye, M., Xu, S., and Cao, T. (2020, January 14\u201319). Hvnet: Hybrid voxel network for lidar based 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00170"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s41095-021-0229-5","article-title":"PCT: Point cloud transformer","volume":"7","author":"Guo","year":"2021","journal-title":"Comput. Vis. Media"},{"key":"ref_22","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_23","unstructured":"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020). Language models are few-shot learners. arXiv."},{"key":"ref_24","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_25","unstructured":"Han, K., Xiao, A., Wu, E., Guo, J., Xu, C., and Wang, Y. (2021). Transformer in transformer. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Feng, Y., Zhang, Z., Zhao, X., Ji, R., and Gao, Y. (2018, January 18\u201322). GVCNN: Group-view convolutional neural networks for 3D shape recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00035"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5526","DOI":"10.1109\/TIP.2016.2609814","article-title":"Multi-view 3D object retrieval with deep embedding network","volume":"25","author":"Guo","year":"2016","journal-title":"IEEE Trans. Image Process."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Gadelha, M., Wang, R., and Maji, S. (2018, January 8\u201314). Multiresolution tree networks for 3d point cloud processing. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_7"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Riegler, G., Osman Ulusoy, A., and Geiger, A. (2017, January 21\u201326). Octnet: Learning deep 3d representations at high resolutions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.701"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Jiang, L., Zhao, H., Liu, S., Shen, X., Fu, C.W., and Jia, J. (2019, January 27\u201328). Hierarchical point-edge interaction network for point cloud semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.01053"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yan, X., Zheng, C., Li, Z., Wang, S., and Cui, S. (2020, January 13\u201319). Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00563"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liu, Y., Fan, B., Xiang, S., and Pan, C. (2019, January 15\u201320). Relation-shape convolutional neural network for point cloud analysis. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00910"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3197517.3201301","article-title":"Point convolutional neural networks by extension operators","volume":"37","author":"Atzmon","year":"2018","journal-title":"ACM Trans. Graph."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Hua, B.S., Tran, M.K., and Yeung, S.K. (2018, January 18\u201322). Pointwise convolutional neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00109"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, Z.H., Huang, S.Y., and Wang, Y.C.F. (2020, January 14\u201319). Convolution in the cloud: Learning deformable kernels in 3d graph convolution networks for point cloud analysis. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00187"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lin, Y., Yan, Z., Huang, H., Du, D., Liu, L., Cui, S., and Han, X. (2020, January 13\u201319). Fpconv: Learning local flattening for point convolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00435"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Verma, N., Boyer, E., and Verbeek, J. (2018, January 18\u201322). Feastnet: Feature-steered graph convolutions for 3d shape analysis. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00275"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, C., Samari, B., and Siddiqi, K. (2018, January 8\u201314). Local spectral graph convolution for point set feature learning. Proceedings of the European conference on computer vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01225-0_4"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Te, G., Hu, W., Zheng, A., and Guo, Z. (2018, January 22\u201326). Rgcnn: Regularized graph cnn for point cloud segmentation. Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea.","DOI":"10.1145\/3240508.3240621"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Klokov, R., and Lempitsky, V. (2017, January 21\u201326). Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Honolulu, HI, USA.","DOI":"10.1109\/ICCV.2017.99"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Li, J., Chen, B.M., and Lee, G.H. (2018, January 18\u201322). So-net: Self-organizing network for point cloud analysis. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00979"},{"key":"ref_42","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_43","unstructured":"Hendrycks, D., and Gimpel, K. (2016). Gaussian error linear units (gelus). arXiv."},{"key":"ref_44","unstructured":"Ba, J.L., Kiros, J.R., and Hinton, G.E. (2016). Layer normalization. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2980179.2980238","article-title":"A scalable active framework for region annotation in 3d shape collections","volume":"35","author":"Yi","year":"2016","journal-title":"ACM Trans. Graph."},{"key":"ref_46","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Xu, Y., Fan, T., Xu, M., Zeng, L., and Qiao, Y. (2018, January 8\u201314). Spidercnn: Deep learning on point sets with parameterized convolutional filters. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01237-3_6"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/7\/1563\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:42:13Z","timestamp":1760136133000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/7\/1563"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,24]]},"references-count":47,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["rs14071563"],"URL":"https:\/\/doi.org\/10.3390\/rs14071563","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2022,3,24]]}}}