{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,18]],"date-time":"2026-02-18T23:51:36Z","timestamp":1771458696388,"version":"3.50.1"},"reference-count":45,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2024,8,19]],"date-time":"2024-08-19T00:00:00Z","timestamp":1724025600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Key Research and Development Program of Shaanxi Province","award":["2019GY215"],"award-info":[{"award-number":["2019GY215"]}]},{"name":"Key Research and Development Program of Shaanxi Province","award":["2021ZDLSF06-04"],"award-info":[{"award-number":["2021ZDLSF06-04"]}]},{"name":"Key Research and Development Program of Shaanxi Province","award":["2024SF-YBXM-681"],"award-info":[{"award-number":["2024SF-YBXM-681"]}]},{"name":"Key Research and Development Program of Shaanxi Province","award":["61701403"],"award-info":[{"award-number":["61701403"]}]},{"name":"Key Research and Development Program of Shaanxi Province","award":["61806164"],"award-info":[{"award-number":["61806164"]}]},{"name":"National Natural Science Foundation of China","award":["2019GY215"],"award-info":[{"award-number":["2019GY215"]}]},{"name":"National Natural Science Foundation of China","award":["2021ZDLSF06-04"],"award-info":[{"award-number":["2021ZDLSF06-04"]}]},{"name":"National Natural Science Foundation of China","award":["2024SF-YBXM-681"],"award-info":[{"award-number":["2024SF-YBXM-681"]}]},{"name":"National Natural Science Foundation of China","award":["61701403"],"award-info":[{"award-number":["61701403"]}]},{"name":"National Natural Science Foundation of China","award":["61806164"],"award-info":[{"award-number":["61806164"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Self-supervised learning has made significant progress in point cloud processing. Currently, the primary tasks of self-supervised learning, which include point cloud reconstruction and representation learning, are trained separately due to their structural differences. This separation inevitably leads to increased training costs and neglects the potential for mutual assistance between tasks. In this paper, a self-supervised method named PointUR-RL is introduced, which integrates point cloud reconstruction and representation learning. The method features two key components: a variable masked autoencoder (VMAE) and contrastive learning (CL). The VMAE is capable of processing input point cloud blocks with varying masking ratios, ensuring seamless adaptation to both tasks. Furthermore, CL is utilized to enhance the representation learning capabilities and improve the separability of the learned representations. Experimental results confirm the effectiveness of the method in training and its strong generalization ability for downstream tasks. Notably, high-accuracy classification and high-quality reconstruction have been achieved with the public datasets ModelNet and ShapeNet, with competitive results also obtained with the ScanObjectNN real-world dataset.<\/jats:p>","DOI":"10.3390\/rs16163045","type":"journal-article","created":{"date-parts":[[2024,8,19]],"date-time":"2024-08-19T10:11:28Z","timestamp":1724062288000},"page":"3045","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["PointUR-RL: Unified Self-Supervised Learning Method Based on Variable Masked Autoencoder for Point Cloud Reconstruction and Representation Learning"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6218-5715","authenticated-orcid":false,"given":"Kang","family":"Li","sequence":"first","affiliation":[{"name":"School of Information Science and Technology, Northwest University, Xi\u2019an 710127, China"}]},{"given":"Qiuquan","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northwest University, Xi\u2019an 710127, China"}]},{"given":"Haoyu","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northwest University, Xi\u2019an 710127, China"}]},{"given":"Shibo","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northwest University, Xi\u2019an 710127, China"}]},{"given":"He","family":"Tian","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northwest University, Xi\u2019an 710127, China"}]},{"given":"Ping","family":"Zhou","sequence":"additional","affiliation":[{"name":"Emperor Qin Shihuang\u2019s Mausoleum Site Museum, Key Scientific Research Base of Ancient Polychrome Pottery Conservation, Xi\u2019an 710600, China"}]},{"given":"Xin","family":"Cao","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northwest University, Xi\u2019an 710127, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,19]]},"reference":[{"key":"ref_1","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"4338","DOI":"10.1109\/TPAMI.2020.3005434","article-title":"Deep learning for 3d point clouds: A survey","volume":"43","author":"Guo","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Zhang, R., Tan, J., Cao, Z., Xu, L., Liu, Y., Si, L., and Sun, F. (IEEE Trans. Multimed., 2024). Part-Aware Correlation Networks for Few-shot Learning, IEEE Trans. Multimed., Early Access.","DOI":"10.1109\/TMM.2024.3394681"},{"key":"ref_4","first-page":"857","article-title":"Self-supervised learning: Generative or contrastive","volume":"35","author":"Liu","year":"2021","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1109\/MSP.2017.2765202","article-title":"Generative adversarial networks: An overview","volume":"35","author":"Creswell","year":"2018","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"6735","DOI":"10.1109\/TCSVT.2023.3289142","article-title":"Differential feature awareness network within antagonistic learning for infrared-visible object detection","volume":"34","author":"Zhang","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_7","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-encoding variational bayes. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ye, M., Zhang, X., Yuen, P.C., and Chang, S.-F. (2019, January 15\u201320). Unsupervised embedding learning via invariant and spreading instance feature. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00637"},{"key":"ref_9","unstructured":"Achlioptas, P., Diamanti, O., Mitliagkas, I., and Guibas, L. (2017). Representation learning and adversarial generation of 3d point clouds. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Poursaeed, O., Jiang, T., Qiao, H., Xu, N., and Kim, V.G. (2020, January 25\u201328). Self-supervised learning of point clouds via orientation estimation. Proceedings of the 2020 International Conference on 3D Vision (3DV), Fukuoka, Japan.","DOI":"10.1109\/3DV50981.2020.00112"},{"key":"ref_11","unstructured":"Li, R., Li, X., Fu, C.-W., Cohen-Or, D., and Heng, P.-A. (November, January 27). Pu-gan: A point cloud upsampling adversarial network. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sarmad, M., Lee, H.J., and Kim, Y.M. (2019, January 15\u201320). Rl-gan-net: A reinforcement learning agent controlled gan network for real-time point cloud shape completion. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00605"},{"key":"ref_13","unstructured":"Yang, G., Huang, X., Hao, Z., Liu, M.-Y., Belongie, S., and Hariharan, B. (November, January 27). Pointflow: 3d point cloud generation with continuous normalizing flows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, T., Chang, H., Mishra, S., Zhang, H., Katabi, D., and Krishnan, D. (2023, January 17\u201324). Mage: Masked generative encoder to unify representation learning and image synthesis. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00213"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"11321","DOI":"10.1109\/TPAMI.2023.3262786","article-title":"Unsupervised point cloud representation learning with deep neural networks: A survey","volume":"45","author":"Xiao","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Eckart, B., Yuan, W., Liu, C., and Kautz, J. (2021, January 21\u201324). Self-supervised learning on 3d point clouds by learning discrete generative models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00815"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Chhipa, P.C., Upadhyay, R., Saini, R., Lindqvist, L., Nordenskjold, R., Uchida, S., and Liwicki, M. (2022, January 23\u201327). Depth contrast: Self-supervised pretraining on 3dpm images for mining material classification. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25082-8_14"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., and Rodrigo, R. (2022, January 18\u201324). Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00967"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Huang, S., Xie, Y., Zhu, S.-C., and Zhu, Y. (2021, January 11\u201317). Spatio-temporal self-supervised representation learning for 3d point clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00647"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, K., Xiao, A., Zhang, X., Lu, S., and Shao, L. (2023, January 18\u201322). Fac: 3d representation learning via foreground aware feature contrast. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00914"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, H., Liu, Q., Yue, X., Lasenby, J., and Kusner, M.J. (2021, January 11\u201317). Unsupervised point cloud pre-training via occlusion completion. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00964"},{"key":"ref_22","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., and Lu, J. (2022, January 18\u201324). Point-bert: Pre-training 3d point cloud transformers with masked point modeling. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01871"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Pang, Y., Wang, W., Tay, F.E., Liu, W., Tian, Y., and Yuan, L. (2022, January 23\u201327). Masked autoencoders for point cloud self-supervised learning. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20086-1_35"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wu, X., Jiang, L., Wang, P.-S., Liu, Z., Liu, X., Qiao, Y., Ouyang, W., He, T., and Zhao, H. (2024, January 17\u201321). Point Transformer V3: Simpler Faster Stronger. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00463"},{"key":"ref_26","first-page":"1","article-title":"Dynamic graph cnn for learning on point clouds","volume":"38","author":"Wang","year":"2019","journal-title":"ACM Trans. Graph. (tog)"},{"key":"ref_27","unstructured":"Chen, X., Liu, Z., Xie, S., and He, K. (2024). Deconstructing denoising diffusion models for self-supervised learning. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1007\/s11263-023-01852-4","article-title":"Context autoencoder for self-supervised representation learning","volume":"132","author":"Chen","year":"2024","journal-title":"Int. J. Comput. Vis."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"He, K., Chen, X., Xie, S., Li, Y., Doll\u00e1r, P., and Girshick, R. (2022, January 18\u201324). Masked autoencoders are scalable vision learners. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., and Hu, H. (2022, January 18\u201324). Simmim: A simple framework for masked image modeling. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00943"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Li, Z., Gao, Z., Tan, C., Ren, B., Yang, L.T., and Li, S.Z. (2024, January 17\u201321). General Point Model Pretraining with Autoencoding and Autoregressive. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01980"},{"key":"ref_32","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1016\/j.aiopen.2022.10.001","article-title":"A survey of transformers","volume":"3","author":"Lin","year":"2022","journal-title":"AI Open"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1109\/TMM.2021.3070138","article-title":"Deep-IRTarget: An automatic target detector in infrared imagery using dual-domain feature extraction and allocation","volume":"24","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Multimed."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"7478","DOI":"10.1109\/TNNLS.2022.3227717","article-title":"A survey of visual transformers","volume":"35","author":"Liu","year":"2023","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3505244","article-title":"Transformers in vision: A survey","volume":"54","author":"Khan","year":"2022","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s41095-021-0229-5","article-title":"Pct: Point cloud transformer","volume":"7","author":"Guo","year":"2021","journal-title":"Comput. Vis. Media"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Pan, X., Xia, Z., Song, S., Li, L.E., and Huang, G. (2021, January 21\u201324). 3d object detection with pointformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00738"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhao, H., Jiang, L., Jia, J., Torr, P.H., and Koltun, V. (2021, January 11\u201317). Point transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01595"},{"key":"ref_40","unstructured":"Zhang, Y., Lin, J., Li, R., Jia, K., and Zhang, L. (2022). Point-MA2E: Masked and Affine Transformed AutoEncoder for Self-supervised Point Cloud Learning. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Kolodiazhnyi, M., Vorontsova, A., Konushin, A., and Rukhovich, D. (2024, January 17\u201321). Oneformer3d: One transformer for unified point cloud segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01979"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020, January 13\u201319). Momentum contrast for unsupervised visual representation learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"ref_43","unstructured":"Li, Y., Bu, R., Sun, M., Wu, W., Di, X., and Chen, B. (2018). Pointcnn: Convolution on x-transformed points. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1943","DOI":"10.1109\/TMM.2021.3074240","article-title":"Geometric back-projection network for point cloud classification","volume":"24","author":"Qiu","year":"2021","journal-title":"IEEE Trans. Multimed."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"4436","DOI":"10.1109\/TIP.2021.3072214","article-title":"Pra-net: Point relation-aware network for 3d point cloud analysis","volume":"30","author":"Cheng","year":"2021","journal-title":"IEEE Trans. Image Process."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/3045\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:39:03Z","timestamp":1760110743000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/3045"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,19]]},"references-count":45,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["rs16163045"],"URL":"https:\/\/doi.org\/10.3390\/rs16163045","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,19]]}}}