{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:13:12Z","timestamp":1760145192544,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T00:00:00Z","timestamp":1719273600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["92367301","92267201","52275493"],"award-info":[{"award-number":["92367301","92267201","52275493"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Three-dimensional (3D) single-object tracking (3D SOT) is a fundamental yet not well-solved problem in 3D vision, where the complexity of feature matching and the sparsity of point clouds pose significant challenges. To handle abrupt changes in appearance features and sparse point clouds, we propose a novel 3D SOT network, dubbed CDTracker. It leverages both cosine similarity and an attention mechanism to enhance the robustness of feature matching. By combining similarity embedding and attention assignment, CDTracker performs template and search area feature matching in a coarse-to-fine manner. Additionally, CDTracker addresses the problem of sparse point clouds, which commonly leads to inaccurate tracking. It incorporates relatively dense sampling based on the concept of point cloud segmentation to retain more target points, leading to improved localization accuracy. Extensive experiments on both the KITTI and Waymo datasets demonstrate clear improvements in CDTracker over its competitors.<\/jats:p>","DOI":"10.3390\/rs16132322","type":"journal-article","created":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T12:46:56Z","timestamp":1719319616000},"page":"2322","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["CDTracker: Coarse-to-Fine Feature Matching and Point Densification for 3D Single-Object Tracking"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6245-0846","authenticated-orcid":false,"given":"Yuan","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1825-0393","authenticated-orcid":false,"given":"Chenghan","family":"Pu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9094-2111","authenticated-orcid":false,"given":"Yu","family":"Qi","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6627-0076","authenticated-orcid":false,"given":"Jianping","family":"Yang","sequence":"additional","affiliation":[{"name":"College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2311-9787","authenticated-orcid":false,"given":"Xiang","family":"Wu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-0242-7052","authenticated-orcid":false,"given":"Muyuan","family":"Niu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0429-490X","authenticated-orcid":false,"given":"Mingqiang","family":"Wei","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,6,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Zheng, C., Yan, X., Zhang, H., Wang, B., Cheng, S., Cui, S., and Li, Z. (2022, January 18\u201324). Beyond 3d siamese tracking: A motion-centric paradigm for 3d single object tracking in point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00794"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Giancola, S., Zarzar, J., and Ghanem, B. (2019, January 15\u201320). Leveraging shape completion for 3d siamese tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00145"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Qi, H., Feng, C., Cao, Z., Zhao, F., and Xiao, Y. (2020, January 13\u201319). P2b: Point-to-box network for 3d object tracking in point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00636"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zheng, C., Yan, X., Gao, J., Zhao, W., Zhang, W., Li, Z., and Cui, S. (2021, January 11\u201317). Box-aware feature enhancement for single object tracking on point clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01295"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhou, C., Luo, Z., Luo, Y., Liu, T., Pan, L., Cai, Z., Zhao, H., and Lu, S. (2022, January 18\u201324). Pttr: Relational 3d point cloud object tracking with transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00834"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Shan, J., Zhou, S., Fang, Z., and Cui, Y. (October, January 27). Ptt: Point-track-transformer module for 3d single object tracking in point clouds. Proceedings of the 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic.","DOI":"10.1109\/IROS51168.2021.9636821"},{"key":"ref_7","first-page":"28714","article-title":"3D Siamese voxel-to-BEV tracker for sparse point clouds","volume":"34","author":"Hui","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Hui, L., Wang, L., Tang, L., Lan, K., Xie, J., and Yang, J. (2022). 3d siamese transformer network for single object tracking on point clouds. Computer Vision\u2013ECCV 2022, Proceedings of the 17th European Conference, Tel Aviv, Israel, 23\u201327 October 2022, Springer. Proceedings, Part II.","DOI":"10.1007\/978-3-031-20086-1_17"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhao, K., Zhao, H., Wang, Z., Peng, J., and Hu, Z. (2023). Object Preserving Siamese Network for Single Object Tracking on Point Clouds. arXiv.","DOI":"10.1109\/TMM.2023.3306490"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Xu, T.X., Guo, Y.C., Lai, Y.K., and Zhang, S.H. (2023, January 17\u201324). CXTrack: Improving 3D point cloud tracking with contextual information. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00111"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? the kitti vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., and Caine, B. (2020, January 13\u201319). Scalability in perception for autonomous driving: Waymo open dataset. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00252"},{"key":"ref_13","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, MIT."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Litany, O., He, K., and Guibas, L.J. (2019, January 15\u201320). Deep hough voting for 3d object detection in point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00937"},{"key":"ref_15","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Luo, Z., Zhou, C., Pan, L., Zhang, G., Liu, T., Luo, Y., Zhao, H., Liu, Z., and Lu, S. (2022). Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer. arXiv.","DOI":"10.1109\/CVPR52688.2022.00834"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Nie, J., He, Z., Yang, Y., Gao, M., and Zhang, J. (2023, January 7\u201314). Glt-t: Global-local transformer voting for 3d single object tracking in point clouds. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.","DOI":"10.1609\/aaai.v37i2.25287"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Nie, J., He, Z., Yang, Y., Bao, Z., Gao, M., and Zhang, J. (2023, January 19\u201325). OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, Macao, China.","DOI":"10.24963\/ijcai.2023\/143"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"5543","DOI":"10.1109\/TITS.2023.3243470","article-title":"A lightweight and detector-free 3d single object tracker on point clouds","volume":"24","author":"Xia","year":"2023","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_20","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_21","first-page":"1","article-title":"Dynamic graph cnn for learning on point clouds","volume":"38","author":"Wang","year":"2019","journal-title":"ACM Trans. Graph."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1016\/j.gmod.2011.03.002","article-title":"Graph-based representations of point clouds","volume":"73","author":"Natali","year":"2011","journal-title":"Graph. Model."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 15\u201320). Pointpillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201322). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, R., Wu, J., Luo, Y., and Xu, G. (2024). PointMM: Point Cloud Semantic Segmentation CNN under Multi-Spatial Feature Encoding and Multi-Head Attention Pooling. Remote Sens., 16.","DOI":"10.3390\/rs16071246"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Shi, M., Zhang, F., Chen, L., Liu, S., Yang, L., and Zhang, C. (2024). Position-Feature Attention Network-Based Approach for Semantic Segmentation of Urban Building Point Clouds from Airborne Array Interferometric SAR. Remote Sens., 16.","DOI":"10.3390\/rs16071141"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Chen, X., Li, D., Liu, M., and Jia, J. (2023). CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation. Remote Sens., 15.","DOI":"10.3390\/rs15184455"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Quan, H., Lai, H., Gao, G., Ma, J., Li, J., and Chen, D. (2024). Pairwise CNN-Transformer Features for Human\u2013Object Interaction Detection. Entropy, 26.","DOI":"10.3390\/e26030205"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Gong, H., Mu, T., Li, Q., Dai, H., Li, C., He, Z., Wang, W., Han, F., Tuniyazi, A., and Li, H. (2022). Swin-transformer-enabled YOLOv5 with attention mechanism for small object detection on satellite images. Remote Sens., 14.","DOI":"10.3390\/rs14122861"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., and Lu, H. (2021, January 20\u201325). Transformer tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yang, J., Pan, Z., Liu, Y., Niu, B., and Lei, B. (2023). Single object tracking in satellite videos based on feature enhancement and multi-level matching strategy. Remote Sens., 15.","DOI":"10.3390\/rs15174351"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhao, H., Jiang, L., Jia, J., Torr, P.H., and Koltun, V. (2021, January 11\u201317). Point transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01595"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Estrella-Ibarra, L.F., Le\u00f3n-Cuevas, A.d., and Tovar-Arriaga, S. (2024). Nested Contrastive Boundary Learning: Point Transformer Self-Attention Regularization for 3D Intracranial Aneurysm Segmentation. Technologies, 12.","DOI":"10.3390\/technologies12030028"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Mao, J., Xue, Y., Niu, M., Bai, H., Feng, J., Liang, X., Xu, H., and Xu, C. (2021, January 11\u201317). Voxel transformer for 3d object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00315"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Pan, X., Xia, Z., Song, S., Li, L.E., and Huang, G. (2021, January 20\u201325). 3d object detection with pointformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00738"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"2137","DOI":"10.1109\/TPAMI.2016.2516982","article-title":"A novel performance evaluation methodology for single-target trackers","volume":"38","author":"Kristan","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_40","unstructured":"Yang, Y., Deng, Y., Nie, J., and Zhang, J. (2023). BEVTrack: A Simple Baseline for Point Cloud Tracking in Bird\u2019s-Eye-View. arXiv."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/13\/2322\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:04:16Z","timestamp":1760108656000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/13\/2322"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,25]]},"references-count":40,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["rs16132322"],"URL":"https:\/\/doi.org\/10.3390\/rs16132322","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2024,6,25]]}}}