{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:01:52Z","timestamp":1750309312585,"version":"3.41.0"},"reference-count":65,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,2,16]],"date-time":"2024-02-16T00:00:00Z","timestamp":1708041600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100010574","name":"China Academy of Space Technology","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100010574","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Sen. Netw."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>It is an important aspect to fully leverage complementary sensors of images and point clouds for objects classification and six-dimensional (6D) pose estimation tasks. Prior works extract objects category from a single sensor such as RGB camera or LiDAR, limiting their robustness in the event that a key sensor is severely blocked or fails. In this work, we present a robust objects classification and 6D object pose estimation strategy by dual fusion of image and point cloud data. Instead of solely relying on 3D proposals or mature 2D object detectors, our model deeply integrates 2D and 3D information of heterogeneous data sources by a robustness dual fusion network and an attention-based nonlinear fusion function Attn-fun(.), achieving efficiency as well as high accuracy classification for even missed some data sources. Then, our method is also able to precisely estimate the transformation matrix between two input objects by minimizing the feature difference to achieve 6D object pose estimation, even under strong noise or with outliers. We deploy our proposed method not only to ModelNet40 datasets but also to a real fusion vision rotating platform for tracking objects in outer space based on the estimated pose.<\/jats:p>","DOI":"10.1145\/3639705","type":"journal-article","created":{"date-parts":[[2024,1,5]],"date-time":"2024-01-05T20:13:12Z","timestamp":1704485592000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Robust Classification and 6D Pose Estimation by Sensor Dual Fusion of Image and Point Cloud Data"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3757-9269","authenticated-orcid":false,"given":"Yaming","family":"Xu","sequence":"first","affiliation":[{"name":"Harbin Institute of Technolog (School of Astronautics), China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-0054-967X","authenticated-orcid":false,"given":"Yan","family":"Wang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technolog (School of Astronautics), China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0100-5655","authenticated-orcid":false,"given":"Boliang","family":"Li","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technolog (School of Astronautics), China"}]}],"member":"320","published-online":{"date-parts":[[2024,2,16]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/IV47402.2020.9304694"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00733"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.269"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2017.09.038"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2020.3023541"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49357.2023.10095599"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2929170"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2018.2822828"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21918"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00030"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3211006"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018376"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00356"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2019.00268"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01138"},{"key":"e_1_3_1_17_2","article-title":"A comprehensive survey on point cloud registration","author":"Huang Xiaoshui","year":"2021","unstructured":"Xiaoshui Huang, Guofeng Mei, Jian Zhang, and Rana Abbas. 2021. A comprehensive survey on point cloud registration. arXiv:2103.02690. Retrieved from https:\/\/arxiv.org\/abs\/2103.02690","journal-title":"arXiv:2103.02690"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018505"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364920987859"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00831"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00997"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s43154-020-00021-6"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00504"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01298"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00959"},{"key":"e_1_3_1_26_2","first-page":"3744","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Lee Juho","year":"2019","unstructured":"Juho Lee, Yoonho Lee, Jungtaek Kim, Adam Kosiorek, Seungjin Choi, and Yee Whye Teh. 2019. Set transformer: A framework for attention-based permutation-invariant neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 3744\u20133753."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00979"},{"key":"e_1_3_1_28_2","first-page":"820","article-title":"PointCNN: Convolution on x-transformed points","volume":"31","author":"Li Yangyan","year":"2018","unstructured":"Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: Convolution on x-transformed points. Adv. Neural Inf. Process. Syst. 31 (2018), 820\u2013830.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2019.2958517"},{"key":"e_1_3_1_30_2","article-title":"Deep learning for LiDAR point clouds in autonomous driving: A review","author":"Li Ying","year":"2020","unstructured":"Ying Li, Lingfei Ma, Zilong Zhong, Fei Liu, Michael A. Chapman, Dongpu Cao, and Jonathan Li. 2020. Deep learning for LiDAR point clouds in autonomous driving: A review. IEEE Trans. Neural Netw. Learn. Syst. 32, 8 (2020), 3412\u20133432.","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2022.3181597"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2023.3244348"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00294"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2023.3251021"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00319"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00469"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58580-8_31"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00937"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00102"},{"key":"e_1_3_1_41_2","first-page":"652","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Charles R.","year":"2017","unstructured":"Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652\u2013660."},{"key":"e_1_3_1_42_2","article-title":"PointNet++: Deep hierarchical feature learning on point sets in a metric space","volume":"30","author":"Qi Charles Ruizhongtai","year":"2017","unstructured":"Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30 (2017).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_43_2","article-title":"Stand-alone self-attention in vision models","volume":"32","author":"Ramachandran Prajit","year":"2019","unstructured":"Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, and Jon Shlens. 2019. Stand-alone self-attention in vision models. Adv. Neural Inf. Process. Syst. 32 (2019).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00086"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00178"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/IVS.2019.8813895"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00109"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58604-1_41"},{"key":"e_1_3_1_49_2","first-page":"24261","article-title":"MLP-Mixer: An all-MLP architecture for vision","volume":"34","author":"Tolstikhin Ilya O.","year":"2021","unstructured":"Ilya O. Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, et\u00a0al. 2021. MLP-Mixer: An all-MLP architecture for vision. Adv. Neural Inf. Process. Syst. 34 (2021), 24261\u201324272.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_50_2","first-page":"4227","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Tremblay Jonathan","year":"2020","unstructured":"Jonathan Tremblay, Stephen Tyree, Terry Mosier, and Stan Birchfield. 2020. Indirect object-to-robot pose estimation from an external monocular RGB camera. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4227\u20134234."},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00466"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00346"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3326362"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICESIT53460.2021.9696941"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00114"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00985"},{"key":"e_1_3_1_57_2","first-page":"1912","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Wu Zhirong","year":"2015","unstructured":"Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912\u20131920."},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00033"},{"key":"e_1_3_1_59_2","article-title":"PairConnect: A compute-efficient MLP alternative to attention","author":"Xu Zhaozhuo","year":"2021","unstructured":"Zhaozhuo Xu, Minghao Yan, Junyan Zhang, and Anshumali Shrivastava. 2021. PairConnect: A compute-efficient MLP alternative to attention. arXiv:2106.08235. Retrieved from https:\/\/arxiv.org\/abs\/2106.08235","journal-title":"arXiv:2106.08235"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00344"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00760"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1093\/aob\/mcg029"},{"key":"e_1_3_1_63_2","article-title":"Rethinking token-mixing MLP for MLP-based vision backbone","author":"Yu Tan","year":"2021","unstructured":"Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, and Ping Li. 2021. Rethinking token-mixing MLP for MLP-based vision backbone. arXiv:2106.14882. Retrieved from https:\/\/arxiv.org\/abs\/2106.14882","journal-title":"arXiv:2106.14882"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2020.2987728"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00472"},{"key":"e_1_3_1_66_2","article-title":"Deformable DETR: Deformable transformers for end-to-end object detection","author":"Zhu Xizhou","year":"2021","unstructured":"Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable DETR: Deformable transformers for end-to-end object detection. arXiv:2010.04159. Retrieved from https:\/\/arxiv.org\/abs\/2010.04159","journal-title":"arXiv:2010.04159"}],"container-title":["ACM Transactions on Sensor Networks"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639705","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3639705","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:49Z","timestamp":1750291429000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639705"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,16]]},"references-count":65,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3639705"],"URL":"https:\/\/doi.org\/10.1145\/3639705","relation":{},"ISSN":["1550-4859","1550-4867"],"issn-type":[{"type":"print","value":"1550-4859"},{"type":"electronic","value":"1550-4867"}],"subject":[],"published":{"date-parts":[[2024,2,16]]},"assertion":[{"value":"2022-10-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-30","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-02-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}