{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T16:12:58Z","timestamp":1758125578013,"version":"3.44.0"},"reference-count":38,"publisher":"Emerald","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9,19]]},"abstract":"<jats:sec>\n                  <jats:title>Purpose<\/jats:title>\n                  <jats:p>The 6D pose estimation is a crucial branch of robot vision. However, the authors find that due to the failure to make full use of the complementarity of the appearance and geometry information of the object, the failure to deeply explore the contributions of the features from different regions to the pose estimation, and the failure to take advantage of the invariance of the geometric structure of keypoints, the performances of the most existing methods are not satisfactory. This paper aims to design a high-precision 6D pose estimation method based on above insights.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Design\/methodology\/approach<\/jats:title>\n                  <jats:p>First, a multi-scale cross-attention-based feature fusion module (MCFF) is designed to aggregate the appearance and geometry information by exploring the correlations between appearance features and geometry features in the various regions. Second, the authors build a multi-query regional-attention-based feature differentiation module (MRFD) to learn the contribution of each region to each keypoint. Finally, a geometric enhancement mechanism (GEM) is designed to use structure information to predict keypoints and optimize both pose and keypoints in the inference phase.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Findings<\/jats:title>\n                  <jats:p>Experiments on several benchmarks and real robot show that the proposed method performs better than existing methods. Ablation studies illustrate the effectiveness of each module of the authors\u2019 method.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Originality\/value<\/jats:title>\n                  <jats:p>A high-precision 6D pose estimation method is proposed by studying the relationship between the appearance and geometry from different object parts and the geometric invariance of the keypoints, which is of great significance for various robot applications.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1108\/ir-08-2024-0366","type":"journal-article","created":{"date-parts":[[2025,2,5]],"date-time":"2025-02-05T04:12:46Z","timestamp":1738728766000},"page":"581-590","source":"Crossref","is-referenced-by-count":0,"title":["Attention-based object pose estimation with feature fusion and geometry enhancement"],"prefix":"10.1108","volume":"52","author":[{"given":"Shuai","family":"Yang","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology School of Mechatronics Engineering, , Harbin, and State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bin","family":"Wang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology School of Mechatronics Engineering, , Harbin, and State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junyuan","family":"Tao","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology School of Electronics and Information Engineering, , Harbin,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhe","family":"Ruan","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology School of Mechatronics Engineering, , Harbin,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hong","family":"Liu","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology State Key Laboratory of Robotics and System, , Harbin,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","published-online":{"date-parts":[[2025,2,7]]},"reference":[{"key":"2025091707163023800_ref001","doi-asserted-by":"publisher","first-page":"6793","DOI":"10.1109\/CVPR52688.2022.00668","article-title":"Ove6d: object viewpoint encoding for depth-based 6d object pose estimation","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Cai","year":"2022"},{"key":"2025091707163023800_ref002","doi-asserted-by":"publisher","first-page":"3150","DOI":"10.1109\/CVPR42600.2020.00322","article-title":"Reconstruct locally, localize globally: a model free method for object pose estimation","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Cai","year":"2020"},{"issue":"5","key":"2025091707163023800_ref003","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1109\/34.1000236","article-title":"Mean shift: a robust approach toward feature space analysis","volume":"24","author":"Comaniciu","year":"2002","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2025091707163023800_ref004","doi-asserted-by":"publisher","first-page":"11444","DOI":"10.1109\/CVPR42600.2020.01146","article-title":"GraspNet-1Billion: a large-scale benchmark for general object grasping","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Fang","year":"2020"},{"issue":"7","key":"2025091707163023800_ref005","doi-asserted-by":"publisher","first-page":"3358","DOI":"10.1109\/TCSVT.2022.3233191","article-title":"The 6D pose estimation of the aircraft using geometric property","volume":"33","author":"Fu","year":"2023","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"2025091707163023800_ref006","doi-asserted-by":"publisher","first-page":"11081","DOI":"10.1109\/ICRA48506.2021.9561475","article-title":"Cloudaae: learning 6d object pose regression with on-line data synthesis on point clouds","volume-title":"Proceedings of IEEE International Conference on Robotics and Automation","author":"Gao","year":"2021"},{"key":"2025091707163023800_ref007","doi-asserted-by":"publisher","first-page":"3643","DOI":"10.1109\/ICRA40945.2020.9197461","article-title":"6d object pose regression via supervised learning on point clouds","volume-title":"Proceedings of IEEE International Conference on Robotics and Automation","author":"Gao","year":"2020"},{"key":"2025091707163023800_ref008","doi-asserted-by":"publisher","first-page":"5072","DOI":"10.1109\/TIP.2021.3078109","article-title":"Efficient center voting for object detection and 6D pose estimation in 3D point cloud","volume":"30","author":"Guo","year":"2021","journal-title":"IEEE Transactions on Image Processing"},{"key":"2025091707163023800_ref009","doi-asserted-by":"publisher","first-page":"4831","DOI":"10.1109\/CVPR52729.2023.00468","article-title":"Shape-Constraint recurrent flow for 6D object pose estimation","author":"Hai","year":"2023"},{"key":"2025091707163023800_ref010","doi-asserted-by":"publisher","first-page":"3002","DOI":"10.1109\/CVPR46437.2021.00302","article-title":"Ffb6d: a full flow bidirectional fusion network for 6d pose estimation","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"He","year":"2021"},{"issue":"5","key":"2025091707163023800_ref011","doi-asserted-by":"publisher","first-page":"3198","DOI":"10.1109\/TMECH.2021.3109344","article-title":"A generative feature-to-image robotic vision framework for 6d pose measurement of metal parts","volume":"27","author":"He","year":"2022","journal-title":"IEEE\/ASME Transactions on Mechatronics"},{"key":"2025091707163023800_ref012","doi-asserted-by":"publisher","first-page":"35103","DOI":"10.48550\/arXiv.2301.07673","article-title":"Onepose++: keypoint-free one-shot object pose estimation without CAD models","author":"He","year":"2022","journal-title":"Proceedings of Advances in Neural Information Processing Systems"},{"key":"2025091707163023800_ref013","doi-asserted-by":"publisher","first-page":"11632","DOI":"10.1109\/CVPR42600.2020.01165","article-title":"Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"He","year":"2020"},{"issue":"4","key":"2025091707163023800_ref014","doi-asserted-by":"publisher","first-page":"8980","DOI":"10.1109\/LRA.2022.3189158","article-title":"Voting and attention-based pose relation learning for object pose estimation from 3d point clouds","volume":"7","author":"Hoang","year":"2022","journal-title":"IEEE Robotics and Automation Letters"},{"key":"2025091707163023800_ref015","doi-asserted-by":"publisher","first-page":"11108","DOI":"10.1109\/CVPR42600.2020.01112","article-title":"Randla-net: efficient semantic segmentation of large-scale point clouds","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Hu","year":"2020"},{"issue":"1","key":"2025091707163023800_ref016","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1109\/LRA.2022.3222998","article-title":"Ambiguity-aware multi-object pose optimization for visually-assisted robot manipulation","volume":"8","author":"Jeon","year":"2023","journal-title":"IEEE Robotics and Automation Letters"},{"key":"2025091707163023800_ref017","doi-asserted-by":"publisher","first-page":"11164","DOI":"10.1109\/CVPR52688.2022.01089","article-title":"Uni6d: a unified cnn framework without projection breakdown for 6d pose estimation","author":"Jiang","year":"2022"},{"issue":"3","key":"2025091707163023800_ref018","doi-asserted-by":"publisher","first-page":"6526","DOI":"10.1109\/LRA.2022.3174261","article-title":"E2EK: end-to-end regression network based on keypoint for 6D pose estimation","volume":"7","author":"Lin","year":"2022","journal-title":"IEEE Robotics and Automation Letters"},{"key":"2025091707163023800_ref019","doi-asserted-by":"publisher","first-page":"127652","DOI":"10.1016\/j.neucom.2024.127652","article-title":"Transpose: 6d object pose estimation with geometry-aware transformer","volume":"589","author":"Lin","year":"2024","journal-title":"Neurocomputing"},{"issue":"8","key":"2025091707163023800_ref020","doi-asserted-by":"publisher","first-page":"8203","DOI":"10.1109\/TIE.2022.3212422","article-title":"A robust Pixel-Wise prediction network with applications to industrial robotic grasping","volume":"70","author":"Liu","year":"2023","journal-title":"IEEE Transactions on Industrial Electronics"},{"issue":"6","key":"2025091707163023800_ref021","doi-asserted-by":"publisher","first-page":"3212","DOI":"10.1109\/TPAMI.2020.3047388","article-title":"Pvnet: pixel-wise voting network for 6dof pose estimation","volume":"44","author":"Peng","year":"2022","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"3","key":"2025091707163023800_ref022","doi-asserted-by":"publisher","first-page":"1515","DOI":"10.1109\/LRA.2023.3240362","article-title":"i2c-net: using instance-level neural networks for monocular category-level 6D pose estimation","volume":"8","author":"Remus","year":"2023","journal-title":"IEEE Robotics and Automation Letters"},{"key":"2025091707163023800_ref023","doi-asserted-by":"publisher","first-page":"4937","DOI":"10.1109\/CVPR42600.2020.00499","article-title":"Superglue: learning feature matching with graph neural networks","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Sarlin","year":"2020"},{"key":"2025091707163023800_ref024","doi-asserted-by":"publisher","first-page":"15217","DOI":"10.1109\/CVPR46437.2021.01497","article-title":"Stablepose: learning 6d object poses from geometrically stable patches","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Shi","year":"2021"},{"key":"2025091707163023800_ref025","doi-asserted-by":"publisher","first-page":"6728","DOI":"10.1109\/CVPR52688.2022.00662","article-title":"ZebraPose: coarse to fine surface encoding for 6DoF object pose estimation","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Su","year":"2022"},{"key":"2025091707163023800_ref026","doi-asserted-by":"publisher","first-page":"6815","DOI":"10.1109\/CVPR52688.2022.00670","article-title":"Onepose: one-shot object pose estimation without cad models","author":"Sun","year":"2022"},{"issue":"9","key":"2025091707163023800_ref027","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1109\/TCSVT.2019.2950449","article-title":"3D mapping and 6D pose computation for real time augmented reality on cylindrical objects","volume":"30","author":"Tang","year":"2020","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"2025091707163023800_ref028","doi-asserted-by":"publisher","first-page":"14540","DOI":"10.1109\/CVPR42600.2020.01455","article-title":"Morefusion: multi-object reasoning for 6d pose estimation from volumetric fusion","author":"Wada","year":"2020"},{"issue":"3","key":"2025091707163023800_ref029","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3136301","article-title":"Occlusion-aware self-supervised monocular 6D object pose estimation","volume":"46","author":"Wang","year":"2021","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2025091707163023800_ref030","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TIM.2023.3236334","article-title":"A geometry-enhanced 6D pose estimation network with incomplete shape recovery for industrial parts","volume":"72","author":"Wang","year":"2023","journal-title":"IEEE Transactions on Instrumentation and Measurement"},{"key":"2025091707163023800_ref031","doi-asserted-by":"publisher","first-page":"3338","DOI":"10.1109\/CVPR.2019.00346","article-title":"Densefusion: 6d object pose estimation by iterative dense fusion","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang","year":"2019"},{"key":"2025091707163023800_ref032","doi-asserted-by":"publisher","first-page":"606","DOI":"10.1109\/CVPR52729.2023.00066","article-title":"BundleSDF: neural 6-DoF tracking and 3D reconstruction of unknown objects","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wen","year":"2023"},{"issue":"2","key":"2025091707163023800_ref033","doi-asserted-by":"publisher","first-page":"1840","DOI":"10.1109\/LRA.2021.3136873","article-title":"Panet: a pixel-level attention network for 6d pose estimation with embedding vector features","volume":"7","author":"Xie","year":"2022","journal-title":"IEEE Robotics and Automation Letters"},{"article-title":"Megapose: 6d pose estimation of novel objects via render & compare","year":"2022","author":"Labb\u00e9","key":"2025091707163023800_ref034"},{"key":"2025091707163023800_ref035","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TIM.2022.3150568","article-title":"EANet: edge-attention 6D pose estimation network for texture-less objects","volume":"71","author":"Zhang","year":"2022","journal-title":"IEEE Transactions on Instrumentation and Measurement"},{"key":"2025091707163023800_ref036","first-page":"14045","article-title":"Learning symmetry-aware geometry correspondences for 6d object pose estimation","author":"Zhao","year":"2023"},{"key":"2025091707163023800_ref037","doi-asserted-by":"publisher","first-page":"1630","DOI":"10.1109\/TMM.2020.3001533","article-title":"A novel depth and color feature fusion framework for 6d object pose estimation","volume":"23","author":"Zhou","year":"2020","journal-title":"IEEE Transactions on Multimedia"},{"key":"2025091707163023800_ref038","doi-asserted-by":"publisher","first-page":"13921","DOI":"10.1109\/ICCV51070.2023.01284","article-title":"Deep fusion transformer network with weighted vector-wise keypoints voting for robust 6D object pose estimation","volume-title":"Proceedings of IEEE\/CVF International Conference on Computer Vision","author":"Zhou","year":"2023"}],"container-title":["Industrial Robot: the international journal of robotics research and application"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/IR-08-2024-0366\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/ir\/article-pdf\/52\/4\/581\/10293971\/ir-08-2024-0366en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ir\/article-pdf\/52\/4\/581\/10293971\/ir-08-2024-0366en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T11:16:40Z","timestamp":1758107800000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ir\/article\/52\/4\/581\/1253066\/Attention-based-object-pose-estimation-with"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,7]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,9,19]]}},"URL":"https:\/\/doi.org\/10.1108\/ir-08-2024-0366","relation":{},"ISSN":["0143-991X","1758-5791"],"issn-type":[{"type":"print","value":"0143-991X"},{"type":"electronic","value":"1758-5791"}],"subject":[],"published":{"date-parts":[[2025,2,7]]}}}