{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T20:36:46Z","timestamp":1772051806207,"version":"3.50.1"},"reference-count":38,"publisher":"Emerald","issue":"4","license":[{"start":{"date-parts":[[2020,4,20]],"date-time":"2020-04-20T00:00:00Z","timestamp":1587340800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IR"],"published-print":{"date-parts":[[2020,4,20]]},"abstract":"<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title>\n<jats:p>This paper aims to design a deep neural network for object instance segmentation and six-dimensional (6D) pose estimation in cluttered scenes and apply the proposed method in real-world robotic autonomous grasping of household objects.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title>\n<jats:p>A novel deep learning method is proposed for instance segmentation and 6D pose estimation in cluttered scenes. An iterative pose refinement network is integrated with the main network to obtain more robust final pose estimation results for robotic applications. To train the network, a technique is presented to generate abundant annotated synthetic data consisting of RGB-D images and object masks in a fast manner without any hand-labeling. For robotic grasping, the offline grasp planning based on eigengrasp planner is performed and combined with the online object pose estimation.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Findings<\/jats:title>\n<jats:p>The experiments on the standard pose benchmarking data sets showed that the method achieves better pose estimation and time efficiency performance than state-of-art methods with depth-based ICP refinement. The proposed method is also evaluated on a seven DOFs Kinova Jaco robot with an Intel Realsense RGB-D camera, the grasping results illustrated that the method is accurate and robust enough for real-world robotic applications.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title>\n<jats:p>A novel 6D pose estimation network based on the instance segmentation framework is proposed and a neural work-based iterative pose refinement module is integrated into the method. The proposed method exhibits satisfactory pose estimation and time efficiency for the robotic grasping.<\/jats:p>\n<\/jats:sec>","DOI":"10.1108\/ir-12-2019-0259","type":"journal-article","created":{"date-parts":[[2020,4,27]],"date-time":"2020-04-27T05:13:51Z","timestamp":1587964431000},"page":"593-606","source":"Crossref","is-referenced-by-count":10,"title":["Deep instance segmentation and 6D object pose estimation in cluttered scenes for robotic autonomous grasping"],"prefix":"10.1108","volume":"47","author":[{"given":"Yongxiang","family":"Wu","sequence":"first","affiliation":[]},{"given":"Yili","family":"Fu","sequence":"additional","affiliation":[]},{"given":"Shuguo","family":"Wang","sequence":"additional","affiliation":[]}],"member":"140","reference":[{"key":"key2020070908325836400_ref001","article-title":"Mask r-cnn for object detection and instance segmentation on Keras and TensorFlow","year":"2017"},{"key":"key2020070908325836400_ref002","first-page":"586","article-title":"Method for registration of 3-D shapes","volume-title":"Sensor Fusion IV: Control Paradigms and Data Structures","year":"1992"},{"issue":"2","key":"key2020070908325836400_ref003","first-page":"289","article-title":"Data-driven grasp synthesis\u2014a survey","volume":"30","year":"2013","journal-title":"IEEE Transactions on Robotics"},{"key":"key2020070908325836400_ref004","first-page":"3364","article-title":"Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","year":"2016"},{"key":"key2020070908325836400_ref005","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1109\/ICAR.2015.7251504","article-title":"The ycb object and model set: towards common benchmarks for manipulation research","volume-title":"2015 international conference on advanced robotics (ICAR)","year":"2015"},{"key":"key2020070908325836400_ref006","doi-asserted-by":"crossref","first-page":"3270","DOI":"10.1109\/IROS.2007.4399227","article-title":"Dimensionality reduction for hand-independent dexterous robotic grasping","volume-title":"2007 IEEE\/RSJ International Conference on Intelligent Robots and Systems","year":"2007"},{"key":"key2020070908325836400_ref007","doi-asserted-by":"crossref","first-page":"7283","DOI":"10.1109\/ICRA.2019.8793744","article-title":"Segmenting unknown 3d objects from real depth images using mask r-cnn trained on synthetic data","volume-title":"2019 International Conference on Robotics and Automation (ICRA)","year":"2019"},{"key":"key2020070908325836400_ref008","article-title":"Deep-6dpose: recovering 6d object pose from a single RGB image","year":"2018"},{"key":"key2020070908325836400_ref009","first-page":"2758","article-title":"Flownet: learning optical flow with convolutional networks","volume-title":"Proceedings of the IEEE international conference on computer vision","year":"2015"},{"issue":"4","key":"key2020070908325836400_ref010","first-page":"2257","article-title":"A markerless human-robot interface using particle filter and Kalman filter for dual robots","volume":"62","year":"2014","journal-title":"IEEE Transactions on Industrial Electronics"},{"issue":"10","key":"key2020070908325836400_ref011","doi-asserted-by":"crossref","first-page":"5411","DOI":"10.1109\/TIE.2014.2301728","article-title":"Human-manipulator interface based on multisensory process via Kalman filters","volume":"61","year":"2014","journal-title":"IEEE Transactions on Industrial Electronics"},{"issue":"2","key":"key2020070908325836400_ref012","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1109\/TII.2016.2526674","article-title":"Markerless human-manipulator interface using leap motion with interval Kalman filter and improved particle filter","volume":"12","year":"2016","journal-title":"IEEE Transactions on Industrial Informatics"},{"key":"key2020070908325836400_ref013","first-page":"2961","article-title":"Mask r-CNN","volume-title":"Proceedings of the IEEE international conference on computer vision","year":"2017"},{"key":"key2020070908325836400_ref014","first-page":"834","article-title":"Going further with point pair features","volume-title":"European conference on computer vision","year":"2016"},{"key":"key2020070908325836400_ref015","first-page":"548","article-title":"Model-based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes","volume-title":"Asian conference on computer vision","year":"2012"},{"key":"key2020070908325836400_ref016","first-page":"1","article-title":"Grasping known objects with humanoid robots: a box-based approach","volume-title":"2009 International Conference on Advanced Robotics","year":"2009"},{"key":"key2020070908325836400_ref017","first-page":"1521","article-title":"SSD-6D: making RGB-based 3D detection and 6D pose estimation great again","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","year":"2017"},{"key":"key2020070908325836400_ref018","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","year":"2012"},{"key":"key2020070908325836400_ref019","first-page":"954","article-title":"Learning analysis-by-synthesis for 6D pose estimation in RGB-D images","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","year":"2015"},{"key":"key2020070908325836400_ref020","first-page":"254","article-title":"A unified framework for multi-view multi-class object pose estimation","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV)","year":"2018"},{"key":"key2020070908325836400_ref021","first-page":"683","article-title":"Deepim: deep iterative matching for 6d pose estimation","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV)","year":"2018"},{"key":"key2020070908325836400_ref022","doi-asserted-by":"crossref","first-page":"3629","DOI":"10.1109\/ICRA.2019.8794435","article-title":"Pointnetgpd: detecting grasp configurations from point sets","volume-title":"2019 International Conference on Robotics and Automation (ICRA)","year":"2019"},{"issue":"2","key":"key2020070908325836400_ref023","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","year":"2004","journal-title":"International Journal of Computer Vision"},{"key":"key2020070908325836400_ref024","doi-asserted-by":"crossref","first-page":"1150","DOI":"10.1109\/ICCV.1999.790410","article-title":"Object recognition from local scale-invariant features","volume-title":"Proceedings of the seventh IEEE international conference on computer vision","year":"1999"},{"key":"key2020070908325836400_ref025","article-title":"Dex-net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics","year":"2017"},{"key":"key2020070908325836400_ref026","first-page":"462","article-title":"Global hypothesis generation for 6D object pose estimation","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","year":"2017"},{"issue":"4","key":"key2020070908325836400_ref027","first-page":"110","article-title":"Graspit! a versatile simulator for robotic grasping","volume":"11","year":"2004","journal-title":"IEEE Robotics & Automation Magazine"},{"key":"key2020070908325836400_ref028","doi-asserted-by":"crossref","first-page":"5663","DOI":"10.1109\/IROS.2006.282367","article-title":"Integrated grasp planning and visual object localization for a humanoid robot with five-fingered hands","volume-title":"2006 IEEE\/RSJ International Conference on Intelligent Robots and Systems","year":"2006"},{"key":"key2020070908325836400_ref029","article-title":"Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach","year":"2018"},{"key":"key2020070908325836400_ref030","first-page":"3828","article-title":"BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","year":"2017"},{"key":"key2020070908325836400_ref031","first-page":"779","article-title":"You only look once: unified, real-time object detection","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","year":"2016"},{"issue":"4","key":"key2020070908325836400_ref032","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1137\/1006093","article-title":"On the parametrization of the three-dimensional rotation group","volume":"6","year":"1964","journal-title":"SIAM Review"},{"key":"key2020070908325836400_ref033","first-page":"462","article-title":"Latent-class hough forests for 3D object detection and pose estimation","volume-title":"European Conference on Computer Vision","year":"2014"},{"key":"key2020070908325836400_ref034","first-page":"292","article-title":"Real-time seamless single shot 6d object pose prediction","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","year":"2018"},{"key":"key2020070908325836400_ref035","first-page":"3343","article-title":"Densefusion: 6d object pose estimation by iterative dense fusion","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","year":"2019"},{"key":"key2020070908325836400_ref036","first-page":"2642","article-title":"Normalized object coordinate space for category-level 6d object pose and size estimation","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","year":"2019"},{"key":"key2020070908325836400_ref037","article-title":"Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes","year":"2017"},{"issue":"3","key":"key2020070908325836400_ref038","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1108\/IR-12-2015-0222","article-title":"Ensuring safety in a human-robot coexisting environment based on two-level protection","volume":"43","year":"2016","journal-title":"Industrial Robot: An International Journal"}],"container-title":["Industrial Robot: the international journal of robotics research and application"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/IR-12-2019-0259\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/IR-12-2019-0259\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T21:40:38Z","timestamp":1753393238000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/ir\/article\/47\/4\/593-606\/187083"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,20]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,4,20]]}},"alternative-id":["10.1108\/IR-12-2019-0259"],"URL":"https:\/\/doi.org\/10.1108\/ir-12-2019-0259","relation":{},"ISSN":["0143-991X","0143-991X"],"issn-type":[{"value":"0143-991X","type":"print"},{"value":"0143-991X","type":"print"}],"subject":[],"published":{"date-parts":[[2020,4,20]]}}}