{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:54:19Z","timestamp":1760234059575,"version":"build-2065373602"},"reference-count":33,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2021,3,24]],"date-time":"2021-03-24T00:00:00Z","timestamp":1616544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, a manipulation planning method for object re-orientation based on semantic segmentation keypoint detection is proposed for robot manipulator which is able to detect and re-orientate the randomly placed objects to a specified position and pose. There are two main parts: (1) 3D keypoint detection system; and (2) manipulation planning system for object re-orientation. In the 3D keypoint detection system, an RGB-D camera is used to obtain the information of the environment and can generate 3D keypoints of the target object as inputs to represent its corresponding position and pose. This process simplifies the 3D model representation so that the manipulation planning for object re-orientation can be executed in a category-level manner by adding various training data of the object in the training phase. In addition, 3D suction points in both the object\u2019s current and expected poses are also generated as the inputs of the next operation stage. During the next stage, Mask Region-Convolutional Neural Network (Mask R-CNN) algorithm is used for preliminary object detection and object image. The highest confidence index image is selected as the input of the semantic segmentation system in order to classify each pixel in the picture for the corresponding pack unit of the object. In addition, after using a convolutional neural network for semantic segmentation, the Conditional Random Fields (CRFs) method is used to perform several iterations to obtain a more accurate result of object recognition. When the target object is segmented into the pack units of image process, the center position of each pack unit can be obtained. Then, a normal vector of each pack unit\u2019s center points is generated by the depth image information and pose of the object, which can be obtained by connecting the center points of each pack unit. In the manipulation planning system for object re-orientation, the pose of the object and the normal vector of each pack unit are first converted into the working coordinate system of the robot manipulator. Then, according to the current and expected pose of the object, the spherical linear interpolation (Slerp) algorithm is used to generate a series of movements in the workspace for object re-orientation on the robot manipulator. In addition, the pose of the object is adjusted on the z-axis of the object\u2019s geodetic coordinate system based on the image features on the surface of the object, so that the pose of the placed object can approach the desired pose. Finally, a robot manipulator and a vacuum suction cup made by the laboratory are used to verify that the proposed system can indeed complete the planned task of object re-orientation.<\/jats:p>","DOI":"10.3390\/s21072280","type":"journal-article","created":{"date-parts":[[2021,3,24]],"date-time":"2021-03-24T21:36:51Z","timestamp":1616621811000},"page":"2280","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Manipulation Planning for Object Re-Orientation Based on Semantic Segmentation Keypoint Detection"],"prefix":"10.3390","volume":"21","author":[{"given":"Ching-Chang","family":"Wong","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Tamkang University, New Taipei City 25137, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Li-Yu","family":"Yeh","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Tamkang University, New Taipei City 25137, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chih-Cheng","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Tamkang University, New Taipei City 25137, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chi-Yi","family":"Tsai","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Tamkang University, New Taipei City 25137, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hisasuki","family":"Aoyama","sequence":"additional","affiliation":[{"name":"Department of Mechanical and Intelligent Systems Engineering, University of Electro-Communications, Tokyo 182-8585, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,3,24]]},"reference":[{"key":"ref_1","first-page":"1","article-title":"Going deeper with convolutions","volume":"7\u201312","author":"Christian","year":"2015","journal-title":"IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn."},{"key":"ref_2","first-page":"481","article-title":"Very deep convolutional neural networks for robust speech recognition","volume":"1","author":"Yanmin","year":"2017","journal-title":"IEEE Workshop Spok. Lang. Technol."},{"key":"ref_3","unstructured":"Ross, G., Jeff, D., Trevor, D., and Jitendra, M. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn., 580\u2013587."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015). Fast R-CNN. IEEE Int. Conf. Comput. Vis., 1440\u20131448.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_5","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Neural Inform. Process. Syst., 91\u201399."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017). Mask R-CNN. IEEE Int. Conf. Comput. Vis., 2980\u20132988.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified, real-time object detection. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn., 779\u2013788.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017). YOLO9000: Better, faster, stronger. IEEE Conf. Comput. Vis. Pattern Recogn., 187\u2013213.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_9","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016). SSD: Single shot multibox detector. Eu. Conf. Comput. Vis., 21\u201337.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Jiang, P., Ishihara, Y., Sugiyama, N., Oaki, J., Tokura, S., Sugahara, A., and Ogawa, A. (2020). Depth image\u2013based deep learning of grasp planning for textureless planar-faced objects in vision-guided robotic bin-picking. IEEE Sens. J., 20.","DOI":"10.3390\/s20030706"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"9370","DOI":"10.1109\/JSEN.2018.2870957","article-title":"Visual object recognition and pose estimation based on a deep semantic segmentation network","volume":"18","author":"Lin","year":"2018","journal-title":"IEEE Sen. J."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1108\/IR-12-2019-0259","article-title":"Deep instance segmentation and 6D object pose estimation in cluttered scenes for robotic autonomous grasping","volume":"47","author":"Wu","year":"2020","journal-title":"Ind. Robot"},{"key":"ref_14","unstructured":"Manuelli, L., Gao, W., Florence, P., and Tedrake, R. (2019). kPAM: KeyPoint affordances for category-level robotic manipulation. arXiv."},{"key":"ref_15","first-page":"536","article-title":"Integral human pose regression","volume":"11210","author":"Sun","year":"2018","journal-title":"Comput. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Semochkin, A.N., Zabihifar, S., and Efimov, A.R. (2019, January 7\u201310). Object grasping and manipulating according to user-defined method using key-points. Proceedings of the IEEE International Conference on Developments in eSystems Engineering, Kazan, Russia.","DOI":"10.1109\/DeSE.2019.00089"},{"key":"ref_17","unstructured":"Vecerik, M., Regli, J.-B., Sushkov, O., Barker, D., Pevceviciute, R., Roth\u00f6rl, T., Schuster, C., Hadsell, R., Agapito, L., and Scholz, J. (2020). S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via Multi-View Consistency. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Newbury, R., He, K., Cosgun, A., and Drummond, T. (2020). Learning to place objects onto flat surfaces in human-preferred orientations. arXiv.","DOI":"10.1109\/LRA.2021.3068122"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J.A., and Goldberg, K. (2017). Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. Robot. Sci. Syst., 58\u201372.","DOI":"10.15607\/RSS.2017.XIII.058"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Morrison, D., Leitner, J., and Corke, P. (2018). Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. Robot. Sci. Syst., 21\u201331.","DOI":"10.15607\/RSS.2018.XIV.021"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wada, K., Okada, K., and Inaba, M. (2019). Joint learning of instance and semantic segmentation for robotic pick-and-place with heavy occlusions in clutter. Int. Conf. Robot. Autom., 9558\u20139564.","DOI":"10.1109\/ICRA.2019.8793783"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1101","DOI":"10.1007\/s10514-018-9781-y","article-title":"A regrasp planning component for object reorientation","volume":"43","author":"Wan","year":"2019","journal-title":"Autonom. Robot."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wan, W., Mason, M.T., Fukui, R., and Kuniyoshi, Y. (2015). Improving regrasp algorithms to analyze the utility of work surfaces in a workcell. IEEE Int. Conf. Robot. Autom., 4326\u20134333.","DOI":"10.1109\/ICRA.2015.7139796"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ali, A., and Lee, J.Y. (2020). Integrated motion planning for assembly task with part manipulation using re-grasping. Appl. Sci., 10.","DOI":"10.3390\/app10030749"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Nguyen, A., Kanoulas, D., Caldwell, D.G., and Tsagarakis, N.G. (2016, January 9\u201314). Preparatory object reorientation for task-oriented grasping. Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Daejeon, Korea.","DOI":"10.1109\/IROS.2016.7759156"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Do, T.T., Nguyen, A., and Reid, I. (2018, January 21\u201325). AffordanceNet: An end-to-end deep learning approach for object affordance detection. Proceedings of the IEEE International Conference on Robotics and Automation, Brisbane, QLD, Australia.","DOI":"10.1109\/ICRA.2018.8460902"},{"key":"ref_27","unstructured":"Lai, Y.-C. (2020). Task-Oriented Grasping and Tool Manipulation for Dual-Arm Robot (In Chinese). [Ph.D. Thesis, Tamkang University]."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Qin, Z., Fang, K., Zhu, Y., Li, F., and Savarese, S. (2019). KETO: Learning keypoint representations for tool manipulation. arXiv.","DOI":"10.1109\/ICRA40945.2020.9196971"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Shoemake, K. (1985, January 22\u201326). Animating rotation with quaternion curves. Proceedings of the 12th Annual Conference on Computer Graphics and Interactive Techniques, New York, NY, USA.","DOI":"10.1145\/325334.325242"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2014). Fully convolutional networks for semantic segmentation. IEEE Conf. Comput. Vis. Pattern Recogn., 3431\u20133440.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"26871","DOI":"10.1109\/ACCESS.2021.3056903","article-title":"Motion planning for dual-arm robot based on soft actor-critic","volume":"9","author":"Wong","year":"2021","journal-title":"IEEE Access."},{"key":"ref_33","unstructured":"(2018). Intel\u00ae RealSense\u2122 Depth Module D400 Series Custom Calibration, Intel Corporation."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/7\/2280\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:40:38Z","timestamp":1760161238000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/7\/2280"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,24]]},"references-count":33,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2021,4]]}},"alternative-id":["s21072280"],"URL":"https:\/\/doi.org\/10.3390\/s21072280","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,3,24]]}}}