{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T16:29:12Z","timestamp":1771950552946,"version":"3.50.1"},"reference-count":55,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2023,11,5]],"date-time":"2023-11-05T00:00:00Z","timestamp":1699142400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Portugal 2020, under the Competitiveness and Internationalization Operational Program, the Lisbon Regional Operational Program and by the European Regional Development Fund","award":["POCI-01-0247-FEDER-046103"],"award-info":[{"award-number":["POCI-01-0247-FEDER-046103"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Recent advances in the field of collaborative robotics aim to endow industrial robots with prediction and anticipation abilities. In many shared tasks, the robot\u2019s ability to accurately perceive and recognize the objects being manipulated by the human operator is crucial to make predictions about the operator\u2019s intentions. In this context, this paper proposes a novel learning-based framework to enable an assistive robot to recognize the object grasped by the human operator based on the pattern of the hand and finger joints. The framework combines the strengths of the commonly available software MediaPipe in detecting hand landmarks in an RGB image with a deep multi-class classifier that predicts the manipulated object from the extracted keypoints. This study focuses on the comparison between two deep architectures, a convolutional neural network and a transformer, in terms of prediction accuracy, precision, recall and F1-score. We test the performance of the recognition system on a new dataset collected with different users and in different sessions. The results demonstrate the effectiveness of the proposed methods, while providing valuable insights into the factors that limit the generalization ability of the models.<\/jats:p>","DOI":"10.3390\/s23218989","type":"journal-article","created":{"date-parts":[[2023,11,5]],"date-time":"2023-11-05T07:35:06Z","timestamp":1699169706000},"page":"8989","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Recognition of Grasping Patterns Using Deep Learning for Human\u2013Robot Collaboration"],"prefix":"10.3390","volume":"23","author":[{"given":"Pedro","family":"Amaral","sequence":"first","affiliation":[{"name":"Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6191-0727","authenticated-orcid":false,"given":"Filipe","family":"Silva","sequence":"additional","affiliation":[{"name":"Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1283-7388","authenticated-orcid":false,"given":"V\u00edtor","family":"Santos","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering (DEM), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2023,11,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"26754","DOI":"10.1109\/ACCESS.2017.2773127","article-title":"Working Together: A Review on Safe Human-Robot Collaboration in Industrial Environments","volume":"5","author":"Becerra","year":"2017","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1016\/j.mechatronics.2018.02.009","article-title":"Survey on human\u2013robot collaboration in industrial settings: Safety, intuitive interfaces and applications","volume":"55","author":"Villani","year":"2018","journal-title":"Mechatronics"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"957","DOI":"10.1007\/s10514-017-9677-2","article-title":"Progress and Prospects of the Human-Robot Collaboration","volume":"42","author":"Ajoudani","year":"2018","journal-title":"Auton. Robot."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Matheson, E., Minto, R., Zampieri, E.G.G., Faccio, M., and Rosati, G. (2019). Human-Robot Collaboration in Manufacturing Applications: A Review. Robotics, 8.","DOI":"10.3390\/robotics8040100"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1109\/TSMC.2020.3041231","article-title":"Survey of Human-Robot Collaboration in Industrial Settings: Awareness, Intelligence, and Compliance","volume":"51","author":"Kumar","year":"2021","journal-title":"IEEE Trans. Syst. Man Cybern. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Castro, A., Silva, F., and Santos, V. (2021). Trends of human-robot collaboration in industry contexts: Handover, learning, and metrics. Sensors, 21.","DOI":"10.3390\/s21124113"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1016\/j.mechatronics.2018.08.006","article-title":"Seamless human robot collaborative assembly\u2014An automotive case study","volume":"55","author":"Michalos","year":"2018","journal-title":"Mechatronics"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"3881","DOI":"10.1007\/s00170-019-03790-3","article-title":"Towards seamless human robot collaboration: Integrating multimodal interaction","volume":"105","author":"Papanastasiou","year":"2019","journal-title":"Int. J. Adv. Manuf. Technol."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1109\/THMS.2019.2904558","article-title":"Evaluating Fluency in Human\u2013Robot Collaboration","volume":"49","author":"Hoffman","year":"2019","journal-title":"IEEE Trans. Hum.-Mach. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"953","DOI":"10.1007\/s10514-018-9756-z","article-title":"Special issue on learning for human\u2013robot collaboration","volume":"42","author":"Rozo","year":"2018","journal-title":"Auton. Robot."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"5089","DOI":"10.1080\/00207543.2020.1722324","article-title":"Towards augmenting cyber-physical-human collaborative cognition for human-automation interaction in complex manufacturing and operational environments","volume":"58","author":"Jiao","year":"2020","journal-title":"Int. J. Prod. Res."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Hoffman, G., and Breazeal, C. (2007, January 10\u201312). Effects of anticipatory action on human-robot teamwork efficiency, fluency, and perception of team. Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction, Arlington, VA, USA.","DOI":"10.1145\/1228716.1228718"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1016\/S0079-6123(09)01307-7","article-title":"Perceiving the intentions of others: How do skilled performers make anticipation judgments?","volume":"174","author":"Williams","year":"2009","journal-title":"Prog. Brain Res."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Huang, C.M., and Mutlu, B. (2016, January 7\u201310). Anticipatory robot control for efficient human-robot collaboration. Proceedings of the 2016 11th ACM\/IEEE International Conference on Human-Robot Interaction (HRI), Christchurch, New Zealand.","DOI":"10.1109\/HRI.2016.7451737"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"4132","DOI":"10.1109\/LRA.2018.2861569","article-title":"Action anticipation: Reading the intentions of humans and robots","volume":"3","author":"Duarte","year":"2018","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.3389\/fpsyg.2015.01049","article-title":"Using gaze patterns to predict task intent in collaboration","volume":"6","author":"Huang","year":"2015","journal-title":"Front. Psychol."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"G\u00f6r\u00fcr, O.C., Rosman, B., Sivrikaya, F., and Albayrak, S. (2018, January 5\u20138). Social cobots: Anticipatory decision-making for collaborative robots incorporating unexpected human behaviors. Proceedings of the 2018 ACM\/IEEE International Conference on Human-Robot Interaction, Chicago, IL, USA.","DOI":"10.1145\/3171221.3171256"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gkioxari, G., Girshick, R., Doll\u00e1r, P., and He, K. (2018, January 18\u201322). Detecting and recognizing human-object interactions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00872"},{"key":"ref_19","unstructured":"Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M., and Lee, J. (2019, January 17). Mediapipe: A framework for perceiving and processing reality. Proceedings of the Third Workshop on Computer Vision for AR\/VR at IEEE Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA."},{"key":"ref_20","first-page":"9","article-title":"Activity theory as a potential framework for human-computer interaction research","volume":"1744","author":"Kuutti","year":"1996","journal-title":"Context Consciousness Act. Theory Hum.-Comput. Interact."},{"key":"ref_21","unstructured":"Taubin, G., and Cooper, D.B. (1992). Geometric Invariance in Computer Vision, MIT Press."},{"key":"ref_22","unstructured":"Singh, S. (1998, January 23\u201325). Color-Based Moment Invariants for Viewpoint and Illumination Independent Recognition of Planar Color Patterns. Proceedings of the International Conference on Advances in Pattern Recognition, Plymouth, UK."},{"key":"ref_23","unstructured":"Sarfraz, M. (2006, January 5\u20137). Object Recognition Using Moments: Some Experiments and Observations. Proceedings of the Geometric Modeling and Imaging\u2013New Trends (GMAI\u201906), London, UK."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1016\/j.neucom.2020.01.085","article-title":"Recent advances in deep learning for object detection","volume":"396","author":"Wu","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Barabanau, I., Artemov, A., Burnaev, E., and Murashkin, V. (2020, January 27\u201329). Monocular 3D Object Detection via Geometric Reasoning on Keypoints. Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020)\u2014Volume 5: VISAPP. INSTICC, Valletta, Malta.","DOI":"10.5220\/0009102506520659"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1109\/JPROC.2020.3004555","article-title":"A Comprehensive Survey on Transfer Learning","volume":"109","author":"Zhuang","year":"2021","journal-title":"Proc. IEEE"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zimmermann, C., Welschehold, T., Dornhege, C., Burgard, W., and Brox, T. (2018, January 21\u201325). 3D Human Pose Estimation in RGBD Images for Robotic Task Learning. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8462833"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1016\/j.jmsy.2022.07.006","article-title":"A sensor-to-pattern calibration framework for multi-modal industrial collaborative cells","volume":"64","author":"Rato","year":"2022","journal-title":"J. Manuf. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"102053","DOI":"10.1016\/j.displa.2021.102053","article-title":"Review of multi-view 3D object recognition methods based on deep learning","volume":"69","author":"Qi","year":"2021","journal-title":"Displays"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chao, Y.W., Liu, Y., Liu, X., Zeng, H., and Deng, J. (2018, January 12\u201315). Learning to detect human-object interactions. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00048"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Cao, Z., Radosavovic, I., Kanazawa, A., and Malik, J. (2021, January 11\u201317). Reconstructing hand-object interactions in the wild. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01219"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, S., Jiang, H., Xu, J., Liu, S., and Wang, X. (2021, January 20\u201325). Semi-supervised 3d hand-object poses estimation with interactions in time. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01445"},{"key":"ref_34","unstructured":"Gupta, S., and Malik, J. (2015). Visual semantic role labeling. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhuang, B., Wu, Q., Shen, C., Reid, I., and van den Hengel, A. (2018, January 2\u20137). HCVRD: A benchmark for large-scale human-centered visual relationship detection. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12260"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/TPAMI.2015.2430335","article-title":"Anticipating human activities using object affordances for reactive robotic response","volume":"38","author":"Koppula","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Hayes, B., and Shah, J.A. (June, January 29). Interpretable models for fast activity recognition and anomaly explanation during collaborative robotics tasks. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989778"},{"key":"ref_38","unstructured":"Furnari, A., and Farinella, G.M. (November, January 27). What would you expect? Anticipating egocentric actions with rolling-unrolling lstms and modality attention. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1423","DOI":"10.1109\/TMM.2019.2943753","article-title":"Interact as you intend: Intention-driven human-object interaction detection","volume":"22","author":"Xu","year":"2019","journal-title":"IEEE Trans. Multimed."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"8116","DOI":"10.1109\/TIP.2021.3113114","article-title":"Action anticipation using pairwise human-object interactions and transformers","volume":"30","author":"Roy","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_41","first-page":"1","article-title":"What is that in your hand? Recognizing grasped objects via forearm electromyography sensing","volume":"Volume 2","author":"Fan","year":"2018","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.ijhcs.2010.09.003","article-title":"Object interaction detection using hand posture cues in an office setting","volume":"69","author":"Paulson","year":"2011","journal-title":"Int. J. Hum.-Comput. Stud."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1016\/j.ijhcs.2013.01.002","article-title":"Automatic recognition of object size and shape via user-dependent measurements of the grasping hand","volume":"71","author":"Vatavu","year":"2013","journal-title":"Int. J. Hum.-Comput. Stud."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1109\/THMS.2015.2470657","article-title":"The grasp taxonomy of human grasp types","volume":"46","author":"Feix","year":"2015","journal-title":"IEEE Trans. Hum.-Mach. Syst."},{"key":"ref_45","unstructured":"MacKenzie, C.L., and Iberall, T. (1994). The Grasping Hand, Elsevier."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1109\/TOH.2014.2326871","article-title":"Analysis of human grasping behavior: Object characteristics and grasp type","volume":"7","author":"Feix","year":"2014","journal-title":"IEEE Trans. Haptics"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Puhlmann, S., Heinemann, F., Brock, O., and Maertens, M. (2016, January 9\u201314). A compact representation of human single-object grasping. Proceedings of the 2016 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea.","DOI":"10.1109\/IROS.2016.7759308"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"990","DOI":"10.3389\/fpsyg.2018.00990","article-title":"Reach-to-grasp movements: A multimodal techniques study","volume":"9","author":"Betti","year":"2018","journal-title":"Front. Psychol."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1080\/00222895.2017.1327407","article-title":"Shaping of reach-to-grasp kinematics by intentions: A meta-analysis","volume":"50","author":"Egmose","year":"2018","journal-title":"J. Mot. Behav."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Valkov, D., Kockwelp, P., Daiber, F., and Kr\u00fcger, A. (2023, January 23\u201328). Reach Prediction using Finger Motion Dynamics. Proceedings of the Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany.","DOI":"10.1145\/3544549.3585773"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Berger, E., Wheeler, R., and Ng, A.Y. (2009, January 12\u201317). ROS: An open-source Robot Operating System. Proceedings of the ICRA Workshop on Open Source Software, Kobe, Japan.","DOI":"10.1109\/MRA.2010.936956"},{"key":"ref_52","unstructured":"Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.L., and Grundmann, M. (2020). Mediapipe hands: On-device real-time hand tracking. arXiv."},{"key":"ref_53","unstructured":"Amprimo, G., Masi, G., Pettiti, G., Olmo, G., Priano, L., and Ferraris, C. (2023). Hand tracking for clinical applications: Validation of the Google MediaPipe Hand (GMH) and the depth-enhanced GMH-D frameworks. arXiv."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Amprimo, G., Ferraris, C., Masi, G., Pettiti, G., and Priano, L. (2022, January 10\u201316). Gmh-d: Combining google mediapipe and rgb-depth cameras for hand motor skills remote assessment. Proceedings of the 2022 IEEE International Conference on Digital Health (ICDH), Barcelona, Spain.","DOI":"10.1109\/ICDH55609.2022.00029"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"180101","DOI":"10.1038\/sdata.2018.101","article-title":"Human grasping database for activities of daily living with depth, color and kinematic data streams","volume":"5","author":"Saudabayev","year":"2018","journal-title":"Sci. Data"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/21\/8989\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:17:55Z","timestamp":1760131075000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/21\/8989"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,5]]},"references-count":55,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["s23218989"],"URL":"https:\/\/doi.org\/10.3390\/s23218989","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,5]]}}}