{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T10:15:09Z","timestamp":1772619309934,"version":"3.50.1"},"reference-count":22,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T00:00:00Z","timestamp":1772150400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>In Industry 5.0, the transition from fixed traditional automation to flexible human\u2013robot collaboration (HRC) needs interfaces that are both intuitive and efficient. This paper introduces a novel, multimodal control system for autonomous object handling, specifically designed to enhance natural user interaction in dynamic work environments. The system integrates a 6-Degrees of Freedom (DoF) collaborative robot (UR5e) with a hand-eye RGB-D vision system to achieve robust autonomy. The core technical contribution lies in a vision pipeline utilizing deep learning for object detection and point cloud processing for accurate 6D pose estimation, enabling advanced tasks such as human-aware object handover directly onto the operator\u2019s hand. Crucially, an Automatic Speech Recognition (ASR) is incorporated, providing a Natural Language Understanding (NLU) layer that allows operators to issue real-time commands for task modification, error correction and object selection. Experimental results demonstrate that this multimodal approach offers a streamlined workflow aiming to improve operational flexibility compared to traditional HMIs, while enhancing the perceived naturalness of the collaborative task. The system establishes a framework for highly responsive and intuitive human\u2013robot workspaces, advancing the state of the art in natural interaction for collaborative object manipulation.<\/jats:p>","DOI":"10.3390\/robotics15030049","type":"journal-article","created":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T17:05:16Z","timestamp":1772211916000},"page":"49","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Collaborative Robotic System for Autonomous Object Handling with Natural User Interaction"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2085-0892","authenticated-orcid":false,"given":"Federico","family":"Neri","sequence":"first","affiliation":[{"name":"DIISM\u2014Department of Industrial Engineering and Mathematical Sciences, Polytechnic University of Marche, 60131 Ancona, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6650-1786","authenticated-orcid":false,"given":"Gaetano","family":"Lettera","sequence":"additional","affiliation":[{"name":"DIISM\u2014Department of Industrial Engineering and Mathematical Sciences, Polytechnic University of Marche, 60131 Ancona, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7820-2890","authenticated-orcid":false,"given":"Giacomo","family":"Palmieri","sequence":"additional","affiliation":[{"name":"DIISM\u2014Department of Industrial Engineering and Mathematical Sciences, Polytechnic University of Marche, 60131 Ancona, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4065-3212","authenticated-orcid":false,"given":"Massimo","family":"Callegari","sequence":"additional","affiliation":[{"name":"DIISM\u2014Department of Industrial Engineering and Mathematical Sciences, Polytechnic University of Marche, 60131 Ancona, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Mourtzis, D. (2024). Challenges and opportunities of the transition from Industry 4.0 to Industry 5.0. Manufacturing from Industry 4.0 to Industry 5.0, Elsevier.","DOI":"10.1016\/B978-0-443-13924-6.00004-1"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Antonaci, F.G., Olivetti, E.C., Marcolin, F., Castiblanco Jimenez, I.A., Eynard, B., Vezzetti, E., and Moos, S. (2024). Workplace well-being in industry 5.0: A worker-centered systematic review. Sensors, 24.","DOI":"10.3390\/s24175473"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"102937","DOI":"10.1016\/j.rcim.2024.102937","article-title":"Reviewing human-robot collaboration in manufacturing: Opportunities and challenges in the context of industry 5.0","volume":"93","author":"Dhanda","year":"2025","journal-title":"Robot. Comput.-Integr. Manuf."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Forlini, M., Neri, F., Carbonari, L., Callegari, M., and Palmieri, G. (2024). Enhanced Human-Robot Collaboration through AI Tools and Collision Avoidance Control. Proceedings of the 2024 20th IEEE\/ASME International Conference on Mechatronic and Embedded Systems and Applications (MESA), IEEE.","DOI":"10.1109\/MESA61532.2024.10704917"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Costanzo, M., De Maria, G., and Natale, C. (2021). Handover control for human-robot and robot-robot collaboration. Front. Robot. AI, 8.","DOI":"10.3389\/frobt.2021.672995"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Duarte, L., Polito, M., Gastaldi, L., Neto, P., and Pastorelli, S. (2024). Demonstration of real-time event camera to collaborative robot communication. Proceedings of the The International Conference of IFToMM ITALY, Springer.","DOI":"10.1007\/978-3-031-64553-2_41"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Costanzo, M., Natale, C., and Selvaggio, M. (2023). Visual and Haptic Cues for Human-Robot Handover*. Proceedings of the 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), IEEE.","DOI":"10.1109\/RO-MAN57019.2023.10309480"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kim, M., Yang, S., Kim, B., Kim, J., and Kim, D. (2024). Human-to-Robot Handover Based on Reinforcement Learning. Sensors, 24.","DOI":"10.3390\/s24196275"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Salinas-Mart\u00ednez, \u00c1.G., Cunill\u00e9-Rodr\u00edguez, J., Aquino-L\u00f3pez, E., and Garc\u00eda-Moreno, A.I. (2024). Multimodal Human-Robot Interaction Using Gestures and Speech: A Case Study for Printed Circuit Board Manufacturing. J. Manuf. Mater. Process., 8.","DOI":"10.3390\/jmmp8060274"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"101007","DOI":"10.1115\/1.4054297","article-title":"Real-Time Multi-Modal Human-Robot Collaboration Using Gestures and Speech","volume":"144","author":"Chen","year":"2022","journal-title":"J. Manuf. Sci. Eng."},{"key":"ref_11","unstructured":"Yu, P., Abuduweili, A., Liu, R., and Liu, C. (2024). Robustifying Long-term Human-Robot Collaboration through a Multimodal and Hierarchical Framework. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"110851","DOI":"10.1016\/j.knosys.2023.110851","article-title":"Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization","volume":"277","author":"Kheddar","year":"2023","journal-title":"Knowl.-Based Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1007\/s42524-025-4136-9","article-title":"Vision-language model-based human-robot collaboration for smart manufacturing: A state-of-the-art survey","volume":"12","author":"Fan","year":"2025","journal-title":"Front. Eng. Manag."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1108\/01439910910980213","article-title":"A robotic system based on fuzzy visual servoing for handling flexible sheets lying on a table","volume":"36","author":"Zacharia","year":"2009","journal-title":"Ind. Rob."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"3996","DOI":"10.1109\/LRA.2023.3279614","article-title":"Impact of ros 2 node composition in robotic systems","volume":"8","author":"Macenski","year":"2023","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_16","first-page":"102","article-title":"Real-time data exchange (RTDE) robot control integration for Fabrication Information Modeling","volume":"40","author":"Slepicka","year":"2023","journal-title":"Proceedings of the ISARC, Proceedings of the International Symposium on Automation and Robotics in Construction"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"4817","DOI":"10.1109\/TASE.2023.3302951","article-title":"Evaluating the efficiency of voice control as human machine interface in production","volume":"21","author":"Norda","year":"2023","journal-title":"IEEE Trans. Autom. Sci. Eng."},{"key":"ref_18","first-page":"1","article-title":"Speech recognition utilizing deep learning: A systematic review of the latest developments","volume":"14","author":"Sharrab","year":"2024","journal-title":"Hum.-Centric Comput. Inf. Sci."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Pallavoor, A., Jalan, A., Ballapur, S.C., Kiran, S., Anantharaman, P., and Shylaja, S. (2024). Gesture and Speech Recognition for Real-Time Multi-modal Human\u2013Robot Interaction Using Deep Learning Based Approach. Proceedings of the International Conference on Computing and Machine Learning, Springer.","DOI":"10.1007\/978-981-97-7571-2_20"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"3713","DOI":"10.1007\/s11042-022-13428-4","article-title":"Natural language processing: State of the art, current trends and challenges","volume":"82","author":"Khurana","year":"2023","journal-title":"Multimed. Tools Appl."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Thakkar, H., and Manimaran, A. (2023). Comprehensive examination of instruction-based language models: A comparative analysis of mistral-7b and llama-2-7b. Proceedings of the 2023 International Conference on Emerging Research in Computational Science (ICERCS), IEEE.","DOI":"10.1109\/ICERCS57948.2023.10434081"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Neri, F., Forlini, M., Scoccia, C., Palmieri, G., and Callegari, M. (2023). Experimental evaluation of collision avoidance techniques for collaborative robots. Appl. Sci., 13.","DOI":"10.3390\/app13052944"}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/15\/3\/49\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T05:35:39Z","timestamp":1772602539000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/15\/3\/49"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,27]]},"references-count":22,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["robotics15030049"],"URL":"https:\/\/doi.org\/10.3390\/robotics15030049","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,27]]}}}