{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T07:17:04Z","timestamp":1769757424463,"version":"3.49.0"},"reference-count":49,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2020,8,24]],"date-time":"2020-08-24T00:00:00Z","timestamp":1598227200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>The recent spread of low-cost and high-quality RGB-D and infrared sensors has supported the development of Natural User Interfaces (NUIs) in which the interaction is carried without the use of physical devices such as keyboards and mouse. In this paper, we propose a NUI based on dynamic hand gestures, acquired with RGB, depth and infrared sensors. The system is developed for the challenging automotive context, aiming at reducing the driver\u2019s distraction during the driving activity. Specifically, the proposed framework is based on a multimodal combination of Convolutional Neural Networks whose input is represented by depth and infrared images, achieving a good level of light invariance, a key element in vision-based in-car systems. We test our system on a recent multimodal dataset collected in a realistic automotive setting, placing the sensors in an innovative point of view, i.e., in the tunnel console looking upwards. The dataset consists of a great amount of labelled frames containing 12 dynamic gestures performed by multiple subjects, making it suitable for deep learning-based approaches. In addition, we test the system on a different well-known public dataset, created for the interaction between the driver and the car. Experimental results on both datasets reveal the efficacy and the real-time performance of the proposed method.<\/jats:p>","DOI":"10.3390\/informatics7030031","type":"journal-article","created":{"date-parts":[[2020,8,25]],"date-time":"2020-08-25T09:24:56Z","timestamp":1598347496000},"page":"31","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":28,"title":["Multimodal Hand Gesture Classification for the Human\u2013Car Interaction"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8908-6485","authenticated-orcid":false,"given":"Andrea","family":"D\u2019Eusanio","sequence":"first","affiliation":[{"name":"AIRI\u2014Artificial Intelligence Research and Innovation Center, University of Modena and Reggio Emilia, 41125 Modena, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3095-3294","authenticated-orcid":false,"given":"Alessandro","family":"Simoni","sequence":"additional","affiliation":[{"name":"DIEF\u2014Department of Engineering \u201cEnzo Ferrari\u201d, University of Modena and Reggio Emilia, 41125 Modena, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9821-2014","authenticated-orcid":false,"given":"Stefano","family":"Pini","sequence":"additional","affiliation":[{"name":"DIEF\u2014Department of Engineering \u201cEnzo Ferrari\u201d, University of Modena and Reggio Emilia, 41125 Modena, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2441-7524","authenticated-orcid":false,"given":"Guido","family":"Borghi","sequence":"additional","affiliation":[{"name":"AIRI\u2014Artificial Intelligence Research and Innovation Center, University of Modena and Reggio Emilia, 41125 Modena, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1046-6870","authenticated-orcid":false,"given":"Roberto","family":"Vezzani","sequence":"additional","affiliation":[{"name":"AIRI\u2014Artificial Intelligence Research and Innovation Center, University of Modena and Reggio Emilia, 41125 Modena, Italy"},{"name":"DIEF\u2014Department of Engineering \u201cEnzo Ferrari\u201d, University of Modena and Reggio Emilia, 41125 Modena, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2239-283X","authenticated-orcid":false,"given":"Rita","family":"Cucchiara","sequence":"additional","affiliation":[{"name":"DIEF\u2014Department of Engineering \u201cEnzo Ferrari\u201d, University of Modena and Reggio Emilia, 41125 Modena, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2020,8,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Borghi, G., Vezzani, R., and Cucchiara, R. (2016, January 4\u20138). Fast gesture recognition with multiple stream discrete HMMs on 3D skeletons. Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico.","DOI":"10.1109\/ICPR.2016.7899766"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Vidakis, N., Syntychakis, M., Triantafyllidis, G., and Akoumianakis, D. (August, January 30). Multimodal natural user interaction for multiple applications: The gesture\u2014Voice example. Proceedings of the 2012 International Conference on Telecommunications and Multimedia (TEMU), Chania, Greece.","DOI":"10.1109\/TEMU.2012.6294720"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Saba, E.N., Larson, E.C., and Patel, S.N. (2012, January 12\u201314). Dante vision: In-air and touch gesture sensing for natural surface interaction with combined depth and thermal cameras. Proceedings of the 2012 IEEE International Conference on Emerging Signal Processing Applications, Las Vegas, NV, USA.","DOI":"10.1109\/ESPA.2012.6152472"},{"key":"ref_4","unstructured":"Liu, W. (2010, January 17\u201319). Natural user interface-next mainstream product user interface. Proceedings of the 2010 IEEE 11th International Conference on Computer-Aided Industrial Design & Conceptual Design 1, Yiwu, China."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Rodr\u00edguez, N.D., Wikstr\u00f6m, R., Lilius, J., Cu\u00e9llar, M.P., and Flores, M.D.C. (2013). Understanding movement and interaction: An ontology for Kinect-based 3D depth sensors. Ubiquitous Computing and Ambient Intelligence. Context-Awareness and Context-Driven Interaction, Springer.","DOI":"10.1007\/978-3-319-03176-7_33"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Boulabiar, M.I., Burger, T., Poirier, F., and Coppin, G. (2011, January 9\u201314). A low-cost natural user interaction based on a camera hand-gestures recognizer. Proceedings of the International Conference on Human-Computer Interaction, Orlando, FL, USA.","DOI":"10.1007\/978-3-642-21605-3_24"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Villaroman, N., Rowe, D., and Swan, B. (2011, January 20\u201322). Teaching natural user interaction using OpenNI and the Microsoft Kinect sensor. Proceedings of the 2011 Conference on Information Technology Education, New York, NY, USA.","DOI":"10.1145\/2047594.2047654"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Marin, G., Dominio, F., and Zanuttigh, P. (2014, January 27\u201330). Hand gesture recognition with leap motion and kinect devices. Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France.","DOI":"10.1109\/ICIP.2014.7025313"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Mazzini, L., Franco, A., and Maltoni, D. (2019, January 9\u201313). Gesture Recognition by Leap Motion Controller and LSTM Networks for CAD-oriented Interfaces. Proceedings of the International Conference on Image Analysis and Processing, Trento, Italy.","DOI":"10.1007\/978-3-030-30642-7_17"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2213","DOI":"10.2105\/AJPH.2009.187179","article-title":"Trends in fatalities from distracted driving in the United States, 1999 to 2008","volume":"100","author":"Wilson","year":"2010","journal-title":"Am. J. Public Health"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"596","DOI":"10.1109\/TITS.2010.2092770","article-title":"Driver inattention monitoring system for intelligent vehicles: A review","volume":"12","author":"Dong","year":"2011","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1016\/0001-4575(93)90020-W","article-title":"The effect of cellular phone use upon driver attention","volume":"25","author":"McKnight","year":"1993","journal-title":"Accid. Anal. Prev."},{"key":"ref_13","unstructured":"Ranney, T.A., Garrott, W.R., and Goodman, M.J. (2001). NHTSA Driver Distraction Research: Past, Present, and Future, SAE. SAE Technical Paper."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Borghi, G., Gasparini, R., Vezzani, R., and Cucchiara, R. (2017, January 11\u201314). Embedded recurrent network for head pose estimation in car. Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA.","DOI":"10.1109\/IVS.2017.7995922"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1016\/j.aap.2006.08.013","article-title":"An on-road assessment of cognitive distraction: Impacts on drivers\u2019 visual behavior and braking performance","volume":"39","author":"Harbluk","year":"2007","journal-title":"Accid. Anal. Prev."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1037\/1076-898X.9.2.119","article-title":"Mental workload while driving: Effects on visual search, discrimination, and decision making","volume":"9","author":"Recarte","year":"2003","journal-title":"J. Exp. Psychol. Appl."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/j.ssci.2011.07.008","article-title":"Examining the relationship between driver distraction and driving errors: A discussion of theory, studies and methods","volume":"50","author":"Young","year":"2012","journal-title":"Saf. Sci."},{"key":"ref_18","first-page":"24","article-title":"Investigating the role of fatigue, sleep and sleep disorders in commercial vehicle crashes: A systematic review","volume":"22","author":"Sharwood","year":"2011","journal-title":"J. Australas. Coll. Road Saf."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Borghi, G., Frigieri, E., Vezzani, R., and Cucchiara, R. (2018, January 15\u201319). Hands on the wheel: A dataset for driver hand detection and tracking. Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi\u2019an, China.","DOI":"10.1109\/FG.2018.00090"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Manganaro, F., Pini, S., Borghi, G., Vezzani, R., and Cucchiara, R. (2019, January 9\u201313). Hand Gestures for the Human-Car Interaction: The Briareo dataset. Proceedings of the International Conference on Image Analysis and Processing, Trento, Italy.","DOI":"10.1007\/978-3-030-30645-8_51"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., and Kautz, J. (2016, January 27\u201330). Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.456"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_23","unstructured":"Weissmann, J., and Salomon, R. (1999, January 10\u201316). Gesture recognition for virtual reality applications using data gloves and neural networks. Proceedings of the IJCNN\u201999, International Joint Conference on Neural Networks, Proceedings (Cat. No. 99CH36339), Washington, DC, USA."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"724","DOI":"10.1109\/TNSRE.2019.2905658","article-title":"Hand gesture recognition and finger angle estimation via wrist-worn modified barometric pressure sensing","volume":"27","author":"Shull","year":"2019","journal-title":"IEEE Trans. Neural Syst. Rehabil. Eng."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A tutorial on hidden Markov models and selected applications in speech recognition","volume":"77","author":"Rabiner","year":"1989","journal-title":"Proc. IEEE"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1583","DOI":"10.1109\/TPAMI.2016.2537340","article-title":"Deep dynamic neural networks for multimodal gesture segmentation and recognition","volume":"38","author":"Wu","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 13\u201316). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Molchanov, P., Gupta, S., Kim, K., and Kautz, J. (2015, January 7\u201312). Hand gesture recognition with 3D convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA.","DOI":"10.1109\/CVPRW.2015.7301342"},{"key":"ref_30","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_31","unstructured":"Graves, A., and Schmidhuber, J. (2009, January 7\u201310). Offline handwriting recognition with multidimensional recurrent neural networks. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2368","DOI":"10.1109\/TITS.2014.2337331","article-title":"Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations","volume":"15","author":"Trivedi","year":"2014","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Miao, Q., Li, Y., Ouyang, W., Ma, Z., Xu, X., Shi, W., and Cao, X. (2017, January 22\u201329). Multimodal gesture recognition based on the resc3d network. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.360"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Cho, K., Van Merri\u00ebnboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Boulahia, S.Y., Anquetil, E., Multon, F., and Kulpa, R. (December, January 28). Dynamic hand gesture recognition based on 3D pattern assembled trajectories. Proceedings of the 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), Montreal, QC, Canada.","DOI":"10.1109\/IPTA.2017.8310146"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Escalera, S., Bar\u00f3, X., Gonzalez, J., Bautista, M.A., Madadi, M., Reyes, M., Ponce-L\u00f3pez, V., Escalante, H.J., Shotton, J., and Guyon, I. (2014, January 6\u201312). Chalearn looking at people challenge 2014: Dataset and results. Proceedings of the Workshop at the ECCV, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-16178-5_32"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_39","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). Pytorch: An imperative style, high-performance deep learning library. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Robbins, H., and Monro, S. (1951). A stochastic approximation method. Ann. Math. Stat., 400\u2013407.","DOI":"10.1214\/aoms\/1177729586"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1214\/aoms\/1177729392","article-title":"Stochastic estimation of the maximum of a regression function","volume":"23","author":"Kiefer","year":"1952","journal-title":"Ann. Math. Stat."},{"key":"ref_42","unstructured":"Sutskever, I., Martens, J., Dahl, G., and Hinton, G. (2013, January 17\u201319). On the importance of initialization and momentum in deep learning. Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA."},{"key":"ref_43","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_44","unstructured":"Zhang, Z., and Sabuncu, M. (2018, January 3\u20138). Generalized cross entropy loss for training deep neural networks with noisy labels. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_45","unstructured":"Simonyan, K., and Zisserman, A. (2014, January 8\u201313). Two-stream convolutional networks for action recognition in videos. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Pini, S., Ahmed, O.B., Cornia, M., Baraldi, L., Cucchiara, R., and Huet, B. (2017, January 13\u201317). Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild. Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK.","DOI":"10.1145\/3136755.3143006"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Gao, Q., Ogenyi, U.E., Liu, J., Ju, Z., and Liu, H. (2019, January 11\u201313). A two-stream CNN framework for American sign language recognition based on multimodal data fusion. Proceedings of the UK Workshop on Computational Intelligence, Portsmouth, UK.","DOI":"10.1007\/978-3-030-29933-0_9"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1109\/MSP.2017.2738401","article-title":"Deep multimodal learning: A survey on recent advances and trends","volume":"34","author":"Ramachandram","year":"2017","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Sarbolandi, H., Lefloch, D., and Kolb, A. (2015). Kinect range sensing: Structured-light versus Time-of-Flight Kinect. Computer Vision and Image Understanding, Elsevier.","DOI":"10.1016\/j.cviu.2015.05.006"}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/7\/3\/31\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:06:02Z","timestamp":1760177162000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/7\/3\/31"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,24]]},"references-count":49,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2020,9]]}},"alternative-id":["informatics7030031"],"URL":"https:\/\/doi.org\/10.3390\/informatics7030031","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,24]]}}}