{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T19:37:50Z","timestamp":1781811470684,"version":"3.54.5"},"reference-count":54,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2020,4,23]],"date-time":"2020-04-23T00:00:00Z","timestamp":1587600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"The European Commission Marie Sk\u0142odowska-Curie SMOOTH project","award":["H2020-MSCA-RISE-2016-734875"],"award-info":[{"award-number":["H2020-MSCA-RISE-2016-734875"]}]},{"name":"CCCDI \u2013 UEFISCDI Multi-MonD2 Project, Multi-Agent Intelligent Systems Platform for Water Quality Monitoring on the Romanian Danube and Danube Delta","award":["PN-III-P1-1.2-PCCDI2017-0637\/33PCCDI\/01.03.2018"],"award-info":[{"award-number":["PN-III-P1-1.2-PCCDI2017-0637\/33PCCDI\/01.03.2018"]}]},{"name":"Contract no. 22 PCCDI \/2018, within PNCDI III.","award":["PN-III-P1-1.2-PCCDI-2017-0086"],"award-info":[{"award-number":["PN-III-P1-1.2-PCCDI-2017-0086"]}]},{"name":"Yanshan University: \u201cJoint Laboratory of Intelligent Rehabilitation Robot\u201d","award":["KY201501009, Collaborative research agreement between Yanshan University, China and Romanian Academy by IMSAR, RO"],"award-info":[{"award-number":["KY201501009, Collaborative research agreement between Yanshan University, China and Romanian Academy by IMSAR, RO"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The interaction between humans and an NAO robot using deep convolutional neural networks (CNN) is presented in this paper based on an innovative end-to-end pipeline method that applies two optimized CNNs, one for face recognition (FR) and another one for the facial expression recognition (FER) in order to obtain real-time inference speed for the entire process. Two different models for FR are considered, one known to be very accurate, but has low inference speed (faster region-based convolutional neural network), and one that is not as accurate but has high inference speed (single shot detector convolutional neural network). For emotion recognition transfer learning and fine-tuning of three CNN models (VGG, Inception V3 and ResNet) has been used. The overall results show that single shot detector convolutional neural network (SSD CNN) and faster region-based convolutional neural network (Faster R-CNN) models for face detection share almost the same accuracy: 97.8% for Faster R-CNN on PASCAL visual object classes (PASCAL VOCs) evaluation metrics and 97.42% for SSD Inception. In terms of FER, ResNet obtained the highest training accuracy (90.14%), while the visual geometry group (VGG) network had 87% accuracy and Inception V3 reached 81%. The results show improvements over 10% when using two serialized CNN, instead of using only the FER CNN, while the recent optimization model, called rectified adaptive moment optimization (RAdam), lead to a better generalization and accuracy improvement of 3%-4% on each emotion recognition CNN.<\/jats:p>","DOI":"10.3390\/s20082393","type":"journal-article","created":{"date-parts":[[2020,4,23]],"date-time":"2020-04-23T10:46:22Z","timestamp":1587638782000},"page":"2393","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":87,"title":["Facial Expressions Recognition for Human\u2013Robot Interaction Using Deep Convolutional Neural Networks with Rectified Adam Optimizer"],"prefix":"10.3390","volume":"20","author":[{"given":"Daniel Octavian","family":"Melinte","sequence":"first","affiliation":[{"name":"Department of Robotics and Mechatronics, Romanian Academy Institute of Solid Mechanics, 010141 Bucharest, Romania"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Luige","family":"Vladareanu","sequence":"additional","affiliation":[{"name":"Department of Robotics and Mechatronics, Romanian Academy Institute of Solid Mechanics, 010141 Bucharest, Romania"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2020,4,23]]},"reference":[{"key":"ref_1","unstructured":"Lopez-Rincon, A. (March, January 27). Emotion recognition using facial expressions in children using the NAO Robot. Proceedings of the International Conference on Electronics, Communications and Computers (CONIELECOMP), Cholula, Mexico."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Faria, D.R., Vieira, M., and Faria, F.C. (2017, January 21\u201323). Towards the development of affective facial expression recognition for human-robot interaction. Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments, Island of Rhodes, Greece.","DOI":"10.1145\/3056540.3076199"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1007\/s11263-017-1055-1","article-title":"From facial expression recognition to interpersonal relation prediction","volume":"126","author":"Zhang","year":"2018","journal-title":"Int. J. Comput. Vis."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhao, X., Liang, X., Liu, L., Li, T., Han, Y., Vasconcelos, N., and Yan, S. (2016, January 11\u201314). Peak-piloted deep network for facial expression recognition. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_27"},{"key":"ref_5","unstructured":"Ding, H., Zhou, S.K., and Chellappa, R. (June, January 30). Facenet2expnet: Regularizing a deep face recognition net for expression recognition. Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition, Washington, DC, USA."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ng, H.W., Nguyen, V.D., Vonikakis, V., and Winkler, S. (2015, January 9\u201313). Deep learning for emotion recognition on small datasets using transfer learning. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA.","DOI":"10.1145\/2818346.2830593"},{"key":"ref_7","first-page":"16","article-title":"Convolutional neural network for facial expression recognition","volume":"36","author":"Lu","year":"2016","journal-title":"J. Nanjing Univ. Posts Telecommun."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zeng, J., Shan, S., and Chen, X. (2018, January 8\u201314). Facial expression recognition with inconsistently annotated datasets. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_14"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Levi, G., and Hassner, T. (2015, January 9\u201313). Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA.","DOI":"10.1145\/2818346.2830587"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/j.procs.2016.07.233","article-title":"Automatic facial expression recognition using DCNN","volume":"93","author":"Mayya","year":"2016","journal-title":"Procedia Comput. Sci."},{"key":"ref_11","unstructured":"Masi, I., Wu, Y., Hassner, T., and Natarajan, P. (November, January 29). Deep face recognition: A survey. Proceedings of the 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Paran\u00e1, Brazil."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., and Matthews, I. (2010, January 13\u201318). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543262"},{"key":"ref_13","unstructured":"Lyons, M., Akamatsu, S., Kamachi, M., and Gyoba, J. (1998, January 14\u201316). Coding facial expressions with gabor wavelets. Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., and Lee, D.H. (2013, January 3\u20137). Challenges in representation learning: A report on three machine learning contests. Proceedings of the International Conference on Neural Information Processing, Daegu, Korea.","DOI":"10.1007\/978-3-642-42051-1_16"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/TAFFC.2017.2740923","article-title":"Affectnet: A database for facial expression, valence, and arousal computing in the wild","volume":"10","author":"Mollahosseini","year":"2017","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_16","unstructured":"Pantic, M., Valstar, M., Rademaker, R., and Maat, L. (2005, January 6\u201310). Web-based database for facial expression analysis. Proceedings of the IEEE international Conference on Multimedia and Expo, London, UK."},{"key":"ref_17","unstructured":"Valstar, M., and Pantic, M. (June, January 30). Induced disgust, happiness and surprise: An addition to the mmi facial expression database. Proceedings of the 3rd Intern. Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, Valetta, Malta."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1109\/MMUL.2012.26","article-title":"Collecting large, richly annotated facial-expression databases from movies","volume":"1","author":"Dhall","year":"2012","journal-title":"IEEE Multimed."},{"key":"ref_19","first-page":"2","article-title":"The Karolinska directed emotional faces (KDEF)","volume":"91","author":"Lundqvist","year":"1998","journal-title":"CD ROM Dep. Clin. Neurosci. Psychol. Sect. Karolinska Inst."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yang, H., Ciftci, U., and Yin, L. (2018, January 18\u201322). Facial expression recognition by de-expression residue learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00231"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Hamester, D., Barros, P., and Wermter, S. (2015, January 12\u201317). Face expression recognition with a 2-channel convolutional neural network. Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland.","DOI":"10.1109\/IJCNN.2015.7280539"},{"key":"ref_22","unstructured":"Pramerdorfer, C., and Kampel, M. (2016). Facial expression recognition using convolutional neural networks: State of the art. arXiv."},{"key":"ref_23","unstructured":"Tang, Y. (2013). Deep learning using linear support vector machines. arXiv."},{"key":"ref_24","unstructured":"Kim, B.-K., Dong, S.-Y., Roh, J., Kim, G., and Lee, S.-Y. (July, January 26). Fusing aligned and non-aligned face information for automatic affect recognition in the wild: A deep learning approach. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA."},{"key":"ref_25","unstructured":"Minaee, S., and Abdolrashidi, A. (2019). Deep-emotion: Facial expression recognition using attentional convolutional network. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"24321","DOI":"10.1109\/ACCESS.2019.2900231","article-title":"HERO: Human emotions recognition for realizing intelligent Internet of Things","volume":"7","author":"Hua","year":"2019","journal-title":"IEEE Access"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Connie, T., Al-Shabi, M., Cheah, W.P., and Goh, M. (2017, January 20\u201322). Facial expression recognition using a hybrid CNN\u2013SIFT aggregator. Proceedings of the International Workshop on Multi-Disciplinary Trends in Artificial Intelligence, Gadong, Brunei.","DOI":"10.1007\/978-3-319-69456-6_12"},{"key":"ref_28","unstructured":"(2019, August 30). Emotion-Compilation. Available online: https:\/\/www.kaggle.com\/qnkhuat\/emotion-compilation."},{"key":"ref_29","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_31","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Taigman, Y., Yang, M., Ranzato, M.A., and Wolf, L. (2014, January 23\u201328). Deepface: Closing the gap to human-level performance in face verification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.220"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Schroff, F., Kalenichenko, D., and Philbin, J. (2015, January 7\u201312). Facenet: A unified embedding for face recognition and clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"ref_34","first-page":"6","article-title":"Deep face recognition","volume":"1","author":"Parkhi","year":"2015","journal-title":"BMVC"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., and Song, L. (2017, January 21\u201326). Sphereface: Deep hypersphere embedding for face recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.713"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gal, I.A., Bucur, D., and Vladareanu, L. (2018). DSmT decision-making algorithms for finding grasping configurations of robot dexterous hands. Symmetry, 10.","DOI":"10.3390\/sym10060198"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yan, H., Wang, H., Vladareanu, L., Lin, M., Vladareanu, V., and Li, Y. (2019). Detection of Participation and Training Task Difficulty Applied to the Multi-Sensor Systems of Rehabilitation Robots. Sensors, 19.","DOI":"10.3390\/s19214681"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Feng, Y., Wang, H., Vladareanu, L., Chen, Z., and Jin, D. (2019). New Motion Intention Acquisition Method of Lower Limb Rehabilitation Robot Based on Static Torque Sensors. Sensors, 19.","DOI":"10.3390\/s19153439"},{"key":"ref_39","first-page":"267","article-title":"Research on upper limb biomechanical system","volume":"7","author":"Iliescu","year":"2019","journal-title":"Period. Eng. Nat. Sci."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Wang, H., Zhang, D., Lu, H., Feng, Y., Xu, P., Mihai, R.V., and Vladareanu, L. (2015, January 22\u201324). Active training research of a lower limb rehabilitation robot based on constrained trajectory. Proceedings of the IEEE International Conference on Advanced Mechatronic Systems (ICAMechS), Beijing, China.","DOI":"10.1109\/ICAMechS.2015.7287123"},{"key":"ref_41","first-page":"9","article-title":"Generalization of Neutrosophic Rings and Neutrosophic Fields","volume":"5","author":"Ali","year":"2014","journal-title":"Neutrosophic Sets Syst."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Smarandache, F., and Vl\u0103d\u0103reanu, L. (2011, January 8\u201310). Applications of neutrosophic logic to robotics. Proceedings of the IEEE International Conference on Granular Computing, Kaohsiung, Taiwan.","DOI":"10.1109\/GRC.2011.6122666"},{"key":"ref_43","first-page":"43","article-title":"Theory and Application of Extension Hybrid Force-Position Control in Robotics","volume":"76","author":"Vladareanu","year":"2014","journal-title":"Univ. Politeh. Buchar. Sci. Bull.-Ser. A-Appl. Math. Phys."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/j.procs.2015.09.115","article-title":"The optimization of intelligent control interfaces using Versatile Intelligent Portable Robot Platform","volume":"65","author":"Vladareanu","year":"2015","journal-title":"Procedia Comput. Sci."},{"key":"ref_45","unstructured":"Vladareanu, L., Tont, G., Ion, I., Velea, L.M., Gal, A., and Melinte, O. (2010, January 16\u201319). Fuzzy dynamic modeling for walking modular robot control. Proceedings of the 9th International Conference on Application of Electrical Engineering, Prague, Czech Republic."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"409","DOI":"10.24846\/v24i4y201505","article-title":"Versatile Intelligent Portable Robot Platform applied to dynamic control of the walking robots","volume":"24","author":"Vladareanu","year":"2015","journal-title":"Stud. Inform. Control"},{"key":"ref_47","unstructured":"Vladareanu, L., Tont, G., Vladareanu, V., Smarandache, F., and Capitanu, L. (2012, January 18\u201321). The navigation mobile robot systems using Bayesian approach through the virtual projection method. Proceedings of the IEEE the 2012 International Conference on Advanced Mechatronic Systems, Tokyo, Japan."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_49","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, Mit Press."},{"key":"ref_50","unstructured":"Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., and Duerig, T. (2018). The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv."},{"key":"ref_51","unstructured":"(2019, September 10). Open Images Dataset V6. Available online: https:\/\/storage.googleapis.com\/openimages\/web\/download_v4.html."},{"key":"ref_52","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (July, January 26). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_53","unstructured":"Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., and Han, J. (2019). On the variance of the adaptive learning rate and beyond. arXiv, Available online: http:\/\/doc.aldebaran.com\/1-14\/index.html."},{"key":"ref_54","unstructured":"(2020, January 20). NAO Software 1.14.5 Documentation. Available online: http:\/\/doc.aldebaran.com\/1-14\/index.html."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/8\/2393\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T13:23:33Z","timestamp":1760361813000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/8\/2393"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,23]]},"references-count":54,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2020,4]]}},"alternative-id":["s20082393"],"URL":"https:\/\/doi.org\/10.3390\/s20082393","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,4,23]]}}}