{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T16:03:21Z","timestamp":1764691401278,"version":"3.46.0"},"reference-count":30,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,1]],"date-time":"2025-12-01T00:00:00Z","timestamp":1764547200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Effective communication between deaf\u2013mute and visually impaired individuals remains a challenge in the fields of human\u2013computer interaction and accessibility technology. Current solutions mostly rely on single-modal recognition, which often leads to issues such as semantic ambiguity and loss of emotional information. To address these challenges, this study proposes a lightweight multimodal fusion framework that combines gestures and micro-expressions, which are then processed through a recognition network and a speech synthesis module. The core innovations of this research are as follows: (1) a lightweight YOLOv5s improvement structure that integrates residual modules and efficient downsampling modules, which reduces the model complexity and computational overhead while maintaining high accuracy; (2) a multimodal fusion method based on an attention mechanism, which adaptively and efficiently integrates complementary information from gestures and micro-expressions, significantly improving the semantic richness and accuracy of joint recognition; (3) an end-to-end real-time system that outputs the visual recognition results through a high-quality text-to-speech module, completing the closed-loop from \u201cvisual signal\u201d to \u201cspeech feedback\u201d. We conducted evaluations on the publicly available hand gesture dataset HaGRID and a curated micro-expression image dataset. The results show that, for the joint gesture and micro-expression tasks, our proposed multimodal recognition system achieves a multimodal joint recognition accuracy of 95.3%, representing a 4.5% improvement over the baseline model. The system was evaluated in a locally deployed environment, achieving a real-time processing speed of 22 FPS, with a speech output latency below 0.8 s. The mean opinion score (MOS) reached 4.5, demonstrating the effectiveness of the proposed approach in breaking communication barriers between the hearing-impaired and visually impaired populations.<\/jats:p>","DOI":"10.3390\/computation13120277","type":"journal-article","created":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T15:31:17Z","timestamp":1764689477000},"page":"277","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Research on YOLOv5s-Based Multimodal Assistive Gesture and Micro-Expression Recognition with Speech Synthesis"],"prefix":"10.3390","volume":"13","author":[{"given":"Xiaohua","family":"Li","sequence":"first","affiliation":[{"name":"School of Engineering, King Mongkut\u2019s Institute of Technology Ladkrabang, Bangkok 10520, Thailand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1451-4278","authenticated-orcid":false,"given":"Chaiyan","family":"Jettanasen","sequence":"additional","affiliation":[{"name":"School of Engineering, King Mongkut\u2019s Institute of Technology Ladkrabang, Bangkok 10520, Thailand"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"135373","DOI":"10.1109\/ACCESS.2025.3593428","article-title":"Survey on Hand Gesture Recognition from Visual Input","volume":"13","author":"Linardakis","year":"2025","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Priyaa, V.G., and Rajeswari, A. (2025, January 27\u201328). Indian Sign Language Recognition Using Weighted Motion History RGB Images Through Modified LeNet-5. Proceedings of the 2025 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI), Kannur, India.","DOI":"10.1109\/APCI65531.2025.11137097"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Bhoyar, C., Jain, G., Singh, G., Bhargava, H., and Jain, S. (2025, January 11\u201313). Sign Language Interpreter using Long Short-Term Memory (LSTM). Proceedings of the 2025 4th International Conference on Advances in Computing, Communication, Embedded and Secure Systems (ACCESS), Ernakulam, India.","DOI":"10.1109\/ACCESS65134.2025.11135621"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Tanzeela, S., Magotra, A., and Singh, S. (2025, January 5\u20137). A Real Time Deep Learning Model for Sign Language Interpretation. Proceedings of the 2025 International Conference on Electronics, AI and Computing (EAIC), Jalandhar, India.","DOI":"10.1109\/EAIC66483.2025.11101507"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Paulino, N., Oliveira, M., Ribeiro, F., Outeiro, L., and Pessoa, L.M. (2025, January 3\u20136). Human Activity Recognition with a Reconfigurable Intelligent Surface for Wi-Fi 6E. Proceedings of the 2025 Joint European Conference on Networks and Communications & 6G Summit (EuCNC\/6G Summit), Poznan, Poland.","DOI":"10.1109\/EuCNC\/6GSummit63408.2025.11036889"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"123258","DOI":"10.1016\/j.eswa.2024.123258","article-title":"Spatial\u2013temporal feature-based end-to-end Fourier network for 3D sign language recognition","volume":"248","author":"Abdullahi","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1109\/TMM.2021.3070438","article-title":"A comprehensive study on deep learning-based methods for sign language recognition","volume":"24","author":"Adaloglou","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ahmed, I.T., Gwad, W.H., Hammad, B.T., and Alkayal, E. (2025). Enhancing hand gesture image recognition by integrating various feature groups. Technologies, 13.","DOI":"10.3390\/technologies13040164"},{"key":"ref_9","unstructured":"Al-Barham, M., Alsharkawi, A., Al-Yaman, M., Al-Fetyani, M., Elnagar, A., SaAleek, A.A., and Al-Odat, M. (2023). RGB Arabic alphabets sign language dataset. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Al Farid, F., Hashim, N., Abdullah, J., Bhuiyan, M.R., Isa, W.N.S.M., Uddin, J., Haque, M.A., and Husen, M.N. (2022). A structured and methodological review on vision-based hand gesture recognition system. J. Imag., 8.","DOI":"10.3390\/jimaging8060153"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Alabduallah, B., Al Dayil, R., Alkharashi, A., and Alneil, A.A. (2025). Innovative hand pose based sign language recognition using hybrid Metaheuristic optimization algorithms with deep learning model for hearing impaired persons. Sci. Rep., 15.","DOI":"10.1038\/s41598-025-93559-4"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"7609","DOI":"10.1007\/s00521-024-09503-6","article-title":"Real-time sign language recognition based on YOLO algorithm","volume":"36","author":"Alaftekin","year":"2024","journal-title":"Neural Comput. Appl."},{"key":"ref_13","first-page":"100504","article-title":"A survey on sign language literature","volume":"14","author":"Alaghband","year":"2023","journal-title":"Mach. Learn Appl."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2014","DOI":"10.1007\/s12559-023-10174-z","article-title":"Snapture\u2014A novel neural architecture for combined static and dynamic hand gesture recognition","volume":"15","author":"Ali","year":"2023","journal-title":"Cognit. Comput."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Andriluka, M., Iqbal, U., Insafutdinov, E., Pishchulin, L., Milan, A., Gall, J., and Schiele, B. (2018, January 18\u201323). PoseTrack: A benchmark for human pose estimation and tracking. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00542"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"44858","DOI":"10.1109\/ACCESS.2024.3471613","article-title":"Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism","volume":"13","author":"Zhang","year":"2025","journal-title":"IEEE Access"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Nwosu, K.C., Gaines, K., Abdulgader, M., and Hu, Y.-H. (2024, January 17\u201319). Enhancing Patient-Provider Interaction by Leveraging AI for Facial Emotion Recognition in Healthcare. Proceedings of the 2024 International Conference on Computer and Applications (ICCA), Cairo, Egypt.","DOI":"10.1109\/ICCA62237.2024.10927903"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Taco-Jimenez, A., and Suni-Lopez, F. (2024, January 6\u20138). Self-Adaptation of Software Services Based on User Profile. Proceedings of the 2024 IEEE XXXI International Conference on Electronics, Electrical Engineering and Computing (INTERCON), Lima, Peru.","DOI":"10.1109\/INTERCON63140.2024.10833493"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Sutar, M.B., and Ambhaikar, A. (2023, January 17\u201319). A Comparative Study on Deep Facial Expression Recognition. Proceedings of the 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India.","DOI":"10.1109\/ICICCS56967.2023.10142703"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Agarwal, A., and Susan, S. (2023, January 3\u20135). Emotion Recognition from Masked Faces using Inception-v3. Proceedings of the 2023 5th International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, India.","DOI":"10.1109\/RAIT57693.2023.10126777"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Stanciu, L., and Albu, A. (2019, January 21\u201323). Analysis on Emotion Detection and Recognition Methods using Facial Microexpressions. A Review. Proceedings of the 2019 E-Health and Bioengineering Conference (EHB), Iasi, Romania.","DOI":"10.1109\/EHB47216.2019.8969925"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"100372","DOI":"10.1016\/j.imu.2020.100372","article-title":"Development of a Real-Time Emotion Recognition System Using Facial Expressions and EEG based on machine learning and deep neural network methods","volume":"20","author":"Hassouneh","year":"2020","journal-title":"Inform. Med. Unlocked"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ulrich, L., Carmassi, G., Garelli, P., Lo Presti, G., Ramondetti, G., Marullo, G., Innocente, C., and Vezzetti, E. (2024). SIGNIFY: Leveraging Machine Learning and Gesture Recognition for Sign Language Teaching Through a Serious Game. Future Internet, 16.","DOI":"10.3390\/fi16120447"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Pranav, E., Kamal, S., Chandran, C.S., and Supriya, M.H. (2020, January 6\u20137). Facial Emotion Recognition Using Deep Convolutional Neural Network. Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India.","DOI":"10.1109\/ICACCS48705.2020.9074302"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ranjan, R., and Sahana, B.C. (2019, January 10\u201312). An Efficient Facial Feature Extraction Method Based Supervised Classification Model for Human Facial Emotion Identification. Proceedings of the 2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Ajman, United Arab Emirates.","DOI":"10.1109\/ISSPIT47144.2019.9001839"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wu, T., Fu, S., and Yang, G. (2012, January 11\u201314). Survey of the Facial Expression Recognition Research. Proceedings of the Advances in Brain Inspired Cognitive Systems (BICS 2012), Shenyang, China.","DOI":"10.1007\/978-3-642-31561-9_44"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Thakur, N., Cui, S., Khanna, K., Knieling, V., Duggal, Y.N., and Shao, M. (2023). Investigation of the Gender-Specific Discourse about Online Learning during COVID-19 on Twitter Using Sentiment Analysis, Subjectivity Analysis, and Toxicity Analysis. Computers, 12.","DOI":"10.20944\/preprints202310.0157.v1"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1080\/02650487.2021.1997455","article-title":"Gender effects in influencer marketing: An experimental study on the efficacy of endorsements by same- vs. other-gender social media influencers on Instagram","volume":"41","author":"Hudders","year":"2021","journal-title":"Int. J. Advert."},{"key":"ref_29","unstructured":"Kapitanov, A., Kvanchiani, K., Nagaev, A., Kraynov, R., and Makhliarchuk, A. (2024, January 3\u20138). HaGRID\u2014HAnd Gesture Recognition Image Dataset. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA."},{"key":"ref_30","unstructured":"Forster, J., Schmidt, C., Hoyoux, T., Koller, O., Zelle, U., Piater, J., and Ney, H. (2012, January 23\u201325). RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus. Proceedings of the Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/12\/277\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T16:00:26Z","timestamp":1764691226000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/12\/277"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,1]]},"references-count":30,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["computation13120277"],"URL":"https:\/\/doi.org\/10.3390\/computation13120277","relation":{},"ISSN":["2079-3197"],"issn-type":[{"type":"electronic","value":"2079-3197"}],"subject":[],"published":{"date-parts":[[2025,12,1]]}}}