{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T16:41:54Z","timestamp":1771951314120,"version":"3.50.1"},"reference-count":24,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2025,2,5]],"date-time":"2025-02-05T00:00:00Z","timestamp":1738713600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Emotion recognition in speech has gained increasing relevance in recent years, enabling more personalized interactions between users and automated systems. This paper presents the development of a dataset of features obtained from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) to classify emotions in speech. The paper highlights audio processing techniques such as silence removal and framing to extract features from the recordings. The features are extracted from the audio signals using spectral techniques, time-domain analysis, and the discrete wavelet transform. The resulting dataset is used to train a neural network and the support vector machine learning algorithm. Cross-validation is employed for model training. The developed models were optimized using a software package that performs hyperparameter tuning to improve results. Finally, the emotional classification outcomes were compared. The results showed an emotion classification accuracy of 0.654 for the perceptron neural network and 0.724 for the support vector machine algorithm, demonstrating satisfactory performance in emotion classification.<\/jats:p>","DOI":"10.3390\/computation13020039","type":"journal-article","created":{"date-parts":[[2025,2,5]],"date-time":"2025-02-05T10:09:52Z","timestamp":1738750192000},"page":"39","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Developing a Dataset of Audio Features to Classify Emotions in Speech"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-7023-2921","authenticated-orcid":false,"given":"Alvaro A.","family":"Colunga-Rodriguez","sequence":"first","affiliation":[{"name":"Tecnol\u00f3gico Nacional de M\u00e9xico\/Matamoros, Heroica Matamoros 87490, Mexico"},{"name":"Tecnol\u00f3gico Nacional de M\u00e9xico\/Cenidet, Cuernavaca 62493, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1071-8599","authenticated-orcid":false,"given":"Alicia","family":"Mart\u00ednez-Rebollar","sequence":"additional","affiliation":[{"name":"Tecnol\u00f3gico Nacional de M\u00e9xico\/Cenidet, Cuernavaca 62493, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1466-7581","authenticated-orcid":false,"given":"Hugo","family":"Estrada-Esquivel","sequence":"additional","affiliation":[{"name":"Tecnol\u00f3gico Nacional de M\u00e9xico\/Cenidet, Cuernavaca 62493, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3195-9540","authenticated-orcid":false,"given":"Eddie","family":"Clemente","sequence":"additional","affiliation":[{"name":"Tecnol\u00f3gico Nacional de M\u00e9xico\/Cenidet, Cuernavaca 62493, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1793-4921","authenticated-orcid":false,"given":"Odette A.","family":"Pliego-Mart\u00ednez","sequence":"additional","affiliation":[{"name":"Tecnol\u00f3gico Nacional de M\u00e9xico\/Cenidet, Cuernavaca 62493, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1080\/02699939208411068","article-title":"An argument for basic emotions","volume":"6","author":"Ekman","year":"1992","journal-title":"Cogn. Emot."},{"key":"ref_2","unstructured":"(2024, March 27). APA Dictionary of Psychology. Available online: https:\/\/dictionary.apa.org."},{"key":"ref_3","unstructured":"World Health Organization (2021). World Health Organization Suicide Worldwide in 2019: Global Health Estimates, World Health Organization."},{"key":"ref_4","unstructured":"(2024, March 19). INEGI Salud Mental. Available online: https:\/\/www.inegi.org.mx\/temas\/salud\/."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Chandraprabha, K.S., Shwetha, A.N., Kavitha, M., and Sumathi, R. (2021, January 4). Real Time-Employee Emotion Detection System (RtEED) Using Machine Learning. Proceedings of the 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India.","DOI":"10.1109\/ICICV50876.2021.9388510"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1040","DOI":"10.5829\/IJE.2023.36.06C.02","article-title":"Real Time Emotion Recognition with AD8232 ECG Sensor for Classwise Performance Evaluation of Machine Learning Methods","volume":"36","author":"Patil","year":"2023","journal-title":"Int. J. Eng."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Issa, D., Demirci, M.F., and Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control, 59.","DOI":"10.1016\/j.bspc.2020.101894"},{"key":"ref_8","unstructured":"Gaoxiang, C., Yuankai, Q., Liang, L., Beheshti, A., Zhedong, Z., Hengel, A., Ming-Hsuan, Y., Chenggang, Y., and Qingming, H. (2024, January 11). StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing. Proceedings of the Findings of the Association for Computational Linguistics, Bangkok, Thailand."},{"key":"ref_9","unstructured":"Henkel, A.P., Bromuri, S., and Waelbers, B.M.L. (2022, January 18\u201323). Comparing Neural Networks for Speech Emotion Recognition in Customer Service Interactions. Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Tariq, Z., Shah, S.K., and Lee, Y. (2019, January 9\u201312). Speech Emotion Detection using IoT based Deep Learning for Health Care. Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA.","DOI":"10.1109\/BigData47090.2019.9005638"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhili, S., and Xue, W. (2022, January 24). Real-Time Assessment of Student Concentration Based on Attention and Speech Emotion Recognition. Proceedings of the 2022 7th International Conference on Intelligent Informatics and Biomedical Science (ICIIBMS), Nara, Japan.","DOI":"10.1109\/ICIIBMS55689.2022.9971476"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"107172","DOI":"10.1016\/j.compeleceng.2021.107172","article-title":"A new proposed statistical feature extraction method in speech emotion recognition","volume":"93","author":"Abdulmohsin","year":"2021","journal-title":"Comput. Electr. Eng."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Caihua, C. (2019, January 12\u201314). Research on Multi-Modal Mandarin Speech Emotion Recognition Based on SVM. Proceedings of the 2019 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China.","DOI":"10.1109\/ICPICS47731.2019.8942545"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Livingstone, S.R., and Russo, F.A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0196391"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2533","DOI":"10.1016\/j.procs.2023.01.227","article-title":"Speech Emotion Recognition System Using Gender Dependent Convolution Neural Network","volume":"218","author":"Singh","year":"2023","journal-title":"Procedia Comput. Sci."},{"key":"ref_16","unstructured":"Mahmud, M., Vassanelli, S., Kaiser, M.S., and Zhong, N. (2020). Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network. Brain Informatics, Padua, Italy, 19 September 2020, Springer."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Luna-Jim\u00e9nez, C., Griol, D., Callejas, Z., Kleinlein, R., Montero, J.M., and Fern\u00e1ndez-Mart\u00ednez, F. (2021). Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors, 21.","DOI":"10.3390\/s21227665"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1007\/s10772-023-10037-w","article-title":"Application of Probabilistic Neural Network for Speech Emotion Recognition","volume":"27","author":"Deshmukh","year":"2023","journal-title":"Int. J. Speech Technol."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"41309","DOI":"10.1007\/s11042-022-12411-3","article-title":"Emotion Detection from Multilingual Audio Using Deep Analysis. Multimed","volume":"81","author":"Bhattacharya","year":"2022","journal-title":"Tools Appl."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"8663","DOI":"10.1007\/s11042-023-16036-y","article-title":"Machine Learning Approach of Speech Emotions Recognition Using Feature Fusion Technique. Multimed","volume":"83","author":"Paul","year":"2024","journal-title":"Tools Appl."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1007\/s12530-023-09550-9","article-title":"Speech Emotion Classification Using Feature-Level and Classifier-Level Fusion","volume":"15","author":"Mishra","year":"2024","journal-title":"Evol. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019, January 25). Optuna: A Next-Generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330701"},{"key":"ref_23","unstructured":"Bishop, C.M. (2006). Pattern Recognition and Machine Learning; Information Science and Statistics, Springer."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chen, Q., Tan, M., Qi, Y., Zhou, J., Li, Y., and Wu, Q. (2022, January 19\u201324). V2C: Visual Voice Cloning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.02056"}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/2\/39\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:27:20Z","timestamp":1760027240000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/2\/39"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,5]]},"references-count":24,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["computation13020039"],"URL":"https:\/\/doi.org\/10.3390\/computation13020039","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,5]]}}}