{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,18]],"date-time":"2026-05-18T11:26:46Z","timestamp":1779103606645,"version":"3.51.4"},"reference-count":41,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2022,3,19]],"date-time":"2022-03-19T00:00:00Z","timestamp":1647648000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Science and ICT Korea","award":["IITP-2021-0-01835"],"award-info":[{"award-number":["IITP-2021-0-01835"]}]},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2021R1A2B5B03002118"],"award-info":[{"award-number":["NRF-2021R1A2B5B03002118"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.<\/jats:p>","DOI":"10.3390\/s22062378","type":"journal-article","created":{"date-parts":[[2022,3,20]],"date-time":"2022-03-20T21:37:17Z","timestamp":1647812237000},"page":"2378","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":109,"title":["Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7230-3869","authenticated-orcid":false,"given":"Apeksha","family":"Aggarwal","sequence":"first","affiliation":[{"name":"Department of Computer Science Engineering & Information Technology, Jaypee Institute of Information Technology, A 10, Sector 62, Noida 201307, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0258-6100","authenticated-orcid":false,"given":"Akshat","family":"Srivastava","sequence":"additional","affiliation":[{"name":"School of Computer Science Engineering and Technology, Bennett University, Plot Nos 8-11, TechZone 2, Greater Noida 201310, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ajay","family":"Agarwal","sequence":"additional","affiliation":[{"name":"Department of Information Technology, KIET Group of Institutions, Delhi-NCR, Meerut Road (NH-58), Ghaziabad 201206, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nidhi","family":"Chahal","sequence":"additional","affiliation":[{"name":"Nidhi Chahal, NIIT Limited, Gurugram 110019, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6475-4491","authenticated-orcid":false,"given":"Dilbag","family":"Singh","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2537-2439","authenticated-orcid":false,"given":"Abeer Ali","family":"Alnuaim","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, College of Applied Studies and Community Services, King Saud University, P.O. Box 22459, Riyadh 11495, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5301-8878","authenticated-orcid":false,"given":"Aseel","family":"Alhadlaq","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, College of Applied Studies and Community Services, King Saud University, P.O. Box 22459, Riyadh 11495, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heung-No","family":"Lee","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"14","DOI":"10.3389\/fcomp.2020.00014","article-title":"Real-time speech emotion recognition using a pre-trained image classification network: Effects of bandwidth reduction and companding","volume":"2","author":"Lech","year":"2020","journal-title":"Front. Comput. Sci."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"117327","DOI":"10.1109\/ACCESS.2019.2936124","article-title":"Speech emotion recognition using deep learning techniques: A review","volume":"7","author":"Khalil","year":"2019","journal-title":"IEEE Access"},{"key":"ref_3","first-page":"25170","article-title":"Speech Emotion Recognition using Neural Network and MLP Classifier","volume":"2020","author":"Joy","year":"2020","journal-title":"IJESC"},{"key":"ref_4","first-page":"4245","article-title":"Voice emotion recognition using CNN and decision tree","volume":"8","author":"Damodar","year":"2019","journal-title":"Int. J. Innov. Technol. Exp. Eng."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1007\/s10772-017-9396-2","article-title":"Vocal-based emotion recognition using random forests and decision tree","volume":"20","author":"Noroozi","year":"2017","journal-title":"Int. J. Speech Technol."},{"key":"ref_6","first-page":"148","article-title":"Speech Emotion Recognition Using 2D-CNN with Mel-Frequency Cepstrum Coefficients","volume":"19","author":"Eom","year":"2021","journal-title":"J. Inf. Commun. Converg. Eng."},{"key":"ref_7","first-page":"228","article-title":"Modeling the Scheduling Problem in Cellular Manufacturing Systems Using Genetic Algorithm as an Efficient Meta-Heuristic Approach","volume":"1","author":"Rezaeipanah","year":"2021","journal-title":"J. Artif. Intell. Technol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1684017","DOI":"10.1155\/2022\/1684017","article-title":"A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques","volume":"2022","author":"Krishnamoorthi","year":"2022","journal-title":"J. Healthc. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"5594267","DOI":"10.1155\/2021\/5594267","article-title":"A systematic review on harmony search algorithm: Theory, literature, and applications","volume":"2021","author":"Dubey","year":"2021","journal-title":"Math. Probl. Eng."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"4277436","DOI":"10.1155\/2022\/4277436","article-title":"AI-DRIVEN Novel Approach for Liver Cancer Screening and Prediction Using Cascaded Fully Convolutional Neural Network","volume":"2022","author":"Shukla","year":"2022","journal-title":"J. Healthc. Eng."},{"key":"ref_11","unstructured":"Weiqiao, Z., Yu, J., and Zou, Y. (2015, January 21\u201324). An experimental study of speech emotion recognition based on deep convolutional neural networks. Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), Xi\u2019an, China."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kurpukdee, N., Kasuriya, S., Chunwijitra, V., Wutiwiwatchai, C., and Lamsrichan, P. (2017, January 7\u20139). A study of support vector machines for emotional speech recognition. Proceedings of the 2017 8th International Conference of Information and Communication Technology for Embedded Systems (IC-ICTES), Chonburi, Thailand.","DOI":"10.1109\/ICTEmSys.2017.7958773"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1049\/iet-syb.2019.0116","article-title":"Efficient prediction of drug\u2013drug interaction using deep learning models","volume":"14","author":"Shukla","year":"2020","journal-title":"IET Syst. Biol."},{"key":"ref_14","first-page":"23","article-title":"A Data Transmission Approach Based on Ant Colony Optimization and Threshold Proxy Re-encryption in WSNs","volume":"2","author":"Liu","year":"2022","journal-title":"J. Artif. Intell. Technol."},{"key":"ref_15","first-page":"9","article-title":"A survey of NISQ era hybrid quantum-classical machine learning research","volume":"2","year":"2022","journal-title":"J. Artif. Intell. Technol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1109\/ACCESS.2021.3136251","article-title":"Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks","volume":"10","author":"Sultana","year":"2021","journal-title":"IEEE Access"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lee, K.H., Choi, H.K., and Jang, B.T. (2019, January 16\u201318). A study on speech emotion recognition using a deep neural network. Proceedings of the 2019 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Korea.","DOI":"10.1109\/ICTC46691.2019.8939830"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1080\/13682199.2018.1505327","article-title":"Parallel non-dominated sorting genetic algorithm-II-based image encryption technique","volume":"66","author":"Kaur","year":"2018","journal-title":"Imaging Sci. J."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Pandey, S., Shekhawat, H., and Prasanna, S. (2019, January 16\u201318). Deep Learning Techniques for Speech Emotion Recognition: A Review. Proceedings of the 2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA), Pardubice, Czech Republic.","DOI":"10.1109\/RADIOELEK.2019.8733432"},{"key":"ref_20","first-page":"3097","article-title":"Emotion Identification from Raw Speech Signals Using DNNs","volume":"2018","author":"Sarma","year":"2018","journal-title":"Interspeech"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Li, P., Song, Y., McLoughlin, I.V., Guo, W., and Dai, L.R. (2018, January 2\u20136). An attention pooling based representation learning method for speech emotion recognition. Proceedings of the ISCA Conference, Los Angeles, CA, USA.","DOI":"10.21437\/Interspeech.2018-1242"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Palo, H., Mohanty, M.N., and Chandra, M. (2015). Use of different features for emotion recognition using MLP network. Computational Vision and Robotics, Springer.","DOI":"10.1007\/978-81-322-2196-8_2"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Neumann, M., and Vu, N.T. (2017). Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. arXiv.","DOI":"10.21437\/Interspeech.2017-917"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.neunet.2017.02.013","article-title":"Evaluating deep learning architectures for Speech Emotion Recognition","volume":"92","author":"Fayek","year":"2017","journal-title":"Neural Netw."},{"key":"ref_25","first-page":"152","article-title":"Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition","volume":"2018","author":"Luo","year":"2018","journal-title":"Interspeech"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Tzinis, E., and Potamianos, A. (2017, January 23\u201326). Segment-based speech emotion recognition using recurrent neural networks. Proceedings of the 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, TX, USA.","DOI":"10.1109\/ACII.2017.8273599"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Mirsamadi, S., Barsoum, E., and Zhang, C. (2017, January 5\u20139). Automatic speech emotion recognition using recurrent neural networks with local attention. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952552"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Tao, F., and Liu, G. (2018, January 15\u201320). Advanced LSTM: A study about better time dependency modeling in emotion recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461750"},{"key":"ref_29","first-page":"336","article-title":"High-level Feature Representation using Recurrent Neural Network for Speech Emotion Recognition","volume":"2015","author":"Lee","year":"2015","journal-title":"Interspeech"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Maimon, O., and Rokach, L. (2005). Decision Trees. Data Mining and Knowledge Discovery Handbook, Springer.","DOI":"10.1007\/b107408"},{"key":"ref_31","first-page":"272","article-title":"Random forests and decision trees","volume":"9","author":"Ali","year":"2012","journal-title":"Int. J. Comput. Sci. Issues (IJCSI)"},{"key":"ref_32","first-page":"26","article-title":"Multilayer Perceptron: Architecture Optimization and Training","volume":"4","author":"Ramchoun","year":"2016","journal-title":"Int. J. Interact. Multim. Artif. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_34","unstructured":"Lok, E.J. (2021, December 16). Toronto Emotional Speech Set (TESS). Available online: https:\/\/www.kaggle.com\/ejlok1\/toronto-emotional-speech-set-tess."},{"key":"ref_35","unstructured":"Livingstone, S.R. (2021, December 06). RAVDESS Emotional Speech Audio Emotional Speech Dataset. Available online: https:\/\/www.kaggle.com\/uwrfkaggler\/ravdess-emotional-speech-audio."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1049\/cit2.12042","article-title":"Performance analysis of machine learning algorithms on automated sleep staging feature sets","volume":"6","author":"Satapathy","year":"2021","journal-title":"CAAI Trans. Intell. Technol."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1049\/cit2.12025","article-title":"Deep imitation reinforcement learning for self-driving by vision","volume":"6","author":"Zou","year":"2021","journal-title":"CAAI Trans. Intell. Technol."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1049\/cit2.12044","article-title":"Image-denoising algorithm based on improved K-singular value decomposition and atom optimization","volume":"7","author":"Chen","year":"2022","journal-title":"CAAI Trans. Intell. Technol."},{"key":"ref_39","first-page":"526","article-title":"Speech Emotion Recognition\u2019in the Wild\u2019Using an Autoencoder","volume":"2020","author":"Dissanayake","year":"2020","journal-title":"Interspeech"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Li, H., Ding, W., Wu, Z., and Liu, Z. (2020). Learning Fine-Grained Cross Modality Excitement for Speech Emotion Recognition. arXiv.","DOI":"10.21437\/Interspeech.2021-158"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"74539","DOI":"10.1109\/ACCESS.2021.3067460","article-title":"Head fusion: Improving the accuracy and robustness of speech emotion recognition on the IEMOCAP and RAVDESS dataset","volume":"9","author":"Xu","year":"2021","journal-title":"IEEE Access"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2378\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:39:32Z","timestamp":1760135972000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2378"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,19]]},"references-count":41,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["s22062378"],"URL":"https:\/\/doi.org\/10.3390\/s22062378","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,19]]}}}