{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T12:18:15Z","timestamp":1773317895002,"version":"3.50.1"},"publisher-location":"Cham","reference-count":34,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031333798","type":"print"},{"value":"9783031333804","type":"electronic"}],"license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,27]],"date-time":"2023-05-27T00:00:00Z","timestamp":1685145600000},"content-version":"vor","delay-in-days":146,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Emotion recognition (ER) from speech signals is a robust approach since it cannot be imitated like facial expression or text based sentiment analysis. Valuable information underlying the emotions are significant for human-computer interactions enabling intelligent machines to interact with sensitivity in the real world. Previous ER studies through speech signal processing have focused exclusively on associations between different signal mode decomposition methods and hidden informative features. However, improper decomposition parameter selections lead to informative signal component losses due to mode duplicating and mixing. In contrast, the current study proposes VGG-optiVMD, an empowered variational mode decomposition algorithm, to distinguish meaningful speech features and automatically select the number of decomposed modes and optimum balancing parameter for the data fidelity constraint by assessing their effects on the VGG16 flattening output layer. Various feature vectors were employed to train the VGG16 network on different databases and assess VGG-optiVMD reproducibility and reliability. One, two, and three-dimensional feature vectors were constructed by concatenating Mel-frequency cepstral coefficients, Chromagram, Mel spectrograms, Tonnetz diagrams, and spectral centroids. Results confirmed a synergistic relationship between the fine-tuning of the signal sample rate and decomposition parameters with classification accuracy, achieving state-of-the-art 96.09% accuracy in predicting seven emotions on the Berlin EMO-DB database.<\/jats:p>","DOI":"10.1007\/978-3-031-33380-4_17","type":"book-chapter","created":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T08:05:58Z","timestamp":1685088358000},"page":"219-231","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance"],"prefix":"10.1007","author":[{"given":"David","family":"Hason Rudd","sequence":"first","affiliation":[]},{"given":"Huan","family":"Huo","sequence":"additional","affiliation":[]},{"given":"Guandong","family":"Xu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,27]]},"reference":[{"key":"17_CR1","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","DOI":"10.1007\/b104114","volume-title":"Advances in Multimedia Information Processing - PCM 2004","year":"2005","unstructured":"Aizawa, Kiyoharu, Nakamura, Yuichi, Satoh, Shin\u2019ichi (eds.): PCM 2004. LNCS, vol. 3331. Springer, Heidelberg (2005). https:\/\/doi.org\/10.1007\/b104114"},{"key":"17_CR2","doi-asserted-by":"crossref","unstructured":"Alshamsi, H., Kepuska, V., Alshamsi, H., Meng, H.: Automated facial expression and speech emotion recognition app development on smart phones using cloud computing. In: 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 730\u2013738. IEEE (2018)","DOI":"10.1109\/IEMCON.2018.8614831"},{"key":"17_CR3","doi-asserted-by":"crossref","unstructured":"Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International Conference on Platform Technology and Service (PlatCon), pp. 1\u20135 (2017)","DOI":"10.1109\/PlatCon.2017.7883728"},{"issue":"5","key":"17_CR4","doi-asserted-by":"publisher","first-page":"5571","DOI":"10.1007\/s11042-017-5292-7","volume":"78","author":"AM Badshah","year":"2019","unstructured":"Badshah, A.M., Rahim, N.: Ullah: Deep features-based speech emotion recognition for smart affective services. Multimedia Tools and Applications 78(5), 5571\u20135589 (2019)","journal-title":"Multimedia Tools and Applications"},{"key":"17_CR5","unstructured":"Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: A framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12449\u201312460 (2020)"},{"key":"17_CR6","doi-asserted-by":"crossref","unstructured":"Basharirad, B., Moradhaseli, M.: Speech emotion recognition methods: A literature review. In: AIP Conference Proceedings, vol. 1891, p. 020105. AIP Publishing LLC (2017)","DOI":"10.1063\/1.5005438"},{"key":"17_CR7","doi-asserted-by":"crossref","unstructured":"Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B., et al.: A database of german emotional speech. In: Interspeech. vol. 5, pp. 1517\u20131520 (2005)","DOI":"10.21437\/Interspeech.2005-446"},{"key":"17_CR8","doi-asserted-by":"publisher","DOI":"10.1016\/j.bspc.2020.102073","volume":"62","author":"VR Carvalho","year":"2020","unstructured":"Carvalho, V.R., Moraes, M.F., Braga, A.P., Mendes, E.M.: Evaluating five different adaptive decomposition methods for eeg signal seizure detection and classification. Biomed. Signal Process. Control 62, 102073 (2020)","journal-title":"Biomed. Signal Process. Control"},{"issue":"8","key":"17_CR9","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1007\/s00521-016-2712-y","volume":"29","author":"S Demircan","year":"2018","unstructured":"Demircan, S., Kahramanli, H.: Application of fuzzy c-means clustering algorithm to spectral features for emotion classification from speech. Neural Comput. Appl. 29(8), 59\u201366 (2018)","journal-title":"Neural Comput. Appl."},{"key":"17_CR10","doi-asserted-by":"crossref","unstructured":"Dendukuri, L.S., Hussain, S.J.: Emotional speech analysis and classification using variational mode decomposition. Int. J. Speech Technol, pp. 1\u201313 (2022)","DOI":"10.1007\/s10772-022-09970-z"},{"issue":"3","key":"17_CR11","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1109\/TSP.2013.2288675","volume":"62","author":"K Dragomiretskiy","year":"2013","unstructured":"Dragomiretskiy, K., Zosso, D.: Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531\u2013544 (2013)","journal-title":"IEEE Trans. Signal Process."},{"issue":"5","key":"17_CR12","doi-asserted-by":"publisher","first-page":"479","DOI":"10.3390\/e21050479","volume":"21","author":"N Hajarolasvadi","year":"2019","unstructured":"Hajarolasvadi, N., Demirel, H.: 3d cnn-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21(5), 479\u2013495 (2019)","journal-title":"Entropy"},{"key":"17_CR13","doi-asserted-by":"crossref","unstructured":"Harte, C., Sandler, M., Gasser, M.: Detecting harmonic change in musical audio. In: Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, pp. 21\u201326 (2006)","DOI":"10.1145\/1178723.1178727"},{"issue":"5","key":"17_CR14","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/BF00927673","volume":"4","author":"MR Hestenes","year":"1969","unstructured":"Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(5), 303\u2013320 (1969)","journal-title":"J. Optim. Theory Appl."},{"key":"17_CR15","doi-asserted-by":"crossref","unstructured":"Huang, Z., Dong, M., Mao, Q., Zhan, Y.: Speech emotion recognition using cnn. In: Proceedings of the 22nd ACM International Conference Media, pp. 801\u2013804 (2014)","DOI":"10.1145\/2647868.2654984"},{"key":"17_CR16","doi-asserted-by":"publisher","first-page":"101894","DOI":"10.1016\/j.bspc.2020.101894","volume":"59","author":"D Issa","year":"2020","unstructured":"Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894\u2013101904 (2020)","journal-title":"Biomed. Signal Process. Control"},{"issue":"2","key":"17_CR17","doi-asserted-by":"publisher","first-page":"2035","DOI":"10.1109\/JSEN.2020.3020915","volume":"21","author":"SK Khare","year":"2020","unstructured":"Khare, S.K., Bajaj, V.: An evolutionary optimized variational mode decomposition for emotion recognition. IEEE Sens. J. 21(2), 2035\u20132042 (2020)","journal-title":"IEEE Sens. J."},{"issue":"1","key":"17_CR18","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1109\/T-AFFC.2011.15","volume":"3","author":"S Koelstra","year":"2011","unstructured":"Koelstra, S., Kolestra, S., et al.: Deap: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18\u201331 (2011)","journal-title":"IEEE Trans. Affect. Comput."},{"issue":"1","key":"17_CR19","doi-asserted-by":"publisher","first-page":"183","DOI":"10.3390\/s20010183","volume":"20","author":"S Kwon","year":"2019","unstructured":"Kwon, S.: A cnn-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1), 183 (2019)","journal-title":"Sensors"},{"issue":"8","key":"17_CR20","doi-asserted-by":"publisher","first-page":"3245","DOI":"10.1007\/s00034-018-0804-x","volume":"37","author":"GJ Lal","year":"2018","unstructured":"Lal, G.J., Gopalakrishnan, E., Govind, D.: Epoch estimation from emotional speech signals using variational mode decomposition. Circ. Syst. Signal Process. 37(8), 3245\u20133274 (2018)","journal-title":"Circ. Syst. Signal Process."},{"issue":"5","key":"17_CR21","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0196391","volume":"13","author":"SR Livingstone","year":"2018","unstructured":"Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north american english. PLoS ONE 13(5), e0196391 (2018)","journal-title":"PLoS ONE"},{"key":"17_CR22","doi-asserted-by":"publisher","first-page":"125868","DOI":"10.1109\/ACCESS.2019.2938007","volume":"7","author":"H Meng","year":"2019","unstructured":"Meng, H., Yan, T., Yuan, F., Wei, H.: Speech emotion recognition from 3d log-mel spectrograms with deep learning network. IEEE access 7, 125868\u2013125881 (2019)","journal-title":"IEEE access"},{"key":"17_CR23","doi-asserted-by":"publisher","DOI":"10.1016\/j.jsv.2021.116370","volume":"512","author":"M Mousavi","year":"2021","unstructured":"Mousavi, M., Gandomi, A.H.: Structural health monitoring under environmental and operational variations using mcd prediction error. J. Sound Vib. 512, 116370 (2021)","journal-title":"J. Sound Vib."},{"key":"17_CR24","doi-asserted-by":"crossref","unstructured":"Pandey, P., Seeja, K.: Subject independent emotion recognition from eeg using vmd and deep learning. J. King Saud University-Comput. Inform. Sci. 34(4), 1730\u20131738 (2019)","DOI":"10.1016\/j.jksuci.2019.11.003"},{"issue":"1\u20132","key":"17_CR25","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1016\/S1071-5819(02)00141-6","volume":"59","author":"O Pierre-Yves","year":"2003","unstructured":"Pierre-Yves, O.: The production and recognition of emotions in speech: features and algorithms. Int. J. Hum Comput Stud. 59(1\u20132), 157\u2013183 (2003)","journal-title":"Int. J. Hum Comput Stud."},{"key":"17_CR26","doi-asserted-by":"crossref","unstructured":"Popova, A.S., Rassadin, A.G., Ponomarenko, A.A.: Emotion recognition in sound. In: International Conference on Neuroinformatics, pp. 117\u2013124 (2017)","DOI":"10.1007\/978-3-319-66604-4_18"},{"issue":"1","key":"17_CR27","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1007\/BF01580138","volume":"5","author":"RT Rockafellar","year":"1973","unstructured":"Rockafellar, R.T.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Program. 5(1), 354\u2013373 (1973)","journal-title":"Math. Program."},{"key":"17_CR28","doi-asserted-by":"publisher","unstructured":"Rudd, D.H., Huo, H., Xu, G.: Leveraged mel spectrograms using harmonic and percussive components in speech emotion recognition. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 392\u2013404. Springer (2022). https:\/\/doi.org\/10.1007\/978-3-031-05936-0_31","DOI":"10.1007\/978-3-031-05936-0_31"},{"issue":"3","key":"17_CR29","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","volume":"115","author":"O Russakovsky","year":"2015","unstructured":"Russakovsky, O., Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211\u2013252 (2015)","journal-title":"Int. J. Comput. Vision"},{"issue":"1","key":"17_CR30","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1109\/TAFFC.2015.2392101","volume":"6","author":"K Wang","year":"2015","unstructured":"Wang, K., An, N., Li, B.N., Zhang, Y., Li, L.: Speech emotion recognition using fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69\u201375 (2015)","journal-title":"IEEE Trans. Affect. Comput."},{"issue":"5","key":"17_CR31","doi-asserted-by":"publisher","first-page":"768","DOI":"10.1016\/j.specom.2010.08.013","volume":"53","author":"S Wu","year":"2011","unstructured":"Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768\u2013785 (2011)","journal-title":"Speech Commun."},{"key":"17_CR32","doi-asserted-by":"crossref","unstructured":"Zamil, A.A.A., Hasan, S., Baki, S.M.J., Adam, J.M., Zaman, I.: Emotion detection from speech signals using voting mechanism on classified frames. In: 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 281\u2013285. IEEE (2019)","DOI":"10.1109\/ICREST.2019.8644168"},{"key":"17_CR33","doi-asserted-by":"crossref","unstructured":"Zhang, M., Hu, B., Zheng, X., Li, T.: A novel multidimensional feature extraction method based on vmd and wpd for emotion recognition. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1216\u20131220. IEEE (2020)","DOI":"10.1109\/BIBM49941.2020.9313220"},{"key":"17_CR34","doi-asserted-by":"publisher","first-page":"312","DOI":"10.1016\/j.bspc.2018.08.035","volume":"47","author":"J Zhao","year":"2019","unstructured":"Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using deep 1d & 2d cnn lstm networks. Biomed. Signal Process. Control 47, 312\u2013323 (2019)","journal-title":"Biomed. Signal Process. Control"}],"container-title":["Lecture Notes in Computer Science","Advances in Knowledge Discovery and Data Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-33380-4_17","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,13]],"date-time":"2024-03-13T19:58:45Z","timestamp":1710359925000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-33380-4_17"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"ISBN":["9783031333798","9783031333804"],"references-count":34,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-33380-4_17","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023]]},"assertion":[{"value":"27 May 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"PAKDD","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Pacific-Asia Conference on Knowledge Discovery and Data Mining","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Osaka","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Japan","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2023","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"25 May 2023","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"28 May 2023","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"27","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"pakdd2023","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/pakdd2023.org\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Microsoft CMT","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"813","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"143","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"18% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.5","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"10","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}