{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T14:24:16Z","timestamp":1766067856829,"version":"3.37.3"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,1,6]],"date-time":"2024-01-06T00:00:00Z","timestamp":1704499200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,6]],"date-time":"2024-01-06T00:00:00Z","timestamp":1704499200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are directly related to magnitude information. In this paper, the importance of phase-based features is explored for the detection of shouted speech. The novel contributions of this work are as follows. (1) Three phase-based features, namely, relative phase (RP), linear prediction analysis estimated speech-based RP (LPAES-RP) and linear prediction residual-based RP (LPR-RP) features, are explored for shouted and normal speech classification. (2) We propose a new RP feature, called the glottal source-based RP (GRP) feature. The main idea of the proposed GRP feature is to exploit the difference between RP and LPAES-RP features to detect shouted speech. (3) A score combination of phase- and magnitude-based features is also employed to further improve the classification performance. The proposed feature and combination are evaluated using the shouted normal electroglottograph speech (SNE-Speech) corpus. The experimental findings show that the RP, LPAES-RP, and LPR-RP features provide promising results for the detection of shouted speech. We also find that the proposed GRP feature can provide better results than those of the standard mel-frequency cepstral coefficient (MFCC) feature. Moreover, compared to using individual features, the score combination of the MFCC and RP\/LPAES-RP\/LPR-RP\/GRP features yields an improved detection performance. Performance analysis under noisy environments shows that the score combination of the MFCC and the RP\/LPAES-RP\/LPR-RP features gives more robust classification. These outcomes show the importance of RP features in distinguishing shouted speech from normal speech.<\/jats:p>","DOI":"10.1186\/s13636-023-00324-4","type":"journal-article","created":{"date-parts":[[2024,1,6]],"date-time":"2024-01-06T14:01:43Z","timestamp":1704549703000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Significance of relative phase features for shouted and normal speech classification"],"prefix":"10.1186","volume":"2024","author":[{"given":"Khomdet","family":"Phapatanaburi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4005-5036","authenticated-orcid":false,"given":"Longbiao","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Meng","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Seiichi","family":"Nakagawa","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Talit","family":"Jumphoo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peerapong","family":"Uthansakul","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,1,6]]},"reference":[{"key":"324_CR1","doi-asserted-by":"publisher","first-page":"1437","DOI":"10.1109\/5.628714","volume":"85","author":"J Campbell","year":"1997","unstructured":"J. Campbell, Speaker recognition: a tutorial. Proc. IEEE 85, 1437\u20131462 (1997)","journal-title":"Proc. IEEE"},{"key":"324_CR2","doi-asserted-by":"publisher","first-page":"1116","DOI":"10.1109\/JPROC.2012.2236631","volume":"101","author":"X He","year":"2013","unstructured":"X. He, L. Deng, Speech-centric information processing: an optimization-oriented approach. Proc. IEEE 101, 1116\u20131135 (2013)","journal-title":"Proc. IEEE"},{"key":"324_CR3","doi-asserted-by":"publisher","first-page":"745","DOI":"10.1109\/TASLP.2014.2304637","volume":"22","author":"J Li","year":"2014","unstructured":"J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, An overview of noise-robust automatic speech recognition. IEEE\/ACM Trans. Audio Speech Lang. Process. 22, 745\u2013777 (2014)","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"324_CR4","doi-asserted-by":"publisher","first-page":"2700","DOI":"10.1016\/j.sigpro.2008.05.012","volume":"88","author":"I Shahin","year":"2008","unstructured":"I. Shahin, Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Process. 88, 2700\u20132708 (2008)","journal-title":"Signal Process."},{"key":"324_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.csl.2018.06.002","volume":"53","author":"E Jokinen","year":"2019","unstructured":"E. Jokinen, R. Saeidi, T. Kinnunen, P. Alku, Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task. Comput. Speech Lang. 53, 1\u201311 (2019)","journal-title":"Comput. Speech Lang."},{"key":"324_CR6","doi-asserted-by":"publisher","first-page":"2377","DOI":"10.1121\/1.4794394","volume":"133","author":"J Pohjalainen","year":"2013","unstructured":"J. Pohjalainen, T. Raitio, S. Yrttiaho, P. Alku, Detection of shouted speech in noise: human and machine. J. Acoust. Soc. Am. 133, 2377\u20132389 (2013)","journal-title":"J. Acoust. Soc. Am."},{"key":"324_CR7","doi-asserted-by":"publisher","first-page":"732","DOI":"10.1016\/j.specom.2012.01.002","volume":"54","author":"P Zelinka","year":"2012","unstructured":"P. Zelinka, M. Sigmund, J. Schimmel, Impact of vocal effort variability on automatic speech recognition. Speech Commun. 54, 732\u2013742 (2012)","journal-title":"Speech Commun."},{"key":"324_CR8","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1016\/j.specom.2018.05.004","volume":"101","author":"J Hansen","year":"2018","unstructured":"J. Hansen, H. Bo\u0159il, On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks. Speech Commun. 101, 94\u2013108 (2018)","journal-title":"Speech Commun."},{"key":"324_CR9","doi-asserted-by":"crossref","unstructured":"S. Baghel, B. Khonglah, S. Prasanna, P. Guha, in Proceedings of IEEE Region 10 Conference (TENCON): 28-31 October 2016. Shouted\/normal speech classification using speech-specific features (IEEE, Jeju Island, 2016), pp. 1655\u20131659","DOI":"10.1109\/TENCON.2016.7848298"},{"key":"324_CR10","doi-asserted-by":"crossref","unstructured":"V. Mittal, A. Vuppala, in Proceedings of International Symposium on Chinese Spoken Language Processing (ISCSLP): 17-20 October 2016. Significance of automatic detection of vowel regions for automatic shout detection in continuous speech (IEEE, Tianjin, 2016), pp. 1\u20135","DOI":"10.1109\/ISCSLP.2016.7918393"},{"key":"324_CR11","doi-asserted-by":"publisher","first-page":"1543","DOI":"10.1121\/1.1911899","volume":"46","author":"J Brandt","year":"1969","unstructured":"J. Brandt, K. Ruder, T. Shipp, Vocal loudness and effort in continuous speech. J. Acoust. Soc. Am. 46, 1543\u20131548 (1969)","journal-title":"J. Acoust. Soc. Am."},{"key":"324_CR12","doi-asserted-by":"publisher","first-page":"3050","DOI":"10.1121\/1.4796110","volume":"13","author":"V Mittal","year":"2013","unstructured":"V. Mittal, B. Yegnanarayana, Effect of glottal dynamics in the production of shouted speech. J. Acoust. Soc. Am. 13, 3050\u20133061 (2013)","journal-title":"J. Acoust. Soc. Am."},{"key":"324_CR13","doi-asserted-by":"crossref","unstructured":"S. Baghel, P. Guha, in Proceedings of International Conference on Signal Processing and Communications (SPCOM): 16-19 July 2018. Excitation source feature for discriminating shouted and normal speech. (IEEE,\u00a0Bangalore, 2018), pp. 167\u2013171","DOI":"10.1109\/SPCOM.2018.8724482"},{"key":"324_CR14","doi-asserted-by":"crossref","unstructured":"S. Baghel, M. Bhattacharjee, S. Prasanna, P. Guha, in Proceedings of International Conference on Pattern Recognition and Machine Intelligence: 17-20 December 2019. Shouted and normal speech classification using 1D CNN. (Springer, Tezpur, 2019), pp. 472\u2013480","DOI":"10.1007\/978-3-030-34872-4_52"},{"key":"324_CR15","doi-asserted-by":"crossref","unstructured":"T. Raitio, A. Suni, J. Pohjalainen, M. Airaksinen, M. Vainio, P. Alku, in Proceedings of the The International Speech Communication Association (INTERSPEECH): 25-29 August 2013. Analysis and synthesis of shouted speech. (ISCA, Lyon, 2013), pp. 1544\u20131548","DOI":"10.21437\/Interspeech.2013-391"},{"key":"324_CR16","doi-asserted-by":"crossref","unstructured":"G. Degottex, J. Kane, T. Drugman, T. Raitio, S. Scherer, in Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP): 4 -6 May 2013. COVAREP-A collaborative voice analysis repository for speech technologies (IEEE, Florence, 2014), pp. 960\u2013964","DOI":"10.1109\/ICASSP.2014.6853739"},{"key":"324_CR17","doi-asserted-by":"publisher","first-page":"1250","DOI":"10.1121\/10.0000757","volume":"147","author":"S Baghel","year":"2020","unstructured":"S. Baghel, S. Prasanna, P.P. Guha, Exploration of excitation source information for shouted and normal speech classification. J. Acoust. Soc. Am. 147, 1250\u20131261 (2020)","journal-title":"J. Acoust. Soc. Am."},{"key":"324_CR18","doi-asserted-by":"crossref","unstructured":"N.N. Singh, R.R. Khan, R.R. Shree, MFCC and prosodic feature extraction techniques: a comparative study. Int. J. Comput. Appl. 54,\u00a09\u201313 (2012)","DOI":"10.5120\/8529-2061"},{"key":"324_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.specom.2016.04.002","volume":"81","author":"P Mowlaee","year":"2019","unstructured":"P. Mowlaee, R. Saeidi, Y. Stylianou, Advances in phase-aware signal processing in speech communication. Speech Commun. 81, 1\u201329 (2019)","journal-title":"Speech Commun."},{"key":"324_CR20","doi-asserted-by":"crossref","unstructured":"L. Guo, L. Wang, J. Dang, Z. Liu, H. Guan, in Proceedings of the First National Conference on Porous Sieves: 5-8 January 2020. Speaker-aware speech emotion recognition by fusing amplitude and phase information (Springer, Daejeon, 2020), pp. 14\u201325","DOI":"10.1007\/978-3-030-37731-1_2"},{"key":"324_CR21","doi-asserted-by":"publisher","first-page":"18865","DOI":"10.1007\/s11042-018-5686-1","volume":"77","author":"Z Oo","year":"2018","unstructured":"Z. Oo, L. Wang, K. Phapatanaburi, M. Iwahashi, S. Nakagawa, J. Dang, Phase and reverberation aware DNN for distant-talking speech enhancement. Multimed. Tools Appl. 77, 18865\u201318880 (2018)","journal-title":"Multimed. Tools Appl."},{"key":"324_CR22","doi-asserted-by":"crossref","unstructured":"Z. Oo, Y. Kawakami, L. Wang, S. Nakagawa, X. Xiao, M. Iwahashi, in Proceedings of the International Speech Communication Association (INTERSPEECH): 8-12 September 2016. DNN-Based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification (ISCA, San Francisco, 2016), pp. 2204\u20132208","DOI":"10.21437\/Interspeech.2016-717"},{"key":"324_CR23","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1109\/TASL.2006.876858","volume":"15","author":"R Hegde","year":"2007","unstructured":"R. Hegde, H. Murthy, V. Gadde, Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15, 190\u2013202 (2007)","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"324_CR24","doi-asserted-by":"publisher","first-page":"1085","DOI":"10.1109\/TASL.2011.2172422","volume":"20","author":"S Nakagawa","year":"2012","unstructured":"S. Nakagawa, L. Wang, S. Ohtsuka, Speaker identification and verification by combining MFCC and phase information. IEEE Trans. Audio Speech Lang. Process. 20, 1085\u20131095 (2012)","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"324_CR25","doi-asserted-by":"crossref","unstructured":"L. Wang, Y. Yoshida, Y. Kawakami, S. Nakagawa, in Proceedings of the International Speech Communication Association (INTERSPEECH): 6-10 September 2015. Relative phase information for detecting human speech and spoofed speech (ISCA, Dresden, 2015), pp. 2092\u20132096","DOI":"10.21437\/Interspeech.2015-473"},{"key":"324_CR26","doi-asserted-by":"crossref","unstructured":"Z. Oo, L. Wang, K. Phapatanaburi, M. Liu, S. Nakagawa, M. Iwahashi, J. Dang, Replay attack detection with auditory filter-based relative phase features. EURASIP J. Audio Spee. 2019,\u00a01\u201311 (2019)","DOI":"10.1186\/s13636-019-0151-2"},{"key":"324_CR27","doi-asserted-by":"publisher","first-page":"183614","DOI":"10.1109\/ACCESS.2019.2960369","volume":"7","author":"K Phapatanaburi","year":"2019","unstructured":"K. Phapatanaburi, L. Wang, M. Iwahashi, S. Nakagawa, Replay attack detection using linear prediction analysis-based relative phase features. IEEE Access 7, 183614\u2013183625 (2019)","journal-title":"IEEE Access"},{"key":"324_CR28","doi-asserted-by":"crossref","unstructured":"L. Wang, K. Phapatanaburi, Z. Oo, S. Nakagawa, M. Iwahashi, J. Dang, in Proceedings of IEEE International Conference on Multimedia and Expo (ICME): 10-14 June 2017, ed. by Y. Smith. Phase aware deep neural network for noise robust voice activity detection (IEEE, Hong Kong, 2017) pp. 1087\u20131092","DOI":"10.1109\/ICME.2017.8019414"},{"key":"324_CR29","doi-asserted-by":"crossref","unstructured":"X. Zhang, J. Wu, in Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP): 26-31 May 2013. Denoising deep neural networks based voice activity detection (IEEE, Vancouver, 2013), pp. 853\u2013857","DOI":"10.1109\/ICASSP.2013.6637769"},{"key":"324_CR30","doi-asserted-by":"crossref","unstructured":"L. Deng, Deep learning: from speech recognition to language and multimodal processing. APSIPA Trans. Signal Inf. Process. 5,\u00a01\u201315 (2016)","DOI":"10.1017\/ATSIP.2015.22"},{"key":"324_CR31","doi-asserted-by":"crossref","unstructured":"Hanil\u00e7i, C., Kinnunen, T., Sahidullah , M., Sizov, A. in Proceedings of the International Speech Communication Association: 6-10 September 2015 ed. by Y. Smith. Classifiers for synthetic speech detection: a comparison (ISCA, Dresden, 2015), pp. 2057\u20132061","DOI":"10.21437\/Interspeech.2015-466"},{"key":"324_CR32","doi-asserted-by":"crossref","unstructured":"H. Delgado, M. Todisco, M. Sahidullah, A. Sarkar, N. Evans, T. Kinnunen, Z. Tan, in Proceedings of IEEE Spoken Language Technology Workshop (SLT): 13-16 December 2016. Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification (IEEE, San Diego, 2016), pp. 179\u2013185","DOI":"10.1109\/SLT.2016.7846262"},{"key":"324_CR33","doi-asserted-by":"publisher","first-page":"845","DOI":"10.1007\/s12652-017-0482-8","volume":"8","author":"K Phapatanaburi","year":"2017","unstructured":"K. Phapatanaburi, L. Wang, Z. Oo, W. Li, S. Nakagawa, M. Iwahashi, Noise robust voice activity detection using joint phase and magnitude based feature enhancement. J. Amb. Intel. Hum. Comp. 8, 845\u2013859 (2017)","journal-title":"J. Amb. Intel. Hum. Comp."},{"key":"324_CR34","doi-asserted-by":"publisher","first-page":"1243","DOI":"10.1016\/j.specom.2006.06.002","volume":"48","author":"SM Prasanna","year":"2006","unstructured":"S.M. Prasanna, C.S. Gupta, B. Yegnanarayana, Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 48, 1243\u20131261 (2006)","journal-title":"Speech Commun."},{"key":"324_CR35","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1093\/biostatistics\/5.1.113","volume":"5","author":"C Moskowitz","year":"2004","unstructured":"C. Moskowitz, M. Pepe, Quantifying and comparing the predictive accuracy of continuous prognostic factors for binary outcomes. Biostatistics 5, 113\u2013127 (2004)","journal-title":"Biostatistics"},{"key":"324_CR36","doi-asserted-by":"crossref","unstructured":"Z. Chen, Z. Xie, W. Zhang, X. Xu, in Proceedings of the The International Speech Communication Association (INTERSPEECH): 20-24 August 2017. ResNet and Model Fusion for Automatic Spoofing Detection (ISCA, Stockholm, 2017), pp. 102\u2013106","DOI":"10.21437\/Interspeech.2017-1085"},{"key":"324_CR37","doi-asserted-by":"crossref","unstructured":"L. Wang, K. Minami, K. Yamamoto, S. Nakagawa, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): 15-19 March 2010. Speaker identification by combining MFCC and phase information in noisy environments (IEEE, Texas, 2018), pp. 4502\u20134505","DOI":"10.1109\/ICASSP.2010.5495586"},{"key":"324_CR38","unstructured":"A. Varga, H. Steeneken, D. Jones, The noisex-92 study on the effect of additive noise on automatic speech recognition system. Reports of NATO Research Study Group (RSG. 10) (1992)"},{"key":"324_CR39","doi-asserted-by":"crossref","unstructured":"R. Das, H. Li, in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC): 12-15 November 2018. Instantaneous phase and excitation source features for detection of replay attacks (IEEE, Honolulu, 2018), pp. 151\u2013155","DOI":"10.23919\/APSIPA.2018.8659789"},{"key":"324_CR40","doi-asserted-by":"crossref","unstructured":"K. Srinivas, R. Das, H. Patil, in Proceedings of International Symposium on Chinese Spoken Language Processing (ISCSLP): 26-29 November 2018. Combining phase-based features for replay spoof detection system (IEEE, Taipei City, 2018), pp. 151\u2013155","DOI":"10.1109\/ISCSLP.2018.8706672"},{"key":"324_CR41","unstructured":"P. Alku, H. Pohjalainen, M. Airaksinen, in Proceedings of the Subsidia: Tools and Resources for Speech Sciences: 21\u201323 June 2017. Aalto Aparat-A freely available tool for glottal inverse filtering and voice source parameterization (Malaga), pp. 1\u20138"}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-023-00324-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13636-023-00324-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-023-00324-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,6]],"date-time":"2024-01-06T14:03:54Z","timestamp":1704549834000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-023-00324-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,6]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["324"],"URL":"https:\/\/doi.org\/10.1186\/s13636-023-00324-4","relation":{},"ISSN":["1687-4722"],"issn-type":[{"type":"electronic","value":"1687-4722"}],"subject":[],"published":{"date-parts":[[2024,1,6]]},"assertion":[{"value":"5 February 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 December 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"2"}}