{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T23:45:44Z","timestamp":1771026344345,"version":"3.50.1"},"reference-count":20,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,1,13]],"date-time":"2020-01-13T00:00:00Z","timestamp":1578873600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,1,13]],"date-time":"2020-01-13T00:00:00Z","timestamp":1578873600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this paper, we use empirical mode decomposition and Hurst-based mode selection (EMDH) along with deep learning architecture using a convolutional neural network (CNN) to improve the recognition of dysarthric speech. The EMDH speech enhancement technique is used as a preprocessing step to improve the quality of dysarthric speech. Then, the Mel-frequency cepstral coefficients are extracted from the speech processed by EMDH to be used as input features to a CNN-based recognizer. The effectiveness of the proposed EMDH-CNN approach is demonstrated by the results obtained on the Nemours corpus of dysarthric speech. Compared to baseline systems that use Hidden Markov with Gaussian Mixture Models (HMM-GMMs) and a CNN without an enhancement module, the EMDH-CNN system increases the overall accuracy by 20.72% and 9.95%, respectively, using a<jats:italic>k<\/jats:italic>-fold cross-validation experimental setup.<\/jats:p>","DOI":"10.1186\/s13636-019-0169-5","type":"journal-article","created":{"date-parts":[[2020,1,13]],"date-time":"2020-01-13T14:02:49Z","timestamp":1578924169000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":47,"title":["Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network"],"prefix":"10.1186","volume":"2020","author":[{"given":"Mohammed","family":"Sidi Yakoub","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sid-ahmed","family":"Selouani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Brahim-Fares","family":"Zaidi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Asma","family":"Bouchair","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,1,13]]},"reference":[{"key":"169_CR1","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/B978-0-444-52901-5.00022-8","volume-title":"Neurological Rehabilitation","author":"Pam Enderby","year":"2013","unstructured":"P. Enderby, in Handbook of Clinical Neurology (110 ed.)Disorders of communication: Dysarthria (Elsevier B. V., 2013), pp. 273\u2013281. https:\/\/www.sciencedirect.com\/science\/article\/pii\/B9780444529015000228. https:\/\/doi.org\/10.1016\/B978-0-444-52901-5.00022-8."},{"issue":"8","key":"169_CR2","doi-asserted-by":"publisher","first-page":"741","DOI":"10.1016\/j.medengphy.2005.11.002","volume":"28","author":"P. D. Polur","year":"2006","unstructured":"P. D. Polur, G. E. Miller, Investigation of an hmm\/ann hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. Med. Eng. Phys.28(8), 741\u2013748 (2006).","journal-title":"Med. Eng. Phys."},{"key":"169_CR3","doi-asserted-by":"publisher","unstructured":"M. Hasegawa-Johnson, J. Gunderson, A. Perlman, T. Huang, in 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse. Hmm-Based and Svm-Based Recognition of the Speech of Talkers With Spastic Dysarthria, (2006), pp. III-III. https:\/\/ieeexplore.ieee.org\/abstract\/document\/1660840. https:\/\/doi.org\/10.1109\/ICASSP.2006.1660840.","DOI":"10.1109\/ICASSP.2006.1660840"},{"key":"169_CR4","doi-asserted-by":"crossref","unstructured":"M. J. Kim, B. Cao, K. An, J. Wang, in Interspeech. Dysarthric speech recognition using convolutional lstm neural network, (2018), pp. 2948\u20132952. https:\/\/www.researchgate.net\/publication\/327350843_Dysarthric_Speech_Recognition_Using_Convolutional_LSTM_Neural_Network.","DOI":"10.21437\/Interspeech.2018-2250"},{"key":"169_CR5","unstructured":"S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, et al., The htk book (for htk version. 3.3), Cambridge University Engineering Department, 2005 (2006). http:\/\/htk.eng.cam.ac.uk\/docs\/docs.shtml."},{"key":"169_CR6","doi-asserted-by":"crossref","unstructured":"S. Oue, R. Marxer, F. Rudzicz, in Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies. Automatic dysfluency detection in dysarthric speech using deep belief networks, (2015), pp. 60\u201364. https:\/\/www.aclweb.org\/anthology\/W15-5111\/.","DOI":"10.18653\/v1\/W15-5111"},{"key":"169_CR7","unstructured":"B. Vachhani, C. Bhat, S. K. Kopparapu, in Interspeech. Data augmentation using healthy speech for dysarthric speech recognition, (2018), pp. 471\u2013475. https:\/\/www.iscaspeech.org\/archive\/Interspeech_2018\/pdfs\/1751.pdf. https:\/\/www.semanticscholar.org\/paper\/Data-Augmentation-Using-Healthy-Speech-for-Speech-Vachhani-Bhat\/e98ea9dc73bf87e5509e987addf56b7006593ad7."},{"key":"169_CR8","doi-asserted-by":"crossref","unstructured":"C. Bhat, B. Das, B. Vachhani, S. K. Kopparapu, in Interspeech. Dysarthric speech recognition using time-delay neural network based denoising autoencoder, (2018), pp. 451\u2013455. https:\/\/www.iscaspeech.org\/archive\/Interspeech_2018\/pdfs\/1754.pdf. https:\/\/www.researchgate.net\/publication\/327389525_Dysarthric_Speech_Recognition_Using_Timedelay_Neural_Network_Based_Denoising_Autoencoder.","DOI":"10.21437\/Interspeech.2018-1754"},{"key":"169_CR9","unstructured":"S. Liu, S. Hu, X. Liu, H. Meng, On the use of pitch features for disordered speech recognition, (2019). https:\/\/www.iscaspeech.org\/archive\/Interspeech_2019\/pdfs\/2609.pdf. https:\/\/www.semanticscholar.org\/paper\/On-the-Use-of-Pitch-Features-for-Disordered-Speech-Liu-Hu\/26cd586e2e704cc46099e9af18c56b3d7419ec54."},{"key":"169_CR10","unstructured":"J. Wu, Application of artificial neural network on speech signal features for Parkinson\u2019s disease classification (2019). http:\/\/csusm-dspace.calstate.edu\/handle\/10211.3\/209929."},{"key":"169_CR11","unstructured":"S. Hu, S. Liu, H. F. Chang, M. Geng, J. Chen, T. K. H. Chung, J. Yu, K. H. Wong, X. Liu, H. Meng, The cuhk dysarthric speech recognition systems for English and Cantonese, (2019). https:\/\/www.iscaspeech.org\/archive\/Interspeech_2019\/pdfs\/8047.pdf. https:\/\/www.semanticscholar.org\/paper\/The-CUHK-Dysarthric-Speech-Recognition-Systems-for-Hu-Liu\/21f55376e76a0602525bdfbe54c00ff97226c30a."},{"issue":"6","key":"169_CR12","doi-asserted-by":"publisher","first-page":"4660","DOI":"10.1121\/1.4986746","volume":"141","author":"S. A. Borrie","year":"2017","unstructured":"S. A. Borrie, M. Baese-Berk, K. Van Engen, T. Bent, A relationship between processing speech in noise and dysarthric speech. J. Acoust. Soc. Am.141(6), 4660\u20134667 (2017).","journal-title":"J. Acoust. Soc. Am."},{"key":"169_CR13","doi-asserted-by":"crossref","first-page":"770","DOI":"10.1061\/TACEAT.0006518","volume":"116","author":"H. E. Hurst","year":"1951","unstructured":"H. E. Hurst, Long-term storage capacity of reservoirs. Trans. Amer. Soc. Civil Eng.116:, 770\u2013799 (1951).","journal-title":"Trans. Amer. Soc. Civil Eng."},{"key":"169_CR14","unstructured":"B. B. Mandelbrot, R. L. Hudson, The (mis) Behaviour of Markets: a Fractal View of Risk, Ruin and Reward (Profile books, 2010). https:\/\/users.math.yale.edu\/~bbm3\/web_pdfs\/misbehaviorprelude.pdf."},{"issue":"5","key":"169_CR15","doi-asserted-by":"publisher","first-page":"899","DOI":"10.1109\/TASLP.2014.2312541","volume":"22","author":"L. Zao","year":"2014","unstructured":"L. Zao, R. Coelho, P. Flandrin, Speech enhancement with EMD and Hurst-based mode selection. IEEE\/ACM Trans. Audio Speech Lang. Process.22(5), 899\u2013911 (2014).","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"issue":"3","key":"169_CR16","doi-asserted-by":"publisher","first-page":"931","DOI":"10.1109\/TSA.2005.858054","volume":"14","author":"R. Sant\u2019Ana","year":"2006","unstructured":"R. Sant\u2019Ana, R. Coelho, A. Alcaim, Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model. IEEE Trans. Audio Speech Lang. Process.14(3), 931\u2013940 (2006).","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"169_CR17","unstructured":"P. Burkert, F. Trier, M. Z. Afzal, A. Dengel, M. Liwicki, Dexpression: deep convolutional neural network for expression recognition. arXiv preprint arXiv:1509.05371 (2015)."},{"key":"169_CR18","doi-asserted-by":"crossref","unstructured":"X Menendez-Pidal, J. B. Polikoff, S. M. Peters, J. E. Leonzio, H. T. Bunnell, in Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP \u201996. The Nemours database of dysarthric speech (Philadelphia, 1996), pp. 1962\u20131965. https:\/\/ieeexplore.ieee.org\/abstract\/document\/608020.","DOI":"10.21437\/ICSLP.1996-503"},{"key":"169_CR19","unstructured":"F. Chollet, et al., Keras (2015). http:\/\/keras.io\/. https:\/\/keras.io\/getting-started\/faq\/#how-should-i-citekeras. https:\/\/keras.io\/getting-started\/faq\/\\#how-should-i-citekeras."},{"key":"169_CR20","doi-asserted-by":"publisher","unstructured":"K. T. Mengistu, F. Rudzicz, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Adapting acoustic and lexical models to dysarthric speech (Prague, 2011), pp. 4924\u20134927. https:\/\/ieeexplore.ieee.org\/abstract\/document\/5947460. https:\/\/doi.org\/10.1109\/ICASSP.2011.5947460.","DOI":"10.1109\/ICASSP.2011.5947460"}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-019-0169-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13636-019-0169-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-019-0169-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,11]],"date-time":"2022-10-11T00:51:24Z","timestamp":1665449484000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-019-0169-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,13]]},"references-count":20,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["169"],"URL":"https:\/\/doi.org\/10.1186\/s13636-019-0169-5","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,13]]},"assertion":[{"value":"19 August 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 December 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 January 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"1"}}