{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T22:46:50Z","timestamp":1767912410047,"version":"3.49.0"},"reference-count":28,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2021,9,27]],"date-time":"2021-09-27T00:00:00Z","timestamp":1632700800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Within the field of Automatic Speech Recognition (ASR) systems, facing impaired speech is a big challenge because standard approaches are ineffective in the presence of dysarthria. The first aim of our work is to confirm the effectiveness of a new speech analysis technique for speakers with dysarthria. This new approach exploits the fine-tuning of the size and shift parameters of the spectral analysis window used to compute the initial short-time Fourier transform, to improve the performance of a speaker-dependent ASR system. The second aim is to define if there exists a correlation among the speaker\u2019s voice features and the optimal window and shift parameters that minimises the error of an ASR system, for that specific speaker. For our experiments, we used both impaired and unimpaired Italian speech. Specifically, we used 30 speakers with dysarthria from the IDEA database and 10 professional speakers from the CLIPS database. Both databases are freely available. The results confirm that, if a standard ASR system performs poorly with a speaker with dysarthria, it can be improved by using the new speech analysis. Otherwise, the new approach is ineffective in cases of unimpaired and low impaired speech. Furthermore, there exists a correlation between some speaker\u2019s voice features and their optimal parameters.<\/jats:p>","DOI":"10.3390\/s21196460","type":"journal-article","created":{"date-parts":[[2021,9,27]],"date-time":"2021-09-27T22:16:38Z","timestamp":1632780998000},"page":"6460","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6197-1642","authenticated-orcid":false,"given":"Marco","family":"Marini","sequence":"first","affiliation":[{"name":"Department of Information Engineering, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2312-6699","authenticated-orcid":false,"given":"Nicola","family":"Vanello","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5426-4974","authenticated-orcid":false,"given":"Luca","family":"Fanucci","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,27]]},"reference":[{"key":"ref_1","unstructured":"McNeil, M.R. (2009). Clinical Management of Sensorimotor Speech Disorders, Thieme."},{"key":"ref_2","unstructured":"Ballati, F., Corno, F., and De Russis, L. (2018). \u201cHey Siri, do you understand me?\u201d: Virtual Assistants and Dysarthria. Intelligent Environments 2018, IOS Press."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Gales, M., and Young, S. (2008). The Application of Hidden Markov Models in Speech Recognition, Publishers Inc.","DOI":"10.1561\/9781601981219"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Li, J., Yu, D., Huang, J.T., and Gong, Y. (2012, January 2\u20135). Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM. Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), Miami, FL, USA.","DOI":"10.1109\/SLT.2012.6424210"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ballati, F., Corno, F., and De Russis, L. (2018, January 22\u201324). Assessing virtual assistant capabilities with Italian dysarthric speech. Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility, Galway, Ireland.","DOI":"10.1145\/3234695.3236354"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1007\/s10579-011-9145-0","article-title":"The TORGO database of acoustic and articulatory speech from speakers with dysarthria","volume":"46","author":"Rudzicz","year":"2012","journal-title":"Lang. Resour. Eval."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T.S., Watkin, K., and Frame, S. (2008, January 22\u201326). Dysarthric speech database for universal access research. Proceedings of the Ninth Annual Conference of the International Speech Communication Association, Brisbane, Australia.","DOI":"10.21437\/Interspeech.2008-480"},{"key":"ref_8","unstructured":"James, X.M.P., Polikoff, J.B., Peters, S.M., Leonzio, J.E., and Bunnell, H. (1996, January 3\u20136). The Nemours database of dysarthric speech. Proceedings of the Fourth International Conference on Spoken Language Processing, Philadelphia, PA, USA."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Mengistu, K.T., and Rudzicz, F. (2011, January 22\u201327). Adapting acoustic and lexical models to dysarthric speech. Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic.","DOI":"10.1109\/ICASSP.2011.5947460"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1109\/TNSRE.2018.2802914","article-title":"Improving acoustic models in TORGO dysarthric speech database","volume":"26","author":"Joy","year":"2018","journal-title":"IEEE Trans. Neural Syst. Rehabil. Eng."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Espana-Bonet, C., and Fonollosa, J.A. (2016). Automatic speech recognition with deep neural networks for impaired speech. International Conference on Advances in Speech and Language Technologies for Iberian Languages, Springer.","DOI":"10.1007\/978-3-319-49169-1_10"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Aizawa, K., Nakamura, Y., and Satoh, S. (2004). Advances in Multimedia Information Processing-PCM 2004: 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, 30 November\u20133 December 2004, Proceedings, Part II, Springer.","DOI":"10.1007\/b104117"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Marini, M., Meoni, G., Mulfari, D., Vanello, N., and Fanucci, L. (2020). Enabling Smart Home Voice Control for Italian People with Dysarthria: Preliminary Analysis of Frame Rate Effect on Speech Recognition. International Conference on Applications in Electronics Pervading Industry, Environment and Society, Springer.","DOI":"10.1007\/978-3-030-66729-0_13"},{"key":"ref_14","unstructured":"Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., and Schwarz, P. (2011). The Kaldi speech recognition toolkit. IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, IEEE Signal Processing Society."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Marini, M., Vigan\u00f2, M., Corbo, M., Zettin, M., Simoncini, G., Fattori, B., D\u2019Anna, C., Donati, M., and Fanucci, L. (2021). IDEA: An Italian Dysarthric Speech Database. 2021 IEEE Spoken Language Technology Workshop (SLT), IEEE.","DOI":"10.1109\/SLT48900.2021.9383467"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Shahin, M., Ahmed, B., McKechnie, J., Ballard, K., and Gutierrez-Osuna, R. (2014, January 14\u201318). A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association, Singapore.","DOI":"10.21437\/Interspeech.2014-377"},{"key":"ref_17","unstructured":"Fukunaga, K. (2013). Introduction to Statistical Pattern Recognition, Elsevier."},{"key":"ref_18","unstructured":"Gopinath, R.A. (1998, January 12\u201315). Maximum likelihood modeling with Gaussian distributions for classification. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP\u201998 (Cat. No. 98CH36181), Washington, DC, USA."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1214\/aos\/1013699998","article-title":"The control of the false discovery rate in multiple testing under dependency","volume":"29","author":"Benjamini","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.anl.2014.11.001","article-title":"Assessment of voice quality: Current state-of-the-art","volume":"42","author":"Barsties","year":"2015","journal-title":"Auris Nasus Larynx"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1112","DOI":"10.1016\/j.protcy.2013.12.124","article-title":"Vocal acoustic analysis\u2013jitter, shimmer and hnr parameters","volume":"9","author":"Teixeira","year":"2013","journal-title":"Procedia Technol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1205","DOI":"10.1016\/j.procs.2017.05.092","article-title":"Vocal signal analysis in patients affected by Multiple Sclerosis","volume":"108","author":"Vizza","year":"2017","journal-title":"Procedia Comput. Sci."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/j.jvoice.2004.02.005","article-title":"Spectral moments of the long-term average spectrum: Sensitive indices of voice change after therapy?","volume":"19","author":"Tanner","year":"2005","journal-title":"J. Voice"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sun, X. (2002, January 13\u201317). Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA.","DOI":"10.1109\/ICASSP.2002.5743722"},{"key":"ref_25","first-page":"109","article-title":"Digital signal processing in the differential diagnosis of benign larynx diseases","volume":"16","author":"Zwetsch","year":"2006","journal-title":"Sci. Medica"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1002\/widm.2","article-title":"Robust statistics for outlier detection","volume":"1","author":"Rousseeuw","year":"2011","journal-title":"Wiley Interdiscip. Rev. Data Min. Knowl. Discov."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Feinberg, D.R., and Cook, O. VoiceLab: Automated Reproducible Acoustic Analysis. PsyArXiv, 2020.","DOI":"10.31234\/osf.io\/v5uxf"},{"key":"ref_28","unstructured":"Boersma, P. (2021, May 10). Praat: Doing Phonetics by Computer. Available online: http:\/\/www.praat.org\/."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/19\/6460\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:06:00Z","timestamp":1760166360000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/19\/6460"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,27]]},"references-count":28,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["s21196460"],"URL":"https:\/\/doi.org\/10.3390\/s21196460","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,27]]}}}