{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:51:25Z","timestamp":1760241085581,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2019,11,25]],"date-time":"2019-11-25T00:00:00Z","timestamp":1574640000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006769","name":"Russian Science Foundation","doi-asserted-by":"publisher","award":["16-15-00038"],"award-info":[{"award-number":["16-15-00038"]}],"id":[{"id":"10.13039\/501100006769","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003443","name":"Ministry of Education and Science of the Russian Federation","doi-asserted-by":"publisher","award":["8.9628.2017\/8.9"],"award-info":[{"award-number":["8.9628.2017\/8.9"]}],"id":[{"id":"10.13039\/501100003443","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>This paper discusses an approach for assessing the quality of speech while undergoing speech rehabilitation. One of the main reasons for speech quality decrease during the surgical treatment of vocal tract diseases is the loss of the vocal tract\u02c8s parts and the disruption of its symmetry. In particular, one of the most common oncological diseases of the oral cavity is cancer of the tongue. During surgical treatment, a glossectomy is performed, which leads to the need for speech rehabilitation to eliminate the occurring speech defects, leading to a decrease in speech intelligibility. In this paper, we present an automated approach for conducting the speech quality evaluation. The approach relies on a convolutional neural network (CNN). The main idea of the approach is to train an individual neural network for a patient before having an operation to recognize typical sounding of phonemes for their speech. The neural network will thereby be able to evaluate the similarity between the patient\u02c8s speech before and after the surgery. The recognition based on the full phoneme set and the recognition by groups of phonemes were considered. The correspondence of assessments obtained through the autorecognition approach with those from the human-based approach is shown. The automated approach is principally applicable to defining boundaries between phonemes. The paper shows that iterative training of the neural network and continuous updating of the training dataset gradually improve the ability of the CNN to define boundaries between different phonemes.<\/jats:p>","DOI":"10.3390\/sym11121447","type":"journal-article","created":{"date-parts":[[2019,11,25]],"date-time":"2019-11-25T05:52:58Z","timestamp":1574661178000},"page":"1447","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Evaluation of Speech Quality Through Recognition and Classification of Phonemes"],"prefix":"10.3390","volume":"11","author":[{"given":"Svetlana","family":"Pekarskikh","sequence":"first","affiliation":[{"name":"Tomsk State University of Control Systems and Radioelectronics, Tomsk 634050, Russia"}]},{"given":"Evgeny","family":"Kostyuchenko","sequence":"additional","affiliation":[{"name":"Tomsk State University of Control Systems and Radioelectronics, Tomsk 634050, Russia"}]},{"given":"Lidiya","family":"Balatskaya","sequence":"additional","affiliation":[{"name":"Tomsk State University of Control Systems and Radioelectronics, Tomsk 634050, Russia"},{"name":"Tomsk Cancer Research Institute, Tomsk 634050, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2019,11,25]]},"reference":[{"key":"ref_1","first-page":"e603","article-title":"Oral squamous cell carcinoma of tongue: Histological risk assessment. A pilot study","volume":"24","year":"2019","journal-title":"Med. Oral. Patol. Oral Cir. Bucal."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"719","DOI":"10.5631\/jibirin.97.719","article-title":"Partial glossectomy for early tongue cancer","volume":"97","author":"Kishimoto","year":"2004","journal-title":"Pract. Oto-Rhino-Laryngologica"},{"key":"ref_3","first-page":"116","article-title":"Structure and database of software for speech quality and intelligibility assessment in the process of rehabilitation after surgery in the treatment of cancers of the oral cavity and oropharynx, maxillofacial area","volume":"32","author":"Kostyuchenko","year":"2014","journal-title":"SPIIRAS Proc."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kostyuchenko, E., Ignatieva, D., Meshcheryakov, R., Pyatkov, A., Choynzonov, E., and Balatskaya, L. (2016, January 15\u201317). Model of system quality assessment pronouncing phonemes. Proceedings of the 2016 Dynamics of Systems, Mechanisms and Machines, Omsk, Russia.","DOI":"10.1109\/Dynamics.2016.7819016"},{"key":"ref_5","unstructured":"GOST R 50840-95 (1995). Speech Transmission over Varies Communication Channels. Techniques for Measurements of Speech Quality, Intelligibility and Voice Identification."},{"key":"ref_6","first-page":"341","article-title":"Praat, a system for doing phonetics by computer","volume":"5","author":"Boersma","year":"2002","journal-title":"Glot Int."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sj\u00f6lander, K., and Beskow, J. (2000, January 16\u201320). Wavesurfer\u2014An open source speech tool. Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China.","DOI":"10.21437\/ICSLP.2000-849"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"052090","DOI":"10.1088\/1757-899X\/563\/5\/052090","article-title":"The Software System Implementation of Speech Command Recognizer under Intensive Background Nosie","volume":"563","author":"Song","year":"2019","journal-title":"IOP Conf. Ser. Mater. Sci. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Betkowska, A., Shinoda, K., and Furui, S. (2007). Robust speech recognition using factorial HMMs for home environments. Eurasip J. Adv. Signal Process., 20593.","DOI":"10.1155\/2007\/20593"},{"key":"ref_10","first-page":"264","article-title":"Performance of isolated and continuous digit recognition system using Kaldi toolkit","volume":"8","author":"Mahadevaswamy","year":"2019","journal-title":"Int. J. Recent Technol. Eng."},{"key":"ref_11","first-page":"22","article-title":"Creation and comparison of language and acoustic models using Kaldi for noisy and enhanced speech data","volume":"10","author":"Jayanna","year":"2018","journal-title":"Int. J. Intell. Syst. Appl."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"235","DOI":"10.2478\/jaiscr-2019-0006","article-title":"Performance Evaluation of Deep neural networks Applied to Speech Recognition: RNN, LSTM and GRU","volume":"9","author":"Shewalkar","year":"2019","journal-title":"J. Artif. Intell. Soft Comput. Res."},{"key":"ref_13","first-page":"1","article-title":"Phone recognition with hierarchical convolutional deep maxout networks","volume":"25","year":"2015","journal-title":"Eurasip J. Audio Speech Music Process."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"4813","DOI":"10.35940\/ijeat.F9110.088619","article-title":"ASR system for isolated words using ANN with back propagation and fuzzy based DWT","volume":"8","author":"Mendiratta","year":"2019","journal-title":"Int. J. Eng. Adv. Technol."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"James, P.E., Mun, H.K., and Vaithilingam, C.A. (2019). A hybrid spoken language processing system for smart device troubleshooting. Electronics, 8.","DOI":"10.3390\/electronics8060681"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Novoa, J., Wuth, J., Escudero, J.P., Fredes, J., Mahu, R., and Yoma, N.B. (2018, January 5\u20138). DNN-HMM based Automatic Speech Recognition for HRI Scenarios. Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction, Chicago, IL, USA.","DOI":"10.1145\/3171221.3171280"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, D., Wang, X., and Lv, S. (2019). An Overview of End-to-End 2000Automatic Speech Recognition. Symmetry, 11.","DOI":"10.3390\/sym11081018"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Wang, D., Wang, X., and Lv, S. (2019). End-to-End Mandarin Speech Recognition Combining CNN and BLSTM. Symmetry, 11.","DOI":"10.3390\/sym11050644"},{"key":"ref_19","unstructured":"Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., and Povey, D. (2009). The HTK Book (v3.4), Engineering Department, Cambridge University."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Trmal, J., Wiesner, M., Peddinti, V., Zhang, X., Ghahremani, P., Wang, Y., Manohar, V., Xu, H., Povey, D., and Khudanpur, S. (2017, January 20\u201324). The Kaldi OpenKWS System: Improving low resource keyword search. Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Stockholm, Sweden.","DOI":"10.21437\/Interspeech.2017-601"},{"key":"ref_21","unstructured":"Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., and Schwarz, P. (2019, October 14). The Kaldi Speech Recognition Toolkit. Available online: https:\/\/infoscience.epfl.ch\/record\/192584."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1109\/29.45616","article-title":"An Overview of the SPHINX Speech Recognition System","volume":"38","author":"Lee","year":"1990","journal-title":"IEEE Trans. Acoust. Speech Signal Process."},{"key":"ref_23","unstructured":"Kingma, D.P., and Ba, J.L. (2015, January 7\u20139). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015\u2014Conference Track Proceedings, San Diego, CA, USA."},{"key":"ref_24","unstructured":"(2019, October 14). Mathworks Homepage. Available online: https:\/\/www.mathworks.com\/."},{"key":"ref_25","unstructured":"(1981). IDEF: Integrated Computer-Aided Manufacturing (ICAM) Architecture, Part II (1981) Volume VI\u2014Function Modeling Manual, Wright-Patterson AFB. [3rd ed.]. USAF Report Number AFWAL-TR-81-4023."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Graves, A., Fernandez, S., Gomez, F., and Schmidhuber, J. (2006, January 25\u201329). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143891"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Scheidl, H., Fiel, S., and Sablatnig, R. (2018, January 5\u20138). Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm. Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR 2018), Niagara Falls, NY, USA.","DOI":"10.1109\/ICFHR-2018.2018.00052"},{"key":"ref_28","unstructured":"(2019, April 30). International Phonetic Association. Available online: https:\/\/www.internationalphoneticassociation.org."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2190","DOI":"10.1134\/S000511791412008X","article-title":"An automatic multimodal speech recognition system with audio and video information","volume":"75","author":"Karpov","year":"2014","journal-title":"Autom. Remote Control"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/12\/1447\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:37:15Z","timestamp":1760189835000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/12\/1447"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,25]]},"references-count":29,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2019,12]]}},"alternative-id":["sym11121447"],"URL":"https:\/\/doi.org\/10.3390\/sym11121447","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2019,11,25]]}}}