{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T16:30:48Z","timestamp":1778603448398,"version":"3.51.4"},"reference-count":24,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2019,9,19]],"date-time":"2019-09-19T00:00:00Z","timestamp":1568851200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004569","name":"Ministerstwo Nauki i Szkolnictwa Wy\u017cszego","doi-asserted-by":"publisher","award":["020\/RID\/2018\/19"],"award-info":[{"award-number":["020\/RID\/2018\/19"]}],"id":[{"id":"10.13039\/501100004569","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>This work presents a new approach to speech recognition, based on the specific coding of time and frequency characteristics of speech. The research proposed the use of convolutional neural networks because, as we know, they show high resistance to cross-spectral distortions and differences in the length of the vocal tract. Until now, two layers of time convolution and frequency convolution were used. A novel idea is to weave three separate convolution layers: traditional time convolution and the introduction of two different frequency convolutions (mel-frequency cepstral coefficients (MFCC) convolution and spectrum convolution). This application takes into account more details contained in the tested signal. Our idea assumes creating patterns for sounds in the form of RGB (Red, Green, Blue) images. The work carried out research for isolated words and continuous speech, for neural network structure. A method for dividing continuous speech into syllables has been proposed. This method can be used for symmetrical stereo sound.<\/jats:p>","DOI":"10.3390\/sym11091185","type":"journal-article","created":{"date-parts":[[2019,9,19]],"date-time":"2019-09-19T11:02:01Z","timestamp":1568890921000},"page":"1185","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["A Method of Speech Coding for Speech Recognition Using a Convolutional Neural Network"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9651-9525","authenticated-orcid":false,"given":"Mariusz","family":"Kubanek","sequence":"first","affiliation":[{"name":"Faculty of Mechanical Engineering and Computer Science, Institute of Computer and Information Sciences, Czestochowa University of Technology, Dabrowskiego 73, 42-201 Czestochowa, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3345-604X","authenticated-orcid":false,"given":"Janusz","family":"Bobulski","sequence":"additional","affiliation":[{"name":"Faculty of Mechanical Engineering and Computer Science, Institute of Computer and Information Sciences, Czestochowa University of Technology, Dabrowskiego 73, 42-201 Czestochowa, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5012-6563","authenticated-orcid":false,"given":"Joanna","family":"Kulawik","sequence":"additional","affiliation":[{"name":"Faculty of Mechanical Engineering and Computer Science, Institute of Computer and Information Sciences, Czestochowa University of Technology, Dabrowskiego 73, 42-201 Czestochowa, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,9,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1109\/TSA.2005.848882","article-title":"Active learning: Theory and applications to automatic speech recognition","volume":"13","author":"Riccardi","year":"2005","journal-title":"IEEE Trans. Speech Andaudio Process."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Xiong, W., Wu, L., Alleva, F., Droppo, J., Huang, X., and Stolcke, A. (2018, January 15\u201320). The Microsoft 2017 Conversational Speech Recognition System. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461870"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1109\/TSA.2004.838537","article-title":"Unsupervised training of acoustic models for large vocabulary continuous speech recognition","volume":"13","author":"Wessel","year":"2005","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"974","DOI":"10.1109\/TASL.2009.2014894","article-title":"Approaches to Iterative Speech Feature Enhancement and Recognition","volume":"17","author":"Windmann","year":"2009","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/TASL.2011.2109382","article-title":"Acoustic Modeling Using Deep Belief Networks","volume":"20","author":"Mohamed","year":"2012","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Mitra, V., and Franco, H. (2015, January 13\u201317). Time-frequency convolutional networks for robust speech recognition. Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA.","DOI":"10.1109\/ASRU.2015.7404811"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yu, D., Seide, F., and Li, G. (2011, January 27\u201331). Conversational Speech Transcription Using Context-Dependent Deep Neural Networks. Proceedings of the 12th Annual Conference of the International Speech Communication Association, Florence, Italy.","DOI":"10.21437\/Interspeech.2011-169"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"T\u00fcske, Z., Golik, P., and Schl\u00fcter, R. (2014, January 14\u201318). Acoustic modeling with deep neural networks using raw time signal for LVCSR. Proceedings of the 15th Annual Conference of the International Speech Communication Association, Singapore.","DOI":"10.21437\/Interspeech.2014-223"},{"key":"ref_9","unstructured":"Arisoy, E., Sainath, T.N., Kingsbury, B., and Ramabhadran, B. (2012). Deep Neural Network Language Models. Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, Association for Computational Linguistics."},{"key":"ref_10","first-page":"2422","article-title":"Automatic Speech Recognition using different Neural Network Architectures\u2014A Survey","volume":"7","author":"Lekshmi","year":"2016","journal-title":"Int. J. Comput. Sci. Inf. Technol."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, S.-X., Liu, C., Yao, K., and Gong, Y. (2015, January 19\u201324). Deep neural support vector machines for speech recognition. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Queensland, Australia.","DOI":"10.1109\/ICASSP.2015.7178777"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sainath, T.N., Mohamed, A.R., Kingsbury, B., and Ramabhadran, B. (2013, January 26\u201331). Deep convolutional neural networks for LVCSR. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6639347"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mitra, V., Wang, W., Franco, H., Lei, Y., Bartels, C., and Graciarena, M. (2014, January 14\u201318). Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions. Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association, Singapore.","DOI":"10.21437\/Interspeech.2014-224"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Pratap, V., Hannun, A., Xu, Q., Cai, J., Kahn, J., Synnaeve, G., Liptchinsky, V., and Collobert, R. (2018). wav2letter++: The Fastest Open-source Speech Recognition System. arXiv.","DOI":"10.1109\/ICASSP.2019.8683535"},{"key":"ref_15","unstructured":"de Andrade, D.C. (2018, December 27). Recognizing Speech Commands Using Recurrent Neural Networks with Attention. Available online: https:\/\/towardsdatascience.com\/recognizing-speech-commands-using-recurrent-neural-networks-with-attention-c2b2ba17c837."},{"key":"ref_16","first-page":"1257","article-title":"RA Robust Frequency-Domain Method For Estimation of Intended Fundamental Frequency In Voice Analysis","volume":"7","author":"Andrade","year":"2018","journal-title":"Int. J. Innov. Sci. Res."},{"key":"ref_17","unstructured":"Krishna Gouda, S., Kanetkar, S., Harrison, D., and Warmuth, M.K. (2018). Speech Recognition: Keyword Spotting Through Image Recognition. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Abdel-Hamid, O., Deng, L., and Yu, D. (2013). Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition. Interspeech 2013, ISCA.","DOI":"10.21437\/Interspeech.2013-744"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2352","DOI":"10.1162\/neco_a_00990","article-title":"Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review","volume":"29","author":"Rawat","year":"2017","journal-title":"Neural Comput."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000006","article-title":"Learning Deep Architectures for AI","volume":"2","author":"Bengio","year":"2009","journal-title":"Found. Trends\u00ae Mach. Learn."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_22","first-page":"504","article-title":"Reducing the dimensionality of data with neural networks","volume":"313","author":"Hinton","year":"2006","journal-title":"Sci. Am. Assoc. Adv. Sci."},{"key":"ref_23","first-page":"307","article-title":"Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition","volume":"60","author":"Kubanek","year":"2012","journal-title":"Bull. Pol. Acad. Sci. Tech. Sci."},{"key":"ref_24","unstructured":"Kubanek, M., and Rydzek, S. (2008, January 22\u201326). A Hybrid Method of User Identification with Use Independent Speech and Facial Asymmetry. Proceedings of the 9th International Conference on Artificial Intelligence and Soft Computing (ICAISC 2008), Zakopane, Poland."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/9\/1185\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:21:44Z","timestamp":1760188904000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/9\/1185"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,19]]},"references-count":24,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2019,9]]}},"alternative-id":["sym11091185"],"URL":"https:\/\/doi.org\/10.3390\/sym11091185","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,9,19]]}}}