{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T18:36:20Z","timestamp":1767206180054,"version":"build-2238731810"},"reference-count":25,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T00:00:00Z","timestamp":1734480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005693","name":"Hainan University","doi-asserted-by":"crossref","award":["KYQD(ZR) 21014"],"award-info":[{"award-number":["KYQD(ZR) 21014"]}],"id":[{"id":"10.13039\/501100005693","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100005693","name":"Hainan University","doi-asserted-by":"crossref","award":["2021YFC3340800"],"award-info":[{"award-number":["2021YFC3340800"]}],"id":[{"id":"10.13039\/501100005693","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100005693","name":"Hainan University","doi-asserted-by":"crossref","award":["62177046"],"award-info":[{"award-number":["62177046"]}],"id":[{"id":"10.13039\/501100005693","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100005693","name":"Hainan University","doi-asserted-by":"crossref","award":["No.22YBA012"],"award-info":[{"award-number":["No.22YBA012"]}],"id":[{"id":"10.13039\/501100005693","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100005693","name":"Hainan University","doi-asserted-by":"crossref","award":["No.B23H10004"],"award-info":[{"award-number":["No.B23H10004"]}],"id":[{"id":"10.13039\/501100005693","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Plan project","award":["KYQD(ZR) 21014"],"award-info":[{"award-number":["KYQD(ZR) 21014"]}]},{"name":"National Key Research and Development Plan project","award":["2021YFC3340800"],"award-info":[{"award-number":["2021YFC3340800"]}]},{"name":"National Key Research and Development Plan project","award":["62177046"],"award-info":[{"award-number":["62177046"]}]},{"name":"National Key Research and Development Plan project","award":["No.22YBA012"],"award-info":[{"award-number":["No.22YBA012"]}]},{"name":"National Key Research and Development Plan project","award":["No.B23H10004"],"award-info":[{"award-number":["No.B23H10004"]}]},{"name":"National Natural Science Foundation project","award":["KYQD(ZR) 21014"],"award-info":[{"award-number":["KYQD(ZR) 21014"]}]},{"name":"National Natural Science Foundation project","award":["2021YFC3340800"],"award-info":[{"award-number":["2021YFC3340800"]}]},{"name":"National Natural Science Foundation project","award":["62177046"],"award-info":[{"award-number":["62177046"]}]},{"name":"National Natural Science Foundation project","award":["No.22YBA012"],"award-info":[{"award-number":["No.22YBA012"]}]},{"name":"National Natural Science Foundation project","award":["No.B23H10004"],"award-info":[{"award-number":["No.B23H10004"]}]},{"name":"High Performance Computing Center (HPC), Central South University","award":["KYQD(ZR) 21014"],"award-info":[{"award-number":["KYQD(ZR) 21014"]}]},{"name":"High Performance Computing Center (HPC), Central South University","award":["2021YFC3340800"],"award-info":[{"award-number":["2021YFC3340800"]}]},{"name":"High Performance Computing Center (HPC), Central South University","award":["62177046"],"award-info":[{"award-number":["62177046"]}]},{"name":"High Performance Computing Center (HPC), Central South University","award":["No.22YBA012"],"award-info":[{"award-number":["No.22YBA012"]}]},{"name":"High Performance Computing Center (HPC), Central South University","award":["No.B23H10004"],"award-info":[{"award-number":["No.B23H10004"]}]},{"name":"Social Science Foundation of Hunan Province","award":["KYQD(ZR) 21014"],"award-info":[{"award-number":["KYQD(ZR) 21014"]}]},{"name":"Social Science Foundation of Hunan Province","award":["2021YFC3340800"],"award-info":[{"award-number":["2021YFC3340800"]}]},{"name":"Social Science Foundation of Hunan Province","award":["62177046"],"award-info":[{"award-number":["62177046"]}]},{"name":"Social Science Foundation of Hunan Province","award":["No.22YBA012"],"award-info":[{"award-number":["No.22YBA012"]}]},{"name":"Social Science Foundation of Hunan Province","award":["No.B23H10004"],"award-info":[{"award-number":["No.B23H10004"]}]},{"name":"Key Laboratory of Seed Industry of Hainan Province","award":["KYQD(ZR) 21014"],"award-info":[{"award-number":["KYQD(ZR) 21014"]}]},{"name":"Key Laboratory of Seed Industry of Hainan Province","award":["2021YFC3340800"],"award-info":[{"award-number":["2021YFC3340800"]}]},{"name":"Key Laboratory of Seed Industry of Hainan Province","award":["62177046"],"award-info":[{"award-number":["62177046"]}]},{"name":"Key Laboratory of Seed Industry of Hainan Province","award":["No.22YBA012"],"award-info":[{"award-number":["No.22YBA012"]}]},{"name":"Key Laboratory of Seed Industry of Hainan Province","award":["No.B23H10004"],"award-info":[{"award-number":["No.B23H10004"]}]}],"content-domain":{"domain":["www.mdpi.com"],"crossmark-restriction":true},"short-container-title":["BDCC"],"abstract":"<jats:p>Speech recognition technology is an important branch in the field of artificial intelligence, aiming to transform human speech into computer-readable text information. However, speech recognition technology still faces many challenges, such as noise interference, and accent and speech rate differences. An aim of this paper is to explore a deep learning-based speech recognition method to improve the accuracy and robustness of speech recognition. Firstly, this paper introduces the basic principles of speech recognition and existing mainstream technologies, and then focuses on the deep learning-based speech recognition method. Through comparative experiments, it is found that the self-attention mechanism performs best in speech recognition tasks. In order to further improve speech recognition performance, this paper proposes a deep learning model based on the self-attention mechanism with DCNN-GRU. The model realizes the dynamic attention to an input speech by introducing the self-attention mechanism in a neural network model instead of an RNN and with a deep convolutional neural network, which improves the robustness and recognition accuracy of this model. This experiment uses 170 h of Chinese dataset AISHELL-1. Compared with the deep convolutional neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of at least 6% in CER. Compared with a bidirectional gated recurrent neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of 0.7% in CER. And finally, this experiment is performed on a test set analyzed the influencing factors affecting the CER. The experimental results show that this model exhibits good performance in various noise environments and accent conditions.<\/jats:p>","DOI":"10.3390\/bdcc8120195","type":"journal-article","created":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T11:17:22Z","timestamp":1734520642000},"page":"195","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Mandarin Recognition Based on Self-Attention Mechanism with Deep Convolutional Neural Network (DCNN)-Gated Recurrent Unit (GRU)"],"prefix":"10.3390","volume":"8","author":[{"given":"Xun","family":"Chen","sequence":"first","affiliation":[{"name":"School of Information and Communication Engineering, Hainan University, Haikou 570228, China"}]},{"given":"Chengqi","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information and Communication Engineering, Hainan University, Haikou 570228, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3606-2900","authenticated-orcid":false,"given":"Chao","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Central South University, Changsha 410083, China"}]},{"given":"Qin","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information and Communication Engineering, Hainan University, Haikou 570228, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,12,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"101055","DOI":"10.1016\/j.csl.2019.101055","article-title":"A survey on automatic speech recognition systems for Portuguese language and its variations","volume":"62","year":"2020","journal-title":"Comput. Speech Lang."},{"key":"ref_2","first-page":"131","article-title":"Chinese Speech Recognition Technology Based on Neural Network","volume":"45","author":"Wei","year":"2022","journal-title":"J. Sichuan Norm. Univ. (Nat. Sci. Ed.)"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Meng, J., Zhang, J., and Zhao, H. (2012, January 17\u201319). Overview of the Speech Recognition Technology. Proceedings of the 2012 Fourth International Conference on Computational and Information Sciences, Chongqing, China.","DOI":"10.1109\/ICCIS.2012.202"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1071","DOI":"10.1007\/s11831-019-09344-w","article-title":"A Survey of Deep Learning and Its Ap-plications: A New Paradigm to Machine Learning","volume":"27","author":"Dargan","year":"2020","journal-title":"Arch. Comput. Methods Eng."},{"key":"ref_5","first-page":"16","article-title":"A review on speech recognition technique","volume":"10","author":"Gaikwad","year":"2010","journal-title":"Int. J. Comput. Appl."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, D., Wang, X., and Lv, S. (2019). An overview of end-to-end automatic speech recognition. Symmetry, 11.","DOI":"10.3390\/sym11081018"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"113402","DOI":"10.1016\/j.eswa.2020.113402","article-title":"Optimisation of phonetic aware speech recogni-tion through multi-objective evolutionary algorithms","volume":"153","author":"Bird","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_8","first-page":"153","article-title":"Research and application of deep recurrent neural networks based voiceprint recognition","volume":"36","author":"Yu","year":"2019","journal-title":"J. Appl. Res. Comput."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Goh, Y.H., Lau, K.X., and Lee, Y.K. (2019, January 24\u201325). Audio visual speech recognition system using recurrent neural network. Proceedings of the 2019 4th International Conference on Information Technology (InCIT), Bangkok, Thailand.","DOI":"10.1109\/INCIT.2019.8912049"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhou, P., Yang, W., Chen, W., Wang, Y., and Jia, J. (2019, January 12\u201317). Modality attention for end-to-end au-dio-visual speech recognition. Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8683733"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Xie, X., Liu, X., Lee, T., Hu, S., and Wang, L. (2019, January 12\u201317). BLHUC: Bayesian learning of hidden unit con-tributions for deep neural network speaker adaptation. Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682667"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"e8","DOI":"10.1561\/116.00000050","article-title":"Recent advances in end-to-end automatic speech recognition","volume":"11","author":"Li","year":"2022","journal-title":"Apsipa Trans. Signal Inf. Process."},{"key":"ref_13","first-page":"749","article-title":"A real-time end-to-end multilingual speech recognition architecture","volume":"9","author":"Eustis","year":"2014","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hu, S., Lam, M.W., Xie, X., Liu, S., Yu, J., Wu, X., Liu, X., and Meng, H. (2019, January 12\u201317). Bayesian and Gaussian process neural networks for large vocabulary continuous speech recognition. Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682487"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Joy, N., Oglic, D., Cvetkovic, Z., Bell, P., and Renals, S. (2020, January 25\u201329). Deep scattering power spectrum features for robust speech recognition. Proceedings of the International Speech Communication Association, Virtual.","DOI":"10.21437\/Interspeech.2020-2656"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Pham, N.Q., Nguyen, T.S., Niehues, J., M\u00fcller, M., St\u00fcker, S., and Waibel, A. (2019). Very Deep Self-Attention Networks for End-to-End Speech Recognition. arXiv.","DOI":"10.21437\/Interspeech.2019-2702"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1109\/TVLSI.2019.2942267","article-title":"A training-efficient hybrid-structured deep neural network with reconfigurable memristive synapses","volume":"28","author":"Bai","year":"2020","journal-title":"IEEE Trans. Very Large Scale Integr. (VLSI) Syst."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3425","DOI":"10.1007\/s41870-022-00907-y","article-title":"Combining audio and visual speech recog-nition using LSTM and deep convolutional neural network","volume":"14","author":"Shashidhar","year":"2022","journal-title":"Int. J. Inf. Tecnol."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Oglic, D., Cvetkovic, Z., Bell, P., and Renals, S. (2020, January 25\u201329). A deep 2D convolutional network for wave-form-based speech recognition. Proceedings of the International Speech Communication Association, Virtual.","DOI":"10.21437\/Interspeech.2020-1870"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Loweimi, E., Bell, P., and Renals, S. (2019, January 15\u201319). On learning interpretable CNNs with parametric modulated Kernel-based filters. Proceedings of the International Speech Communication Association, Graz, Austria.","DOI":"10.21437\/Interspeech.2019-1257"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13636-019-0169-5","article-title":"Improving dysarthric speech recogni-tion using empirical mode decomposition and convolutional neural networks","volume":"2020","author":"Yakoub","year":"2020","journal-title":"EURASIP J. Audio Speech Music Process."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"9089","DOI":"10.1007\/s00521-020-05672-2","article-title":"Deep neural network architectures for dysarthric speech analysis and recognition","volume":"33","author":"Zaidi","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1240","DOI":"10.1109\/JSTSP.2017.2763455","article-title":"Hybrid CTC\/attention architecture for end-to-end speech recognition","volume":"11","author":"Watanabe","year":"2017","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Martinez, B., Ma, P., Petridis, S., and Pantic, M. (2020, January 4\u20138). Lipreading using temporal convolutional net-works. Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053841"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1007\/s12204-019-2147-6","article-title":"Joint CTC-Attention End-to-End Speech Recognition with a Triangle Recurrent Neural Network Encoder","volume":"25","author":"Zhu","year":"2020","journal-title":"J. Shanghai Jiaotong Univ. (Sci.)"}],"updated-by":[{"DOI":"10.3390\/bdcc9030060","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T00:00:00Z","timestamp":1734480000000}}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/12\/195\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,3]],"date-time":"2025-08-03T15:03:16Z","timestamp":1754233396000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/12\/195"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,18]]},"references-count":25,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["bdcc8120195"],"URL":"https:\/\/doi.org\/10.3390\/bdcc8120195","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,18]]}}}