{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T07:02:18Z","timestamp":1777705338805,"version":"3.51.4"},"reference-count":22,"publisher":"SAGE Publications","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IFS"],"published-print":{"date-parts":[[2021,12,16]]},"abstract":"<jats:p>Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence of these letters in the corpus transcription affects the accuracy of speech recognition. In this paper, we investigate the effect of diactrics on the Arabic speech recognition based end-to-end deep learning. The applied end-to-end approach includes CNN-LSTM and attention-based technique presented in the state-of-the-art framework namely, Espresso using Pytorch. In addition, and to the best of our knowledge, the approach of CNN-LSTM with attention-based has not been used in the task of Arabic Automatic speech recognition (ASR). To fill this gap, this paper proposes a new approach based on CNN-LSTM with attention based method for Arabic ASR. The language model in this approach is trained using RNN-LM and LSTM-LM and based on nondiacritized transcription of the speech corpus. The Standard Arabic Single Speaker Corpus (SASSC), after omitting the diacritics, is used to train and test the deep learning model. Experimental results show that the removal of diacritics decreased out-of-vocabulary and perplexity of the language model. In addition, the word error rate (WER) is significantly improved when compared to diacritized data. The achieved average reduction in WER is 13.52%.<\/jats:p>","DOI":"10.3233\/jifs-202841","type":"journal-article","created":{"date-parts":[[2021,11,9]],"date-time":"2021-11-09T12:53:54Z","timestamp":1636462434000},"page":"6207-6219","source":"Crossref","is-referenced-by-count":13,"title":["Non-diacritized Arabic speech recognition based on CNN-LSTM and attention-based models"],"prefix":"10.1177","volume":"41","author":[{"given":"Hamzah A.","family":"Alsayadi","sequence":"first","affiliation":[{"name":"Computer Science Department, Faculty of Computer & Information Sciences, Ain Shams University, Egypt"},{"name":"Computer Science Department, Faculty of Sciences, Ibb University, Yemen"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abdelaziz A.","family":"Abdelhamid","sequence":"additional","affiliation":[{"name":"Computer Science Department, Faculty of Computer & Information Sciences, Ain Shams University, Egypt"},{"name":"College of Computing and Information Technology, Shaqra University, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Islam","family":"Hegazy","sequence":"additional","affiliation":[{"name":"Computer Science Department, Faculty of Computer & Information Sciences, Ain Shams University, Egypt"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zaki T.","family":"Fayed","sequence":"additional","affiliation":[{"name":"Computer Science Department, Faculty of Computer & Information Sciences, Ain Shams University, Egypt"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","reference":[{"issue":"1","key":"10.3233\/JIFS-202841_ref1","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1515\/comp-2019-0004","article-title":"Bidirectionaldeep architecture for arabic speech recognition","volume":"9","author":"Zerari","year":"2019","journal-title":"Open ComputerScience"},{"key":"10.3233\/JIFS-202841_ref4","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1162\/tacl_a_00254","article-title":"Analysis methods in neural languageprocessing: A survey","volume":"7","author":"Belinkov","year":"2019","journal-title":"Transactions of the Association forComputational Linguistics"},{"issue":"1","key":"10.3233\/JIFS-202841_ref8","doi-asserted-by":"crossref","first-page":"1261","DOI":"10.1515\/jisys-2018-0372","article-title":"A hybrid of deep cnn andbidirectional lstm for automatic speech recognition","volume":"29","author":"Passricha","year":"2020","journal-title":"Journal ofIntelligent Systems"},{"issue":"10","key":"10.3233\/JIFS-202841_ref9","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1109\/TASLP.2014.2339736","article-title":"Convolutional neural networks for speech recognition","volume":"22","author":"Abdel-Hamid","year":"2014","journal-title":"IEEE\/ACMTransactions on audio, speech, and language processing"},{"issue":"1","key":"10.3233\/JIFS-202841_ref10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.21608\/ijicis.2019.62603","article-title":"Preprocessing the egyptian arabicdialect for personality traits prediction","volume":"19","author":"Salim","year":"2019","journal-title":"InternationalJournal of Intelligent Computing and Information Sciences"},{"key":"10.3233\/JIFS-202841_ref11","first-page":"1","article-title":"Arabic speech recognition: Challengesand state of the art","volume":"4","author":"Abdou","year":"2018","journal-title":"Comput Linguistics Speech Image ProcessArabic Lang"},{"issue":"11","key":"10.3233\/JIFS-202841_ref13","doi-asserted-by":"crossref","first-page":"9043","DOI":"10.1007\/s13369-019-04024-0","article-title":"Diacritics effect on arabic speechrecognition","volume":"44","author":"Alshayeji","year":"2019","journal-title":"Arabian Journal for Science and Engineering"},{"key":"10.3233\/JIFS-202841_ref15","first-page":"349","article-title":"The ibm gale arabic asr system, in IEEE, pp. IV\u20132007 IEEEInternational Conference on Acoustics","volume":"4","author":"Soltau","year":"2007","journal-title":"Speech and SignalProcessing-ICASSP\u201907"},{"key":"10.3233\/JIFS-202841_ref18","unstructured":"Khatatneh K. , et al., A novel arabic speech recognition method usingneural networks and gaussian filtering, International Journalof Electrical, Electronics & Computer Systems 19(1) (2014)."},{"key":"10.3233\/JIFS-202841_ref22","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1142\/9789813229396_0011","article-title":"End-to-end lexicon freearabic speech recognition using recurrent neural networks","volume":"4","author":"Ahmed","year":"2018","journal-title":"Computational Linguistics, Speech And Image Processing For ArabicLanguage"},{"key":"10.3233\/JIFS-202841_ref23","doi-asserted-by":"crossref","first-page":"19143","DOI":"10.1109\/ACCESS.2019.2896880","article-title":"Speechrecognition using deep neural networks: A systematic review","volume":"7","author":"Nassif","year":"2019","journal-title":"IEEE Access"},{"issue":"4","key":"10.3233\/JIFS-202841_ref24","doi-asserted-by":"crossref","first-page":"41","DOI":"10.21608\/ijicis.2016.19823","article-title":"Reducing error rate of deeplearning using auto encoder and genetic algorithms","volume":"16","author":"Habeeb","year":"2016","journal-title":"International Journal of Intelligent Computing and InformationSciences"},{"issue":"2","key":"10.3233\/JIFS-202841_ref25","first-page":"1","article-title":"Architecture optimization model for thedeep neural network","volume":"19","author":"Ukaoha","year":"2019","journal-title":"International Journal of IntelligentComputing and Information Sciences"},{"issue":"7","key":"10.3233\/JIFS-202841_ref26","doi-asserted-by":"crossref","first-page":"1387","DOI":"10.3390\/w11071387","article-title":"Application of longshort-term memory (lstm) neural network for flood forecasting","volume":"11","author":"Le","year":"2019","journal-title":"Water"},{"issue":"8","key":"10.3233\/JIFS-202841_ref28","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.3390\/sym11081018","article-title":"An overview of end-to-end automaticspeech recognition","volume":"11","author":"Wang","year":"2019","journal-title":"Symmetry"},{"key":"10.3233\/JIFS-202841_ref29","doi-asserted-by":"crossref","unstructured":"Wang S. and Li G. , Overview of end-to-end speech recognition, Journal of Physics: Conference Series 1187(5) (2019).","DOI":"10.1088\/1742-6596\/1187\/5\/052068"},{"issue":"1","key":"10.3233\/JIFS-202841_ref30","doi-asserted-by":"crossref","first-page":"27","DOI":"10.21248\/jlcl.32.2017.213","article-title":"A survey and comparative study of arabicdiacritization tools","volume":"32","author":"Hamed","year":"2017","journal-title":"J Lang Technol Comput Linguistics"},{"key":"10.3233\/JIFS-202841_ref31","doi-asserted-by":"crossref","unstructured":"Hadjir I. , Abbache M. and Belkredim F.Z. , An approach for arabic diacritization, in International Conference on Applications of Natural Language to Information Systems, Springer, 2019, pp. 337\u2013344.","DOI":"10.1007\/978-3-030-23281-8_29"},{"issue":"8","key":"10.3233\/JIFS-202841_ref32","doi-asserted-by":"crossref","first-page":"2326","DOI":"10.3390\/s20082326","article-title":"Incorporating noise robustnessin speech command recognition by noise augmentation of trainingdata","volume":"20","author":"Pervaiz","year":"2020","journal-title":"Sensors"},{"issue":"4","key":"10.3233\/JIFS-202841_ref34","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1109\/TASSP.1980.1163420","article-title":"Comparison of parametricrepresentations for monosyllabic word recognition in continuouslyspoken sentences","volume":"28","author":"Davis","year":"1980","journal-title":"IEEE Transactions on Acoustics, Speech andSignal Processing"},{"issue":"5-6","key":"10.3233\/JIFS-202841_ref37","doi-asserted-by":"crossref","first-page":"602","DOI":"10.1016\/j.neunet.2005.06.042","article-title":"Framewise phoneme classification withbidirectional lstm and other neural network architectures","volume":"18","author":"Graves","year":"2005","journal-title":"Neural networks"},{"issue":"2","key":"10.3233\/JIFS-202841_ref49","doi-asserted-by":"crossref","first-page":"49","DOI":"10.36478\/ajit.2019.49.56","article-title":"Large vocabulary arabiccontinuous speech recognition using tied states acoustic models","volume":"18","author":"Azim","year":"2019","journal-title":"Asian J Inf Technol"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/JIFS-202841","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:43:34Z","timestamp":1777455814000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/JIFS-202841"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,16]]},"references-count":22,"journal-issue":{"issue":"6"},"URL":"https:\/\/doi.org\/10.3233\/jifs-202841","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,16]]}}}