{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T20:22:01Z","timestamp":1781382121386,"version":"3.54.1"},"reference-count":47,"publisher":"Wiley","license":[{"start":{"date-parts":[[2024,4,29]],"date-time":"2024-04-29T00:00:00Z","timestamp":1714348800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Journal of Electrical and Computer Engineering"],"published-print":{"date-parts":[[2024,4,29]]},"abstract":"<jats:p>Automatic speech recognition (ASR) is a field of research that focuses on the ability of computers to process and interpret speech feedback from humans and to provide the highest degree of accuracy in recognition. Speech is one of the simplest ways to convey a message in a basic context, and ASR refers to the ability of machines to process and accept speech data from humans with the greatest degree of accuracy. As the human-to-machine interface continues to evolve, speech recognition is expected to become increasingly important. However, the Arabic language has distinct features that set it apart from other languages, such as the dialect and the pronunciation of words. Until now, insufficient attention has been devoted to continuous Arabic speech recognition research for independent speakers with a limited database. This research proposed two techniques for the recognition of Arabic speech. The first uses a combination of convolutional neural network (CNN) and long short-term memory (LSTM) encoders, and an attention-based decoder, and the second is based on the Sphinx-4 recognizer, which includes pocket sphinx, base sphinx, and sphinx train, with various types and number of features to be extracted (filter bank and mel frequency cepstral coefficients (MFCC)) based on the CMU Sphinx tool, which generates a language model for different sentences spoken by different speakers. These approaches were tested on a dataset containing 7\u2009hours of spoken Arabic from 11 Arab countries, covering the Levant, Gulf, and African regions, which make up the Arab world, and achieved promising results. CNN-LSTM achieved a word error rate (WER) of 3.63% using 120 features for filter bank and 4.04% WER using 39 features for MFCC, respectively, while the Sphinx-4 recognizer technique achieved 8.17% WER and an accuracy of 91.83% using 25 features for MFCC and 8 Gaussian mixtures, respectively, when tested on the same benchmark dataset.<\/jats:p>","DOI":"10.1155\/2024\/4976944","type":"journal-article","created":{"date-parts":[[2024,4,29]],"date-time":"2024-04-29T19:50:05Z","timestamp":1714420205000},"page":"1-11","source":"Crossref","is-referenced-by-count":3,"title":["Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers"],"prefix":"10.1155","volume":"2024","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-6428-1890","authenticated-orcid":true,"given":"Sally A.","family":"Sayed","sequence":"first","affiliation":[{"name":"Department of Computer Science, Faculty of Computers & Artificial Intelligence, Fayoum University, El Fayoum 63514, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rania","family":"Ahmed Abdel Azeem Abul Seoud","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Faculty of Engineering, Fayoum University, El Fayoum 63514, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Howida Y.","family":"Abdel Naby","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Computers & Artificial Intelligence, Fayoum University, El Fayoum 63514, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"311","reference":[{"key":"1","doi-asserted-by":"publisher","DOI":"10.1155\/2023\/7398538"},{"key":"2","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-020-09505-5"},{"key":"3","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2004-495","volume-title":"Morphology-based Language Modeling for Arabic Speech Recognition","author":"D. Vergyri","year":"2004"},{"issue":"2","key":"4","article-title":"Investigation Arabic speech recognition using CMU sphinx system","volume":"6","author":"H. Satori","year":"2009","journal-title":"The International Arab Journal of Information Technology"},{"key":"5","first-page":"156","article-title":"Advances in dialectal Arabic speech recognition: a study using twitter to improve egyptian asr","author":"A. Ali"},{"key":"6","doi-asserted-by":"crossref","article-title":"Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera","author":"P. Cardinal","DOI":"10.21437\/Interspeech.2014-474"},{"key":"7","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2011.12.001"},{"key":"8","doi-asserted-by":"publisher","DOI":"10.21248\/jlcl.32.2017.213"},{"key":"9","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1007\/978-3-030-23281-8_29","article-title":"An approach for Arabic diacritization","volume-title":"Natural Language Processing and Information Systems: 24th International Conference on Applications of Natural Language to Information Systems, NLDB 2019","author":"I. Hadjir","year":"2019"},{"key":"10","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/6825555"},{"key":"11","doi-asserted-by":"publisher","DOI":"10.21437\/interspeech.2019-2599"},{"key":"12","doi-asserted-by":"crossref","article-title":"Exploring how deep neural networks form phonemic categories","author":"T. Nagamine","DOI":"10.21437\/Interspeech.2015-422"},{"key":"13","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2016-1406","volume-title":"On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models","author":"T. Nagamine","year":"2016"},{"key":"14","first-page":"389","article-title":"End-to-end speech recognition with word-based RNN language models","author":"T. Hori"},{"key":"15","doi-asserted-by":"publisher","DOI":"10.1186\/s13636-015-0058-5"},{"key":"16","first-page":"83","article-title":"The floating Arabic dictionary: an automatic method for updating a lexical database through the detection and lemmatization of unknown words","author":"M. Attia"},{"key":"17","first-page":"231","article-title":"End-to-end lexicon free Arabic speech recognition using recurrent neural networks","volume-title":"Computational Linguistics, Speech and Image Processing for Arabic Language","author":"A. Ahmed","year":"2019"},{"issue":"1","key":"18","article-title":"A novel Arabic Speech Recognition method using neural networks and Gaussian Filtering","volume":"19","author":"K. Khatatneh","year":"2014","journal-title":"International Journal of Electrical, Electronics and Computer Systems"},{"key":"19","first-page":"8435","article-title":"Morpheme-based feature-rich language models using deep neural networks for lvcsr of Egyptian Arabic","author":"A. E.-D. Mousa"},{"key":"20","first-page":"299","article-title":"Development of the MIT ASR system for the 2016 Arabic multi-genre broadcast challenge","author":"T. AlHanai"},{"key":"21","first-page":"525","article-title":"A complete KALDI recipe for building Arabic speech recognition systems","author":"A. Ali"},{"key":"22","first-page":"192","article-title":"Maxout based deep neural networks for Arabic phonemes recognition","author":"A. AbdAlmisreb"},{"key":"23","doi-asserted-by":"publisher","DOI":"10.36478\/ajit.2019.49.56"},{"key":"24","doi-asserted-by":"publisher","DOI":"10.1049\/sil2.12057"},{"key":"25","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/d18-2012"},{"key":"26","doi-asserted-by":"publisher","DOI":"10.1121\/1.425019"},{"issue":"6","key":"27","first-page":"479","article-title":"Review of feature extraction techniques in automatic speech recognition","volume":"2","author":"T. S. Shanthi","year":"2013","journal-title":"International Journal of Scientific Engineering and Technology"},{"key":"28","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/7186375"},{"key":"29","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-020-10073-7"},{"key":"30","doi-asserted-by":"publisher","DOI":"10.1155\/2019\/4203821"},{"key":"31","article-title":"End-to-end continuous speech recognition using attention-based recurrent nn: first results","author":"J. Chorowski","year":"2014"},{"key":"32","first-page":"4845","article-title":"Very deep convolutional networks for end-to-end speech recognition","author":"Y. Zhang"},{"key":"33","doi-asserted-by":"publisher","DOI":"10.21437\/interspeech.2018-1456"},{"key":"34","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1007\/11550907_126","article-title":"Bidirectional LSTM networks for improved phoneme classification and recognition","volume-title":"Artificial Neural Networks: Formal Models and Their Applications\u2013ICANN 2005: 15th International Conference","author":"A. Graves","year":"2005"},{"key":"35","article-title":"Scheduled sampling for sequence prediction with recurrent neural networks","volume":"28","author":"S. Bengio","year":"2015","journal-title":"Advances in Neural Information Processing Systems"},{"key":"36","first-page":"2818","article-title":"Rethinking the inception architecture for computer vision","author":"C. Szegedy"},{"key":"37","article-title":"Towards better decoding and language model integration in sequence to sequence models","author":"J. Chorowski","year":"2016"},{"key":"38","article-title":"Neural machine translation by jointly learning to align and translate","author":"D. Bahdanau","year":"2014"},{"key":"39","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/d15-1166"},{"key":"40","article-title":"Arabic speech recognition system using cmu-sphinx4","author":"H. Satori","year":"2007"},{"key":"41","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/1346543"},{"key":"42","first-page":"2","article-title":"The CMU SPHINX-4 speech recognition system","volume":"1","author":"P. Lamere","year":"2003","journal-title":"IEEE International Conference on acoustics, speech and signal processing (icassp 2003), hong kong"},{"key":"43","doi-asserted-by":"publisher","DOI":"10.9790\/9622-0703022024"},{"key":"44","article-title":"Database of Arabic sounds: sentences","author":"M. Alghamdi","year":"2003"},{"key":"45","article-title":"Phonetic inventory for an Arabic speech corpus","author":"N. Halabi","year":"2016"},{"key":"46","doi-asserted-by":"publisher","DOI":"10.1109\/itsim.2010.5561391"},{"issue":"1","key":"47","first-page":"84","article-title":"Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus","volume":"9","author":"M. A.-A. M. Abushariah","year":"2012","journal-title":"The International Arab Journal of Information Technology"}],"container-title":["Journal of Electrical and Computer Engineering"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/jece\/2024\/4976944.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/jece\/2024\/4976944.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/jece\/2024\/4976944.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,9]],"date-time":"2024-05-09T09:57:22Z","timestamp":1715248642000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.hindawi.com\/journals\/jece\/2024\/4976944\/"}},"subtitle":[],"editor":[{"given":"Mustafa","family":"Sameer","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2024,4,29]]},"references-count":47,"alternative-id":["4976944","4976944"],"URL":"https:\/\/doi.org\/10.1155\/2024\/4976944","relation":{},"ISSN":["2090-0155","2090-0147"],"issn-type":[{"value":"2090-0155","type":"electronic"},{"value":"2090-0147","type":"print"}],"subject":[],"published":{"date-parts":[[2024,4,29]]}}}