{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T12:38:32Z","timestamp":1772627912661,"version":"3.50.1"},"reference-count":48,"publisher":"Walter de Gruyter GmbH","issue":"1","license":[{"start":{"date-parts":[[2019,1,1]],"date-time":"2019-01-01T00:00:00Z","timestamp":1546300800000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Nowadays, the real life constraints necessitates controlling modern machines using human intervention by means of sensorial organs. The voice is one of the human senses that can control\/monitor modern interfaces. In this context, Automatic Speech Recognition is principally used to convert natural voice into computer text as well as to perform an action based on the instructions given by the human. In this paper, we propose a general framework for Arabic speech recognition that uses Long Short-Term Memory (LSTM) and Neural Network (Multi-Layer Perceptron: MLP) classifier to cope with the nonuniform sequence length of the speech utterances issued fromboth feature extraction techniques, (1)Mel Frequency Cepstral Coefficients MFCC (static and dynamic features), (2) the Filter Banks (FB) coefficients. The neural architecture can recognize the isolated Arabic speech via classification technique. The proposed system involves, first, extracting pertinent features from the natural speech signal using MFCC (static and dynamic features) and FB. Next, the extracted features are padded in order to deal with the non-uniformity of the sequences length. Then, a deep architecture represented by a recurrent LSTM or GRU (Gated Recurrent Unit) architectures are used to encode the sequences of MFCC\/FB features as a fixed size vector that will be introduced to a Multi-Layer Perceptron network (MLP) to perform the classification (recognition). The proposed system is assessed using two different databases, the first one concerns the spoken digit recognition where a comparison with other related works in the literature is performed, whereas the second one contains the spoken TV commands. The obtained results show the superiority of the proposed approach.<\/jats:p>","DOI":"10.1515\/comp-2019-0004","type":"journal-article","created":{"date-parts":[[2019,4,21]],"date-time":"2019-04-21T09:01:12Z","timestamp":1555837272000},"page":"92-102","source":"Crossref","is-referenced-by-count":42,"title":["Bidirectional deep architecture for Arabic speech recognition"],"prefix":"10.1515","volume":"9","author":[{"given":"Naima","family":"Zerari","sequence":"first","affiliation":[{"name":"Laboratory of Automation and Manufacturing , Department of Industrial Engineering , University of Batna 2 Mostefa Ben Boulaid, Batna , 05000 , Algeria"}]},{"given":"Samir","family":"Abdelhamid","sequence":"additional","affiliation":[{"name":"Laboratory of Automation and Manufacturing , Department of Industrial Engineering , University of Batna 2 Mostefa Ben Boulaid, Batna , 05000 , Algeria"}]},{"given":"Hassen","family":"Bouzgou","sequence":"additional","affiliation":[{"name":"Department of Industrial Engineering , University of Batna 2 Mostefa Ben Boulaid, Batna , 05000 , Algeria"}]},{"given":"Christian","family":"Raymond","sequence":"additional","affiliation":[{"name":"INSA Rennes, IRISA\/INRIA , Rennes , France"}]}],"member":"374","published-online":{"date-parts":[[2019,4,20]]},"reference":[{"key":"2022042707443482322_j_comp-2019-0004_ref_001_w2aab3b7b3b1b6b1ab1ab1Aa","unstructured":"[1] Rabiner L. R., Juang B. H., Fundamentals of speech recognition, PTR Prentice Hall Englewood Cliffs, 1993"},{"key":"2022042707443482322_j_comp-2019-0004_ref_002_w2aab3b7b3b1b6b1ab1ab2Aa","unstructured":"[2] Jelinek F., Statistical methods for speech recognition, MIT press, 1997"},{"key":"2022042707443482322_j_comp-2019-0004_ref_003_w2aab3b7b3b1b6b1ab1ab3Aa","unstructured":"[3] Desai N., Dhameliya K., Desai V., Feature extraction and classifcation techniques for speech recognition: A review, International Journal of Emerging Technology and Advanced Engineering, 2013, 3(12), 367\u2013371"},{"key":"2022042707443482322_j_comp-2019-0004_ref_004_w2aab3b7b3b1b6b1ab1ab4Aa","unstructured":"[4] Ittichaichareon C., Suksri S., Yingthawornsuk T., Speech recognition using mfcc, International Conference on Computer Graphics, Simulation and Modeling, 2012, 28\u201329"},{"key":"2022042707443482322_j_comp-2019-0004_ref_005_w2aab3b7b3b1b6b1ab1ab5Aa","doi-asserted-by":"crossref","unstructured":"[5] Hochreiter S., Schmidhuber J., Long short-term memory, Neural computation, 1997, 9(8), 1735\u20131780937727610.1162\/neco.1997.9.8.17359377276","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"2022042707443482322_j_comp-2019-0004_ref_006_w2aab3b7b3b1b6b1ab1ab6Aa","doi-asserted-by":"crossref","unstructured":"[6] Lippmann R. P., Review of neural networks for speech recognition, Neural computation, 1989, 1(1), 1\u20133810.1162\/neco.1989.1.1.1","DOI":"10.1162\/neco.1989.1.1.1"},{"key":"2022042707443482322_j_comp-2019-0004_ref_007_w2aab3b7b3b1b6b1ab1ab7Aa","unstructured":"[7] Juang B. H., Rabiner L. R., Automatic Speech Recognition \u2013 A Brief History of the Technology Development, Georgia Institute of Technology, Atlanta, Rutgers University and the University of California, Santa Barbara, 200510.1016\/B0-08-044854-2\/00906-8"},{"key":"2022042707443482322_j_comp-2019-0004_ref_008_w2aab3b7b3b1b6b1ab1ab8Aa","unstructured":"[8] Anusuya M. A., Katti S. K., Speech recognition by machine, a review, arXiv preprint arXiv:1001.2267, 2010"},{"key":"2022042707443482322_j_comp-2019-0004_ref_009_w2aab3b7b3b1b6b1ab1ab9Aa","doi-asserted-by":"crossref","unstructured":"[9] Saeed K., Nammous M. K., A speech-and-speaker identifcation system: feature extraction, description, and classification of speech signal image, IEEE transactions on industrial electronics, 2007, 54(2), 887\u201389710.1109\/TIE.2007.891647","DOI":"10.1109\/TIE.2007.891647"},{"key":"2022042707443482322_j_comp-2019-0004_ref_010_w2aab3b7b3b1b6b1ab1ac10Aa","doi-asserted-by":"crossref","unstructured":"[10] Hammami N., Sellam M., Tree distribution classifier for automatic spoken arabic digit recognition, IEEE International Conference for Internet Technology and Secured Transactions, 2009, 1\u2013410.1109\/ICITST.2009.5402575","DOI":"10.1109\/ICITST.2009.5402575"},{"key":"2022042707443482322_j_comp-2019-0004_ref_011_w2aab3b7b3b1b6b1ab1ac11Aa","doi-asserted-by":"crossref","unstructured":"[11] Hammami N., Bedda M., Improved tree model for arabic speech recognition, International Conference on Computer Science and Information Technology, 2010, (5), 521\u201352610.1109\/ICCSIT.2010.5563892","DOI":"10.1109\/ICCSIT.2010.5563892"},{"key":"2022042707443482322_j_comp-2019-0004_ref_012_w2aab3b7b3b1b6b1ab1ac12Aa","doi-asserted-by":"crossref","unstructured":"[12] Daqrouq K., Alfaouri M., Alkhateeb A., Khalaf E., Morfeq A., Wavelet lpc with neural network for spoken arabic digits recognition system, British Journal of Applied Science & Technology, 2014, 4(8), 1238\u2013125510.9734\/BJAST\/2014\/6034","DOI":"10.9734\/BJAST\/2014\/6034"},{"key":"2022042707443482322_j_comp-2019-0004_ref_013_w2aab3b7b3b1b6b1ab1ac13Aa","doi-asserted-by":"crossref","unstructured":"[13] Satori H., Harti M., Chenfour N., Introduction to arabic speech recognition using cmu sphinx system, arXiv preprint arXiv:0704.2083, 200710.1109\/ISCIII.2007.367358","DOI":"10.1109\/ISCIII.2007.367358"},{"key":"2022042707443482322_j_comp-2019-0004_ref_014_w2aab3b7b3b1b6b1ab1ac14Aa","doi-asserted-by":"crossref","unstructured":"[14] LeCun Y., Bengio Y., Hinton G., Deep learning, nature, 2015, 521, 436\u201344410.1038\/nature1453926017442","DOI":"10.1038\/nature14539"},{"key":"2022042707443482322_j_comp-2019-0004_ref_015_w2aab3b7b3b1b6b1ab1ac15Aa","doi-asserted-by":"crossref","unstructured":"[15] Graves A., Mohamed A. R., Hinton, G., Speech recognition with deep recurrent neural networks, IEEE International conference on acoustics, speech and signal processing, 2013, 6645\u2013664910.1109\/ICASSP.2013.6638947","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"2022042707443482322_j_comp-2019-0004_ref_016_w2aab3b7b3b1b6b1ab1ac16Aa","doi-asserted-by":"crossref","unstructured":"[16] Dahl G. E., Yu D., Deng, L., Acero A., Context-dependent pretrained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on audio, speech, and language processing, 2012, 20(1), 30\u20134210.1109\/TASL.2011.2134090","DOI":"10.1109\/TASL.2011.2134090"},{"key":"2022042707443482322_j_comp-2019-0004_ref_017_w2aab3b7b3b1b6b1ab1ac17Aa","doi-asserted-by":"crossref","unstructured":"[17] Hinton G., Deng L., Yu D., Dahl G.,Mohamed A. R., Jaitly N., et al., Deep neural networks for acoustic modeling in speech recognition, IEEE Signal processing magazine, 2012, 29(6), 82\u20139710.1109\/MSP.2012.2205597","DOI":"10.1109\/MSP.2012.2205597"},{"key":"2022042707443482322_j_comp-2019-0004_ref_018_w2aab3b7b3b1b6b1ab1ac18Aa","doi-asserted-by":"crossref","unstructured":"[18] Ali A., Zhang Y., Cardinal P., Dahak N., Vogel S., Glass J., A complete kaldi recipe for building arabic speech recognition systems, IEEE spoken language technology workshop, 2014, 525\u201352910.1109\/SLT.2014.7078629","DOI":"10.1109\/SLT.2014.7078629"},{"key":"2022042707443482322_j_comp-2019-0004_ref_019_w2aab3b7b3b1b6b1ab1ac19Aa","doi-asserted-by":"crossref","unstructured":"[19] Ali A., Bell P., Glass J., Messaoui Y., Mubarak H., Renals S., et al., The MGB-2 challenge: Arabic multi-dialect broadcast media recognition, IEEE Spoken Language Technology Workshop, 2016, 279\u201328410.1109\/SLT.2016.7846277","DOI":"10.1109\/SLT.2016.7846277"},{"key":"2022042707443482322_j_comp-2019-0004_ref_020_w2aab3b7b3b1b6b1ab1ac20Aa","doi-asserted-by":"crossref","unstructured":"[20] Ali A., Vogel S., Renals S., Speech recognition challenge in the wild: Arabic MGB-3, IEEE Automatic Speech Recognition and Understanding Workshop, 2017, 316\u201332210.1109\/ASRU.2017.8268952","DOI":"10.1109\/ASRU.2017.8268952"},{"key":"2022042707443482322_j_comp-2019-0004_ref_021_w2aab3b7b3b1b6b1ab1ac21Aa","doi-asserted-by":"crossref","unstructured":"[21] Afify M., Nguyen L., Xiang B., Abdou S., Makhoul J., Recent progress in Arabic broadcast news transcription at BBN, Ninth European Conference on Speech Communication and Technology, 200510.21437\/Interspeech.2005-537","DOI":"10.21437\/Interspeech.2005-537"},{"key":"2022042707443482322_j_comp-2019-0004_ref_022_w2aab3b7b3b1b6b1ab1ac22Aa","doi-asserted-by":"crossref","unstructured":"[22] Manohar V., Povey D., Khudanpur S., JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning, Automatic Speech Recognition and Understanding Workshop, 2017, 346\u201335210.1109\/ASRU.2017.8268956","DOI":"10.1109\/ASRU.2017.8268956"},{"key":"2022042707443482322_j_comp-2019-0004_ref_023_w2aab3b7b3b1b6b1ab1ac23Aa","unstructured":"[23] Young S. J., Young S., The HTK hiddenMarkov model toolkit: Design and philosophy, University of Cambridge, Department of Engineering, 1993"},{"key":"2022042707443482322_j_comp-2019-0004_ref_024_w2aab3b7b3b1b6b1ab1ac24Aa","doi-asserted-by":"crossref","unstructured":"[24] Davis S., Mermelstein P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, Transactions on acoustics, speech, and signal processing, 1980, 28(4), 357\u201336610.1109\/TASSP.1980.1163420","DOI":"10.1109\/TASSP.1980.1163420"},{"key":"2022042707443482322_j_comp-2019-0004_ref_025_w2aab3b7b3b1b6b1ab1ac25Aa","doi-asserted-by":"crossref","unstructured":"[25] Wang J. C.,Wang J. F., Weng Y. S., Chip design of MFCC extraction for speech recognition, Integration the VLSI journal, 2002, 32(1-2), 111\u201313110.1016\/S0167-9260(02)00045-7","DOI":"10.1016\/S0167-9260(02)00045-7"},{"key":"2022042707443482322_j_comp-2019-0004_ref_026_w2aab3b7b3b1b6b1ab1ac26Aa","doi-asserted-by":"crossref","unstructured":"[26] Lalitha S., Geyasruti D., Narayanan R., Shravani M., Emotion detection using MFCC and cepstrum features, Procedia Computer Science, 2015, 70, 29\u20133510.1016\/j.procs.2015.10.020","DOI":"10.1016\/j.procs.2015.10.020"},{"key":"2022042707443482322_j_comp-2019-0004_ref_027_w2aab3b7b3b1b6b1ab1ac27Aa","doi-asserted-by":"crossref","unstructured":"[27] Ai O. C., Hariharan M., Yaacob S., Chee L. S., Classification of speech dysfluencies with MFCC and LPCC features, Expert Systems with Applications, 2012, 39(2), 2157\u2013216510.1016\/j.eswa.2011.07.065","DOI":"10.1016\/j.eswa.2011.07.065"},{"key":"2022042707443482322_j_comp-2019-0004_ref_028_w2aab3b7b3b1b6b1ab1ac28Aa","unstructured":"[28] Al-Anzi F. S., AbuZeina D., The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition, International Journal of Computer and Information Engineering, 2017, 11(10), 1162\u20131166"},{"key":"2022042707443482322_j_comp-2019-0004_ref_029_w2aab3b7b3b1b6b1ab1ac29Aa","unstructured":"[29] Rabiner L. R., Schafer R. W., Theory and applications of digital speech processing, Upper Saddle River, NJ: Pearson, 2011, 64"},{"key":"2022042707443482322_j_comp-2019-0004_ref_030_w2aab3b7b3b1b6b1ab1ac30Aa","unstructured":"[30] Furui S., Speaker-independent isolated word recognition based on emphasized spectral dynamics, International Conference on Acoustics, speech and Signal Processing, 1986, 1991\u20131994"},{"key":"2022042707443482322_j_comp-2019-0004_ref_031_w2aab3b7b3b1b6b1ab1ac31Aa","doi-asserted-by":"crossref","unstructured":"[31] Kumar K., Kim C., Stern R. M., Delta-spectral cepstral coefficients for robust speech recognition, IEEE international conference on acoustics, speech and signal processing, 2011, 4784\u2013478710.1109\/ICASSP.2011.5947425","DOI":"10.1109\/ICASSP.2011.5947425"},{"key":"2022042707443482322_j_comp-2019-0004_ref_032_w2aab3b7b3b1b6b1ab1ac32Aa","doi-asserted-by":"crossref","unstructured":"[32] San-Segundo R., Montero J. M., Barra-Chicote R., Fern\u00e1ndez F., Pardo, J. M., Feature extraction fromsmartphone inertial signals for human activity segmentation, Signal Processing, 2016, 120, 359\u201337210.1016\/j.sigpro.2015.09.029","DOI":"10.1016\/j.sigpro.2015.09.029"},{"key":"2022042707443482322_j_comp-2019-0004_ref_033_w2aab3b7b3b1b6b1ab1ac33Aa","doi-asserted-by":"crossref","unstructured":"[33] Graves A., Fern\u00e1ndez S., Gomez F., Schmidhuber J., Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, The 23rd International Conference on Machine Learning ACM, 2006, 369\u201337610.1145\/1143844.1143891","DOI":"10.1145\/1143844.1143891"},{"key":"2022042707443482322_j_comp-2019-0004_ref_034_w2aab3b7b3b1b6b1ab1ac34Aa","doi-asserted-by":"crossref","unstructured":"[34] Vukotic V., Raymond C., Gravier G., A step beyond local observations with a dialog aware bidirectional GRU network for Spoken Language Understanding, 17th Annual Conference of the International Speech Communication Association, 2016, 3241\u2013324410.21437\/Interspeech.2016-1301","DOI":"10.21437\/Interspeech.2016-1301"},{"key":"2022042707443482322_j_comp-2019-0004_ref_035_w2aab3b7b3b1b6b1ab1ac35Aa","unstructured":"[35] Chung J., Gulcehre C., Cho K., Bengio Y., Gated feedback recurrent neural networks, International Conference on Machine Learning, 2015, 2067\u20132075"},{"key":"2022042707443482322_j_comp-2019-0004_ref_036_w2aab3b7b3b1b6b1ab1ac36Aa","unstructured":"[36] Yuan Gao., Dorota Glowacka., Deep gate recurrent neural network, Asian Conference on Machine Learning, 2016, 350\u2013365"},{"key":"2022042707443482322_j_comp-2019-0004_ref_037_w2aab3b7b3b1b6b1ab1ac37Aa","doi-asserted-by":"crossref","unstructured":"[37] Graves A., Jaitly N., Mohamed A. R., Hybrid speech recognition with deep bidirectional LSTM, IEEE workshop on automatic speech recognition and understanding, 2013, 273\u201327810.1109\/ASRU.2013.6707742","DOI":"10.1109\/ASRU.2013.6707742"},{"key":"2022042707443482322_j_comp-2019-0004_ref_038_w2aab3b7b3b1b6b1ab1ac38Aa","unstructured":"[38] Huang Z., Xu W., Yu K., Bidirectional LSTM-CRF models for sequence tagging, arXiv preprint arXiv:1508.01991, 2015"},{"key":"2022042707443482322_j_comp-2019-0004_ref_039_w2aab3b7b3b1b6b1ab1ac39Aa","unstructured":"[39] Duda R. O., Hart P. E., Stork D. G., Pattern classification, John Wiley & Sons, 2012"},{"key":"2022042707443482322_j_comp-2019-0004_ref_040_w2aab3b7b3b1b6b1ab1ac40Aa","unstructured":"[40] Haykin S. S., Neural networks and learning machines, Pearson Education, Upper Saddle River, NJ, 2009"},{"key":"2022042707443482322_j_comp-2019-0004_ref_041_w2aab3b7b3b1b6b1ab1ac41Aa","unstructured":"[41] Bishop C. M., Neural networks for pattern recognition. Oxford University Press, 199510.1201\/9781420050646.ptb6"},{"key":"2022042707443482322_j_comp-2019-0004_ref_042_w2aab3b7b3b1b6b1ab1ac42Aa","unstructured":"[42] Lichman M., UCIMachine Learning Repository, University of California, http:\/\/archive.ics.uci.edu\/ml, 2013"},{"key":"2022042707443482322_j_comp-2019-0004_ref_043_w2aab3b7b3b1b6b1ab1ac43Aa","unstructured":"[43] Chollet F., Keras: The python deep learning library, Astrophysics Source Code Library, 2018"},{"key":"2022042707443482322_j_comp-2019-0004_ref_044_w2aab3b7b3b1b6b1ab1ac44Aa","doi-asserted-by":"crossref","unstructured":"[44] Jiang H., Confidence measures for speech recognition: A survey, Speech communication, 2005, 45(4), 455\u201347010.1016\/j.specom.2004.12.004","DOI":"10.1016\/j.specom.2004.12.004"},{"key":"2022042707443482322_j_comp-2019-0004_ref_045_w2aab3b7b3b1b6b1ab1ac45Aa","unstructured":"[45] Bouzgou H., Automatic Analysis of High dimensional Signals: Advanced Wind Speed Forecasting Techniques, Lambert Academic Publishing, 2012"},{"key":"2022042707443482322_j_comp-2019-0004_ref_046_w2aab3b7b3b1b6b1ab1ac46Aa","doi-asserted-by":"crossref","unstructured":"[46] Zerari N., Abdelhamid S., Bouzgou H., Raymond C., Bidirectional recurrent end-to-end neural network classifier for spoken Arab digit recognition, International Conference on Natural Language and Speech Processing, 2018, 1\u2013610.1109\/ICNLSP.2018.8374374","DOI":"10.1109\/ICNLSP.2018.8374374"},{"key":"2022042707443482322_j_comp-2019-0004_ref_047_w2aab3b7b3b1b6b1ab1ac47Aa","unstructured":"[47] Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R., Dropout: a simpleway to prevent neural networks from over-fitting, Journal of Machine Learning Research, 2014, 15(1), 1929\u20131958"},{"key":"2022042707443482322_j_comp-2019-0004_ref_048_w2aab3b7b3b1b6b1ab1ac48Aa","doi-asserted-by":"crossref","unstructured":"[48] Sahidullah M., Saha, G., Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition, Speech Communication, 2012, 54(4), 543\u201356510.1016\/j.specom.2011.11.004","DOI":"10.1016\/j.specom.2011.11.004"}],"container-title":["Open Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyter.com\/view\/journals\/comp\/9\/1\/article-p92.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2019-0004\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2019-0004\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,27]],"date-time":"2022-04-27T08:20:04Z","timestamp":1651047604000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2019-0004\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,1]]},"references-count":48,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,9,26]]},"published-print":{"date-parts":[[2019,1,1]]}},"alternative-id":["10.1515\/comp-2019-0004"],"URL":"https:\/\/doi.org\/10.1515\/comp-2019-0004","relation":{},"ISSN":["2299-1093"],"issn-type":[{"value":"2299-1093","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,1,1]]}}}