{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T05:30:38Z","timestamp":1765517438526,"version":"3.48.0"},"reference-count":32,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T00:00:00Z","timestamp":1765324800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>This work investigates how different language modeling techniques affect the performance of an end-to-end automatic speech recognition (ASR) system for the Amazigh language. A (CNN-BiLSTM-CTC) model enhanced with an attention mechanism was used as the baseline. During decoding, two external language models were integrated using shallow fusion: a trigram N-gram model built with KenLM and a recurrent neural network language model (RNN-LM) trained on the same Tifdigit corpus. Four decoding methods were compared: greedy decoding; beam search; beam search with an N-gram language model; and beam search with a compact recurrent neural network language model. Experimental results on the Tifdigit dataset reveal a clear trade-off: the N-gram language model produces the best results compared to RNN-LM, with a phonetic error rate (PER) of 0.0268, representing a relative improvement of 4.0% over the greedy baseline model, and translates into an accuracy of 97.32%. This suggests that N-gram models can outperform neural approaches when reliable, limited data and lexical resources are available. The improved N-gram approach notably outperformed both simple beam search and the RNN neural language model. This improvement is due to higher-order context modeling, its optimized interpolation weights, and its adaptive lexical weighting tailored to the phonotactic structure of the Amazigh language.<\/jats:p>","DOI":"10.3390\/make7040164","type":"journal-article","created":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T16:11:18Z","timestamp":1765383078000},"page":"164","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["N-Gram and RNN-LM Language Model Integration for End-to-End Amazigh Speech Recognition"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2940-8248","authenticated-orcid":false,"given":"Meryam","family":"Telmem","sequence":"first","affiliation":[{"name":"The Higher School of Technology in Meknes, Moulay Ismail University, Meknes 50000, Morocco"}]},{"given":"Naouar","family":"Laaidi","sequence":"additional","affiliation":[{"name":"Faculty of Sciences Dhar Mahraz, Sidi Mohamed Ben Abdellah University, Fes 30000, Morocco"}]},{"given":"Youssef","family":"Ghanou","sequence":"additional","affiliation":[{"name":"The Higher School of Technology in Meknes, Moulay Ismail University, Meknes 50000, Morocco"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7393-5726","authenticated-orcid":false,"given":"Hassan","family":"Satori","sequence":"additional","affiliation":[{"name":"Faculty of Sciences Dhar Mahraz, Sidi Mohamed Ben Abdellah University, Fes 30000, Morocco"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, J. (2022). Recent advances in end-to-end automatic speech recognition. APSIPA Trans. Signal Inf. Process., 11.","DOI":"10.1561\/116.00000050"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kandji, A.K., Ba, C., and Ndiaye, S. (2023). State-of-the-Art Review on Recent Trends in Automatic Speech Recognition. International Conference on Emerging Technologies for Developing Countries, Springer Nature.","DOI":"10.1007\/978-3-031-63999-9_11"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1109\/TASLP.2023.3328283","article-title":"End-to-end speech recognition: A survey","volume":"32","author":"Prabhavalkar","year":"2023","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Slam, W., Li, Y., and Urouvas, N. (2023). Frontier research on low-resource speech recognition technology. Sensors, 23.","DOI":"10.3390\/s23229096"},{"key":"ref_5","first-page":"3513","article-title":"Multilingual speech recognition initiative for African languages","volume":"20","author":"Allak","year":"2024","journal-title":"Int. J. Data Sci. Anal."},{"key":"ref_6","first-page":"3533","article-title":"Amazigh speech recognition based on the Kaldi ASR toolkit","volume":"15","author":"Barkani","year":"2023","journal-title":"Int. J. Inf. Technol."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1007\/s10772-024-10100-0","article-title":"Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method","volume":"27","author":"Boulal","year":"2024","journal-title":"Int. J. Speech Technol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1007\/s10772-024-10164-y","article-title":"Exploring data augmentation for Amazigh speech recognition with convolutional neural networks","volume":"28","author":"Boulal","year":"2025","journal-title":"Int. J. Speech Technol."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1121","DOI":"10.1007\/s10772-024-10154-0","article-title":"Comparative study of CNN, LSTM and hybrid CNN-LSTM model in Amazigh speech recognition using spectrogram feature extraction and different gender and age dataset","volume":"27","author":"Telmem","year":"2024","journal-title":"Int. J. Speech Technol."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Telmem, M., Laaidi, N., and Satori, H. (2025). The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system. Int. J. Speech Technol., 1\u201314.","DOI":"10.1007\/s10772-025-10183-3"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Guan, B., Cao, J., Wang, X., Wang, Z., Sui, M., and Wang, Z. (2024\u20132, January 31). Integrated method of deep learning and large language model in speech recognition. Proceedings of the 2024 IEEE 7th International Conference on Electronic Information and Communication Technology (ICEICT), Xi\u2019an, China.","DOI":"10.1109\/ICEICT61637.2024.10671048"},{"key":"ref_12","first-page":"737","article-title":"Improving speech recognition with prompt-based contextualized asr and llm-based re-predictor","volume":"2024","author":"Anh","year":"2024","journal-title":"Interspeech"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1038\/s44387-025-00011-z","article-title":"Large language models for disease diagnosis: A scoping review","volume":"1","author":"Zhou","year":"2025","journal-title":"Npj Artif. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"515","DOI":"10.12928\/telkomnika.v19i2.16793","article-title":"The convolutional neural networks for Amazigh speech recognition system","volume":"19","author":"Telmem","year":"2021","journal-title":"TELKOMNIKA"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Mukhamadiyev, A., Mukhiddinov, M., Khujayarov, I., Ochilov, M., and Cho, J. (2023). Development of language models for continuous Uzbek speech recognition system. Sensors, 23.","DOI":"10.3390\/s23031145"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liu, Z., Venkateswaran, N., Le Ferrand, \u00c9., and Prud\u2019hommeaux, E. (2024). How important is a language model for low-resource ASR. Findings of the Association for Computational Linguistics: ACL 2024, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2024.findings-acl.13"},{"key":"ref_17","unstructured":"Anoop, C.S., and Ramakrishnan, A.G. (2021, January 27\u201330). CTC-based end-to-end ASR for the low resource Sanskrit language with spectrogram augmentation. Proceedings of the 2021 National Conference on Communications (NCC), Kanpur, India."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1007\/s10772-022-09983-8","article-title":"Hybrid end-to-end model for Kazakh speech recognition","volume":"26","author":"Mamyrbayev","year":"2023","journal-title":"Int. J. Speech Technol."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Labied, M., Belangour, A., and Banane, M. (2023, January 16\u201317). Delve deep into End-To-End Automatic Speech Recognition Models. Proceedings of the 2023 International Seminar on Application for Technology of Information and Communication (iSemantic), Semarang, Indonesia.","DOI":"10.1109\/iSemantic59612.2023.10295371"},{"key":"ref_20","unstructured":"Mori, D., Ohta, K., Nishimura, R., Ogawa, A., and Kitaoka, N. (2021, January 14\u201317). Advanced language model fusion method for encoder-decoder model in Japanese speech recognition. Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"01064","DOI":"10.1051\/e3sconf\/202341201064","article-title":"Comparative Study of Amazigh Speech Recognition Systems Based on Different Toolkits and Approaches","volume":"Volume 412","author":"Atounti","year":"2023","journal-title":"E3S Web of Conferences"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1109\/TASLP.2021.3133216","article-title":"Live streaming speech recognition using deep bidirectional LSTM acoustic models and interpolated language models","volume":"30","author":"Jorge","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"17309","DOI":"10.1007\/s11042-024-19750-3","article-title":"Isolated word recognition based on a hyper-tuned cross-validated CNN-BiLSTM from Mel Frequency Cepstral Coefficients","volume":"84","author":"Paul","year":"2025","journal-title":"Multimed. Tools Appl."},{"key":"ref_24","first-page":"1893","article-title":"Mathematical Modelling of Engineering Problems","volume":"12","author":"Ismael","year":"2025","journal-title":"Int. Inf. Eng. Assoc."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"514","DOI":"10.1016\/j.neucom.2021.09.017","article-title":"Exploring attention mechanisms based on summary information for end-to-end automatic speech recognition","volume":"465","author":"Xue","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Alawdi, A. (2025, January 5\u20136). MultiheadSelfAttention vs, Traditional Encoders: A Benchmark Study on Precision and Recall in Tajweed Recognition. Proceedings of the 2025 5th International Conference on Emerging Smart Technologies and Applications (eSmarTA), Ibb, Yemen.","DOI":"10.1109\/eSmarTA66764.2025.11132108"},{"key":"ref_27","first-page":"2261","article-title":"The Hmm Based Amazigh Digits Audiovisual Speech Recognition System","volume":"71","author":"Addarrazi","year":"2022","journal-title":"Math. Stat. Eng. Appl."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ouhnini, A., Aksasse, B., and Ouanan, M. (2023). Towards an automatic speech-to-text transcription system: Amazigh language. Int. J. Adv. Comput. Sci. Appl., 14.","DOI":"10.14569\/IJACSA.2023.0140250"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1007\/s10772-014-9223-y","article-title":"Investigation Amazigh speech recognition using CMU tools","volume":"17","author":"Satori","year":"2014","journal-title":"Int. J. Speech Technol."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Amin, N.A.M. (2023). Low-Resource Automatic Speech Recognition Domain Adaptation: A Case-Study in Aviation Maintenance. [Doctoral Dissertation, Purdue University Graduate School].","DOI":"10.58940\/2329-258X.2052"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1250\/ast.42.252","article-title":"Deep learning based large vocabulary continuous speech recognition of an under-resourced language Bangladeshi Bangla","volume":"42","author":"Samin","year":"2021","journal-title":"Acoust. Sci. Technol."},{"key":"ref_32","first-page":"1692","article-title":"Integration of WFST Language Model in Pre-trained Korean E2E ASR Model","volume":"18","author":"Oh","year":"2024","journal-title":"KSII Trans. Internet Inf. Syst."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/164\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T05:26:48Z","timestamp":1765517208000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/164"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,10]]},"references-count":32,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["make7040164"],"URL":"https:\/\/doi.org\/10.3390\/make7040164","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2025,12,10]]}}}