{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T10:59:03Z","timestamp":1778497143011,"version":"3.51.4"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T00:00:00Z","timestamp":1739145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2025,2,28]]},"abstract":"<jats:p>Emotional expressions are a fundamental aspect of human communication, with speech being one of the most natural modes of interaction. Speech Emotion Recognition (SER) is a significant research topic in Natural Language Processing (NLP), aimed at identifying emotions such as satisfaction, frustration, and anger from speech audio using multiple classifiers. This article presents a method to emotion recognition from spontaneous Tunisian Dialect (TD) speech, marking the first work in the SER field to utilize spontaneous speech for emotion recognition in this dialect. The dataset was created from freely available YouTube videos across multiple domains and labeled with four perceived emotions: anger, satisfaction, frustration, and neutral.<\/jats:p>\n          <jats:p>To address the data scarcity issue, we implemented data augmentation techniques, specifically Vocal Tract Length Perturbation (VTLP). The preprocessing of the speech signals involved cleaning the data from ambient and unwanted noises. We extracted and selected various spectral features, including Mel-Frequency Cepstral Coefficients (MFCCs) and Linear Prediction Cepstral Coefficients (LPCC). Subsequently, we applied several classification methods: Support Vector Machine (SVM), Bidirectional Long Short-Term Memory (BiLSTM), Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Random Forest.<\/jats:p>\n          <jats:p>Our experiments demonstrated that the Random Forest classifier achieved the highest F-score of 58.75%. The results were thoroughly discussed, analyzed, and compared across the five models using different feature extractions. This study provides valuable insights and advancements in the SER field, particularly for the TD, for future research directions for improving emotion recognition systems.<\/jats:p>","DOI":"10.1145\/3708340","type":"journal-article","created":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T11:39:31Z","timestamp":1734521971000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Emotion Recognition from Spontaneous Tunisian Dialect Speech"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-0677-7458","authenticated-orcid":false,"given":"Latifa","family":"Ibn Nasr","sequence":"first","affiliation":[{"name":"MIRACL Laboratory, University of Sfax, Sfax, Tunisia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5987-8876","authenticated-orcid":false,"given":"Abir","family":"Masmoudi","sequence":"additional","affiliation":[{"name":"MIRACL Laboratory, University of Sfax, Sfax, Tunisia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4868-657X","authenticated-orcid":false,"given":"Lamia","family":"Hadrich Belguith","sequence":"additional","affiliation":[{"name":"MIRACL Laboratory, University of Sfax, Sfax, Tunisia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,2,10]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"22","article-title":"Speech emotion recognition of Indonesian movie audio tracks based on MFCC and SVM","author":"Prasetya M. R.","year":"2019","unstructured":"M. R. Prasetya, A. Harjoko, and C. Supriyanto. 2019. Speech emotion recognition of Indonesian movie audio tracks based on MFCC and SVM. In Proceedings of the 2019 International Conference on Contemporary Computing and Informatics (IC3I). IEEE, 22\u201325","journal-title":"Proceedings of the 2019 International Conference on Contemporary Computing and Informatics (IC3I)"},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"2257","DOI":"10.1109\/WiSPNET.2017.8300161","article-title":"Speech based human emotion recognition using MFCC","author":"Likitha M. S.","year":"2017","unstructured":"M. S. Likitha, S. R. R. Gupta, K. Hasitha, and A. U. Raju. 2017. Speech based human emotion recognition using MFCC. In Proceedings of the 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE, 2257\u20132260.","journal-title":"Proceedings of the 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET)"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2010.09.020"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3599234"},{"key":"e_1_3_1_6_2","first-page":"293","article-title":"Challenges and solutions for Arabic natural language processing in social media","author":"AL-Sarayreh S.","year":"2023","unstructured":"S. AL-Sarayreh, A. Mohamed, and K. Shaalan. 2023. Challenges and solutions for Arabic natural language processing in social media. In International Conference on Variability of the Sun and Sun-like Stars: From Asteroseismology to Space Weather. Springer, Singapore, 293\u2013302.","journal-title":"International Conference on Variability of the Sun and Sun-like Stars: From Asteroseismology to Space Weather"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"J. A. Fishman. 2020. Bilingualism with and without diglossia; Diglossia with and without bilingualism. In The Bilingualism Reader. Routledge 47\u201354.","DOI":"10.4324\/9781003060406-8"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2019.12.001"},{"key":"e_1_3_1_9_2","doi-asserted-by":"crossref","first-page":"6499","DOI":"10.1109\/ICASSP40776.2020.9053039","article-title":"HGFM: A hierarchical grained and feature model for acoustic emotion recognition","author":"Xu Y.","year":"2020","unstructured":"Y. Xu, H. Xu, and J. Zou. 2020. HGFM: A hierarchical grained and feature model for acoustic emotion recognition. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6499\u20136503.","journal-title":"Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)"},{"key":"e_1_3_1_10_2","article-title":"Predicting categorical emotions by jointly learning primary and secondary emotions through multitask learning","author":"Lotfian R.","year":"2018","unstructured":"R. Lotfian and C. Busso. 2018. Predicting categorical emotions by jointly learning primary and secondary emotions through multitask learning. In Proceedings of Interspeech 2018.","journal-title":"Proceedings of Interspeech"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2019-2710"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"6720","DOI":"10.1109\/ICASSP.2019.8683077","article-title":"DNN-based emotion recognition based on bottleneck acoustic features and lexical features","author":"Kim E.","year":"2019","unstructured":"E. Kim and J. W. Shin. 2019. DNN-based emotion recognition based on bottleneck acoustic features and lexical features. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6720\u20136724.","journal-title":"Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)"},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","first-page":"6319","DOI":"10.1109\/ICASSP39728.2021.9414635","article-title":"Speech emotion recognition with multiscale area attention and data augmentation","author":"Xu M.","year":"2021","unstructured":"M. Xu, F. Zhang, X. Cui, and W. Zhang. 2021. Speech emotion recognition with multiscale area attention and data augmentation. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6319\u20136323.","journal-title":"Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-023-16849-x"},{"key":"e_1_3_1_15_2","first-page":"1","article-title":"Emotion recognition from speech audio signals using CNN-BiLSTM hybrid model","author":"Islam A.","year":"2024","unstructured":"A. Islam, M. Foysal, and M. I. Ahmed. 2024. Emotion recognition from speech audio signals using CNN-BiLSTM hybrid model. In Proceedings of the 2024 3rd International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE). IEEE, 1\u20136.","journal-title":"Proceedings of the 2024 3rd International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE)"},{"key":"e_1_3_1_16_2","article-title":"AlloSat: A new call center French corpus for satisfaction and frustration analysis","author":"Macary M.","year":"2020","unstructured":"M. Macary, M. Tahon, Y. Est\u00e8ve, and A. Rousseau. 2020. AlloSat: A new call center French corpus for satisfaction and frustration analysis. In Proceedings of the Language Resources and Evaluation Conference (LREC \u201920).","journal-title":"Proceedings of the Language Resources and Evaluation Conference (LREC \u201920)"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2019.10.004"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.04.028"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-024-10088-7"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2020.04.005"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3110992"},{"key":"e_1_3_1_22_2","first-page":"1","article-title":"Effective speech emotion recognition using deep learning approaches for Algerian dialect","author":"Cherif R. Y.","year":"2021","unstructured":"R. Y. Cherif, A. Moussaoui, N. Frahta, and M. Berrimi. 2021. Effective speech emotion recognition using deep learning approaches for Algerian dialect. In Proceedings of the 2021 International Conference of Women in Data Science at Taif University (WiDSTaif). IEEE, 1\u20136.","journal-title":"Proceedings of the 2021 International Conference of Women in Data Science at Taif University (WiDSTaif)"},{"key":"e_1_3_1_23_2","first-page":"184","article-title":"Automated extraction of features from Arabic emotional speech corpus","volume":"8","author":"Meddeb M.","year":"2016","unstructured":"M. Meddeb, K. Hichem, and A. Alimi. 2016. Automated extraction of features from Arabic emotional speech corpus. International Journal of Computer Information Systems and Industrial Management Applications 8 (2016), 184\u2013194.","journal-title":"International Journal of Computer Information Systems and Industrial Management Applications"},{"key":"e_1_3_1_24_2","first-page":"234","article-title":"TuniSER: Toward a Tunisian speech emotion recognition system","author":"Messaoudi A.","year":"2022","unstructured":"A. Messaoudi, H. Haddad, M. B. Hmida, and M. Graiet. 2022. TuniSER: Toward a Tunisian speech emotion recognition system. In Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP \u201922). 234\u2013241.","journal-title":"Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP \u201922)"},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","unstructured":"R. Harris. 2018. Continuity and Change in the Tunisian Sahel. Routledge.","DOI":"10.4324\/9781351161121"},{"key":"e_1_3_1_26_2","first-page":"135","article-title":"Sp\u00e9cificit\u00e9s du dialecte sfaxien","volume":"1","author":"Lajmi D.","year":"2009","unstructured":"D. Lajmi. 2009. Sp\u00e9cificit\u00e9s du dialecte sfaxien. Synergies Tunisie 1 (2009), 135\u2013142.","journal-title":"Synergies Tunisie"},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","unstructured":"N. Habash A. Soudi and T. Buckwalter. 2007. On Arabic transliteration. Arabic computational morphology: Knowledge-based and empirical methods. Springer 15\u201322.","DOI":"10.1007\/978-1-4020-6046-5_2"},{"key":"e_1_3_1_28_2","unstructured":"A. M. Dammak. 2016. Approche hybride pour la reconnaissance automatique de la parole en langue arabe (Doctoral dissertation Universit\u00e9 du Maine). Available at theses.hal.science."},{"key":"e_1_3_1_29_2","first-page":"21","article-title":"Vocal tract length perturbation (VTLP) improves speech recognition","volume":"117","author":"Jaitly N.","year":"2013","unstructured":"N. Jaitly and G. E. Hinton. 2013. Vocal tract length perturbation (VTLP) improves speech recognition. In Proceedings of the ICML Workshop on Deep Learning for Audio, Speech and Language 117, 21.","journal-title":"Proceedings of the ICML Workshop on Deep Learning for Audio, Speech and Language"},{"key":"e_1_3_1_30_2","first-page":"3586","article-title":"Audio augmentation for speech recognition","author":"Ko T.","year":"2015","unstructured":"T. Ko, V. Peddinti, D. Povey, and S. Khudanpur. 2015. Audio augmentation for speech recognition. In Proceedings of Interspeech, 2015, 3586.","journal-title":"Proceedings of Interspeech"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2019.12.001"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.237532"},{"issue":"1","key":"e_1_3_1_33_2","first-page":"186","article-title":"Preprocessing technique in automatic speech recognition for human computer interaction: An overview","volume":"15","author":"Ibrahim Y. A.","year":"2017","unstructured":"Y. A. Ibrahim, J. C. Odiketa, and T. S. Ibiyemi. 2017. Preprocessing technique in automatic speech recognition for human computer interaction: An overview. Annals. Computer Science Series 15, 1 (2017), 186\u2013191.","journal-title":"Annals. Computer Science Series"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.3390\/app12178898"},{"issue":"5","key":"e_1_3_1_35_2","first-page":"87","article-title":"Frame blocking and windowing speech signal","volume":"4","author":"Hamid O. K.","year":"2018","unstructured":"O. K. Hamid. 2018. Frame blocking and windowing speech signal. Journal of Information, Communication, and Intelligence Systems (JICIS) 4, 5 (2018), 87\u201394.","journal-title":"Journal of Information, Communication, and Intelligence Systems (JICIS)"},{"key":"e_1_3_1_36_2","first-page":"73","article-title":"Natural Tunisian speech preprocessing for features extraction","author":"Nasr L. I.","year":"2023","unstructured":"L. I. Nasr, A. Masmoudi, and L. H. Belguith. 2023. Natural Tunisian speech preprocessing for features extraction. In Proceedings of the 2023 IEEE\/ACIS 23rd International Conference on Computer and Information Science (ICIS). IEEE, 73\u201378.","journal-title":"Proceedings of the 2023 IEEE\/ACIS 23rd International Conference on Computer and Information Science (ICIS)"},{"key":"e_1_3_1_37_2","first-page":"10","article-title":"Detection of adolescent depression from speech using optimised spectral roll-off parameters","volume":"2","author":"Stolar M. N.","year":"2018","unstructured":"M. N. Stolar, M. Lech, S. J. Stolar, and N. B. Allen. 2018. Detection of adolescent depression from speech using optimised spectral roll-off parameters. Biomedical Journal 2 (2018), 10.","journal-title":"Biomedical Journal"},{"key":"e_1_3_1_38_2","author":"Chen A.","year":"2014","unstructured":"A. Chen. 2014. Automatic Classification of Electronic Music and Speech\/Music Audio Content. Doctoral Dissertation. University of Illinois at UrbanaChampaign.","journal-title":"Automatic Classification of Electronic Music and Speech\/Music Audio Content"},{"key":"e_1_3_1_39_2","first-page":"18","article-title":"Natural Arabic language resources for emotion recognition in Algerian dialect","author":"Dahmani H.","year":"2019","unstructured":"H. Dahmani, H. Hussein, B. Meyer-Sickendiek, and O. Jokisch. 2019. Natural Arabic language resources for emotion recognition in Algerian dialect. In Arabic Language Processing: From Theory to Practice. Proceedings of the 7th International Conference (ICALP \u201919) 7. Springer International Publishing. 18\u201333.","journal-title":"Arabic Language Processing: From Theory to Practice. Proceedings of the 7th International Conference (ICALP \u201919) 7"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.apacoust.2023.109279"},{"key":"e_1_3_1_41_2","first-page":"102","article-title":"Emotion recognition system for Arabic speech: Case study Egyptian accent","author":"El Seknedy M.","year":"2022","unstructured":"M. El Seknedy and S. A. Fawzi. 2022. Emotion recognition system for Arabic speech: Case study Egyptian accent. In Proceedings of the International Conference on Model and Data Engineering. Springer, 102\u2013115.","journal-title":"Proceedings of the International Conference on Model and Data Engineering"},{"key":"e_1_3_1_42_2","unstructured":"Y. Bahou A. Masmoudi and L. H. Belguith. 2010. Traitement des disfluences dans le cadre de la compr\u00e9hension automatique de l'oral arabe spontan\u00e9. In Actes de la 17e Conf\u00e9rence Sur Le Traitement Automatique des Langues Naturelles. 201\u2013210."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3708340","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3708340","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:09:45Z","timestamp":1750295385000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3708340"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,10]]},"references-count":41,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2,28]]}},"alternative-id":["10.1145\/3708340"],"URL":"https:\/\/doi.org\/10.1145\/3708340","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,10]]},"assertion":[{"value":"2023-10-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-26","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}