{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:02:16Z","timestamp":1760058136689,"version":"build-2065373602"},"reference-count":67,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T00:00:00Z","timestamp":1741910400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"various research projects on multilingual language technologies","award":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"],"award-info":[{"award-number":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"]}]},{"name":"RECOVER","award":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"],"award-info":[{"award-number":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"]}]},{"name":"Projects of Excellence, Andalusian Regional Government (Junta de Andaluc\u00eda)","award":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"],"award-info":[{"award-number":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"]}]},{"name":"DIFARMA","award":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"],"award-info":[{"award-number":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"]}]},{"name":"European Regional Development Fund (ERDF)","award":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"],"award-info":[{"award-number":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"]}]},{"name":"D\u00cdGAME","award":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"],"award-info":[{"award-number":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"]}]},{"name":"MCINAEI\/10.13039\/501100011033","award":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"],"award-info":[{"award-number":["PID2020-112818GB-I00\/AEI\/10.13039\/501100011033","ProyExcel_00540","PAIDI 2020","HUM106-G-FEDER","JA.A1.3-06","PRE2021-098899"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>In recent years, the advances in deep neural networks (DNNs) and large language models (LLMs) have led to major breakthroughs and new levels of performance in Natural Language Processing (NLP), including tasks related to speech processing. Based on these new trends, new models such as Whisper and Wav2Vec 2.0 achieve robust performance in speech processing tasks, even in speech-to-text translation and end-to-end speech translation, far exceeding all previous results. Although these models have shown excellent results in real-time speech processing, they still have some accuracy issues for some tasks and high latency problems when working with large amounts of audio data. In addition, many of them need audio to be segmented and labelled for speech synthesis and annotation tasks. Speaker diarisation, background noise detection, prosodic boundary detection and accent classification are some of the pre-processing tasks required in these cases. In this study, we will fine-tune a small Wav2Vec 2.0 base model for multi-task classification and audio segmentation. A corpus of spoken American English will be used for the experiments. We intend to explore this new approach and, more specifically, the performance of the model with regard to prosodic boundaries detection for audio segmentation, and advanced accent identification.<\/jats:p>","DOI":"10.3390\/computers14030102","type":"journal-article","created":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T07:02:16Z","timestamp":1741935736000},"page":"102","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Advanced Identification of Prosodic Boundaries, Speakers, and Accents Through Multi-Task Audio Pre-Processing and Speech Language Models"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-2622-3843","authenticated-orcid":false,"given":"Francisco Javier","family":"Lima Florido","sequence":"first","affiliation":[{"name":"Instituto Universitario de Investigaci\u00f3n de Tecnolog\u00edas Ling\u00fc\u00edsticas Multiling\u00fces (IUITLM), University of Malaga, 29010 Malaga, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6688-1531","authenticated-orcid":false,"given":"Gloria","family":"Corpas Pastor","sequence":"additional","affiliation":[{"name":"Instituto Universitario de Investigaci\u00f3n de Tecnolog\u00edas Ling\u00fc\u00edsticas Multiling\u00fces (IUITLM), University of Malaga, 29010 Malaga, Spain"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,14]]},"reference":[{"unstructured":"Radford, A., Kim, J.W., Xu, T., Brockman, G., Mcleavey, C., and Sutskever, I. (2023, January 23\u201329). Robust Speech Recognition via Large-Scale Weak Supervision. Proceedings of the 40th International Conference on Machine Learning, PMLR, Honolulu, HI, USA.","key":"ref_1"},{"key":"ref_2","first-page":"12449","article-title":"Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations","volume":"33","author":"Baevski","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"doi-asserted-by":"crossref","unstructured":"Fu, B., Fan, K., Liao, M., Chen, Y., Shi, X., and Huang, Z. (2024, January 11\u201316). Wav2vec-S: Adapting Pre-Trained Speech Models for Streaming. Proceedings of the Findings of the Association for Computational Linguistics ACL 2024, Bangkok, Thailand.","key":"ref_3","DOI":"10.18653\/v1\/2024.findings-acl.681"},{"unstructured":"Arriaga, C., Pozo, A., Conde, J., and Alonso, A. (2024). Evaluation of Real-Time Transcriptions Using End-to-End ASR Models. arXiv.","key":"ref_4"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"103110","DOI":"10.1016\/j.specom.2024.103110","article-title":"Arabic Automatic Speech Recognition: Challenges and Progress","volume":"163","author":"Besdouri","year":"2024","journal-title":"Speech Commun."},{"unstructured":"Hsu, M.-H., Huang, K.P., and Lee, H. (2024). Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages. arXiv.","key":"ref_6"},{"unstructured":"Synnaeve, G., Xu, Q., Kahn, J., Likhomanenko, T., Grave, E., Pratap, V., Sriram, A., Liptchinsky, V., and Collobert, R. (2020, January 17). End-to-End ASR: From Supervised to Semi-Supervised Learning with Modern Architectures. Proceedings of the ICML 2020 Workshop on Self-Supervision in Audio and Speech, Virtual.","key":"ref_7"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"101869","DOI":"10.1016\/j.inffus.2023.101869","article-title":"A Review of Deep Learning Techniques for Speech Processing","volume":"99","author":"Mehrish","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1016\/j.specom.2022.02.005","article-title":"Unsupervised Automatic Speech Recognition: A Review","volume":"139","author":"Aldarmaki","year":"2022","journal-title":"Speech Commun."},{"doi-asserted-by":"crossref","unstructured":"Grimm, M., and Kroschel, K. (2007). Voice Activity Detection. Fundamentals and Speech Recognition System Robustness. Robust Speech Recognition and Understanding, InTech.","key":"ref_10","DOI":"10.5772\/35"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1035","DOI":"10.1002\/j.1538-7305.1983.tb03114.x","article-title":"An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition","volume":"62","author":"Levinson","year":"1983","journal-title":"Bell Syst. Tech. J."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"S130","DOI":"10.1121\/1.2017051","article-title":"MITalk-79: The 1979 MIT Text-to-Speech System","volume":"65","author":"Allen","year":"1979","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1006\/csla.1998.0041","article-title":"Assigning Phrase Breaks from Part-of-Speech Sequences","volume":"12","author":"Taylor","year":"1998","journal-title":"Comput. Speech Lang."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1186\/s13636-014-0028-3","article-title":"Linguistically Motivated Parameter Estimation Methods for a Superpositional Intonation Model","volume":"2014","author":"Torres","year":"2014","journal-title":"EURASIP J. Audio Speech Music Process."},{"doi-asserted-by":"crossref","unstructured":"Zen, H., Senior, A., and Schuster, M. (2013, January 26\u201331). Statistical Parametric Speech Synthesis Using Deep Neural Networks. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","key":"ref_15","DOI":"10.1109\/ICASSP.2013.6639215"},{"doi-asserted-by":"crossref","unstructured":"Graves, A., Mohamed, A., and Hinton, G. (2013, January 26\u201331). Speech Recognition with Deep Recurrent Neural Networks. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","key":"ref_16","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1186\/s13636-022-00251-w","article-title":"Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data","volume":"2022","author":"Roger","year":"2022","journal-title":"EURASIP J. Audio Speech Music Process."},{"unstructured":"Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., and Liu, T.-Y. (2019). FastSpeech: Fast, Robust and Controllable Text to Speech. Advances in Neural Information Processing Systems 32 (NeurIPS 2019), The MIT Press.","key":"ref_18"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1080\/00401706.1991.10484833","article-title":"Hidden Markov Models for Speech Recognition","volume":"33","author":"Juang","year":"1991","journal-title":"Technometrics"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","article-title":"Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups","volume":"29","author":"Hinton","year":"2012","journal-title":"IEEE Signal Process. Mag."},{"doi-asserted-by":"crossref","unstructured":"Yu, D., and Deng, L. (2015). Automatic Speech Recognition, Springer.","key":"ref_21","DOI":"10.1007\/978-1-4471-5779-3"},{"unstructured":"Ning, H., Liu, M., Tang, H., and Huang, T. (2006, January 17\u201321). A Spectral Clustering Approach to Speaker Diarization. Proceedings of the INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006\u2014ICSLP, Pittsburgh, PA, USA.","key":"ref_22"},{"doi-asserted-by":"crossref","unstructured":"Piat, M., Fohr, D., and Illina, I. (2008, January 22\u201326). Foreign Accent Identification Based on Prosodic Parameters. Proceedings of the Interspeech 2008, Brisbane, Australia.","key":"ref_23","DOI":"10.21437\/Interspeech.2008-235"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/j.csl.2018.07.001","article-title":"Prosodic Boundary Detection Using Syntactic and Acoustic Information","volume":"53","author":"Kocharov","year":"2019","journal-title":"Comput. Speech Lang."},{"doi-asserted-by":"crossref","unstructured":"Hogg, A.O.T., Evers, C., and Naylor, P.A. (2019, January 12\u201317). Speaker Change Detection Using Fundamental Frequency with Application to Multi-Talker Segmentation. Proceedings of the ICASSP 2019\u20142019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","key":"ref_25","DOI":"10.1109\/ICASSP.2019.8682924"},{"unstructured":"Chorowski, J., Bahdanau, D., Cho, K., and Bengio, Y. (2014, January 12\u201313). End-to-End Continuous Speech Recognition Using Attention-Based Recurrent Nn: First Results. Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada.","key":"ref_26"},{"doi-asserted-by":"crossref","unstructured":"Sell, G., and Garcia-Romero, D. (2014, January 7\u201310). Speaker Diarization with Plda I-Vector Scoring and Unsupervised Calibration. Proceedings of the 2014 IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, NV, USA.","key":"ref_27","DOI":"10.1109\/SLT.2014.7078610"},{"doi-asserted-by":"crossref","unstructured":"Wan, L., Wang, Q., Papir, A., and Moreno, I.L. (2018, January 15\u201320). Generalized End-to-End Loss for Speaker Verification. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","key":"ref_28","DOI":"10.1109\/ICASSP.2018.8462665"},{"doi-asserted-by":"crossref","unstructured":"Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., and Khudanpur, S. (2018, January 15\u201320). X-Vectors: Robust DNN Embeddings for Speaker Recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","key":"ref_29","DOI":"10.1109\/ICASSP.2018.8461375"},{"unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, The MIT Press.","key":"ref_30"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3530811","article-title":"Efficient Transformers: A Survey","volume":"55","author":"Tay","year":"2023","journal-title":"ACM Comput. Surv."},{"doi-asserted-by":"crossref","unstructured":"Roll, N., Graham, C., and Todd, S. (2023). PSST! Prosodic Speech Segmentation with Transformers. arXiv.","key":"ref_32","DOI":"10.18653\/v1\/2023.conll-1.31"},{"doi-asserted-by":"crossref","unstructured":"Taylor, P. (2009). Text-to-Speech Synthesis, Cambridge University Press. [1st ed.].","key":"ref_33","DOI":"10.1017\/CBO9780511816338"},{"doi-asserted-by":"crossref","unstructured":"Karpov, A., and Potapova, R. (2021). Human and Transformer-Based Prosodic Phrasing in Two Speech Genres. Proceedings of the Speech and Computer. SPECOM 2021, Springer.","key":"ref_34","DOI":"10.1007\/978-3-030-87802-3"},{"key":"ref_35","first-page":"4392","article-title":"Joint Detection of Sentence Stress and Phrase Boundary for Prosody","volume":"Volume 2020","author":"Lin","year":"2020","journal-title":"Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH"},{"doi-asserted-by":"crossref","unstructured":"Hruz, M., and Zajic, Z. (2017, January 5\u20139). Convolutional Neural Network for Speaker Change Detection in Telephone Speaker Diarization System. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","key":"ref_36","DOI":"10.1109\/ICASSP.2017.7953097"},{"doi-asserted-by":"crossref","unstructured":"Kwon, S., and Narayanan, S.S. (2002, January 16\u201320). Speaker Change Detection Using a New Weighted Distance Measure. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), Denver, CO, USA.","key":"ref_37","DOI":"10.21437\/ICSLP.2002-660"},{"doi-asserted-by":"crossref","unstructured":"Aronowitz, H., and Zhu, W. (2020, January 4\u20138). Context and Uncertainty Modeling for Online Speaker Change Detection. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","key":"ref_38","DOI":"10.1109\/ICASSP40776.2020.9053280"},{"doi-asserted-by":"crossref","unstructured":"Snyder, D., Garcia-Romero, D., Sell, G., McCree, A., Povey, D., and Khudanpur, S. (2019, January 12\u201317). Speaker Recognition for Multi-Speaker Conversations Using X-Vectors. Proceedings of the ICASSP 2019\u20142019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","key":"ref_39","DOI":"10.1109\/ICASSP.2019.8683760"},{"doi-asserted-by":"crossref","unstructured":"Fujita, Y., Kanda, N., Horiguchi, S., Xue, Y., Nagamatsu, K., and Watanabe, S. (2019, January 14\u201318). End-to-End Neural Speaker Diarization with Self-Attention. Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore.","key":"ref_40","DOI":"10.1109\/ASRU46091.2019.9003959"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"119238","DOI":"10.1016\/j.eswa.2022.119238","article-title":"Speech and Multilingual Natural Language Framework for Speaker Change Detection and Diarization","volume":"213","author":"Anidjar","year":"2023","journal-title":"Expert Syst. Appl."},{"doi-asserted-by":"crossref","unstructured":"Mateju, L., Kynych, F., Cerva, P., Malek, J., and Zdansky, J. (2022, January 18\u201322). Overlapped Speech Detection in Broadcast Streams Using X-Vectors. Proceedings of the Interspeech 2022, Incheon, Republic of Korea.","key":"ref_42","DOI":"10.21437\/Interspeech.2022-81"},{"doi-asserted-by":"crossref","unstructured":"Bullock, L., Bredin, H., and Garcia-Perera, L.P. (2022, January 23\u201327). Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","key":"ref_43","DOI":"10.1109\/ICASSP40776.2020.9053096"},{"doi-asserted-by":"crossref","unstructured":"Du, Z., Zhang, S., Zheng, S., and Yan, Z. (2022). Speaker Overlap-Aware Neural Diarization for Multi-Party Meeting Analysis. arXiv.","key":"ref_44","DOI":"10.18653\/v1\/2022.emnlp-main.505"},{"doi-asserted-by":"crossref","unstructured":"Su, H., Zhao, D., Dang, L., Li, M., Wu, X., Liu, X., and Meng, H. (2022, January 23\u201327). A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition. Proceedings of the ICASSP 2022\u20142022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","key":"ref_45","DOI":"10.1109\/ICASSP43922.2022.9746116"},{"doi-asserted-by":"crossref","unstructured":"Berkling, K., Zissman, M.A., Vonwiller, J., and Cleirigh, C. (December, January 30). Improving Accent Identification through Knowledge of English Syllable Structure. Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP 1998), Sydney, Australia. paper 0394.","key":"ref_46","DOI":"10.21437\/ICSLP.1998-202"},{"unstructured":"Chen, T., Huang, C., Chang, E., and Wang, J. (2001, January 9\u201313). Automatic Accent Identification Using Gaussian Mixture Models. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Madonna di Campiglio, Italy. ASRU\u201901.","key":"ref_47"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"103167","DOI":"10.1016\/j.specom.2024.103167","article-title":"Spoken Language Identification: An Overview of Past and Present Research Trends","volume":"167","year":"2025","journal-title":"Speech Commun."},{"doi-asserted-by":"crossref","unstructured":"Watanabe, C., and Kameoka, H. (2024, January 3\u20136). GE2E-AC: Generalized End-to-End Loss Training for Accent Classification. Proceedings of the 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Macau, China.","key":"ref_49","DOI":"10.1109\/APSIPAASC63619.2025.10848863"},{"doi-asserted-by":"crossref","unstructured":"Huang, H., Xiang, X., Yang, Y., Ma, R., and Qian, Y. (2021, January 6\u201311). AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge. Proceedings of the ICASSP 2021\u20142021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","key":"ref_50","DOI":"10.1109\/ICASSP39728.2021.9414292"},{"doi-asserted-by":"crossref","unstructured":"Lesnichaia, M., Mikhailava, V., Bogach, N., Lezhenin, I., Blake, J., and Pyshkin, E. (2022, January 18\u201322). Classification of Accented English Using CNN Model Trained on Amplitude Mel-Spectrograms. Proceedings of the Interspeech 2022, Incheon, Republic of Korea.","key":"ref_51","DOI":"10.21437\/Interspeech.2022-462"},{"unstructured":"Matos, A., Ara\u00fajo, G., Junior, A.C., and Ponti, M. (2024, January 14\u201315). Accent Classification Is Challenging but Pre-Training Helps: A Case Study with Novel Brazilian Portuguese Datasets. Proceedings of the 16th International Conference on Computational Processing of Portuguese, Galicia, Spain.","key":"ref_52"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"109512","DOI":"10.1016\/j.engappai.2024.109512","article-title":"A Robust Accent Classification System Based on Variational Mode Decomposition","volume":"139","author":"Subhash","year":"2025","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"101676","DOI":"10.1016\/j.csl.2024.101676","article-title":"MPSA-DenseNet: A Novel Deep Learning Model for English Accent Classification","volume":"89","author":"Song","year":"2025","journal-title":"Comput. Speech Lang."},{"doi-asserted-by":"crossref","unstructured":"Viglino, T., Motlicek, P., and Cernak, M. (2019, January 15\u201319). End-to-End Accented Speech Recognition. Proceedings of the Interspeech 2019, Graz, Austria.","key":"ref_55","DOI":"10.21437\/Interspeech.2019-2122"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"3848","DOI":"10.1121\/10.0026235","article-title":"Advanced Accent\/Dialect Identification and Accentedness Assessment with Multi-Embedding Models and Automatic Speech Recognition","volume":"155","author":"Ghorbani","year":"2024","journal-title":"J. Acoust. Soc. Am."},{"doi-asserted-by":"crossref","unstructured":"Ravanelli, M., Zhong, J., Pascual, S., Swietojanski, P., Monteiro, J., Trmal, J., and Bengio, Y. (2020, January 4\u20138). Multi-Task Self-Supervised Learning for Robust Speech Recognition. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","key":"ref_57","DOI":"10.1109\/ICASSP40776.2020.9053569"},{"doi-asserted-by":"crossref","unstructured":"Pascual, S., Ravanelli, M., Serr\u00e0, J., Bonafonte, A., and Bengio, Y. (2019). Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks. arXiv.","key":"ref_58","DOI":"10.21437\/Interspeech.2019-2605"},{"doi-asserted-by":"crossref","unstructured":"Kune\u0161ov\u00e1, M., and Zaj\u00edc, Z. (2023, January 4\u201310). Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using Wav2vec 2.0. Proceedings of the ICASSP 2023\u20142023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece.","key":"ref_59","DOI":"10.1109\/ICASSP49357.2023.10094972"},{"doi-asserted-by":"crossref","unstructured":"Zhang, J., Peng, Y., Van Tung, P., Xu, H., Huang, H., and Chng, E.S. (2021). E2E-Based Multi-Task Learning Approach to Joint Speech and Accent Recognition. arXiv.","key":"ref_60","DOI":"10.21437\/Interspeech.2021-1495"},{"doi-asserted-by":"crossref","unstructured":"Yolwas, N., and Meng, W. (2023). JSUM: A Multitask Learning Speech Recognition Model for Jointly Supervised and Unsupervised Learning. Appl. Sci., 13.","key":"ref_61","DOI":"10.3390\/app13095239"},{"unstructured":"Wang, R., and Sun, K. (2024). TIMIT Speaker Profiling: A Comparison of Multi-Task Learning and Single-Task Learning Approaches. arXiv.","key":"ref_62"},{"key":"ref_63","first-page":"4980920","article-title":"A Robust Approach for Speaker Identification Using Dialect Information","volume":"2022","author":"Shah","year":"2022","journal-title":"Appl. Comput. Intell. Soft Comput."},{"unstructured":"Du Bois, J.W., Chafe, W.L., Meyer, C., Thompson, S.A., and Martey, N. (2000). Santa Barbara Corpus of Spoken American English. CD-ROM, Linguistic Data Consortium.","key":"ref_64"},{"key":"ref_65","first-page":"377","article-title":"Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0","volume":"Volume 13502","year":"2022","journal-title":"Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)"},{"doi-asserted-by":"crossref","unstructured":"Labov, W., Ash, S., and Boberg, C. (2006). The Atlas of North American English, Mouton de Gruyter.","key":"ref_66","DOI":"10.1515\/9783110167467"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"e36460","DOI":"10.1016\/j.heliyon.2024.e36460","article-title":"MKELM Based Multi-Classification Model for Foreign Accent Identification","volume":"10","author":"Kashif","year":"2024","journal-title":"Heliyon"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/3\/102\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:53:35Z","timestamp":1760028815000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/3\/102"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,14]]},"references-count":67,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["computers14030102"],"URL":"https:\/\/doi.org\/10.3390\/computers14030102","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2025,3,14]]}}}