{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T18:37:45Z","timestamp":1778611065762,"version":"3.51.4"},"reference-count":21,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T00:00:00Z","timestamp":1753315200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T00:00:00Z","timestamp":1753315200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Technical Research Centre of Finland"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["SIViP"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Voice pathology is very important in the identification of vocal disorders. Traditional methods of diagnosing voice disorders using voice pathology are expensive, time-consuming, and subjective. The study proposed the identification of normal and pathological voices using the Arabic Voice Pathology Database (AVPD). The study evaluated the performance of Support Vector Machine (SVM), hybrid deep learning, and transfer learning approaches for identifying normal and pathological voices. These models were trained using Mel spectrogram features extracted from the voice data from the AVPD. The transfer learning model outperformed with an accuracy of 96.88%, a precision of 0.96 and 0.98, a recall of 0.98 and 0.96 in the identification of normal and pathological voices, respectively. The transfer learning model showed an F1 score of 0.97 for both normal and pathological voices. The hybrid model showed an accuracy of 92.71% and superior performance in classification metrics to identify normal and pathological voices. The SVM model achieved an accuracy of 86.46% and showed low performance in classification metrics to identify normal and pathological voices. Deep learning models, particularly the transfer learning model, outperformed across all evaluation metrics. The proposed transfer learning model achieved a 1.53% increase in accuracy over state-of-the-art approaches in identifying voice disorders using voice pathology. The proposed solution has several applications in medical diagnosis, addressing issues associated with traditional approaches for identifying vocal disorders using voice pathology.<\/jats:p>","DOI":"10.1007\/s11760-025-04527-4","type":"journal-article","created":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T18:10:15Z","timestamp":1753380615000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Voice pathology identification using mel spectrogram features and deep learning"],"prefix":"10.1007","volume":"19","author":[{"given":"Rab Nawaz","family":"Bashir","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad Ali","family":"Shahid","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tahir","family":"Rashid","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Faheem","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Taoufik","family":"Saidani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Oumaima","family":"Saidani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Amjad Rehman","family":"Khan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,7,24]]},"reference":[{"issue":"2","key":"4527_CR1","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1016\/j.jvoice.2009.10.009","volume":"25","author":"E Van Houtte","year":"2011","unstructured":"Van Houtte, E., Van Lierde, K., Claeys, S.: Pathophysiology and treatment of muscle tension dysphonia: a review of the current knowledge. J. Voice 25(2), 202\u2013207 (2011)","journal-title":"J. Voice"},{"key":"4527_CR2","doi-asserted-by":"crossref","unstructured":"AL-Dhief, F.T., Latiff, N.M.A., Malik, N.N.N.A., Sabri, N., Baki, M.M., Albadr, M.A.A., Abbas, A.F., Hussein, Y.M., Mohammed, M.A.: Voice pathology detection using machine learning technique. In: 2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT), pp. 99\u2013104 (2020). IEEE","DOI":"10.1109\/ISTT50966.2020.9279346"},{"key":"4527_CR3","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1177\/000348949610500706","volume":"105","author":"MS Courey","year":"1996","unstructured":"Courey, M.S., Scott, M.A., Shohet, J.A., Ossoff, R.H.: Immunohistochemical characterization of benign laryngeal lesions. Annals of Otology, Rhinology & Laryngology 105, 525\u2013531 (1996)","journal-title":"Annals of Otology, Rhinology & Laryngology"},{"key":"4527_CR4","volume-title":"Clinical Voice Pathology: Theory and Management","author":"JC Stemple","year":"2020","unstructured":"Stemple, J.C., Roy, N., Klaben, B.K.: Clinical Voice Pathology: Theory and Management, 6th edn. Plural Publishing Inc, San Diego, CA (2020)","edition":"6"},{"issue":"6","key":"4527_CR5","doi-asserted-by":"publisher","first-page":"7958","DOI":"10.3934\/mbe.2020404","volume":"17","author":"SA Syed","year":"2020","unstructured":"Syed, S.A., Rashid, M., Hussain, S.: Meta-analysis of voice disorders databases and applied machine learning techniques. Math. Biosci. Eng. 17(6), 7958\u20137979 (2020)","journal-title":"Math. Biosci. Eng."},{"key":"4527_CR6","doi-asserted-by":"crossref","unstructured":"Shetty, S., Hegde, S., Dodderi, T., et al.: Classification of healthy and pathological voices using mfcc and ann. In: 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC), pp. 1\u20135 (2018). IEEE","DOI":"10.1109\/ICAECC.2018.8479441"},{"key":"4527_CR7","doi-asserted-by":"publisher","unstructured":"Ur Rehman, M., Shafique, A., Azhar, Q.-U.-A., Jamal, S.S., Gheraibia, Y., Usman, A.B.: Voice disorder detection using machine learning algorithms: An application in speech and language pathology. Engineering Applications of Artificial Intelligence 133, 108047 (2024) https:\/\/doi.org\/10.1016\/j.engappai.2024.108047","DOI":"10.1016\/j.engappai.2024.108047"},{"key":"4527_CR8","doi-asserted-by":"publisher","unstructured":"Maciel, C., Pereira, J., Stewart, D.: Identifying healthy and pathologically affected voice signals. Signal Processing Magazine, IEEE 27, 120\u2013123 (2010) https:\/\/doi.org\/10.1109\/MSP.2009.934925","DOI":"10.1109\/MSP.2009.934925"},{"key":"4527_CR9","doi-asserted-by":"crossref","unstructured":"Pham, M., Lin, J., Zhang, Y.: Diagnosing voice disorder with machine learning. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 5263\u20135266 (2018). IEEE","DOI":"10.1109\/BigData.2018.8622250"},{"key":"4527_CR10","doi-asserted-by":"crossref","unstructured":"Mohamed, A.N., Salama, A.A., Darwish, S.H.: Classification of voice pathology using svm classifier. 2023 International Telecommunications Conference (ITC-Egypt), 419\u2013422 (2023)","DOI":"10.1109\/ITC-Egypt58155.2023.10206116"},{"key":"4527_CR11","doi-asserted-by":"publisher","first-page":"15747","DOI":"10.1007\/s00521-018-3464-7","volume":"32","author":"P Harar","year":"2020","unstructured":"Harar, P., Galaz, Z., Alonso-Hernandez, J.B., Mekyska, J., Burget, R., Smekal, Z.: Towards robust voice pathology detection: Investigation of supervised deep learning, gradient boosting, and anomaly detection approaches across four databases. Neural Comput. Appl. 32, 15747\u201315757 (2020)","journal-title":"Neural Comput. Appl."},{"key":"4527_CR12","doi-asserted-by":"crossref","unstructured":"Abdul, Z.K., Al-Talabani, A.K.: Mel frequency cepstral coefficient and its applications: A review. IEEE Access 10, 122136\u2013122158 (2022)","DOI":"10.1109\/ACCESS.2022.3223444"},{"key":"4527_CR13","unstructured":"Cesare, M.G.D., Perpetuini, D., Cardone, D., Merla, A.: Assessment of voice disorders using machine learning and vocal analysis of voice samples recorded through smartphones. BioMedInformatics (2024)"},{"key":"4527_CR14","doi-asserted-by":"publisher","unstructured":"Islam, R., Abdel-Raheem, E., Tarique, M.: Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals. Computer Methods and Programs in Biomedicine Update 2, 100074 (2022) https:\/\/doi.org\/10.1016\/j.cmpbup.2022.100074","DOI":"10.1016\/j.cmpbup.2022.100074"},{"key":"4527_CR15","doi-asserted-by":"crossref","unstructured":"Harar, P., Alonso-Hernandezy, J.B., Mekyska, J., Galaz, Z., Burget, R., Smekal, Z.: Voice pathology detection using deep learning: a preliminary study. In: 2017 International Conference and Workshop on Bioinspired Intelligence (IWOBI), pp. 1\u20134 (2017). IEEE","DOI":"10.1109\/IWOBI.2017.7985525"},{"key":"4527_CR16","doi-asserted-by":"publisher","unstructured":"Mohammed, M.A., Abdulkareem, K.H., Mostafa, S.A., Khanapi Abd\u00a0Ghani, M., Maashi, M.S., Garcia-Zapirain, B., Oleagordia, I., Alhakami, H., AL-Dhief, F.T.: Voice pathology detection and classification using convolutional neural network model. Applied Sciences 10(11) (2020) https:\/\/doi.org\/10.3390\/app10113723","DOI":"10.3390\/app10113723"},{"issue":"1","key":"4527_CR17","first-page":"8783751","volume":"2017","author":"TA Mesallam","year":"2017","unstructured":"Mesallam, T.A., Farahat, M., Malki, K.H., Alsulaiman, M., Ali, Z., Al-Nasheri, A., Muhammad, G.: Development of the arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. Journal of healthcare engineering 2017(1), 8783751 (2017)","journal-title":"Journal of healthcare engineering"},{"key":"4527_CR18","volume-title":"Principles of Voice Production","author":"IR Titze","year":"2000","unstructured":"Titze, I.R.: Principles of Voice Production, 2nd edn. National Center for Voice and Speech, Iowa City, IA (2000)","edition":"2"},{"issue":"12","key":"4527_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/2000000001","volume":"1","author":"LR Rabiner","year":"2007","unstructured":"Rabiner, L.R., Schafer, R.W.: Introduction to digital speech processing. Foundations and Trends in Signal Processing 1(12), 1\u2013194 (2007). https:\/\/doi.org\/10.1561\/2000000001","journal-title":"Foundations and Trends in Signal Processing"},{"key":"4527_CR20","unstructured":"Astuti1, Y., Hidayat, R., Bejo, A.: A mel-weighted spectrogram feature extraction for improved speaker recognition system. International Journal of Intelligent Engineering and Systems (2022)"},{"key":"4527_CR21","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-024-20348-y","author":"R Jegan","year":"2024","unstructured":"Jegan, R., Jayagowri, R.: Pathological voice detection using optimized deep residual neural network and explainable artificial intelligence. Multimedia Tools and Applications (2024). https:\/\/doi.org\/10.1007\/s11042-024-20348-y","journal-title":"Multimedia Tools and Applications"}],"container-title":["Signal, Image and Video Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11760-025-04527-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11760-025-04527-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11760-025-04527-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T22:13:56Z","timestamp":1757283236000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11760-025-04527-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,24]]},"references-count":21,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["4527"],"URL":"https:\/\/doi.org\/10.1007\/s11760-025-04527-4","relation":{},"ISSN":["1863-1703","1863-1711"],"issn-type":[{"value":"1863-1703","type":"print"},{"value":"1863-1711","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,24]]},"assertion":[{"value":"12 April 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 July 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 July 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 July 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"909"}}