{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T04:11:31Z","timestamp":1776744691279,"version":"3.51.2"},"reference-count":37,"publisher":"MDPI AG","issue":"17","license":[{"start":{"date-parts":[[2022,8,24]],"date-time":"2022-08-24T00:00:00Z","timestamp":1661299200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Despite the lack of findings in laryngeal endoscopy, it is common for patients to undergo vocal problems after thyroid surgery. This study aimed to predict the recovery of the patient\u2019s voice after 3 months from preoperative and postoperative voice spectrograms. We retrospectively collected voice and the GRBAS score from 114 patients undergoing surgery with thyroid cancer. The data for each patient were taken from three points in time: preoperative, and 2 weeks and 3 months postoperative. Using the pretrained model to predict GRBAS as the backbone, the preoperative and 2-weeks-postoperative voice spectrogram were trained for the EfficientNet architecture deep-learning model with long short-term memory (LSTM) to predict the voice at 3 months postoperation. The correlation analysis of the predicted results for the grade, breathiness, and asthenia scores were 0.741, 0.766, and 0.433, respectively. Based on the scaled prediction results, the area under the receiver operating characteristic curve for the binarized grade, breathiness, and asthenia were 0.894, 0.918, and 0.735, respectively. In the follow-up test results for 12 patients after 6 months, the average of the AUC values for the five scores was 0.822. This study showed the feasibility of predicting vocal recovery after 3 months using the spectrogram. We expect this model could be used to relieve patients\u2019 psychological anxiety and encourage them to actively participate in speech rehabilitation.<\/jats:p>","DOI":"10.3390\/s22176387","type":"journal-article","created":{"date-parts":[[2022,8,24]],"date-time":"2022-08-24T23:48:58Z","timestamp":1661384938000},"page":"6387","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1789-8270","authenticated-orcid":false,"given":"Jeong Hoon","family":"Lee","sequence":"first","affiliation":[{"name":"Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul 110799, Korea"}]},{"given":"Chang Yoon","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Otolaryngology, Thyroid\/Head & Neck Cancer Center, The Dongnam Institute of Radiological & Medical Sciences (DIRAMS), Busan 46033, Korea"}]},{"given":"Jin Seop","family":"Eom","sequence":"additional","affiliation":[{"name":"Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si 16677, Korea"}]},{"given":"Mingun","family":"Pak","sequence":"additional","affiliation":[{"name":"Microsoft, Redmond, WA 98052, USA"}]},{"given":"Hee Seok","family":"Jeong","sequence":"additional","affiliation":[{"name":"Department of Radiology, Pusan National University Yangsan Hospital, Yangsan 50612, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2845-1423","authenticated-orcid":false,"given":"Hee Young","family":"Son","sequence":"additional","affiliation":[{"name":"Department of Otolaryngology, Thyroid\/Head & Neck Cancer Center, The Dongnam Institute of Radiological & Medical Sciences (DIRAMS), Busan 46033, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,24]]},"reference":[{"key":"ref_1","first-page":"37","article-title":"Voice Changes after Thyroidectomy without Recurrent Laryngeal Nerve Injury","volume":"21","author":"Choi","year":"2010","journal-title":"J. Korean Soc. Laryngol. Phoniatr. Logop."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1174","DOI":"10.1016\/j.surg.2009.09.010","article-title":"Long-Term Outcome of Functional Post-Thyroidectomy Voice and Swallowing Symptoms","volume":"146","author":"Lombardi","year":"2009","journal-title":"Surgery"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1044\/1058-0360(2012\/12-0014)","article-title":"Evidence-Based Clinical Voice Assessment: A Systematic Review","volume":"22","author":"Roy","year":"2013","journal-title":"Am. J. Speech-Lang. Pathol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1089\/thy.2010.1632","article-title":"The Importance of Pre-and Postoperative Laryngeal Examination for Thyroid Surgery","volume":"20","author":"Randolph","year":"2010","journal-title":"Thyroid"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1177\/0194599813487301","article-title":"Clinical Practice Guideline: Improving Voice Outcomes after Thyroid Surgery","volume":"148","author":"Chandrasekhar","year":"2013","journal-title":"Otolaryngol. Head Neck Surg."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.surg.2009.11.017","article-title":"Functional Voice Outcomes after Thyroidectomy: An Assessment of the Dsyphonia Severity Index (DSI) after Thyroidectomy","volume":"147","author":"Henry","year":"2010","journal-title":"Surgery"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1016\/S0892-1997(97)80026-4","article-title":"Test-Retest Study of the GRBAS Scale: Influence of Experience and Professional Background on Perceptual Rating of Voice Quality","volume":"11","author":"Wuyts","year":"1997","journal-title":"J. Voice"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1016\/S0892-1997(99)80006-X","article-title":"Is the Reliability of a Visual Analog Scale Higher than an Ordinal Scale? An Experiment with the GRBAS Scale for the Perceptual Evaluation of Dysphonia","volume":"13","author":"Wuyts","year":"1999","journal-title":"J. Voice"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1016\/j.jvoice.2003.12.004","article-title":"Perceptual Evaluation of Voice Quality and Its Correlation with Acoustic Measurements","volume":"18","author":"Bhuta","year":"2004","journal-title":"J. Voice"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"329","DOI":"10.25046\/aj030641","article-title":"Machine Learning Applied to GRBAS Voice Quality Assessment","volume":"3","author":"Xie","year":"2018","journal-title":"Adv. Sci. Technol. Eng. Syst. J."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1109\/LSP.2010.2100380","article-title":"Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions","volume":"18","author":"Dennis","year":"2010","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.asoc.2016.12.024","article-title":"An Evaluation of Convolutional Neural Networks for Music Classification Using Spectrograms","volume":"52","author":"Costa","year":"2017","journal-title":"Appl. Soft Comput."},{"key":"ref_13","unstructured":"Sakashita, Y., and Aono, M. (2018). Acoustic Scene Classification by Ensemble of Spectrograms Based on Adaptive Temporal Divisions. Technical report, Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge, DCASE Community."},{"key":"ref_14","unstructured":"Wyse, L. (2017). Audio Spectrogram Representations for Processing with Convolutional Neural Networks. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xie, S., Yan, N., Yu, P., Ng, M.L., Wang, L., and Ji, Z. (2016, January 8\u201312). Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale. Proceedings of the Interspeech, San Francisco, CA, USA.","DOI":"10.21437\/Interspeech.2016-986"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"S36","DOI":"10.1016\/j.metabol.2017.01.011","article-title":"Artificial Intelligence in Medicine","volume":"69","author":"Hamet","year":"2017","journal-title":"Metabolism"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1332","DOI":"10.1089\/thy.2018.0082","article-title":"Deep Learning\u2014Based Computer-Aided Diagnosis System for Localization and Diagnosis of Metastatic Lymph Nodes on Ultrasound: A Pilot Study","volume":"28","author":"Lee","year":"2018","journal-title":"Thyroid"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"5452","DOI":"10.1007\/s00330-019-06098-8","article-title":"Application of Deep Learning to the Diagnosis of Cervical Lymph Node Metastasis from Thyroid Cancer with CT","volume":"29","author":"Lee","year":"2019","journal-title":"Eur. Radiol."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"3066","DOI":"10.1007\/s00330-019-06652-4","article-title":"Application of Deep Learning to the Diagnosis of Cervical Lymph Node Metastasis from Thyroid Cancer with CT: External Validation and Clinical Utility for Resident Training","volume":"30","author":"Lee","year":"2020","journal-title":"Eur. Radiol."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"41034","DOI":"10.1109\/ACCESS.2018.2856238","article-title":"Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework","volume":"6","author":"Alhussein","year":"2018","journal-title":"IEEE Access"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"674","DOI":"10.1590\/S0004-282X2012000900005","article-title":"Idiopathic Parkinson\u2019s Disease: Vocal and Quality of Life Analysis","volume":"70","author":"Gama","year":"2012","journal-title":"Arq. Neuropsiquiatr."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1525","DOI":"10.1002\/lary.28282","article-title":"Predictors of Voice Outcome in Pediatric Non-Selective Laryngeal Reinnervation","volume":"130","author":"Ongkasuwan","year":"2020","journal-title":"Laryngoscope"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/j.jvoice.2016.04.017","article-title":"Analysis of Temporal Change in Voice Quality after Thyroidectomy: Single-Institution Prospective Study","volume":"31","author":"Lee","year":"2017","journal-title":"J. Voice"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"4543","DOI":"10.1007\/s00405-016-4163-6","article-title":"Voice Outcomes after Thyroidectomy without Superior and Recurrent Laryngeal Nerve Injury: VoiSS Questionnaire and GRBAS Tool Assessment","volume":"273","author":"Tedla","year":"2016","journal-title":"Eur. Arch. Oto-Rhino-Laryngol."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tulics, M.G., Szasz\u00e1k, G., M\u00e9sz\u00e1ros, K., and Vicsi, K. (2020, January 23\u201325). Using ASR Posterior Probability and Acoustic Features for Voice Disorder Classification. Proceedings of the 2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Mariehamn, Aland.","DOI":"10.1109\/CogInfoCom50765.2020.9237866"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Tulics, M.G., Szasz\u00e1k, G., M\u00e9sz\u00e1ros, K., and Vicsi, K. (2019, January 23\u201325). Artificial Neural Network and Svm Based Voice Disorder Classification. Proceedings of the 2019 10th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Naples, Italy.","DOI":"10.1109\/CogInfoCom47531.2019.9089908"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1016\/j.jvoice.2017.01.006","article-title":"Predicting Voice Disorder Status from Smoothed Measures of Cepstral Peak Prominence Using Praat and Analysis of Dysphonia in Speech and Voice (ADSV)","volume":"31","author":"Sauder","year":"2017","journal-title":"J. Voice"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/S0892-1997(05)80308-X","article-title":"Acoustic Analysis of Functional Dysphonia: Before and after Voice Therapy (Accent Method)","volume":"8","author":"Fex","year":"1994","journal-title":"J. Voice"},{"key":"ref_29","unstructured":"Tan, M., and Le, Q. (2019). V EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"5929","DOI":"10.1007\/s10462-020-09838-1","article-title":"A Review on the Long Short-Term Memory Model","volume":"53","author":"Mosquera","year":"2020","journal-title":"Artif. Intell. Rev."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mikolov, T., Kombrink, S., Burget, L., \u010cernock\u00fd, J., and Khudanpur, S. (2011, January 22\u201327). Extensions of Recurrent Neural Network Language Model. Proceedings of the 2011 IEEE International Conference on Acoustics, Speech Signal Processing (ICASSP), Prague, Czech Republic.","DOI":"10.1109\/ICASSP.2011.5947611"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017, January 22\u201329). Grad-Cam: Visual Explanations from Deep Networks via Gradient-Based Localization. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.74"},{"key":"ref_33","first-page":"413","article-title":"Multimodal and Multi-Output Deep Learning Architectures for the Automatic Assessment of Voice Quality Using the Grb Scale","volume":"14","year":"2019","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.jvoice.2020.02.009","article-title":"Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network","volume":"36","author":"Fujimura","year":"2020","journal-title":"J. Voice"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"16246","DOI":"10.1109\/ACCESS.2018.2816338","article-title":"Voice Disorder Identification by Using Machine Learning Techniques","volume":"6","author":"Verde","year":"2018","journal-title":"IEEE Access"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"947.e11","DOI":"10.1016\/j.jvoice.2018.07.014","article-title":"A Survey on Machine Learning Approaches for Automatic Detection of Voice Disorders","volume":"33","author":"Hegde","year":"2019","journal-title":"J. Voice"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"513.e1","DOI":"10.1016\/j.jvoice.2016.12.003","article-title":"Voice-Vibratory Assessment with Laryngeal Imaging (VALI) Form: Reliability of Rating Stroboscopy and High-Speed Videoendoscopy","volume":"31","author":"Poburka","year":"2017","journal-title":"J. Voice"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/17\/6387\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:14:52Z","timestamp":1760141692000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/17\/6387"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,24]]},"references-count":37,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["s22176387"],"URL":"https:\/\/doi.org\/10.3390\/s22176387","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,24]]}}}