{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T02:46:34Z","timestamp":1776393994841,"version":"3.51.2"},"reference-count":68,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T00:00:00Z","timestamp":1749081600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>Functional voice disorders are characterized by impaired voice production without primary organic changes, posing challenges for standardized assessment. Current diagnostic methods rely heavily on subjective evaluation, suffering from inter-rater variability. High-speed videoendoscopy (HSV) offers an objective alternative by capturing true intra-cycle vocal fold behavior. Integrating time-synchronized acoustic and HSV recordings could allow for an objective visual and acoustic assessment of vocal function based on a single HSV examination. This study investigates a machine learning-based approach for hoarseness severity assessment using synchronous HSV and acoustic recordings, alongside conventional voice examinations.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>Three databases comprising 457 HSV recordings of the sustained vowel \/i\/, 634 HSV-synchronized acoustic recordings, and clinical parameters from 923 visits were analyzed. Subjects were classified into two hoarseness groups based on auditory-perceptual ratings, with predicted scores serving as continuous hoarseness severity ratings. A videoendoscopic model was developed by selecting a suitable classification algorithm and a minimal-optimal subset of glottal parameters. This model was compared against an acoustic model based on HSV-synchronized recordings and a clinical model based on parameters from other examinations. Two ensemble models were constructed by combining the HSV-based models and all models, respectively. Model performance was evaluated on a shared test set based on classification accuracy, correlation with subjective ratings, and correlation between predicted and observed changes in hoarseness severity.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>The videoendoscopic, acoustic, and clinical model achieved correlations of 0.464, 0.512, and 0.638 with subjective hoarseness ratings. Integrating glottal and acoustic parameters into the HSV-based ensemble model improved correlation to 0.603, confirming the complementary nature of time-synchronized HSV and acoustic recordings. The ensemble model incorporating all modalities achieved the highest correlation of 0.752, underscoring the diagnostic value of multimodal objective assessments.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>This study highlights the potential of synchronous HSV and acoustic recordings for objective hoarseness severity assessment, offering a more comprehensive evaluation of vocal function. While practical challenges remain, the integration of these modalities led to notable improvements, supporting their complementary value in enhancing diagnostic accuracy. Future advancements could include flexible nasal endoscopy to enable more natural phonation and refinement of glottal parameter extraction to improve model robustness under variable recording conditions.<\/jats:p><\/jats:sec>","DOI":"10.3389\/frai.2025.1601716","type":"journal-article","created":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T05:28:43Z","timestamp":1749101323000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Machine learning based assessment of hoarseness severity: a multi-sensor approach centered on high-speed videoendoscopy"],"prefix":"10.3389","volume":"8","author":[{"given":"Tobias","family":"Schraut","sequence":"first","affiliation":[]},{"given":"Anne","family":"Sch\u00fctzenberger","sequence":"additional","affiliation":[]},{"given":"Tom\u00e1s","family":"Arias-Vergara","sequence":"additional","affiliation":[]},{"given":"Melda","family":"Kunduk","sequence":"additional","affiliation":[]},{"given":"Matthias","family":"Echternach","sequence":"additional","affiliation":[]},{"given":"Stephan","family":"D\u00fcrr","sequence":"additional","affiliation":[]},{"given":"Julia","family":"Werz","sequence":"additional","affiliation":[]},{"given":"Michael","family":"D\u00f6llinger","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,6,5]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1016\/j.jvoice.2004.03.007","article-title":"Current and emerging concepts in muscle tension dysphonia: a 30-month review","volume":"19","author":"Altman","year":"2005","journal-title":"J. Voice"},{"key":"ref2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvoice.2023.01.014","article-title":"Nyquist plot parametrization for quantitative analysis of vibration of the vocal folds","author":"Arias-Vergara","year":"2023","journal-title":"J. Voice"},{"key":"ref3","doi-asserted-by":"publisher","first-page":"6679","DOI":"10.1609\/aaai.v35i8.16826","article-title":"TabNet: attentive interpretable tabular learning","volume":"35","author":"Arik","year":"2021","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref4","doi-asserted-by":"publisher","first-page":"1531","DOI":"10.1002\/ohn.636","article-title":"The use of deep learning software in the detection of voice disorders: a systematic review","volume":"170","author":"Barlow","year":"2024","journal-title":"Otolaryngol. Head Neck Surg."},{"key":"ref5","first-page":"785","article-title":"XGBoost: A scalable tree boosting system","author":"Chen","year":"2016"},{"key":"ref6","doi-asserted-by":"publisher","first-page":"1656","DOI":"10.1121\/1.4789931","article-title":"Development of a glottal area index that integrates glottal gap size and open quotient","volume":"133","author":"Chen","year":"2013","journal-title":"J. Acoust. Soc. Am."},{"key":"ref7","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511801389","volume-title":"An introduction to support vector machines and other kernel-based learning methods","author":"Cristianini","year":"2000"},{"key":"ref8","volume-title":"Methodenvergleich zur Bestimmung der glottalen Mittelachse bei endoskopischen Hochgeschwindigkeitsvideoaufnahmen von organisch basierten pathologischen Stimmgebungsprozessen [comparison of methods for determining the glottal midline in endoscopic high-speed video recordings of organically based pathological phonation processes] (dissertation)","author":"de Jesus Goncalves","year":"2015"},{"key":"ref9","doi-asserted-by":"publisher","first-page":"250","DOI":"10.1044\/jshr.2102.250","article-title":"Some waveform and spectral features of vowel roughness","volume":"21","author":"Deal","year":"1978","journal-title":"J. Speech Lang. Hear. Res."},{"key":"ref10","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1007\/s004050000299","article-title":"A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques","volume":"258","author":"Dejonckere","year":"2001","journal-title":"Eur. Arch. Otorrinolaringol."},{"key":"ref11","author":"Deliyski","year":"2016"},{"key":"ref12","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1097\/MOO.0b013e3283395dd4","article-title":"State of the art laryngeal imaging: research and clinical implications","volume":"18","author":"Deliyski","year":"2010","journal-title":"Curr. Opin. Otolaryngol. Head Neck Surg."},{"key":"ref13","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1142\/S0219720005001004","article-title":"Minimum redundancy feature selection from microarray gene expression data","volume":"3","author":"Ding","year":"2005","journal-title":"J. Bioinforma. Comput. Biol."},{"key":"ref14","doi-asserted-by":"publisher","first-page":"726","DOI":"10.1016\/j.jvoice.2012.02.001","article-title":"Analysis of vocal fold function from acoustic data simultaneously recorded with high-speed endoscopy","volume":"26","author":"D\u00f6llinger","year":"2012","journal-title":"J. Voice"},{"key":"ref15","doi-asserted-by":"publisher","first-page":"9791","DOI":"10.3390\/app12199791","article-title":"Re-training of convolutional neural networks for glottis segmentation in endoscopic high-speed videos","volume":"12","author":"D\u00f6llinger","year":"2022","journal-title":"Appl. Sci."},{"key":"ref16","doi-asserted-by":"publisher","first-page":"1282574","DOI":"10.3389\/fphys.2024.1282574","article-title":"Neural network-based estimation of biomechanical vocal fold parameters","volume":"15","author":"Donhauser","year":"2024","journal-title":"Front. Physiol."},{"key":"ref17","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1214\/aos\/1016218223","article-title":"Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)","volume":"28","author":"Friedman","year":"2000","journal-title":"Ann. Stat."},{"key":"ref18","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1038\/s41597-020-0526-3","article-title":"BAGLS, a multihospital benchmark for automatic glottis segmentation","volume":"7","author":"G\u00f3mez","year":"2020","journal-title":"Sci. Data"},{"key":"ref19","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1016\/j.engappai.2019.03.027","article-title":"Emulating the perceptual capabilities of a human evaluator to map the GRB scale for the assessment of voice disorders","volume":"82","author":"G\u00f3mez-Garc\u00eda","year":"2019","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref20","doi-asserted-by":"publisher","first-page":"1157","DOI":"10.1162\/153244303322753616","article-title":"An introduction of variable and feature selection","volume":"1","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref21","doi-asserted-by":"publisher","first-page":"1648","DOI":"10.1121\/1.391611","article-title":"Harmonic-intensity analysis of normal and hoarse voices","volume":"76","author":"Hiraoka","year":"1984","journal-title":"J. Acoust. Soc. Am."},{"key":"ref22","doi-asserted-by":"publisher","first-page":"378","DOI":"10.1016\/S0892-1997(00)80083-1","article-title":"Voice-related quality of life (V-RQOL) following type I thyroplasty for unilateral vocal fold paralysis","volume":"14","author":"Hogikyan","year":"2000","journal-title":"J. Voice"},{"key":"ref23","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1121\/1.396829","article-title":"Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice","volume":"84","author":"Holmberg","year":"1988","journal-title":"J. Acoust. Soc. Am."},{"key":"ref24","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1044\/jshr.2301.202","article-title":"Vocal shimmer in sustained phonation","volume":"23","author":"Horii","year":"1980","journal-title":"J. Speech Lang. Hear. Res."},{"key":"ref25","article-title":"Applied logistic regression","volume-title":"Wiley series in probability and statistics.","author":"Hosmer","year":"2013"},{"key":"ref26","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1044\/1058-0360.0603.66","article-title":"The voice handicap index (VHI)","volume":"6","author":"Jacobson","year":"1997","journal-title":"Am. J. Speech-Lang. Pathol."},{"key":"ref27","first-page":"1200","article-title":"A review of feature selection methods with applications","author":"Jovic","year":"2015"},{"key":"ref28","first-page":"1973","article-title":"Novel acoustic measurements of jitter and shimmer characteristics from pathological voice","author":"Kasuya","year":"1993"},{"key":"ref29","doi-asserted-by":"publisher","first-page":"1329","DOI":"10.1121\/1.394384","article-title":"Normalized noise energy as an acoustic measure to evaluate pathologic voice","volume":"80","author":"Kasuya","year":"1986","journal-title":"J. Acoust. Soc. Am."},{"key":"ref30","first-page":"3149","article-title":"LightGBM: A highly efficient gradient boosting decision tree","volume-title":"Proceedings of the 31st international conference on neural information processing systems, NIPS\u201917","author":"Ke","year":"2017"},{"key":"ref31","doi-asserted-by":"publisher","first-page":"13760","DOI":"10.1038\/s41598-021-93149-0","article-title":"OpenHSV: an open platform for laryngeal high-speed videoendoscopy","volume":"11","author":"Kist","year":"","journal-title":"Sci. Rep."},{"key":"ref32","doi-asserted-by":"publisher","first-page":"1889","DOI":"10.1044\/2021_JSLHR-20-00498","article-title":"A deep learning enhanced novel software tool for laryngeal dynamics analysis","volume":"64","author":"Kist","year":"","journal-title":"J. Speech Lang. Hear. Res."},{"key":"ref33","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/0167-6393(87)90066-5","article-title":"The measurement of the signal-to-noise ratio (SNR) in continuous speech","volume":"6","author":"Klingholz","year":"1987","journal-title":"Speech Comm."},{"key":"ref34","doi-asserted-by":"publisher","first-page":"066138","DOI":"10.1103\/PhysRevE.69.066138","article-title":"Estimating mutual information","volume":"69","author":"Kraskov","year":"2004","journal-title":"Phys. Rev. E"},{"key":"ref35","doi-asserted-by":"publisher","first-page":"981","DOI":"10.1002\/lary.20832","article-title":"Assessment of the variability of vocal fold dynamics within and between recordings with high-speed imaging and by phonovibrogram","volume":"120","author":"Kunduk","year":"2010","journal-title":"Laryngoscope"},{"key":"ref36","volume-title":"Entwicklung einer Klassifikationsmethode zur akustischen analyse fortlaufender Sprache unterschiedlicher Stimmg\u00fcte mittels Neuronaler Netze und deren Anwendung [development and application of a classification method for the acoustic analysis of continuous speech with different vocal qualities using neural networks] (dissertation)","author":"Lessing","year":"2007"},{"key":"ref37","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1016\/S0892-1997(98)80076-3","article-title":"Effects of laryngeal endoscopy on the vocal performance of young adult females with normal voices","volume":"12","author":"Lim","year":"1998","journal-title":"J. Voice"},{"key":"ref38","doi-asserted-by":"publisher","first-page":"725","DOI":"10.1016\/j.jvoice.2014.01.018","article-title":"Speech tasks and interrater reliability in perceptual voice evaluation","volume":"28","author":"Lu","year":"2014","journal-title":"J. Voice"},{"key":"ref39","doi-asserted-by":"crossref","DOI":"10.1002\/9780470479216.corpsy0491","article-title":"Kruskal-Wallis Test","volume-title":"The Corsini encyclopedia of psychology","author":"McKight","year":"2010"},{"key":"ref40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1177\/000348941011900101","article-title":"Voice production mechanisms following phonosurgical treatment of early glottic cancer","volume":"119","author":"Mehta","year":"2010","journal-title":"Ann. Otol. Rhinol. Laryngol."},{"key":"ref41","doi-asserted-by":"publisher","first-page":"3999","DOI":"10.1121\/1.3658441","article-title":"Investigating acoustic correlates of human vocal fold vibratory phase asymmetry through modeling and laryngeal high-speed videoendoscopya","volume":"130","author":"Mehta","year":"2011","journal-title":"J. Acoust. Soc. Am."},{"key":"ref42","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1037\/0096-1523.11.4.457","article-title":"Characteristics of velocity profiles of speech movements","volume":"11","author":"Munhall","year":"1985","journal-title":"J. Exp. Psychol. Hum. Percept. Perform."},{"key":"ref43","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1159\/000094569","article-title":"Acoustic changes related to laryngeal examination with a rigid telescope","volume":"58","author":"Ng","year":"2006","journal-title":"Folia Phoniatr. Logop."},{"key":"ref44","doi-asserted-by":"publisher","first-page":"887","DOI":"10.1044\/2018_AJSLP-17-0009","article-title":"Recommended protocols for instrumental assessment of voice: American speech-language-hearing association expert panel to develop a protocol for instrumental assessment of vocal function","volume":"27","author":"Patel","year":"2018","journal-title":"Am. J. Speech-Lang. Pathol."},{"key":"ref45","doi-asserted-by":"publisher","first-page":"20480","DOI":"10.1038\/s41598-021-99948-9","article-title":"Comparative analysis of high-speed videolaryngoscopy images and sound data simultaneously acquired from rigid and flexible laryngoscope: a pilot study","volume":"11","author":"Pietruszewska","year":"2021","journal-title":"Sci. Rep."},{"key":"ref46","doi-asserted-by":"publisher","first-page":"723","DOI":"10.1016\/j.jvoice.2006.06.001","article-title":"Correlation of the voice handicap index (VHI) and the voice-related quality of life measure (V-RQOL)","volume":"21","author":"Portone","year":"2007","journal-title":"J. Voice"},{"key":"ref47","first-page":"6639","article-title":"CatBoost: Unbiased boosting with categorical features","volume-title":"Proceedings of the 32nd international conference on neural information processing systems, NIPS\u201918","author":"Prokhorenkova","year":"2018"},{"key":"ref48","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1159\/000070724","article-title":"An automatic method to quantify the vibration properties of human vocal folds via Videokymography","volume":"55","author":"Qiu","year":"2003","journal-title":"Folia Phoniatr. Logop."},{"key":"ref49","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1007\/0-387-25465-X_9","article-title":"Decision trees","volume-title":"Data mining and knowledge discovery handbook","author":"Rokach","year":"2005"},{"key":"ref50","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1007\/978-3-642-41136-6_5","article-title":"Explaining AdaBoost","volume-title":"Empirical Inference","author":"Schapire","year":"2013"},{"key":"ref51","volume-title":"Assessment of clinical voice parameters and parameter reduction using supervised learning approaches (dissertation)","author":"Schlegel","year":"2020"},{"key":"ref52","doi-asserted-by":"publisher","first-page":"e0246136","DOI":"10.1371\/journal.pone.0246136","article-title":"Interdependencies between acoustic and high-speed videoendoscopy parameters","volume":"16","author":"Schlegel","year":"2021","journal-title":"PLoS One"},{"key":"ref53","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/JTEHM.2020.2985026","article-title":"Determination of clinical parameters sensitive to functional voice disorders applying boosted decision stumps","volume":"8","author":"Schlegel","year":"","journal-title":"IEEE J. Transl. Eng. Health Med."},{"key":"ref54","doi-asserted-by":"publisher","first-page":"10517","DOI":"10.1038\/s41598-020-66405-y","article-title":"Machine learning based identification of relevant parameters for functional voice disorders derived from endoscopic high-speed recordings","volume":"10","author":"Schlegel","year":"","journal-title":"Sci. Rep."},{"key":"ref55","doi-asserted-by":"publisher","first-page":"2666","DOI":"10.3390\/app8122666","article-title":"Influence of analyzed sequence length on parameters in laryngeal high-speed Videoendoscopy","volume":"8","author":"Schlegel","year":"2018","journal-title":"Appl. Sci."},{"key":"ref56","doi-asserted-by":"publisher","first-page":"811.e1","DOI":"10.1016\/j.jvoice.2018.04.011","article-title":"Dependencies and ill-designed parameters within high-speed Videoendoscopy and acoustic signal analysis","volume":"33","author":"Schlegel","year":"2019","journal-title":"J. Voice"},{"key":"ref57","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-7091-1480-3","volume-title":"Stimmdiagnostik: Ein Leitfaden f\u00fcr die Praxis (Voice diagnostics: A guide for practice)","author":"Schneider-Stickler","year":"2013"},{"key":"ref58","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvoice.2024.12.008","article-title":"Machine learning-based estimation of hoarseness severity using acoustic signals recorded during high-speed Videoendoscopy","author":"Schraut","year":"2025","journal-title":"J. Voice"},{"key":"ref59","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1121\/10.0024341","article-title":"Machine learning based estimation of hoarseness severity using sustained vowels","volume":"155","author":"Schraut","year":"2024","journal-title":"J. Acoust. Soc. Am."},{"key":"ref60","doi-asserted-by":"publisher","first-page":"591","DOI":"10.2307\/2333709","article-title":"An analysis of variance test for normality (complete samples)","volume":"52","author":"Shapiro","year":"1965","journal-title":"Biometrika"},{"key":"ref61","doi-asserted-by":"publisher","first-page":"144","DOI":"10.3109\/00016489209100796","article-title":"A comparison of vocal fold closure in rigid telescopic and flexible fiberoptic laryngostroboscopy","volume":"112","author":"S\u00f6dersten","year":"1992","journal-title":"Acta Otolaryngol."},{"key":"ref62","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1001\/archotol.1958.00730020005001","article-title":"Laryngeal vibrations: measurements of the Glottic wave: part I. The Normal Vibratory Cycle","volume":"68","author":"Timcke","year":"1958","journal-title":"Arch. Otolaryngol."},{"key":"ref63","first-page":"232","article-title":"Control of fundamental frequency","volume-title":"Principles of voice production","author":"Titze","year":"2000"},{"key":"ref64","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1016\/j.jbi.2018.07.014","article-title":"Relief-based feature selection: introduction and review","volume":"85","author":"Urbanowicz","year":"2018","journal-title":"J. Biomed. Inform."},{"key":"ref65","doi-asserted-by":"publisher","first-page":"3276","DOI":"10.1044\/2023_JSLHR-23-00027","article-title":"Influence of perspective distortion in laryngoscopy","volume":"66","author":"Veltrup","year":"2023","journal-title":"J. Speech Lang. Hear. Res."},{"key":"ref66","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1016\/j.artmed.2010.01.001","article-title":"Classification of functional voice disorders based on phonovibrograms","volume":"49","author":"Voigt","year":"2010","journal-title":"Artif. Intell. Med."},{"key":"ref67","doi-asserted-by":"publisher","first-page":"1544","DOI":"10.1121\/1.387808","article-title":"Harmonics-to-noise ratio as an index of the degree of hoarseness","volume":"71","author":"Yumoto","year":"1982","journal-title":"J. Acoust. Soc. Am."},{"key":"ref68","doi-asserted-by":"crossref","DOI":"10.1002\/0470011815.b2a15150","article-title":"Spearman rank correlation","volume-title":"Encyclopedia of biostatistics","author":"Zar","year":"2005"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1601716\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T05:28:49Z","timestamp":1749101329000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1601716\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,5]]},"references-count":68,"alternative-id":["10.3389\/frai.2025.1601716"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1601716","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,5]]},"article-number":"1601716"}}