{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T12:31:13Z","timestamp":1781353873032,"version":"3.54.1"},"reference-count":58,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,2,23]],"date-time":"2022-02-23T00:00:00Z","timestamp":1645574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004281","name":"National Science Center","doi-asserted-by":"publisher","award":["UMO-2016\/21\/N\/ST6\/02612"],"award-info":[{"award-number":["UMO-2016\/21\/N\/ST6\/02612"]}],"id":[{"id":"10.13039\/501100004281","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Laryngeal high-speed videoendoscopy (LHSV) is an imaging technique offering novel visualization quality of the vibratory activity of the vocal folds. However, in most image analysis methods, the interaction of the medical personnel and access to ground truth annotations are required to achieve accurate detection of vocal folds edges. In our fully automatic method, we combine video and acoustic data that are synchronously recorded during the laryngeal endoscopy. We show that the image segmentation algorithm of the glottal area can be optimized by matching the Fourier spectra of the pre-processed video and the spectra of the acoustic recording during the phonation of sustained vowel \/i:\/. We verify our method on a set of LHSV recordings taken from subjects with normophonic voice and patients with voice disorders due to glottal insufficiency. We show that the computed geometric indices of the glottal area make it possible to discriminate between normal and pathologic voices. The median of the Open Quotient and Minimal Relative Glottal Area values for healthy subjects were 0.69 and 0.06, respectively, while for dysphonic subjects were 1 and 0.35, respectively. We also validate these results using independent phoniatrician experts.<\/jats:p>","DOI":"10.3390\/s22051751","type":"journal-article","created":{"date-parts":[[2022,2,24]],"date-time":"2022-02-24T00:53:26Z","timestamp":1645664006000},"page":"1751","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Segmentation of Glottal Images from High-Speed Videoendoscopy Optimized by Synchronous Acoustic Recordings"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7780-6238","authenticated-orcid":false,"given":"Bartosz","family":"Kopczynski","sequence":"first","affiliation":[{"name":"Institute of Electronics, Lodz University of Technology, 90-924 Lodz, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5389-578X","authenticated-orcid":false,"given":"Ewa","family":"Niebudek-Bogusz","sequence":"additional","affiliation":[{"name":"Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-001 Lodz, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5109-3911","authenticated-orcid":false,"given":"Wioletta","family":"Pietruszewska","sequence":"additional","affiliation":[{"name":"Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-001 Lodz, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2757-9828","authenticated-orcid":false,"given":"Pawel","family":"Strumillo","sequence":"additional","affiliation":[{"name":"Institute of Electronics, Lodz University of Technology, 90-924 Lodz, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,2,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1080\/14015430600881901","article-title":"Occupational voice disorders: Is there a firm case for industrial injuries disablement benefit?","volume":"32","author":"Carding","year":"2007","journal-title":"Logop. Phoniatr. Vocol."},{"key":"ref_2","first-page":"25","article-title":"Objective Measures of Stroboscopy and High-Speed Video","volume":"85","author":"Woo","year":"2020","journal-title":"Adv. Otorhinolaryngol."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"669","DOI":"10.1016\/j.jvoice.2018.02.020","article-title":"The 2016 G. Paul Moore Lecture: Lessons in Voice Rehabilitation: Journal of Voice and Clinical Practice","volume":"33","author":"Behlau","year":"2019","journal-title":"J. Voice"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1159\/000091730","article-title":"Epidemiology of voice problems in Dutch teachers","volume":"58","author":"Kooijman","year":"2006","journal-title":"Folia Phoniatr. Logop."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1007\/s004050000299","article-title":"A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS)","volume":"258","author":"Dejonckere","year":"2001","journal-title":"Eur. Arch. Otorhinolaryngol."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"887","DOI":"10.1044\/2018_AJSLP-17-0009","article-title":"Recommended Protocols for Instrumental Assessment of Voice: American Speech-Language-Hearing Association Expert Panel to Develop a Protocol for Instrumental Assessment of Vocal Function","volume":"27","author":"Patel","year":"2018","journal-title":"Am. J. Speech Lang. Pathol."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Andrade-Miranda, G., Stylianou, Y., Deliyski, D.D., Godino-Llorente, J.I., and Henrich Bernardoni, N. (2020). Laryngeal Image Processing of Vocal Folds Motion. Appl. Sci., 10.","DOI":"10.3390\/app10051556"},{"key":"ref_8","unstructured":"Chang, M.X., and Leonardus Willems, F. (2002). Human Speech Processing Apparatus for Detecting Instants of Glottal Closure. (No. 6,470,308), U.S. Patent."},{"key":"ref_9","unstructured":"Grygiel, J., Strumi\u0142\u0142o, P., and Niebudek-Bogusz, E. (2011, January 29\u201330). Application of Mel Cepstral processing and Support Vector Machines for diagnosing vocal disorders from voice recordings. Proceedings of the Signal Processing Algorithms, Architectures, Arrangements, and Applications, SPA 2011, Poznan, Poland."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1109\/TASLP.2016.2516647","article-title":"Relationships between vocal function measures derived from an acoustic microphone and a subglottal neck-surface accelerometer","volume":"24","author":"Mehta","year":"2016","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"73","DOI":"10.3109\/14015439.2016.1174293","article-title":"Quantitative assessment of videolaryngostroboscopic images in patients with glottic pathologies","volume":"42","author":"Kopczynski","year":"2017","journal-title":"Logop. Phoniatr. Vocology"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"182","DOI":"10.3109\/14015439.2012.731083","article-title":"Vocal fold vibration amplitude open quotient speed quotient and their variability along glottal length: Kymographic data from normal subjects","volume":"38","author":"Lohscheller","year":"2013","journal-title":"Logop. Phoniatr. Vocology"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Sujecka, J., \u015awiech, W., Poryza\u0142a, P., and Borowska-Terka, A. (2018). A prototype system for quantitative assessment of voice fatigue: Design for accessibility. Ergonomics for People with Disabilities, De Gruyter.","DOI":"10.2478\/9783110617832-012"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1109\/JSTSP.2019.2959267","article-title":"Improved Subglottal Pressure Estimation from Neck-Surface Vibration in Healthy Speakers Producing Non-Modal Phonation","volume":"14","author":"Lin","year":"2020","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1744","DOI":"10.1109\/TBME.2009.2015772","article-title":"Improving Reliability and Accuracy of Vibration Parameters of Vocal Folds Based on High-Speed Video and Electroglottography","volume":"56","author":"Qin","year":"2009","journal-title":"IEEE Trans. Biomed. Eng."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1044\/1058-0360(2011\/09-0086)","article-title":"Vocal fold phase asymmetries in patients with voice disorders: A study across visualization techniques","volume":"21","author":"Bonilha","year":"2012","journal-title":"Am. J. Speech-Lang. Pathol."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.jvoice.2013.08.002","article-title":"Correlation among the Dysphonia Severity Index (DSI), the RBH voice perceptual evaluation, and minimum glottal area in female patients with vocal fold nodules","volume":"28","author":"Gaber","year":"2011","journal-title":"J. Voice"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1121\/1.2804939","article-title":"Three registers in an untrained female singer analyzed by videokymography, strobolaryngoscopy and sound spectrography","volume":"123","author":"Sundberg","year":"2008","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1016\/S0892-1997(96)80047-6","article-title":"Videokymography: High-speed line scanning of vocal fold vibration","volume":"10","author":"Schutte","year":"1996","journal-title":"J. Voice"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1488","DOI":"10.1044\/2015_JSLHR-S-14-0253","article-title":"Laryngeal High-Speed Videoendoscopy: Rationale and Recommendation for Accurate and Consistent Terminology","volume":"58","author":"Deliyski","year":"2015","journal-title":"J. Speech Lang. Hear. Res."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1016\/j.jvoice.2017.05.002","article-title":"Utility of Laryngeal Highspeed Videoendoscopy in Clinical Voice Assessment","volume":"32","author":"Zacharias","year":"2017","journal-title":"J. Voice"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Hewavitharanage, S., Gubbi, J., Thyagarajan, D., Lau, K., and Palaniswami, M. (2015, January 25\u201329). Estimation of vocal fold plane in 3D CT images for diagnosis of vocal fold abnormalities. Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy.","DOI":"10.1109\/EMBC.2015.7319049"},{"key":"ref_23","unstructured":"Titze, I.R. (2021, December 25). The Myoelatic Aerodynamic Theory of Phonation, Iowa City: National Center for Voice and Speech. Available online: https:\/\/www.worldcat.org\/title\/myoelastic-aerodynamic-theory-of-phonation\/oclc\/79872494."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.compmedimag.2007.12.003","article-title":"Segmentation of the Glottal Space from Laryngeal Images using the Watershed Transform","volume":"32","author":"hon","year":"2008","journal-title":"Comput. Med. Imaging Graph."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Skalski, A., Zielinki, T., and Deliyski, D. (2008, January 14\u201317). Analysis of Vocal Folds Movement in High Speed Videoendoscopy Based on Level Set Segmentation and Image Registration. Proceedings of the 2008 International Conference on Signals and Electronic Systems Krakow, Krak\u00f3w, Poland.","DOI":"10.1109\/ICSES.2008.4673399"},{"key":"ref_26","first-page":"818415","article-title":"Automatic Segmentation of High Speed Video Images of Vocal Folds","volume":"2014","year":"2014","journal-title":"J. Appl. Math."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1111\/coa.13247","article-title":"Laryngeal stroboscopy\u2014Normative values for amplitude, open quotient, asymmetry and phase difference in young adults","volume":"44","author":"Sobol","year":"2019","journal-title":"Clin. Otolaryngol."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1109\/JBHI.2014.2374975","article-title":"Laryngeal Tumor Detection and Classification in Endoscopic Video","volume":"20","author":"Barbalata","year":"2016","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"13760","DOI":"10.1038\/s41598-021-93149-0","article-title":"OpenHSV: An open platform for laryngeal high-speed videoendoscopy","volume":"11","author":"Kist","year":"2021","journal-title":"Sci. Rep."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"D\u00edaz-C\u00e1diz, M.E., Peterson, S.D., Galindo, G.E., Espinoza, V.M., Motie-Shirazi, M., Erath, B.D., and Za\u00f1artu, M. (2019). Estimating Vocal Fold Contact Pressure from Raw Laryngeal High-Speed Videoendoscopy Using a Hertz Contact Model. Appl. Sci., 9.","DOI":"10.3390\/app9112384"},{"key":"ref_31","unstructured":"Andrade-Miranda, G., and Godino-Llorente, J.I. (May, January 29). ROI detection in high speed laryngeal images. Proceedings of the IEEE 11th International Symposium on Biomedical Imaging (ISBI), Beijing, China."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1109\/TBME.2014.2364862","article-title":"Fully Automated Glottis Segmentation in Endoscopic Videos Using Local Color and Shape Features of Glottal Regions","volume":"62","author":"Gloger","year":"2015","journal-title":"IEEE Trans. Biomed. Eng."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"e02715","DOI":"10.1002\/cnm.2715","article-title":"Tracing vocal fold vibrations using level set segmentation method","volume":"31","author":"Shi","year":"2015","journal-title":"Int. J. Numer. Methods Biomed. Eng."},{"key":"ref_34","first-page":"1","article-title":"Automatic high-speed video glottis segmentation using salient regions and 3D geodesic active contours","volume":"2015","author":"Schenk","year":"2015","journal-title":"Ann. BMVA"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1016\/j.jvoice.2013.07.014","article-title":"Graphical Evaluation of Vocal Fold Vibratory Patterns by High-Speed Videolaryngoscopy","volume":"28","author":"Pinheiro","year":"2014","journal-title":"J. Voice"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Fehling, M.K., Grosch, F., Elke Schuster, M., Schick, B., and Lohscheller, J. (2020). Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network. PLoS ONE, 15.","DOI":"10.1371\/journal.pone.0227791"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Vojtech, J.M., Cilento, D.D., Luong, A.T., Noordzij, J.P., Diaz-Cadiz, M., Groll, M.D., Buckley, D.P., McKenna, V.S., Noordzij, J.P., and Stepp, C.E. (2021). Acoustic Identification of the Voicing Boundary during Intervocalic Offsets and Onsets Based on Vocal Fold Vibratory Measures. Appl. Sci., 11.","DOI":"10.3390\/app11093816"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"20480","DOI":"10.1038\/s41598-021-99948-9","article-title":"Comparative analysis of high-speed videolaryngoscopy images and sound data simultaneously acquired from rigid and flexible laryngoscope: A pilot study","volume":"11","author":"Pietruszewska","year":"2021","journal-title":"Sci. Rep."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Li, Z., Wilson, A., Sayce, L., Avhad, A., Rousseau, B., and Luo, H. (2021). Numerical and Experimental Investigations on Vocal Fold Approximation in Healthy and Simulated Unilateral Vocal Fold Paralysis. Appl. Sci., 11.","DOI":"10.3390\/app11041817"},{"key":"ref_40","unstructured":"Ismail, M.A., Deshmukh, S., and Singh, R. (2021, January 6\u201311). Detection of COVID-19 Through the Analysis of Vocal Fold Oscillations. Proceedings of the ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, USA."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Kopczynski, B., Strumillo, P., Just, M., and Niebudek-Bogusz, E. (2018, January 7\u201310). Acoustic Based Method for Automatic Segmentation of Images of Objects in Periodic Motion: Detection of vocal folds edges case study. Proceedings of the Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA), Xi\u2019an, China.","DOI":"10.1109\/IPTA.2018.8608152"},{"key":"ref_42","unstructured":"Gonzales, R.C., and Woods, R.E. (2017). Digital Image Processing, Pearson Education International. [4th ed.]."},{"key":"ref_43","unstructured":"Bengio, Y., Goodfellow, I., and Courville, A. (2016). Deep Learning, MIT Press."},{"key":"ref_44","unstructured":"(2021, July 30). DiagNova Technologies Company. Available online: http:\/\/www.diagnova.pl."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1016\/j.jvoice.2011.02.001","article-title":"Vocal fold vibratory characteristics in normal female speakers from high-speed digital imaging","volume":"26","author":"Ahmad","year":"2012","journal-title":"J. Voice"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1016\/j.jvoice.2011.12.010","article-title":"Evaluation of vocal fold vibration with an assessment form for high-speed digital imaging: Comparative study between healthy young and elderly subjects","volume":"26","author":"Yamauchi","year":"2012","journal-title":"J. Voice"},{"key":"ref_47","first-page":"9","article-title":"Toward a better vocal tract model","volume":"19","author":"Wakita","year":"1978","journal-title":"Speech Transm. Lab. Q. Prog."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Flanagan, J. (1971). Speech Analysis Synthesis and Perception 1965, Springer. [2nd ed.].","DOI":"10.1007\/978-3-662-00849-2"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1016\/j.jvoice.2014.01.016","article-title":"Age- and gender-related difference of vocal fold vibration and glottal configuration in normal speakers: Analysis with glottal area waveform","volume":"28","author":"Yamauchi","year":"2014","journal-title":"J. Voice"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jvoice.2014.12.008","article-title":"Vocal Fold Vibration in Vocal Fold Atrophy: Quatitative Analysis with High Speed-Digital Imaging","volume":"29","author":"Yamauchi","year":"2015","journal-title":"J. Voice"},{"key":"ref_51","unstructured":"Rubin, J., Sataloff, R., and Korovin, G. (2014). Occupational Voice. Diagnosis and Treatment of Voice Disorders, Plural Publishing. [4th ed.]."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"811.e1","DOI":"10.1016\/j.jvoice.2018.04.011","article-title":"Dependencies and Ill-designed Parameters within High-speed Videoendoscopy and Acoustic Signal Analysis","volume":"33","author":"Schlegel","year":"2018","journal-title":"J. Voice"},{"key":"ref_53","unstructured":"Koszty\u0142a-Hojna, B., Zdrojkowski, M., and Duchnowska, E. (2020). Application of the HRES 5562 Camera Using the HSDI Technique in the Diagnosis of Glottal Insufficiencies in Teachers. J. Voice."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1044\/2016_AJSLP-15-0050","article-title":"Comparison of videostroboscopy to stroboscopy derived from high-speed videoendoscopy for evaluating patients with vocal fold mass lesions","volume":"25","author":"Powell","year":"2016","journal-title":"Am. J. Speech-Lang. Pathol."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Yamauchi, A., Imagawa, H., Yokonishi, H., Sakakibara, K.-I., and Tayama, N. (2021). Multivariate Analysis of Vocal Fold Vibrations on Various Voice Disorders Using High-Speed Digital Imaging. Appl. Sci., 11.","DOI":"10.3390\/app11146284"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"1417","DOI":"10.1121\/1.1850031","article-title":"Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency","volume":"117","author":"Henrich","year":"2005","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1044\/2013_JSLHR-S-12-0202","article-title":"Objective quantification of pre- and postphonosurgery vocal fold vibratory characteristics using high-speed videoendoscopy and a harmonic waveform model","volume":"57","author":"Ikuma","year":"2014","journal-title":"J. Speech Lang. Hear. Res."},{"key":"ref_58","unstructured":"Yousef, A.M., Deliyski, D.D., Zacharias, S.R.C., de Alarcon, A., Orlikoff, R.F., and Naghibolhosseini, M. (2020). Spatial Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech. J. Voice."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/5\/1751\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:26:30Z","timestamp":1760135190000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/5\/1751"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,23]]},"references-count":58,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["s22051751"],"URL":"https:\/\/doi.org\/10.3390\/s22051751","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,23]]}}}