{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T04:10:28Z","timestamp":1768277428968,"version":"3.49.0"},"reference-count":49,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,5,9]],"date-time":"2024-05-09T00:00:00Z","timestamp":1715212800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Perceptual measures, such as intelligibility and speech disorder severity, are widely used in the clinical assessment of speech disorders in patients treated for oral or oropharyngeal cancer. Despite their widespread usage, these measures are known to be subjective and hard to reproduce. Therefore, an M-Health assessment based on an automatic prediction has been seen as a more robust and reliable alternative. Despite recent progress, these automatic approaches still remain somewhat theoretical, and a need to implement them in real clinical practice rises. Hence, in the present work we introduce SAMI, a clinical mobile application used to predict speech intelligibility and disorder severity as well as to monitor patient progress on these measures over time. The first part of this work illustrates the design and development of the systems supported by SAMI. Here, we show how deep neural speaker embeddings are used to automatically regress speech disorder measurements (intelligibility and severity), as well as the training and validation of the system on a French corpus of head and neck cancer. Furthermore, we also test our model on a secondary corpus recorded in real clinical conditions. The second part details the results obtained from the deployment of our system in a real clinical environment, over the course of several weeks. In this section, the results obtained with SAMI are compared to an <jats:italic>a posteriori<\/jats:italic> perceptual evaluation, conducted by a set of experts on the new recorded data. The comparison suggests a high correlation and a low error between the perceptual and automatic evaluations, validating the clinical usage of the proposed application.<\/jats:p>","DOI":"10.3389\/frai.2024.1359094","type":"journal-article","created":{"date-parts":[[2024,5,9]],"date-time":"2024-05-09T05:15:34Z","timestamp":1715231734000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["SAMI: an M-Health application to telemonitor intelligibility and speech disorder severity in head and neck cancers"],"prefix":"10.3389","volume":"7","author":[{"given":"Sebasti\u00e3o","family":"Quintas","sequence":"first","affiliation":[]},{"given":"Robin","family":"Vaysse","sequence":"additional","affiliation":[]},{"given":"Mathieu","family":"Balaguer","sequence":"additional","affiliation":[]},{"given":"Vincent","family":"Roger","sequence":"additional","affiliation":[]},{"given":"Julie","family":"Mauclair","sequence":"additional","affiliation":[]},{"given":"J\u00e9r\u00f4me","family":"Farinas","sequence":"additional","affiliation":[]},{"given":"Virginie","family":"Woisard","sequence":"additional","affiliation":[]},{"given":"Julien","family":"Pinquier","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,5,9]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"1925","DOI":"10.1109\/TASLP.2018.2847459","article-title":"Non-intrusive speech intelligibility prediction using convolutional neural networks","volume":"26","author":"Andersen","year":"2018","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process"},{"key":"B2","author":"Balaguer","year":"2021","journal-title":"Mesure de l'alt\u00e9ration de la communication par analyses automatiques de la parole spontan\u00e9e apr\u00e8s traitement d'un cancer oral ou oropharyng\u00e9"},{"key":"B3","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1016\/j.anorl.2019.05.012","article-title":"Assessment of impairment of intelligibility and of speech signal after oral cavity and oropharynx cancer","volume":"136","author":"Balaguer","year":"2019","journal-title":"Eur. Ann. Otorhinolaryngol. Head Neck Dis"},{"key":"B4","first-page":"3016","article-title":"\u201cAutomatic speech intelligibility scoring of head and neck cancer patients with deep neural networks,\u201d","volume-title":"International Congress of Phonetic Sciences (ICPHs')","author":"Bin","year":"2019"},{"key":"B5","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1016\/S0021-9924(02)00065-5","article-title":"Intelligibility as a linear combination of dimensions in dysarthric speech","volume":"35","author":"Bodt","year":"2002","journal-title":"J. Commun. Disord"},{"key":"B6","first-page":"285","article-title":"\u201cA method of estimating the equal error rate for automatic speaker verification,\u201d","volume-title":"Proceedings of ISCSLP","author":"Cheng","year":"2004"},{"key":"B7","doi-asserted-by":"crossref","first-page":"1776","DOI":"10.21437\/Interspeech.2012-484","article-title":"\u201cA comparative study of adaptive, automatic recognition of disordered speech,\u201d","volume-title":"Proceedings of Interspeech","author":"Christensen","year":"2012"},{"key":"B8","first-page":"1086","article-title":"\u201cVoxceleb2: deep speaker recognition,\u201d","volume-title":"Proceedings of Interspeech","author":"Chung","year":"2018"},{"key":"B9","doi-asserted-by":"publisher","first-page":"240","DOI":"10.1109\/JSTSP.2019.2957977","article-title":"Modeling obstructive sleep apnea voices using deep neural network embeddings and domain-adversarial training","volume":"14","author":"Codosero","year":"2019","journal-title":"IEEE J. Sel. Topics Signal Process"},{"key":"B10","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1097\/00005537-200001000-00018","article-title":"Long-term quality of life of patients with head and neck cancer","volume":"98","author":"de Graeff","year":"2000","journal-title":"Laryngoscope"},{"key":"B11","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1016\/S0892-1997(05)80130-4","article-title":"Perceptual evaluation","volume":"6","author":"Fex","year":"1992","journal-title":"IEEE Trans. Acoust. Speech Signal. Process"},{"key":"B12","doi-asserted-by":"publisher","first-page":"2394","DOI":"10.1044\/2017_JSLHR-S-16-0269","article-title":"Automatic speech recognition predicts speech intelligibility and comprehension for listeners with simulated age-related hearing loss","volume":"50","author":"Fontan","year":"2017","journal-title":"J. Speech Lang. Hear. Res"},{"key":"B13","doi-asserted-by":"publisher","first-page":"664","DOI":"10.1016\/j.specom.2011.04.002","article-title":"How to manage sound, physiological and clinical data of 2500 dysphonic and dysarthric speakers?","volume":"54","author":"Ghio","year":"2012","journal-title":"Speech Commun"},{"key":"B14","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/j.neunet.2021.02.008","article-title":"Residual neural network precisely quantifies dysarthria severity-level based on short-duration speech segments","volume":"139","author":"Gupta","year":"2021","journal-title":"Neural Netw"},{"key":"B15","doi-asserted-by":"publisher","first-page":"562","DOI":"10.1044\/1092-4388(2008\/040)","article-title":"The relationship between listener comprehension and intelligibility scores for speakers with dysarthria","volume":"51","author":"Hustad","year":"2008","journal-title":"J. Speech Lang. Hear. Res"},{"key":"B16","doi-asserted-by":"publisher","first-page":"578369","DOI":"10.3389\/fninf.2021.578369","article-title":"X-vectors: new quantitative biomarkers for early parkinson's disease detection from speech","volume":"15","author":"Jeancolas","year":"2021","journal-title":"Front. Neuroinform"},{"key":"B17","doi-asserted-by":"publisher","first-page":"2009","DOI":"10.1109\/TASLP.2016.2585878","article-title":"An algorithm for predicting the intelligibility of speech masked by modulated noise maskers","volume":"24","author":"Jensen","year":"2016","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process"},{"key":"B18","doi-asserted-by":"publisher","first-page":"222","DOI":"10.1044\/1058-0360(2007\/027)","article-title":"Influence of visual information on the intelligibility of dysarthric speech","volume":"16","author":"Keintz","year":"2007","journal-title":"Am. J. Speech-Lang. Pathol"},{"key":"B19","doi-asserted-by":"crossref","DOI":"10.1075\/sspcl.1","volume-title":"Intelligibility in Speech Disorders","author":"Kent","year":"1992"},{"key":"B20","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1080\/0269920031000086248","article-title":"Toward an acoustic typology of motor speech disorders","volume":"17","author":"Kent","year":"2003","journal-title":"Clin. Linguist. Phon"},{"key":"B21","doi-asserted-by":"publisher","first-page":"326","DOI":"10.1080\/17549500903003094","article-title":"Interaction between prosody and intelligibility","volume":"11","author":"Klopfenstein","year":"2009","journal-title":"Int. J. Speech Lang. Pathol"},{"key":"B22","doi-asserted-by":"crossref","first-page":"1839","DOI":"10.21437\/Interspeech.2017-416","article-title":"\u201cApkinson \u2014 a mobile monitoring solution for Parkinson's disease,\u201d","volume-title":"Proceedings of Interspeech","author":"Klumpp","year":"2017"},{"key":"B23","doi-asserted-by":"publisher","first-page":"2021108","DOI":"10.21008\/j.0860-6897.2021.1.08","article-title":"Voice pathology assessment using X-vectors approach","volume":"32","author":"Kotarba","year":"2021","journal-title":"Vib. Phys. Syst"},{"key":"B24","first-page":"2943","article-title":"\u201cAutomatic evaluation of speech intelligibility based on i-vectors in the context of head and neck cancers,\u201d","volume-title":"Proceedings of Interspeech","author":"Laaridh","year":"2018"},{"key":"B25","first-page":"2804","article-title":"\u201cVoice \u00c4pp: a mobile app for crowdsourcing Swiss German dialect data,\u201d","volume-title":"Proceedings of Interspeech","author":"Leemann","year":"2015"},{"key":"B26","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1046\/j.1532-5415.5153.x","article-title":"Early diagnosis of Alzheimer's disease: clinical and economic benefits","volume":"51","author":"Leifer","year":"2003","journal-title":"J. Am. Geriatr. Soc"},{"key":"B27","doi-asserted-by":"publisher","first-page":"159","DOI":"10.5009\/gnl13401","article-title":"Clinical significance of early detection of esophageal cancer in patients with head and neck cancer","volume":"9","author":"Lim","year":"2015","journal-title":"Gut Liver"},{"key":"B28","doi-asserted-by":"publisher","first-page":"1101","DOI":"10.1089\/tmj.2008.0080","article-title":"Overview of telehealth activities in speech-language pathology","volume":"14","author":"Mashima","year":"2008","journal-title":"Telemed. e-Health"},{"key":"B29","author":"Middag","year":"2012","journal-title":"Automatic analysis of pathological speech"},{"key":"B30","doi-asserted-by":"publisher","first-page":"601","DOI":"10.1111\/1460-6984.12061","article-title":"Measuring up to speech intelligibility","volume":"48","author":"Miller","year":"2013","journal-title":"Int. J. Lang. Commun. Disord"},{"key":"B31","doi-asserted-by":"crossref","first-page":"2818","DOI":"10.21437\/Interspeech.2017-950","article-title":"\u201cVoxceleb: a largescale speaker identification dataset,\u201d","volume-title":"Proceedings of Interspeech","author":"Nagrani","year":"2017"},{"key":"B32","doi-asserted-by":"crossref","first-page":"1151","DOI":"10.21437\/Interspeech.2020-1740","article-title":"\u201cEnd-to-end speech intelligibility prediction using time-domain fully convolutional neural networks,\u201d","volume-title":"Proceedings of Interspeech","author":"Pedersen","year":"2020"},{"key":"B33","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1111\/1460-6984.12672","article-title":"Intelligibility and comprehensibility: a delphi consensus study","volume":"57","author":"Pomm\u00e9e","year":"2021","journal-title":"Int. J. Lang. Commun. Disord"},{"key":"B34","first-page":"1","article-title":"\u201cTowards reducing patient effort for the automatic prediction of speech intelligibility in head and neck cancers,\u201d","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Quintas","year":""},{"key":"B35","first-page":"1","article-title":"\u201cCan we use speaker embeddings on spontaneous speech obtained from medical conversations to predict intelligibility?\u201d","volume-title":"IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","author":"Quintas","year":""},{"key":"B36","doi-asserted-by":"publisher","first-page":"4976","DOI":"10.21437\/Interspeech.2020-1431","article-title":"\u201cAutomatic prediction of speech intelligibility based on x- vectors in the context of head and neck cancer,\u201d","author":"Quintas","year":"2020","journal-title":"in of Interspeech"},{"key":"B37","doi-asserted-by":"crossref","first-page":"3608","DOI":"10.21437\/Interspeech.2022-182","article-title":"\u201cAutomatic assessment of speech intelligibility using consonant similarity for head and neck cancer,\u201d","volume-title":"Proceedings of Interspeech","author":"Quintas","year":"2022"},{"key":"B38","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2106.04624","article-title":"SpeechBrain: a general-purpose speech toolkit","author":"Ravanelli","year":"2021","journal-title":"arXiv"},{"key":"B39","volume-title":"Evaluation de l'intelligibilit\u00e9 apr\u00e8s un cancer ORL : Approche perceptive par d\u00e9codage acoustico-phon\u00e9tique et mesures acoustiques","author":"Rebourg","year":"2022"},{"key":"B40","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU51503.2021.9688278","article-title":"\u201cApplying X-vectors on pathological speech after larynx removal,\u201d","volume-title":"IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","author":"Scheuerer","year":"2021"},{"key":"B41","doi-asserted-by":"publisher","first-page":"1741","DOI":"10.1016\/j.ijporl.2006.05.016","article-title":"Evaluation of speech intelligibility for children with cleft lip and palate by means of automatic speech recognition","volume":"70","author":"Schuster","year":"2006","journal-title":"Int. J. Pediatr. Otorhinolaryngol"},{"key":"B42","first-page":"5329","article-title":"\u201cX-vectors: Robust DNN embedings for speaker recognition,\u201d","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Snyder","year":"2018"},{"key":"B43","first-page":"4214","article-title":"\u201cA short-time objective intelligibility measure for time-frequency weighted noisy speech,\u201d","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Taal","year":"2010"},{"key":"B44","first-page":"471","article-title":"\u201cData augmentation using healthy speech for dysarthric speech recognition,\u201d","volume-title":"Proceedings of Interspeech","author":"Vachhani","year":"2018"},{"key":"B45","doi-asserted-by":"publisher","first-page":"217","DOI":"10.3390\/diagnostics10040217","article-title":"AK-DL: A shallow neural network model for diagnosing actinic keratosis with better performance than deep neural networks","volume":"10","author":"Wang","year":"2020","journal-title":"Diagnostics"},{"key":"B46","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1159\/000121004","article-title":"Automatic quantification of speech intelligibility of adults with oral squamous cell carcinoma","volume":"60","author":"Windrich","year":"2008","journal-title":"Folia Phoniatr Logop"},{"key":"B47","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1007\/s10579-020-09496-3","article-title":"C2SI corpus: a database of speech disorder productions to assess intelligibility and quality of life in head and neck cancers","volume":"55","author":"Woisard","year":"2020","journal-title":"Lang. Resour. Eval"},{"key":"B48","doi-asserted-by":"publisher","first-page":"171","DOI":"10.3109\/1651386X.2010.525375","article-title":"Perception of speech disorders: difference between the degree of intelligibility and the degree of severity","volume":"8","author":"Woisard","year":"2010","journal-title":"Audiol. Med"},{"key":"B49","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv:1910.00330","article-title":"A multi-modal feature embedding approach to diagnose alzheimer disease from spoken language","author":"Zargarbashi","year":"2019","journal-title":"arXiv"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1359094\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,9]],"date-time":"2024-05-09T05:15:48Z","timestamp":1715231748000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1359094\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,9]]},"references-count":49,"alternative-id":["10.3389\/frai.2024.1359094"],"URL":"https:\/\/doi.org\/10.3389\/frai.2024.1359094","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,9]]},"article-number":"1359094"}}