{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T10:55:25Z","timestamp":1769856925531,"version":"3.49.0"},"reference-count":20,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T00:00:00Z","timestamp":1747699200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"The National Natural Science Foundation of China","award":["62371261"],"award-info":[{"award-number":["62371261"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>With the development of the marine economy and the increase in marine activities, deep saturation diving has gained significant attention. Helium speech communication is indispensable for saturation diving operations and is a critical technology for deep saturation diving, serving as the sole communication method to ensure the smooth execution of such operations. This study introduces deep learning into helium speech recognition and proposes a spectrogram-based dual-model helium speech recognition method. First, we extract the spectrogram features from the helium speech. Then, we combine a deep fully convolutional neural network with connectionist temporal classification (CTC) to form an acoustic model, in which the spectrogram features of helium speech are used as an input to convert speech signals into phonetic sequences. Finally, a maximum entropy hidden Markov model (MEMM) is employed as the language model to convert the phonetic sequences to word outputs, which is regarded as a dynamic programming problem. We use a Viterbi algorithm to find the optimal path to decode the phonetic sequences to word sequences. The simulation results show that the method can effectively recognize helium speech with a recognition rate of 97.89% for isolated words and 95.99% for continuous helium speech.<\/jats:p>","DOI":"10.3390\/bdcc9050136","type":"journal-article","created":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T06:10:46Z","timestamp":1747721446000},"page":"136","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Helium Speech Recognition Method Based on Spectrogram with Deep Learning"],"prefix":"10.3390","volume":"9","author":[{"given":"Yonghong","family":"Chen","sequence":"first","affiliation":[{"name":"School of Information Engineering, Jiangsu College of Engineering and Technology, Nantong 226006, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8836-376X","authenticated-orcid":false,"given":"Shibing","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Nantong University, Nantong 226019, China"}]},{"given":"Dongmei","family":"Li","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Nantong University, Nantong 226019, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,5,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1109\/TAU.1968.1161958","article-title":"Problems of diver communication","volume":"16","author":"Hunter","year":"1968","journal-title":"IEEE Trans. Audio Electroacoust."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"68","DOI":"10.23919\/JCC.2020.06.006","article-title":"A survey on helium speech communications in saturation diving","volume":"17","author":"Zhang","year":"2020","journal-title":"China Commun."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1121\/1.1918935","article-title":"Helium speech","volume":"36","author":"Holywell","year":"1964","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1238","DOI":"10.1121\/1.1942790","article-title":"Analysis of speech in a helium-oxygen mixture under pressure","volume":"39","author":"Maclean","year":"1966","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_5","unstructured":"Flower, R.A. (1969). Final Technical Report on Helium Speech Investigations, Singer-General Precision, Inc."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"715","DOI":"10.1121\/1.1912690","article-title":"Speech in deep-submergence atmospheres","volume":"50","author":"Morrow","year":"1971","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_7","first-page":"249","article-title":"Speech intelligibility as a function of ambient pressure and HeO2 atmosphere","volume":"44","author":"Hollien","year":"1973","journal-title":"Aerosp. Med."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"830","DOI":"10.1121\/1.1910655","article-title":"On the role of formant transitions in vowel recognition","volume":"42","author":"Lindblom","year":"1967","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_9","first-page":"1","article-title":"Pressure and gas mixture effects on driver\u2019s speech","volume":"9","author":"Fant","year":"1968","journal-title":"Q. Prog. Status Rep."},{"key":"ref_10","unstructured":"Lunde, P. (1985, January 26\u201329). Acoustic transmission-line analysis of formants in hyperbaric helium speech. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Tampa, FL, USA."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"798","DOI":"10.1121\/1.1910898","article-title":"Spectrographic analysis of divers\u2019 speech during decompression","volume":"43","author":"Brubaker","year":"1968","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1049\/ree.1982.0032","article-title":"Helium speech effect and electronic techniques for enhancing intelligibility in a helium-oxygen environment","volume":"52","author":"Jack","year":"1982","journal-title":"Radio Electron. Eng."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1109\/TAU.1973.1162509","article-title":"Helium speech unscramblers: A critical review of the state of the art","volume":"21","author":"Thomas","year":"1973","journal-title":"IEEE Trans. Audio Electroacoust."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1121\/1.1910331","article-title":"Technique for correcting helium speech distortion","volume":"41","author":"Stover","year":"1967","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_15","first-page":"61","article-title":"Translation of helium speech by the use of analytic signal","volume":"21","author":"Takasugi","year":"1974","journal-title":"J. Radio Res. Lab."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2640","DOI":"10.1109\/TUFFC.2020.2983099","article-title":"Synthetic elastography using B-mode ultrasound through a deep fully convolutional neural network","volume":"67","author":"Wildeboer","year":"2020","journal-title":"IEEE Trans. Ultrason. Ferroelectr. Freq. Control"},{"key":"ref_17","first-page":"640","article-title":"Fully convolutional networks for semantic segmentation","volume":"39","author":"Long","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"60305","DOI":"10.1109\/ACCESS.2020.2982939","article-title":"GeminiNet: Combining fully convolutional network with structure of receptive fields for object detection","volume":"8","author":"Yao","year":"2020","journal-title":"IEEE Access"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Li, D.M., Zhang, S.B., Guo, L.L., and Chen, Y.H. (2020, January 21\u201323). Helium speech correction algorithm based on deep neural networks. Proceedings of the 2020 International Conference on Wireless Communications and Signal Processing, Nanjing, China.","DOI":"10.1109\/WCSP49889.2020.9299782"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, H.J., Chen, Y.X., Ji, H.W., and Zhang, S.B. (2024). A helium speech correction method based on generative adversarial networks. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8110158"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/5\/136\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:35:39Z","timestamp":1760031339000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/5\/136"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,20]]},"references-count":20,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["bdcc9050136"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9050136","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,20]]}}}