{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:08:09Z","timestamp":1760242089593,"version":"build-2065373602"},"reference-count":33,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2018,12,14]],"date-time":"2018-12-14T00:00:00Z","timestamp":1544745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Deanship of Scientific Research, King Khalid University","award":["364"],"award-info":[{"award-number":["364"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>The performance of many speech processing algorithms depends on modeling speech signals using appropriate probability distributions. Various distributions such as the Gamma distribution, Gaussian distribution, Generalized Gaussian distribution, Laplace distribution as well as multivariate Gaussian and Laplace distributions have been proposed in the literature to model different segment lengths of speech, typically below 200 ms in different domains. In this paper, we attempted to fit Laplace and Gaussian distributions to obtain a statistical model of speech short-time Fourier transform coefficients with high spectral resolution (segment length &gt;500 ms) and low spectral resolution (segment length &lt;10 ms). Distribution fitting of Laplace and Gaussian distributions was performed using maximum-likelihood estimation. It was found that speech short-time Fourier transform coefficients with high spectral resolution can be modeled using Laplace distribution. For low spectral resolution, neither the Laplace nor Gaussian distribution provided a good fit. Spectral domain modeling of speech with different depths of spectral resolution is useful in understanding the perceptual stability of hearing which is necessary for the design of digital hearing aids.<\/jats:p>","DOI":"10.3390\/sym10120750","type":"journal-article","created":{"date-parts":[[2018,12,14]],"date-time":"2018-12-14T04:44:42Z","timestamp":1544762682000},"page":"750","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Probabilistic Modeling of Speech in Spectral Domain using Maximum Likelihood Estimation"],"prefix":"10.3390","volume":"10","author":[{"given":"Mohammed","family":"Usman","sequence":"first","affiliation":[{"name":"Electrical Engineering Department, King Khalid University, Asir-Abha 61421, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammed","family":"Zubair","sequence":"additional","affiliation":[{"name":"Electrical Engineering Department, King Khalid University, Asir-Abha 61421, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammad","family":"Shiblee","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, King Khalid University, Asir-Abha 61421, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paul","family":"Rodrigues","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, King Khalid University, Asir-Abha 61421, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Syed","family":"Jaffar","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, King Khalid University, Asir-Abha 61421, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,12,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Gazor, S., and Zhang, W. (2003). Speech probability distribution. IEEE Signal Process. Lett., 10.","DOI":"10.1109\/LSP.2003.813679"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1109\/89.902276","article-title":"An adaptive KLT approach for speech enhancement","volume":"9","author":"Rezayee","year":"2001","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Backstrom, T. (2017, January 20\u201324). Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source. Proceedings of the Interspeech: Annual Conference of the International Speech Communication Association, International Speech Communication Association, Stockholm, Sweden.","DOI":"10.21437\/Interspeech.2017-389"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Backstrom, T. (2017). Speech Coding with Code-Excited Linear Prediction, Springer. [1st ed.].","DOI":"10.1007\/978-3-319-50204-5_14"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1109\/TASL.2011.2125954","article-title":"Speaker diarization: A review of recent research","volume":"20","author":"Xavier","year":"2012","journal-title":"IEEE Trans. Audio Speech Lang.Process."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Shin, J.W., Chang, J.H., and Kim, N.S. (2004, January 4\u20138). Speech probability distribution based on generalized gamma distribution. Proceedings of the 8th International Conference on Spoken Language Processing, Jeju Island, Korea.","DOI":"10.21437\/Interspeech.2004-402"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1109\/LSP.2004.840869","article-title":"Statistical Modeling of speech signals based on generalized gamma distribution","volume":"12","author":"Shin","year":"2005","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1049\/piee.1964.0149","article-title":"Statistical properties of speech signals","volume":"111","author":"Richards","year":"1964","journal-title":"Proc. Inst. Elect. Eng."},{"key":"ref_9","unstructured":"Gazor, S., and Far, R.R. (2004, January 2\u20135). Probability distribution of speech signal spectral envelope. Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECE) 2004, (IEEE Cat No. 04CH37513), Niagara Falls, ON, Canada."},{"key":"ref_10","unstructured":"Jensen, J., Batina, I., Hendriks, R.C., and Heusdens, R. (2005, January 19\u201320). A study of the distribution of time-domain speech samples and discrete Fourier coefficients. Proceedings of the 1st BENELUX\/DSP Valley Signal Processing Symposium, Antwerp, Belgium."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Martin, R. (2002, January 13\u201317). Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA.","DOI":"10.1109\/ICASSP.2002.1005724"},{"key":"ref_12","unstructured":"Martin, R., and Breithaupt, C. (2003, January 8\u201311). Speech enhancement in the DFT domain using Laplacian speech priors. Proceedings of the International Workshop on Acoustics Echo and Noise Control (IWAENC), Kyoto, Japan."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1109\/RBME.2008.2008250","article-title":"Cochlear implants: system design, integration, and evaluation","volume":"1","author":"Zeng","year":"2008","journal-title":"IEEE Rev. Biomed. Eng."},{"key":"ref_14","unstructured":"(2018, April 15). NIST\/SEMATECH e-Handbook of Statistical Methods, Available online: http:\/\/www.itl.nist.gov\/div898\/handbook\/."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1080\/00031305.1984.10483185","article-title":"The Double Exponential Distribution: Using Calculus to Find a Maximum Likelihood Estimator","volume":"38","author":"Norton","year":"1984","journal-title":"Am. Statist."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1049\/el.2012.3829","article-title":"Cram\u00e9r-Rao bound for joint estimation problems","volume":"49","author":"Ijyas","year":"2013","journal-title":"Electron. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1214\/ss\/1009212248","article-title":"On the history of maximum likelihood in relation to inverse probability and least squares","volume":"14","author":"Hald","year":"1999","journal-title":"Statist. Sci."},{"key":"ref_18","first-page":"270","article-title":"Fundamental Frequency Extraction Method using Central Clipping and its Importance for the Classification of Emotional State","volume":"10","author":"Partila","year":"2012","journal-title":"Advan. Electr. Electron. Eng."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"798","DOI":"10.1109\/JSTSP.2010.2057192","article-title":"Low-complexity variable frame rate analysis for speech recognition and voice activity detection","volume":"4","author":"Tan","year":"2010","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"3586","DOI":"10.1121\/1.423941","article-title":"Effects of noise and spectral resolution on vowel and consonant recognition: Acoustic and electric hearing","volume":"104","author":"Fu","year":"1998","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1121\/1.4939962","article-title":"Pitch and spectral resolution: A systematic comparison of bottom-up cues for top-down repair of degraded speech","volume":"139","author":"Clarke","year":"2016","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yoshizawa, T., Hirobayashi, S., and Misawa, T. (2011). Noise reduction for periodic signals using high-resolution frequency analysis. EURASIP J. Audio Speech Music Process., 1.","DOI":"10.1186\/1687-4722-2011-426794"},{"key":"ref_23","unstructured":"Graf, S., Zaidi, N., Herbig, T., Buck, M., and Schmidt, G. (2017, January 6\u20139). Detection of voiced speech and pitch estimation for application with low spectral resolution. Proceedings of the DAGA 2017, Kiel, Germay."},{"key":"ref_24","unstructured":"Greenberg, S., and Kingsbury, B.E.D. (1997, January 21\u201324). The modulation spectrogram: in pursuit of an invariant representation of speech. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany."},{"key":"ref_25","unstructured":"Bernhardsson, E. (2018, December 07). Language Pitch. Available online: https:\/\/erikbern.com\/2017\/02\/01\/language-pitch.html, 1-Feb-2017."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"3391","DOI":"10.1016\/j.proeng.2012.06.392","article-title":"Identification of language using Mel Frequency Cepstral Coefficients (MFCC)","volume":"38","author":"Kooagudi","year":"2012","journal-title":"Proceedia Eng."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Gunawan, T.S., Husain, R., and Kartiwi, M. (2017, January 28\u201330). Development of language identification system using MFCC and vector quantization. Proceedings of the IEEE 4th International Conference on Smart Instrumentation, Measurement and Application (ICSIMA), Putrajaya, Malaysia.","DOI":"10.1109\/ICSIMA.2017.8312034"},{"key":"ref_28","unstructured":"Yin, B., Ambikairajah, E., and Chen, F. (2006, January 20\u201324). Combining Cepstral and Prosodic features in language identification. Proceedings of the 18th International Conference on Pattern Recognition (ICPR\u201906), Hong Kong, China."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1109\/TSA.2005.860349","article-title":"Automatic speech recognition with an adaptation model motivated by auditory processing","volume":"14","author":"Holberg","year":"2006","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Alsulaiman, M., Muhammad, G., and Ali, Z. (2011, January 26\u201328). Comparison of voice features for Arabic speech recognition. Proceedings of the Sixth International Conference on Digital Information Management, Melbourne, Australia.","DOI":"10.1109\/ICDIM.2011.6093369"},{"key":"ref_31","unstructured":"Naini, A.S., and Homayounpour, M.M. (, January 16\u201320). Speaker age interval and sex identification based on jitters, shimmers and mean mfcc using supervised and unsupervised discriminative classification methods. Proceedings of the 8th International conference on signal processing, Beijing, China."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Katrenchuk, D. (2017, January 3\u20137). Age group classification with speech and metadata multimodality fusion. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain.","DOI":"10.18653\/v1\/E17-2030"},{"key":"ref_33","unstructured":"Kodrasi, I., and Bourlard, H. (2018, January 10\u201312). Statistical modeling of speech spectral coefficients in patients with Parkinson\u2019s disease. Proceedings of the ITG Conference on Speech Communication, Oldenburg, Germany."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/10\/12\/750\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:33:56Z","timestamp":1760196836000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/10\/12\/750"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,14]]},"references-count":33,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2018,12]]}},"alternative-id":["sym10120750"],"URL":"https:\/\/doi.org\/10.3390\/sym10120750","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2018,12,14]]}}}