{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,1,29]],"date-time":"2025-01-29T06:18:18Z","timestamp":1738131498431,"version":"3.33.0"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,4,17]],"date-time":"2024-04-17T00:00:00Z","timestamp":1713312000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,17]],"date-time":"2024-04-17T00:00:00Z","timestamp":1713312000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Manipal Academy of Higher Education, Manipal"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Multimed Tools Appl"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Although better progress has been made in the area of speech enhancement, a significant performance degradation still exists under highly non-stationary noisy conditions. These conditions have a detrimental impact on the performance of the speech processing applications such as automatic speech recognition, speech encoding, speaker verification, speaker identification, and speaker recognition. Therefore, in this work, a robust noise estimation technique is proposed for speech enhancement under highly non-stationary noisy scenarios. The proposed work introduces an optimal smoothing and minima controlled (OSMC) through an iterative averaging method for noise estimation. Firstly, the computation of smooth power spectrum of degraded speech data and tracking the minima by continuously taking the past spectral average values are considered. Then, to find the activity of speech in each frequency bin, the ratio of degraded speech spectrum to its local minimum is considered, and a Bayes minimum-cost rule is applied for the decision-making. Finally, the spectrum of noise is estimated using the time-frequency dependent smoothing factors which mainly depend on the estimation of the probability of speech presence. The experiments are conducted on NOIZEUS and Kannada speech databases. The evaluated results demonstrated that the proposed OSMC technique exhibits better speech quality and intelligibility performance compared to existing algorithms under highly non-stationary noisy conditions.<\/jats:p>","DOI":"10.1007\/s11042-024-19174-z","type":"journal-article","created":{"date-parts":[[2024,4,17]],"date-time":"2024-04-17T05:02:49Z","timestamp":1713330169000},"page":"1861-1875","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["An ensemble of optimal smoothing and minima controlled through iterative averaging for speech enhancement under uncontrolled environment"],"prefix":"10.1007","volume":"84","author":[{"given":"Raghudathesh","family":"G P","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chandrakala","family":"C B","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dinesh Rao","family":"B","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thimmaraja Yadava","family":"G","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,4,17]]},"reference":[{"key":"19174_CR1","volume-title":"Fundamentals of speech recognition","author":"L Rabiner","year":"1993","unstructured":"Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall Inc, New Jersey"},{"key":"19174_CR2","doi-asserted-by":"publisher","unstructured":"Ramirez J, Gorriz JM, Segura JC (2007) Voice activity detection: Fundamentals and speech recognition system robustness. I-Tech Education and Publishing. https:\/\/doi.org\/10.5772\/4740","DOI":"10.5772\/4740"},{"key":"19174_CR3","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1109\/89.928915","volume":"9","author":"Rainer Martin","year":"2001","unstructured":"Martin Rainer (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9:504\u2013512. https:\/\/doi.org\/10.1109\/89.928915","journal-title":"IEEE Trans Speech Audio Process"},{"key":"19174_CR4","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1109\/97.988717","volume":"9","author":"Israel Cohen","year":"2002","unstructured":"Cohen Israel (2002) Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Process Lett 9:12\u201315. https:\/\/doi.org\/10.1109\/97.988717","journal-title":"IEEE Signal Process Lett"},{"key":"19174_CR5","doi-asserted-by":"publisher","first-page":"466","DOI":"10.1109\/TSA.2003.811544","volume":"11","author":"Israel Cohen","year":"2003","unstructured":"Cohen Israel (2003) Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Trans Speech Audio Process 11:466\u2013475. https:\/\/doi.org\/10.1109\/TSA.2003.811544","journal-title":"IEEE Trans Speech Audio Process"},{"key":"19174_CR6","doi-asserted-by":"publisher","unstructured":"Doblinger G (1995) Computationally efficient speech enhancement by spectral minima tracking in subbands. Citeseer 2:1513\u20131516. https:\/\/doi.org\/10.21437\/Eurospeech.1995-370","DOI":"10.21437\/Eurospeech.1995-370"},{"key":"19174_CR7","doi-asserted-by":"publisher","unstructured":"Hirsch H, Ehrlicher C (1995) Noise estimation techniques for robust speech recognition. In 1995 International conference on acoustics, speech, and signal processing, speech, signal processing vol. 1 pp. 153-156. https:\/\/doi.org\/10.1109\/ICASSP.1995.479387","DOI":"10.1109\/ICASSP.1995.479387"},{"key":"19174_CR8","doi-asserted-by":"publisher","first-page":"2954","DOI":"10.1155\/ASP.2005.2954","volume":"18","author":"K Sorensen","year":"2005","unstructured":"Sorensen K, Andersen S (2005) Speech enhancement with natural sounding residual noise based on connected time-frequency speech presence regions. EURASIP J Adv Signal Process 18:2954\u20132964. https:\/\/doi.org\/10.1155\/ASP.2005.2954","journal-title":"EURASIP J Adv Signal Process"},{"key":"19174_CR9","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1007\/s10772-018-9506-9","volume":"22","author":"G Thimmaraja Yadava","year":"2018","unstructured":"Thimmaraja Yadava G, Jayanna HS (2018) Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. Int J Speech Technol (IJST) Springer 22:639\u2013648. https:\/\/doi.org\/10.1007\/s10772-018-9506-9","journal-title":"Int J Speech Technol (IJST) Springer"},{"key":"19174_CR10","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1007\/s10772-020-09671-5","volume":"23","author":"G Thimmaraja Yadava","year":"2020","unstructured":"Thimmaraja Yadava G, Jayanna HS (2020) Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling. Int J Speech Technol (IJST) Springer 23:149\u2013167. https:\/\/doi.org\/10.1007\/s10772-020-09671-5","journal-title":"Int J Speech Technol (IJST) Springer"},{"key":"19174_CR11","doi-asserted-by":"publisher","first-page":"635","DOI":"10.1007\/s10772-017-9428-y","volume":"20","author":"G Thimmaraja Yadava","year":"2017","unstructured":"Thimmaraja Yadava G, Jayanna HS (2017) A spoken query system for the agricultural commodity prices and weather information access in Kannada language. Int J Speech Technol (IJST) Springer 20:635\u2013644. https:\/\/doi.org\/10.1007\/s10772-017-9428-y","journal-title":"Int J Speech Technol (IJST) Springer"},{"key":"19174_CR12","doi-asserted-by":"publisher","first-page":"2224","DOI":"10.1121\/1.1862575","volume":"117","author":"James M Kates","year":"2005","unstructured":"Kates James M, Arehart Kathryn H (2005) Coherence and the speech intelligibility index. J Acoust Soc Am 117:2224. https:\/\/doi.org\/10.1121\/1.1862575","journal-title":"J Acoust Soc Am"},{"key":"19174_CR13","doi-asserted-by":"publisher","first-page":"3387","DOI":"10.1121\/1.3097493","volume":"125","author":"J Ma","year":"2009","unstructured":"Ma J, Hu Y, Loizou PC (2009) Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J Acoust Soc Am 125:3387\u20133405. https:\/\/doi.org\/10.1121\/1.3097493","journal-title":"J Acoust Soc Am"},{"key":"19174_CR14","doi-asserted-by":"publisher","unstructured":"John H. L. Hansen, Bryan L. Pellom (1998) An effective quality evaluation protocol for speech enhancement algorithms. Proceedings 5th international conference on spoken language processing (ICSLP 1998) Sydney, Australia. https:\/\/doi.org\/10.21437\/ICSLP.1998-350","DOI":"10.21437\/ICSLP.1998-350"},{"key":"19174_CR15","doi-asserted-by":"publisher","unstructured":"Stahl V, Fischer A, Bippus R (2000) Quantile based noise estimation for spectral subtraction and Wiener filtering. In 2000 IEEE international conference on acoustics, speech, and signal processing. Proceedings vol 3 pp 1873\u20131875. https:\/\/doi.org\/10.1109\/ICASSP.2000.862122","DOI":"10.1109\/ICASSP.2000.862122"},{"key":"19174_CR16","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1109\/TSA.2003.819949","volume":"12","author":"Hu Yi","year":"2004","unstructured":"Yi Hu, Loizou PC (2004) Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans Speech Audio Process 12:59\u201367. https:\/\/doi.org\/10.1109\/TSA.2003.819949","journal-title":"IEEE Trans Speech Audio Process"},{"key":"19174_CR17","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1109\/TASL.2007.911054","volume":"16","author":"Y Hu","year":"2008","unstructured":"Hu Y, Loizou PC (2008) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16:229\u2013238. https:\/\/doi.org\/10.1109\/TASL.2007.911054","journal-title":"IEEE Trans Audio Speech Lang Process"},{"key":"19174_CR18","doi-asserted-by":"publisher","first-page":"588","DOI":"10.1016\/j.specom.2006.12.006","volume":"49","author":"Y Hu","year":"2007","unstructured":"Hu Y, Loizou PC (2007) Subjective evaluation and comparison of speech enhancement algorithms. Speech Commun 49:588\u2013601. https:\/\/doi.org\/10.1016\/j.specom.2006.12.006","journal-title":"Speech Commun"},{"key":"19174_CR19","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1007\/s10772-020-09786-9","volume":"24","author":"G Thimmaraja Yadava","year":"2021","unstructured":"Thimmaraja Yadava G, Nagaraja BG, Jayanna HS (2021) Speech enhancement and encoding by combining SS-VAD and LPC. Int J Speech Technol Springer 24:165\u2013172. https:\/\/doi.org\/10.1007\/s10772-020-09786-9","journal-title":"Int J Speech Technol Springer"},{"key":"19174_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.csl.2019.06.005","volume":"59","author":"Zheng-Hua Tan","year":"2020","unstructured":"Tan Zheng-Hua, Sarkar Achintya K R, Dehak Najim (2020) rVAD: An unsupervised segment-based robust voice activity detection method. Comput Speech Lang 59:1\u201321. https:\/\/doi.org\/10.1016\/j.csl.2019.06.005","journal-title":"Comput Speech Lang"},{"key":"19174_CR21","doi-asserted-by":"publisher","first-page":"745","DOI":"10.1007\/s10772-022-09987-4","volume":"25","author":"RK Jaiswal","year":"2022","unstructured":"Jaiswal RK, Yeduri SR, Cenkeramaddi LR (2022) Single-channel speech enhancement using implicit Wiener filter for high-quality speech communication. Int J Speech Technol Springer 25:745\u2013758. https:\/\/doi.org\/10.1007\/s10772-022-09987-4","journal-title":"Int J Speech Technol Springer"},{"key":"19174_CR22","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s10772-020-09767-y","volume":"24","author":"M Bahrami","year":"2021","unstructured":"Bahrami M, Faraji N (2021) Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty. Int J Speech Technol Springer 24:97\u2013108. https:\/\/doi.org\/10.1007\/s10772-020-09767-y","journal-title":"Int J Speech Technol Springer"},{"key":"19174_CR23","doi-asserted-by":"publisher","unstructured":"Roy S, Paliwal KK (2021) A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions. EURASIP J Audio Speech Music Process 32. https:\/\/doi.org\/10.1186\/s13636-021-00220-9","DOI":"10.1186\/s13636-021-00220-9"},{"key":"19174_CR24","doi-asserted-by":"publisher","first-page":"2203","DOI":"10.1007\/s11277-022-10039-y","volume":"128","author":"M Gupta","year":"2023","unstructured":"Gupta M, Singh RK, Singh S (2023) Analysis of optimized spectral subtraction method for single channel speech enhancement. Wireless Pers Commun 128:2203\u20132215. https:\/\/doi.org\/10.1007\/s11277-022-10039-y","journal-title":"Wireless Pers Commun"},{"key":"19174_CR25","doi-asserted-by":"publisher","first-page":"4343","DOI":"10.1007\/s00034-023-02324-3","volume":"42","author":"K Ghorpade","year":"2023","unstructured":"Ghorpade K, Khaparde A (2023) Single-channel speech enhancement using single dimension change accelerated particle swarm optimization for subspace partitioning. Circuits Syst Signal Process 42:4343\u20134361. https:\/\/doi.org\/10.1007\/s00034-023-02324-3","journal-title":"Circuits Syst Signal Process"},{"key":"19174_CR26","doi-asserted-by":"publisher","first-page":"3681","DOI":"10.1007\/s11042-020-09849-8","volume":"80","author":"Ruiyu Liang","year":"2021","unstructured":"Liang Ruiyu, Xie Yue, Cheng Jiaming, Tang Guichen, Sun Shinuo (2021) Real-time speech enhancement algorithm for transient noise suppression. Multimed Tools Appl 80:3681\u20133702. https:\/\/doi.org\/10.1007\/s11042-020-09849-8","journal-title":"Multimed Tools Appl"},{"key":"19174_CR27","doi-asserted-by":"publisher","first-page":"23633","DOI":"10.1007\/s11042-022-12152-3","volume":"81","author":"G Thimmaraja Yadava","year":"2022","unstructured":"Thimmaraja Yadava G, Nagaraja BG, Jayanna HS (2022) A spatial procedure to spectral subtraction for speech enhancement. Multimed Tools Appl 81:23633\u20132364. https:\/\/doi.org\/10.1007\/s11042-022-12152-3","journal-title":"Multimed Tools Appl"},{"key":"19174_CR28","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-023-16100-7","author":"G Thimmaraja Yadava","year":"2023","unstructured":"Thimmaraja Yadava G, Nagaraja BG, Jayanna HS (2023) Amalgamation of noise elimination and TDNN acoustic modelling techniques for the advancements in continuous Kannada ASR system. Multimed Tools Appl. https:\/\/doi.org\/10.1007\/s11042-023-16100-7","journal-title":"Multimed Tools Appl"},{"key":"19174_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1504\/IJSISE.2020.113552","volume":"12","author":"SJ Jainar","year":"2020","unstructured":"Jainar SJ, Sale PL, Nagaraja BG (2020) VAD, feature extraction and modelling techniques for speaker recognition: a review. Int J Signal Imaging Syst Eng 12:1\u201318. https:\/\/doi.org\/10.1504\/IJSISE.2020.113552","journal-title":"Int J Signal Imaging Syst Eng"},{"key":"19174_CR30","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1504\/IJSISE.2016.075000","volume":"9","author":"BG Nagaraja","year":"2016","unstructured":"Nagaraja BG, Jayanna HS (2016) Feature extraction and modelling techniques for multilingual speaker recognition: a review. Int J Signal Imaging Syst Eng 9:67\u201378. https:\/\/doi.org\/10.1504\/IJSISE.2016.075000","journal-title":"Int J Signal Imaging Syst Eng"},{"key":"19174_CR31","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1515\/jisys-2013-0038","volume":"22","author":"BG Nagaraja","year":"2013","unstructured":"Nagaraja BG, Jayanna HS (2013) Multilingual speaker identification by combining evidence from lpr and multitaper mfcc. J Intell Syst 22:241\u2013251. https:\/\/doi.org\/10.1515\/jisys-2013-0038","journal-title":"J Intell Syst"}],"container-title":["Multimedia Tools and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-024-19174-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11042-024-19174-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-024-19174-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,29]],"date-time":"2025-01-29T01:46:45Z","timestamp":1738115205000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11042-024-19174-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,17]]},"references-count":31,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,1]]}},"alternative-id":["19174"],"URL":"https:\/\/doi.org\/10.1007\/s11042-024-19174-z","relation":{},"ISSN":["1573-7721"],"issn-type":[{"type":"electronic","value":"1573-7721"}],"subject":[],"published":{"date-parts":[[2024,4,17]]},"assertion":[{"value":"14 June 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 March 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 April 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 April 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflict of interests on the manuscript.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of interest"}}]}}