{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:42:39Z","timestamp":1760186559653,"version":"build-2065373602"},"reference-count":52,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,1,1]],"date-time":"2019-01-01T00:00:00Z","timestamp":1546300800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004488","name":"Hrvatska Zaklada za Znanost","doi-asserted-by":"publisher","award":["UIP-2014-09-3875"],"award-info":[{"award-number":["UIP-2014-09-3875"]}],"id":[{"id":"10.13039\/501100004488","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Accurate speech recognition can provide a natural interface for human\u2013computer interaction. Recognition rates of the modern speech recognition systems are highly dependent on background noise levels and a choice of acoustic feature extraction method can have a significant impact on system performance. This paper presents a robust speech recognition system based on a front-end motivated by human cochlear processing of audio signals. In the proposed front-end, cochlear behavior is first emulated by the filtering operations of the gammatone filterbank and subsequently by the Inner Hair cell (IHC) processing stage. Experimental results using a continuous density Hidden Markov Model (HMM) recognizer with the proposed Gammatone Hair Cell (GHC) coefficients are lower for clean speech conditions, but demonstrate significant improvement in performance in noisy conditions compared to standard Mel-Frequency Cepstral Coefficients (MFCC) baseline.<\/jats:p>","DOI":"10.3390\/computers8010005","type":"journal-article","created":{"date-parts":[[2019,1,3]],"date-time":"2019-01-03T03:36:30Z","timestamp":1546486590000},"page":"5","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Robust Cochlear-Model-Based Speech Recognition"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9363-6723","authenticated-orcid":false,"given":"Mladen","family":"Russo","sequence":"first","affiliation":[{"name":"Laboratory for Smart Environment Technologies, FESB, University of Split, R. Boskovica 32, 21000 Split, Croatia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7893-6464","authenticated-orcid":false,"given":"Maja","family":"Stella","sequence":"additional","affiliation":[{"name":"Laboratory for Smart Environment Technologies, FESB, University of Split, R. Boskovica 32, 21000 Split, Croatia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5374-8616","authenticated-orcid":false,"given":"Marjan","family":"Sikora","sequence":"additional","affiliation":[{"name":"Laboratory for Smart Environment Technologies, FESB, University of Split, R. Boskovica 32, 21000 Split, Croatia"}]},{"given":"Vesna","family":"Peki\u0107","sequence":"additional","affiliation":[{"name":"Laboratory for Smart Environment Technologies, FESB, University of Split, R. Boskovica 32, 21000 Split, Croatia"}]}],"member":"1968","published-online":{"date-parts":[[2019,1,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"729","DOI":"10.1016\/S0016-0032(22)90319-9","article-title":"The nature of speech and its interpretation","volume":"193","author":"Fletcher","year":"1922","journal-title":"J. Franklin Inst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1121\/1.1906946","article-title":"Automatic recognition of spoken digits","volume":"24","author":"Davis","year":"1952","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1016\/0167-6393(94)00059-J","article-title":"Speech recognition in noisy environments\u2014A survey","volume":"16","author":"Gong","year":"1995","journal-title":"Speech Comm."},{"key":"ref_4","first-page":"69","article-title":"Analysis of factors influencing accuracy of speech recognition","volume":"9","author":"Ceidaite","year":"2010","journal-title":"Elektron. Ir Elektrotech."},{"key":"ref_5","unstructured":"Tan, Z.H., and Lindberg, B. (2010). Mobile Multimedia Processing, Springer."},{"key":"ref_6","first-page":"5","article-title":"Robust in-car speech recognition based on nonlinear multiple regressions","volume":"2007","author":"Li","year":"2007","journal-title":"EURASIP J. Adv. Sig. Process."},{"key":"ref_7","unstructured":"Ou, W., Gao, W., Li, Z., Zhang, S., and Wang, Q. (2010, January 13\u201314). Application of keywords speech recognition in agricultural voice information system. Proceedings of the 2010 Second International Conference on Computational Intelligence and Natural Computing, Wuhan, China."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhu, L., Chen, L., Zhao, D., Zhou, J., and Zhang, W. (2017). Emotion recognition from chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17.","DOI":"10.3390\/s17071694"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Noriega-Linares, J.E., and Navarro Ruiz, J.M. (2016). On the application of the raspberry Pi as an advanced acoustic sensor network for noise monitoring. Electronics, 5.","DOI":"10.3390\/electronics5040074"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1016\/j.jfranklin.2009.02.005","article-title":"A wavelet-and neural network-based voice system for a smart wheelchair control","volume":"348","author":"Assaleh","year":"2011","journal-title":"J. Franklin Inst."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"McLoughlin, I., and Sharifzadeh, H.R. (2008). Speech Recognition, Technologies and Applications, I-Tech Education and Publishing.","DOI":"10.5772\/6363"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1515\/aoa-2016-0049","article-title":"Diagnostics of rotor damages of three-phase induction motors using acoustic signals and SMOFS-20-EXPANDED","volume":"41","author":"Glowacz","year":"2016","journal-title":"Arch. Acoust."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.ymssp.2018.07.044","article-title":"Fault diagnosis of single-phase induction motor based on acoustic signals","volume":"117","author":"Glowacz","year":"2019","journal-title":"Mech. Syst. Signal Process."},{"key":"ref_14","first-page":"235","article-title":"Application of a Phase Resolved Partial Discharge Pattern Analysis for Acoustic Emission Method in High Voltage Insulation Systems Diagnostics","volume":"43","author":"Kunicki","year":"2018","journal-title":"Arch. Acoust."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Mika, D., and J\u00f3zwik, J. (2018). Advanced time-frequency representation in voice signal analysis. Adv. Sci. Technol. Res. J., 12.","DOI":"10.12913\/22998624\/87028"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ono, K. (2018). Review on structural health evaluation with acoustic emission. Appl. Sci., 8.","DOI":"10.3390\/app8060958"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zou, L., Guo, Y., Liu, H., Zhang, L., and Zhao, T. (2017). A method of abnormal states detection based on adaptive extraction of transformer vibro-acoustic signals. Energies, 10.","DOI":"10.3390\/en10122076"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yang, H., Wen, G., Hu, Q., Li, Y., and Dai, L. (2018). Experimental investigation on influence factors of acoustic emission activity in coal failure process. Energies, 11.","DOI":"10.3390\/en11061414"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1016\/j.jfranklin.2012.02.016","article-title":"A self-tuning hybrid active noise control system","volume":"349","author":"Mokhtarpour","year":"2012","journal-title":"J. Franklin Inst."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Lee, S.C., Wang, J.F., and Chen, M.H. (2018). Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions. Sensors, 18.","DOI":"10.3390\/s18072068"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/S0016-0032(00)00007-7","article-title":"Principle and applications of asymmetric crosstalk-resistant adaptive noise canceler","volume":"337","author":"Kuo","year":"2000","journal-title":"J. Franklin Inst."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Hung, J.W., Lin, J.S., and Wu, P.J. (2018). Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network. Appl. Syst. Innov., 1.","DOI":"10.3390\/asi1030028"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0167-6393(97)00021-6","article-title":"Speech recognition by machines and humans","volume":"22","author":"Lippmann","year":"1997","journal-title":"Speech Commun."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1109\/89.326615","article-title":"How do humans process and recognize speech?","volume":"2","author":"Allen","year":"1994","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.specom.2008.06.002","article-title":"Perceptual features for automatic speech recognition in noisy environments","volume":"51","author":"Haque","year":"2009","journal-title":"Speech Commun."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1738","DOI":"10.1121\/1.399423","article-title":"Perceptual linear predictive (PLP) analysis of speech","volume":"87","author":"Hermansky","year":"1990","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1109\/TSA.2005.860349","article-title":"Automatic speech recognition with an adaptation model motivated by auditory processing","volume":"14","author":"Holmberg","year":"2006","journal-title":"IEEE Trans. Audio Speech Lang Process."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Kim, C., and Stern, R.M. (2012, January 25\u201330). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. Proceedings of the 37th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan.","DOI":"10.1109\/ICASSP.2012.6288820"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Seltzer, M.L., Yu, D., and Wang, Y. (2013, January 26\u201331). An investigation of deep neural networks for noise robust speech recognition. Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada.","DOI":"10.1109\/ICASSP.2013.6639100"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Maas, A.L., Le, Q.V., O\u2019Neil, T.M., Vinyals, O., Nguyen, P., and Ng, A.Y. (2012, January 9\u201313). Recurrent neural networks for noise reduction in robust ASR. Proceedings of the 13th Annual Conference of the International Speech Communication Association, Oregon, Poland.","DOI":"10.21437\/Interspeech.2012-6"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"867","DOI":"10.1109\/JSTSP.2010.2057200","article-title":"Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening","volume":"4","author":"Wollmer","year":"2010","journal-title":"IEEE J. Sel. Top. Sign. Process."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1145\/3178115","article-title":"Deep learning for environmentally robust speech recognition: An overview of recent developments","volume":"9","author":"Zhang","year":"2018","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1109\/89.397093","article-title":"A comparison of signal processing front ends for automatic word recognition","volume":"3","author":"Jankowski","year":"1995","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_34","unstructured":"Seneff, S. (1986, January 7\u201311). A computational model for the peripheral auditory system: Application of speech recognition research. Proceedings of the ICASSP \u201986. IEEE International Conference on Acoustics, Speech, and Signal Processing, Tokyo, Japan."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1109\/89.260357","article-title":"Auditory models and human performance in tasks related to speech coding and speech recognition","volume":"2","author":"Ghitza","year":"1994","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_36","unstructured":"Qi, J., Wang, D., Jiang, Y., and Liu, R. (2013, January 19\u201323). Auditory features based on gammatone filters for robust speech recognition. Proceedings of the 2013 IEEE International Symposium on Circuits and Systems, Beijing, China."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1016\/j.specom.2010.04.008","article-title":"Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency","volume":"53","author":"Yin","year":"2011","journal-title":"Speech Commun."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Shao, Y., Jin, Z., Wang, D., and Srinivasan, S. (2009, January 19\u201324). An auditory-based feature for robust speech recognition. Proceedings of the 34th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan.","DOI":"10.1109\/ICASSP.2009.4960661"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Menon, A., Kim, C., and Stern, R.M. (2017, January 20\u201324). Robust Speech Recognition Based on Binaural Auditory Processing. Proceedings of the Interspeech 2017, Stockholm, Sweden.","DOI":"10.21437\/Interspeech.2017-1665"},{"key":"ref_40","unstructured":"Marieb, E.N., and Hoehn, K. (2016). Human anatomy & physiology, Benjamin Cummings."},{"key":"ref_41","unstructured":"Purves, D., Augustine, G.J., Fitzpatrick, D., Hall, W.C., LaMantia, A.S., McNamara, J.O., and Williams, S.M. (2004). Neuroscience, Sinauer Associates."},{"key":"ref_42","unstructured":"Johannesma, P.I. (1972, January 22\u201323). The pre-response stimulus ensemble of neurons in the cochlear nucleus. Proceedings of the Symposium of Hearing Theory, Eindhoven, The Netherland."},{"key":"ref_43","unstructured":"Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C., and Allerhand, M. (1991, January 9\u201314). Complex sounds and auditory images. Proceedings of the 9th International Symposium on Hearing, Carcens, France."},{"key":"ref_44","unstructured":"Patterson, R.D. (1986). Frequency Selectivity in Hearing, Academic Press. Auditory Filters and Excitation Patterns as Representations of Fre-Quency Resolution."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1016\/0378-5955(90)90170-T","article-title":"Derivation of auditory filter shapes from notched-noise data","volume":"47","author":"Glasberg","year":"1990","journal-title":"Hear. Res."},{"key":"ref_46","unstructured":"Slaney, M. (2018, December 25). An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank. Available online: https:\/\/engineering.purdue.edu\/~malcolm\/apple\/tr35\/PattersonsEar.pdf."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1121\/1.393460","article-title":"Simulation of mechanical to neural transduction in the auditory receptor","volume":"79","author":"Meddis","year":"1986","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_48","unstructured":"McEwan, A., and Van Schaik, A. (2000, January 12\u201315). A silicon representation of the Meddis inner hair cell model. Proceedings of the International Congress on Intelligent Systems and Applications (ISA\u20192000), Sydney, Australia."},{"key":"ref_49","unstructured":"Wang, D., and Brown, G.J. (2006). Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley-IEEE Press."},{"key":"ref_50","unstructured":"Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P.C. (2006). The HTK Book, Cambridge University Press. [Edition 3.4]."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1109\/TASLP.2016.2545928","article-title":"Power-normalized cepstral coefficients (PNCC) for robust speech recognition","volume":"24","author":"Kim","year":"2016","journal-title":"IEEE\/ACM Trans. Audio, Speech Lang. Process."},{"key":"ref_52","unstructured":"Pagano, M., and Gauvreau, K. (2018). Principles of Biostatistics, Chapman and Hall\/CRC."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/8\/1\/5\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:23:04Z","timestamp":1760185384000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/8\/1\/5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,1]]},"references-count":52,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["computers8010005"],"URL":"https:\/\/doi.org\/10.3390\/computers8010005","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2019,1,1]]}}}