{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,4]],"date-time":"2022-04-04T07:55:52Z","timestamp":1649058952228},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2012,9,7]],"date-time":"2012-09-07T00:00:00Z","timestamp":1346976000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance when speech signals are corrupted with channel or background distortions. Our brains are equipped with elaborate machinery for speech analysis and feature extraction, which hold great lessons for improving the performance of automatic speech processing systems under adverse conditions. The work presented here explores a biologically-motivated multi-resolution speaker information representation obtained by performing an intricate yet computationally-efficient analysis of the information-rich spectro-temporal attributes of the speech signal. We evaluate the proposed features in a speaker verification task performed on NIST SRE 2010 data. The biomimetic approach yields significant robustness in presence of non-stationary noise and reverberation, offering a new framework for deriving reliable features for speaker recognition and speech processing.<\/jats:p>","DOI":"10.1186\/1687-4722-2012-22","type":"journal-article","created":{"date-parts":[[2012,9,7]],"date-time":"2012-09-07T16:14:23Z","timestamp":1347034463000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Biomimetic multi-resolution analysis for robust speaker recognition"],"prefix":"10.1186","volume":"2012","author":[{"given":"Sridhar Krishna","family":"Nemala","sequence":"first","affiliation":[]},{"given":"Dmitry N","family":"Zotkin","sequence":"additional","affiliation":[]},{"given":"Ramani","family":"Duraiswami","sequence":"additional","affiliation":[]},{"given":"Mounya","family":"Elhilali","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2012,9,7]]},"reference":[{"key":"59_CR1","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-77592-0","volume-title":"Fundamentals of Speaker Recognition","author":"H Beigi","year":"2011","unstructured":"Beigi H: Fundamentals of Speaker Recognition. Springer, Berlin; 2011."},{"key":"59_CR2","volume-title":"Speech Processing in the Auditory System","author":"S Greenberg","year":"2004","unstructured":"Greenberg S, Popper A, Ainsworth W: Speech Processing in the Auditory System. Springer, Berlin; 2004."},{"key":"59_CR3","first-page":"4","volume":"4","author":"K O\u2019Connor","year":"2010","unstructured":"O\u2019Connor K, Yin P, Petkov C, Sutter M: Complex spectral interactions encoded by auditory cortical neurons: relationship between bandwidth and pattern. Front Syst. Neurosci 2010, 4: 4-145.","journal-title":"Front Syst. Neurosci"},{"key":"59_CR4","doi-asserted-by":"publisher","first-page":"1802","DOI":"10.1109\/TASL.2007.900102","volume":"15","author":"J Woojay","year":"2007","unstructured":"Woojay J, Juang B: Speech analysis in a model of the central auditory system. IEEE Trans. Speech Audio Process 2007, 15: 1802-1817.","journal-title":"IEEE Trans. Speech Audio Process"},{"key":"59_CR5","first-page":"4649","volume-title":"Proc. IEEE Intl. Conf. Acoust. Speech Signal Proc., Taipei, Taiwan","author":"Q Wu","year":"2009","unstructured":"Wu Q, Zhang L, Shi G: Robust speech feature extraction based on Gabor filtering and tensor factorization. Proc. IEEE Intl. Conf. Acoust. Speech Signal Proc., Taipei, Taiwan 2009, 4649-4652."},{"key":"59_CR6","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1016\/S0167-6393(02)00134-6","volume":"41","author":"M Elhilali","year":"2003","unstructured":"Elhilali M, Chi T, Shamma SA: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Commun 2003, 41: 331-348. 10.1016\/S0167-6393(02)00134-6","journal-title":"Speech Commun"},{"key":"59_CR7","doi-asserted-by":"publisher","first-page":"e1000302","DOI":"10.1371\/journal.pcbi.1000302","volume":"5","author":"T Elliott","year":"2009","unstructured":"Elliott T, Theunissen F: The modulation transfer function for speech intelligibility. PLoS Comput. Biol 2009, 5: e1000302. 10.1371\/journal.pcbi.1000302","journal-title":"PLoS Comput. Biol"},{"key":"59_CR8","unstructured":"NIST 2010 speaker recognition evaluation http:\/\/www.nist.gov\/speech\/tests\/sre\/2010"},{"key":"59_CR9","doi-asserted-by":"publisher","first-page":"824","DOI":"10.1109\/18.119739","volume":"38","author":"X Yang","year":"1992","unstructured":"Yang X, Wang K, Shamma SA: Auditory representations of acoustic signals. IEEE Trans. Inf. Theory 1992, 38: 824-839. 10.1109\/18.119739","journal-title":"IEEE Trans. Inf. Theory"},{"key":"59_CR10","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1109\/89.294356","volume":"2","author":"K Wang","year":"1994","unstructured":"Wang K, Shamma SA: Self-normalization noise-robustness in early auditory representations. IEEE Trans. Speech Audio Process 1994, 2: 421-435. 10.1109\/89.294356","journal-title":"IEEE Trans. Speech Audio Process"},{"key":"59_CR11","first-page":"39","volume":"1","author":"C Schreiner","year":"1995","unstructured":"Schreiner C, Calhoun B: Spectral envelope coding in cat primary auditory cortex: properties of ripple transfer functions. J. Aud. Neurosc 1995, 1: 39-61.","journal-title":"J. Aud. Neurosc"},{"key":"59_CR12","first-page":"271","volume":"1","author":"H Versnel","year":"1995","unstructured":"Versnel H, Kowalski N, Shamma SA: Ripple analysis in ferret primary auditory cortex. iii. topographic distribution of ripple response parameters. J. Aud. Neurosc 1995, 1: 271-286.","journal-title":"J. Aud. Neurosc"},{"key":"59_CR13","doi-asserted-by":"crossref","DOI":"10.6028\/NIST.IR.4930","volume-title":"DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus","author":"JS Garofolo","year":"1993","unstructured":"Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL: DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus. vol LDC93S1 Linguistic Data Consortium, Philadelphia; 1993."},{"issue":"1","key":"59_CR14","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1152\/jn.00395.2001","volume":"87","author":"L Miller","year":"2002","unstructured":"Miller L, Escabi M, Read H, Schreiner C: Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. J. Neurophysiol 2002, 87(1):516-527.","journal-title":"J. Neurophysiol"},{"key":"59_CR15","volume-title":"Elements of Information Theory","author":"T Cover","year":"2006","unstructured":"Cover T, Thomas J: Elements of Information Theory. 2nd edition. Wiley-Interscience, New York; 2006.","edition":"2"},{"issue":"4","key":"59_CR16","doi-asserted-by":"publisher","first-page":"382","DOI":"10.1109\/89.326616","volume":"2","author":"H Hermansky","year":"1994","unstructured":"Hermansky H, Morgan N: RASTA processing of speech. IEEE Trans. Speech Audio Process 1994, 2(4):382-395.","journal-title":"IEEE Trans. Speech Audio Process"},{"key":"59_CR17","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/j.specom.2009.08.009","volume":"52","author":"T Kinnunen","year":"2010","unstructured":"Kinnunen T, Lib H: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 2010, 52: 12-40. 10.1016\/j.specom.2009.08.009","journal-title":"Speech Commun"},{"key":"59_CR18","first-page":"4229","volume-title":"Proc. IEEE Intl. Conf. Acoust. Speech Signal Proc","author":"D Garcia-Romero","year":"2012","unstructured":"Garcia-Romero D, et al.: The UMD-JHU 2011 speaker recognition system. In Proc. IEEE Intl. Conf. Acoust. Speech Signal Proc. Kyoto, Japan; 2012:4229-4232."},{"key":"59_CR19","doi-asserted-by":"publisher","first-page":"1448","DOI":"10.1109\/TASL.2007.894527","volume":"15","author":"P Kenny","year":"2007","unstructured":"Kenny P, Boulianne G, Ouellet P, Dumouchel P: Speaker and session variability in gmm-based speaker verification. IEEE Trans. Audio Speech Lang. Process 2007, 15: 1448-1460.","journal-title":"IEEE Trans. Audio Speech Lang. Process"},{"key":"59_CR20","first-page":"117","volume-title":"Proc. Odyssey Speaker and Language Recognition Workshop","author":"D Garcia-Romero","year":"2010","unstructured":"Garcia-Romero D, Espy-Wilson C: Joint factor analysis for speaker recognition reinterpreted as signal coding using overcomplete dictionaries. In Proc. Odyssey Speaker and Language Recognition Workshop. Brno, Czech Republic; 2010:117-124."},{"issue":"10","key":"59_CR21","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1006\/dspr.1999.0360","volume":"1","author":"R Auckenthaler","year":"2000","unstructured":"Auckenthaler R, Carey M, Lloyd-Thomas H: Score normalization for text-independent speaker verification system. Digit. Signal Proc 2000, 1(10):42-54.","journal-title":"Digit. Signal Proc"},{"key":"59_CR22","first-page":"29","volume-title":"ISCA ITRW ASR2000","author":"H Hirsch","year":"2000","unstructured":"Hirsch H, Pearce D: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000. vol. 4 Beijing, China; 2000:29-32."},{"key":"59_CR23","unstructured":"NIST 2011 speaker recognition evaluation http:\/\/www.nist.gov\/itl\/iad\/mig\/best.cfm"},{"key":"59_CR24","doi-asserted-by":"publisher","first-page":"1350","DOI":"10.1155\/ASP.2005.1350","volume":"2005","author":"D Zotkin","year":"2005","unstructured":"Zotkin D, Chi T, Shamma SA, Duraiswami R: Neuromimetic sound representation for percept detection and manipulation. EURASIP J. App. Sig. Process 2005, 2005: 1350-1364. 10.1155\/ASP.2005.1350","journal-title":"EURASIP J. App. Sig. Process"},{"key":"59_CR25","doi-asserted-by":"publisher","first-page":"318","DOI":"10.1121\/1.384464","volume":"67","author":"H Steeneken","year":"1979","unstructured":"Steeneken H, Houtgast T: A physical method for measuring speech-transmission quality. J. Acoust. Soc. Am 1979, 67: 318-326.","journal-title":"J. Acoust. Soc. Am"},{"key":"59_CR26","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1121\/1.408467","volume":"95","author":"R Drullman","year":"1994","unstructured":"Drullman R, Festen J, Plomp R: Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am 1994, 95: 1053-1064. 10.1121\/1.408467","journal-title":"J. Acoust. Soc. Am"},{"key":"59_CR27","doi-asserted-by":"publisher","first-page":"2783","DOI":"10.1121\/1.426895","volume":"105","author":"T Arai","year":"1999","unstructured":"Arai T, Pavel M, Hermansky H, Avendano C: Syllable intelligibility for temporally filtered lpc cepstral trajectories. J. Acoust. Soc. Am 1999, 105: 2783-2791. 10.1121\/1.426895","journal-title":"J. Acoust. Soc. Am"},{"key":"59_CR28","first-page":"171","volume-title":"NATO Science Series: Life and Behavioural Sciences","author":"S Greenberg","year":"2006","unstructured":"Greenberg S, Arai T, Grant K: The Role of Temporal Dynamics in Understanding Spoken Language. In NATO Science Series: Life and Behavioural Sciences. IOS Press, Amsterdam; 2006:171-190."},{"key":"59_CR29","doi-asserted-by":"crossref","DOI":"10.1201\/9781420015836","volume-title":"Speech Enhancement: Theory and Practice","author":"P Loizou","year":"2007","unstructured":"Loizou P: Speech Enhancement: Theory and Practice. CRC Press, Boca Raton; 2007."}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1687-4722-2012-22.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1687-4722-2012-22\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1687-4722-2012-22.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T21:40:30Z","timestamp":1630532430000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/1687-4722-2012-22"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9,7]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["59"],"URL":"https:\/\/doi.org\/10.1186\/1687-4722-2012-22","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,9,7]]},"assertion":[{"value":"26 July 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 August 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 September 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"22"}}