{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,5,9]],"date-time":"2024-05-09T10:29:17Z","timestamp":1715250557553},"reference-count":57,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2015,6,26]],"date-time":"2015-06-26T00:00:00Z","timestamp":1435276800000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2015,12]]},"DOI":"10.1186\/s13636-015-0058-5","type":"journal-article","created":{"date-parts":[[2015,6,25]],"date-time":"2015-06-25T10:06:23Z","timestamp":1435226783000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Exploiting foreign resources for DNN-based ASR"],"prefix":"10.1186","volume":"2015","author":[{"given":"Petr","family":"Motlicek","sequence":"first","affiliation":[]},{"given":"David","family":"Imseng","sequence":"additional","affiliation":[]},{"given":"Blaise","family":"Potard","sequence":"additional","affiliation":[]},{"given":"Philip N.","family":"Garner","sequence":"additional","affiliation":[]},{"given":"Ivan","family":"Himawan","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2015,6,26]]},"reference":[{"key":"58_CR1","doi-asserted-by":"crossref","unstructured":"J Cohen, in Automatic Speech Recognition Understanding Workshop (ASRU). The gale project: a description and an update, (2007), pp. 237\u2013237. doi: 10.1109\/ASRU.2007.4430115 .","DOI":"10.1109\/ASRU.2007.4430115"},{"key":"58_CR2","unstructured":"NT Vu, F Kraus, T Schultz, in Proc. of the IEEE Workshop on Spoken Language Technology (SLT). Multilingual a-stabil: a new confidence score for multilingual unsupervised training, (2010), pp. 183\u2013188."},{"key":"58_CR3","unstructured":"S Thomas, ML Seltzer, K Church, H Hermansky, in Proc. of ICASSP. Deep neural network features and semi-supervised training for low resource speech recognition, (2013), pp. 6704\u20136708."},{"key":"58_CR4","unstructured":"D Imseng, P Motlicek, H Bourlard, PN Garner, in Speech Communication. Using out-of-language data to improve an under-resourced speech recognizer, (2014), pp. 142\u2013151."},{"key":"58_CR5","unstructured":"D Imseng, H Bourlard, H Caesar, PN Garner, G Lecorv\u00e9, A Nanchen, in Proc. of the IEEE Workshop on Spoken Language Technology (SLT). MediaParl: Bilingual mixed language accented speech database, (2012), pp. 263\u2013268."},{"key":"58_CR6","unstructured":"D Imseng, B Potard, P Motlicek, A Nanchen, H Bourlard, in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing. Exploiting un-transcribed foreign data for speech recognition in well-resourced languages, (2014), pp. 2322\u20132326."},{"key":"58_CR7","unstructured":"Y Huang, D Yu, Y Gong, C Liu, in Proc. of Interspeech. Semi-supervised GMM and DNN acoustic model training with multi-system combination and confidence re-calibration, (2013), pp. 2360\u20132364."},{"issue":"2","key":"58_CR8","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1109\/5.18626","volume":"77","author":"LR Rabiner","year":"1989","unstructured":"LR Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE. 77(2), 257\u2013286 (1989).","journal-title":"Proc. IEEE"},{"key":"58_CR9","unstructured":"LR Rabiner, BH Juang, in IEEE ASSP Magazine. An introduction to hidden Markov models, (1986), pp. 5 \u201316."},{"key":"58_CR10","unstructured":"F Grezl, M Karafiat, L Burget, in Proc. of Interspeech. Investigation into bottle-neck features for meeting speech recognition, (2009), pp. 2947\u20132950."},{"key":"58_CR11","unstructured":"F Gr\u00e9zl, M Karafi\u00e1t, M Janda, in Proc. of ASRU. Study of probabilistic and bottle-neck features in multilingual environment, (2011), pp. 359\u2013364."},{"key":"58_CR12","doi-asserted-by":"crossref","unstructured":"S Galliano, E Geoffrois, G Gravier, J-F Bonastre, D Mostefa, K Choukri, in Proc. of the International Conference on Language Resources and Evaluation. Corpus description of the ESTER evaluation campaign for the rich transcription of French broadcast news, (2006).","DOI":"10.21437\/Interspeech.2005-441"},{"key":"58_CR13","unstructured":"LF Lamel, J-L Gauvain, M Eskenazi, M Eskenazi, in Proceedings of Eurospeech. BREF, a large vocabulary spoken corpus for french, (1991), pp. 505\u2013508."},{"key":"58_CR14","unstructured":"L Dau-Cheng, et al, in Proc. of Interspeech. SEAME: a Mandarin-english code-switching speech corpus in South-East Asia, (2010), pp. 1986\u20131989."},{"key":"58_CR15","doi-asserted-by":"crossref","unstructured":"H Bourlard, N Morgan, Continuous speech recognition: a hybrid approach (Kluwer Academic Publishers, 1994).","DOI":"10.1007\/978-1-4615-3210-1"},{"issue":"1","key":"58_CR16","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1109\/89.260359","volume":"2","author":"S Renals","year":"1994","unstructured":"S Renals, H Bourlard, M Cohen, H Franco, Connectionist probability estimators in HMM speech recognition. IEEE Trans. Audio Speech Lang. Process. 2(1), 161\u2013174 (1994).","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"58_CR17","unstructured":"H Hermansky, DPW Ellis, S Sharma, in Proc. of ICASSP. Tandem connectionist feature extraction for conventional HMM systems, (2000), pp. 1635\u20131638."},{"key":"58_CR18","unstructured":"MM Hochberg, S Renals, A Robinson, DG Cook, in Proc. of ICASSP. Recent improvements to the ABBOT large vocabulary CSR system, (1995), pp. 69\u201372."},{"key":"58_CR19","unstructured":"J-L Gauvain, C-H Lee, in Proc. of ICASSP, 2. Speaker adaptation based on MAP estimation of HMM parameters, (1993), pp. 558\u2013561."},{"key":"58_CR20","unstructured":"MJF Gales, Maximum likelihood linear transformation for HMM-based speech recognition. Report CUED\/F-INFENG\/TR291, Cambridge University Engineering Department (1997)."},{"key":"58_CR21","unstructured":"SJ Young, JJ Odell, PC Woodland, in Proceedings of the Workshop on Human Language Technology. Tree-based state tying for high accuracy acoustic modelling, (1994), pp. 307\u2013312."},{"key":"58_CR22","unstructured":"D Povey, PC Woodland, in Proc. of ICASSP. Minimum phone error and I-smoothing for improved discriminative training, (2002), pp. 105\u2013108."},{"key":"58_CR23","unstructured":"D Povey, B Kingsbury, L Mangu, G Saon, H Soltau, G Zweig, in Proc. of ICASSP. FMPE: discriminatively trained features for speech recognition, (2005), pp. 961\u2013964."},{"key":"58_CR24","unstructured":"D Erhan, PA Manzagol, Y Bengio, S Bengio, P Vincent, in Proc. 12th. Int. Conference on Artificial Int. Statist. (AISTATS). The difficulty of training deep architectures and the effect of unsupervised pre-training, (2009), pp. 153\u2013160."},{"key":"58_CR25","unstructured":"G Hinton, in Tech. Rep. UTML TR 2010-003. A practical guide to training restricted Boltzmann machines (University of Toronto, 2010)."},{"key":"58_CR26","unstructured":"A-r Mohamed, GE Dahl, G Hinton, Deep belief networks for phone recognition. NIPS workshop on deep learning for speech recognition (2009)."},{"key":"58_CR27","unstructured":"D Yu, L Deng, GE Dahl, in NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning. Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition, (2010). http:\/\/research.microsoft.com\/apps\/pubs\/default.aspx?id=143619 ."},{"issue":"1","key":"58_CR28","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1109\/TASL.2011.2134090","volume":"20","author":"GE Dahl","year":"2012","unstructured":"GE Dahl, D Yu, L Deng, A Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30\u201342 (2012).","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"58_CR29","unstructured":"S Thomas, H Hermansky, in Proc. of Interspeech. Cross-lingual and multistream posterior features for low resource LVCSR systems, (2010), pp. 877\u2013880."},{"key":"58_CR30","unstructured":"S Thomas, S Ganapathy, H Hermansky, in Proc. of ICASSP. Multilingual MLP features for low-resource LVCSR systems, (2012), pp. 4269\u20134272."},{"key":"58_CR31","unstructured":"K Vesely, M Karafi\u00e1t, F Grezl, M Janda, E Egorova, in IEEE Spoken Language Technology Workshop (SLT). The language-independent bottleneck features, (2012), pp. 336\u2013341."},{"key":"58_CR32","unstructured":"J-T Huang, J Li, D Yu, L Deng, Y Gong, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers, (2013), pp. 7304\u20137308."},{"key":"58_CR33","unstructured":"A Ghoshal, P Swietojanski, S Renals, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Multilingual training of deep neural networks, (2013), pp. 7319\u20137323."},{"key":"58_CR34","doi-asserted-by":"crossref","unstructured":"P Swietojanski, S Renals, in Proc. IEEE Workshop on Spoken Language Technology. Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models (Lake Tahoe, USA, 2014).","DOI":"10.1109\/SLT.2014.7078569"},{"key":"58_CR35","unstructured":"T Ochiai, S Matsuda, X Lu, C Hori, S Katagiri, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Speaker adaptive training using deep neural networks, (2014), pp. 6349\u20136353."},{"key":"58_CR36","unstructured":"K Vesely, A Ghoshal, L Burget, D Povey, in INTERSPEECH. Sequence-discriminative training of deep neural networks, (2013), pp. 2345\u20132349."},{"key":"58_CR37","doi-asserted-by":"crossref","unstructured":"P Swietojanski, A Ghoshal, S Renals, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). Hybrid acoustic models for distant and multichannel large vocabulary speech recognition, (2013). doi: 10.1109\/ASRU.2013.6707744 .","DOI":"10.1109\/ASRU.2013.6707744"},{"key":"58_CR38","unstructured":"G Perennou, in Proc. of ICASSP, 11. B.D.L.E.X: a data and cognition base of spoken French, (1986), pp. 325\u2013328."},{"key":"58_CR39","unstructured":"F Schiel, Aussprache-Lexikon PHONOLEX (2013). http:\/\/www.phonetik.uni-muenchen.de\/forschung\/Bas\/BasPHONOLEXeng.html ."},{"key":"58_CR40","unstructured":"JC Wells, SAMPA computer readable phonetic alphabet (2013). http:\/\/www.phon.ucl.ac.uk\/home\/sampa\/ ."},{"key":"58_CR41","unstructured":"J Novak, N Minematsu, K Hirose, C Hori, H Kashioka, P Dixon, in Proc. of Interspeech. Improving WFST-based G2P conversion with alignment constraints and RNNLM N-best rescoring, (2012), pp. 2526\u20132529."},{"key":"58_CR42","first-page":"625","volume":"11","author":"D Erhan","year":"2010","unstructured":"D Erhan, Y Bengio, A Courville, P-A Manzagol, P Vincent, S Bengio, Why does unsupervised pre-training help deep learning?J. Mach. Learn. Res. 11, 625\u2013660 (2010).","journal-title":"J. Mach. Learn. Res."},{"key":"58_CR43","first-page":"1527","volume":"18","author":"G Hinton","year":"2006","unstructured":"G Hinton, S Osindero, Y Teh, A fast algorithm for deep belief nets. Neural Nets. 18, 1527\u20131554 (2006).","journal-title":"Neural Nets"},{"key":"58_CR44","unstructured":"P Swietojanski, A Ghoshal, S Renals, in Proc. of the IEEE Workshop on Spoken Language Technology (SLT). Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR, (2012), pp. 246\u2013251."},{"key":"58_CR45","unstructured":"K Vesely, M Hannemann, L Burget, in Proc. of ASRU. Semi-supervised training of deep neural networks, (2013), pp. 267\u2013272."},{"key":"58_CR46","unstructured":"D Imseng, P Motlicek, P Garner, H Bourlard, in Proc. of ASRU. Impact of deep MLP architecture on different acoustic modeling techniques for under-resourced speech recognition, (2013), pp. 332\u2013337."},{"key":"58_CR47","unstructured":"P Koehn, in Proceedings of the 10th Machine Translation Summit. Europarl: a parallel corpus for statistical machine translation, (2005), pp. 79\u201386."},{"key":"58_CR48","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1016\/j.specom.2004.06.003","volume":"45","author":"A Kusumoto","year":"2005","unstructured":"A Kusumoto, T Arai, K Kinoshita, N Hodoshima, N Vaughan, Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments. Speech Commun. 45, 101\u2013113 (2005).","journal-title":"Speech Commun."},{"key":"58_CR49","unstructured":"H Hermansky, in Proc. of ASRU. Trap-tandem: data-driven extraction of temporal features from speech, (2003), pp. 255\u2013260."},{"key":"58_CR50","unstructured":"B Kingsbury, N Morgan, in Proc. of ICASSP, 2, (1997), pp. 1259\u20131262."},{"key":"58_CR51","unstructured":"M Athineos, D Ellis, in Proc. of ASRU. Frequency-domain linear prediction for temporal features, (2003), pp. 261\u2013266."},{"key":"58_CR52","unstructured":"S Ganapathy, S Thomas, P Motlicek, H Hermansky, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Applications of signal analysis using autoregressive models for amplitude modulation, (2009), pp. 341\u2013344."},{"issue":"1","key":"58_CR53","doi-asserted-by":"publisher","first-page":"EL8","DOI":"10.1121\/1.3040022","volume":"125","author":"S Ganapathy","year":"2009","unstructured":"S Ganapathy, S Thomas, H Hermansky, Modulation frequency features for phoneme recognition in noisy speech. J. Acoust. Soc. Am. 125(1), EL8\u2013EL12 (2009).","journal-title":"J. Acoust. Soc. Am."},{"issue":"2","key":"58_CR54","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1006\/csla.1998.0043","volume":"12","author":"M Gales","year":"1998","unstructured":"M Gales, Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75\u201398 (1998).","journal-title":"Comput. Speech Lang."},{"key":"58_CR55","unstructured":"S Matsoukas, R Schwartz, H Jin, L Nguyen, in DARPA Speech Recognition Workshop. Practical implementations of speaker-adaptive training, (1997)."},{"key":"58_CR56","unstructured":"D Povey, in Proceedings of Interspeech. Improvements to fMPE for discriminative training of features, (2005), pp. 2977\u20132980."},{"key":"58_CR57","unstructured":"D Povey, D Kanevsky, et al, in Proceedings of IEEE ICASSP. Boosted MMI for model and feature-space discriminative training, (2008), pp. 4057\u20134060."}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13636-015-0058-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-015-0058-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-015-0058-5","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-015-0058-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,15]],"date-time":"2022-05-15T09:10:28Z","timestamp":1652605828000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-015-0058-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,6,26]]},"references-count":57,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,12]]}},"alternative-id":["58"],"URL":"https:\/\/doi.org\/10.1186\/s13636-015-0058-5","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,6,26]]},"assertion":[{"value":"15 December 2014","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 May 2015","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 June 2015","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"17"}}