{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T15:45:26Z","timestamp":1768319126568,"version":"3.49.0"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2015,1,20]],"date-time":"2015-01-20T00:00:00Z","timestamp":1421712000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2015,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This \u2018noise injection\u2019 technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.<\/jats:p>","DOI":"10.1186\/s13636-014-0047-0","type":"journal-article","created":{"date-parts":[[2015,1,19]],"date-time":"2015-01-19T10:48:52Z","timestamp":1421664532000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":84,"title":["Noisy training for deep neural networks in speech recognition"],"prefix":"10.1186","volume":"2015","author":[{"given":"Shi","family":"Yin","sequence":"first","affiliation":[]},{"given":"Chao","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Zhiyong","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yiye","family":"Lin","sequence":"additional","affiliation":[]},{"given":"Dong","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Javier","family":"Tejedor","sequence":"additional","affiliation":[]},{"given":"Thomas Fang","family":"Zheng","sequence":"additional","affiliation":[]},{"given":"Yinguo","family":"Li","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2015,1,20]]},"reference":[{"key":"47_CR1","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1561\/2000000039","volume":"7","author":"L Deng","year":"2014","unstructured":"L Deng, D Yu, Deep learning: methods and applications. Foundations Trends Signal Process. 7, 197\u2013387 (2014).","journal-title":"Foundations Trends Signal Process"},{"key":"47_CR2","doi-asserted-by":"crossref","unstructured":"H Bourlard, N Morgan, in Adaptive Processing of Sequences and Data Structures, ser. Lecture Notes in Artificial Intelligence (1387),Hybrid HMM\/ANN systems for speech recognition: overview and new research directions (USA, 1998), pp. 389\u2013417.","DOI":"10.1007\/BFb0054006"},{"key":"47_CR3","doi-asserted-by":"crossref","unstructured":"H Hermansky, DPW Ellis, S Sharma, in Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP),Tandem connectionist feature extraction for conventional HMM systems (Istanbul, Turkey, 9 June 2000), pp. 1635\u20131638.","DOI":"10.1109\/ICASSP.2000.862024"},{"key":"47_CR4","doi-asserted-by":"crossref","unstructured":"GE Dahl, D Yu, L Deng, A Acero, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Large vocabulary continuous speech recognition with context-dependent DBN-HMMs (Prague, Czech Republic, 22 May 2011), pp. 4688\u20134691.","DOI":"10.1109\/ICASSP.2011.5947401"},{"issue":"6","key":"47_CR5","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","volume":"29","author":"G Hinton","year":"2012","unstructured":"G Hinton, L Deng, D Yu, GE Dahl, A-r Mohamed, N Jaitly, A Senior, V Vanhoucke, P Nguyen, TN Sainath, B Kingsbury, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82\u201397 (2012).","journal-title":"IEEE Signal Process. Mag"},{"key":"47_CR6","unstructured":"A Mohamed, G Dahl, G Hinton, in Proc. of Neural Information Processing Systems (NIPS) Workshop Deep Learning for Speech Recognition and Related Applications,Deep belief networks for phone recognition (Vancouver, BC, Canada, 7 December 2009)."},{"issue":"1","key":"47_CR7","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1109\/TASL.2011.2134090","volume":"20","author":"GE Dahl","year":"2012","unstructured":"GE Dahl, D Yu, L Deng, A Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30\u201342 (2012).","journal-title":"IEEE Trans. Audio Speech Lang. Process"},{"key":"47_CR8","unstructured":"D Yu, L Deng, G Dahl, in Proc. of NIPS Workshop on Deep Learning and Unsupervised Feature Learning,Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition (Vancouver, BC, Canada, 6 December, 2010)."},{"key":"47_CR9","doi-asserted-by":"crossref","unstructured":"N Jaitly, P Nguyen, AW Senior, V Vanhoucke, in Proc. of Interspeech,Application of pretrained deep neural networks to large vocabulary speech recognition (Portland, Oregon, USA, 9\u201313 September 2012), pp. 2578\u20132581.","DOI":"10.21437\/Interspeech.2012-10"},{"key":"47_CR10","doi-asserted-by":"crossref","unstructured":"TN Sainath, B Kingsbury, B Ramabhadran, P Fousek, P Novak, A-r Mohamed, in Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU),Making deep belief networks effective for large vocabulary continuous speech recognition (Hawaii, USA, 11 December 2011), pp. 30\u201335.","DOI":"10.1109\/ASRU.2011.6163900"},{"issue":"1","key":"47_CR11","doi-asserted-by":"publisher","first-page":"2267","DOI":"10.1109\/TASL.2013.2284378","volume":"21","author":"TN Sainath","year":"2013","unstructured":"TN Sainath, B Kingsbury, H Soltau, B Ramabhadran, Optimization techniques to improve training speed of deep belief networks for large speech tasks. IEEE Trans. Audio Speech Lang. Process. 21(1), 2267\u20132276 (2013).","journal-title":"IEEE Trans. Audio Speech Lang. Process"},{"key":"47_CR12","doi-asserted-by":"crossref","unstructured":"F Seide, G Li, D Yu, in Proc. of Interspeech,Conversational speech transcription using context-dependent deep neural networks (Florence, Italy, 15 August 2011), pp. 437\u2013440.","DOI":"10.21437\/Interspeech.2011-169"},{"key":"47_CR13","doi-asserted-by":"crossref","unstructured":"F Seide, G Li, X Chen, D Yu, in Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU),Feature engineering in context-dependent deep neural networks for conversational speech transcription (Waikoloa, HI, USA, 11 December 2011), pp. 24\u201329.","DOI":"10.1109\/ASRU.2011.6163899"},{"key":"47_CR14","doi-asserted-by":"crossref","unstructured":"O Vinyals, SV Ravuri, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Comparing multilayer perceptron to deep belief network tandem features for robust ASR (Prague, Czech Republic, 22 May 2011), pp. 4596\u20134599.","DOI":"10.1109\/ICASSP.2011.5947378"},{"key":"47_CR15","doi-asserted-by":"crossref","unstructured":"D Yu, ML Seltzer, in Proc. of Interspeech,Improved bottleneck features using pretrained deep neural networks (Florence, Italy, 15 August 2011), pp. 237\u2013240.","DOI":"10.21437\/Interspeech.2011-91"},{"key":"47_CR16","doi-asserted-by":"crossref","unstructured":"P Bell, P Swietojanski, S Renals, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Multi-level adaptive networks in tandem and hybrid ASR systems (Vancouver, BC, Canada, 26 May 2013), pp. 6975\u20136979.","DOI":"10.1109\/ICASSP.2013.6639014"},{"key":"47_CR17","doi-asserted-by":"crossref","unstructured":"F Grezl, s Fousek P, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Optimizing bottle-neck features for LVCSR (Las Vegas, USA, 4 April 2008), pp. 4729\u20134732.","DOI":"10.1109\/ICASSP.2008.4518713"},{"issue":"12","key":"47_CR18","doi-asserted-by":"publisher","first-page":"2506","DOI":"10.1109\/TASL.2013.2277932","volume":"21","author":"P Lal","year":"2011","unstructured":"P Lal, S King, Cross-lingual automatic speech recognition using tandem features. IEEE Trans. Audio Speech Lang. Process. 21(12), 2506\u20132515 (2011).","journal-title":"IEEE Trans. Audio Speech Lang. Process"},{"key":"47_CR19","doi-asserted-by":"crossref","unstructured":"C Plahl, R Schl\u00fcter, H Ney, in Proc. of Interspeech,Hierarchical bottle neck features for LVCSR (Makuhari, Japan, 26 September 2010), pp. 1197\u20131200.","DOI":"10.21437\/Interspeech.2010-375"},{"key":"47_CR20","doi-asserted-by":"crossref","unstructured":"TN Sainath, B Kingsbury, B Ramabhadran, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Auto-encoder bottleneck features using deep belief networks (Kyoto, Japan, 25 March 2012), pp. 4153\u20134156.","DOI":"10.1109\/ICASSP.2012.6288833"},{"key":"47_CR21","doi-asserted-by":"crossref","unstructured":"Z T\u00fcske, R Schl\u00fcter, H Ney, M Sundermeyer, in Proc. of Interspeech,Context-dependent MLPs for LVCSR: tandem, hybrid or both? (Portland, Oregon, USA, 9 September 2012), pp. 18\u201321.","DOI":"10.21437\/Interspeech.2012-5"},{"key":"47_CR22","doi-asserted-by":"crossref","unstructured":"D Imseng, P Motlicek, PN Garner, H Bourlard, in Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU),Impact of deep MLP architecture on different acoustic modeling techniques for under-resourced speech recognition (Olomouc, Czech Republic, 8 December 2013), pp. 332\u2013337.","DOI":"10.1109\/ASRU.2013.6707752"},{"key":"47_CR23","doi-asserted-by":"crossref","unstructured":"J Qi, D Wang, J Xu, J Tejedor, in Proc. of Interspeech,Bottleneck features based on gammatone frequency cepstral coefficients (Lyon, France, 25 August 2013), pp. 1751\u20131755.","DOI":"10.21437\/Interspeech.2013-435"},{"key":"47_CR24","doi-asserted-by":"crossref","unstructured":"D Yu, ML Seltzer, J Li, J-T Huang, F Seide, in Proc. of International Conference on Learning Representations (ICLR),Feature learning in deep neural networks - a study on speech recognition tasks (Scottsdale, Arizona, USA, 2 May 2013).","DOI":"10.1109\/ICASSP.2013.6639012"},{"key":"47_CR25","doi-asserted-by":"crossref","unstructured":"B Li, KC Sim, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Noise adaptive front-end normalization based on vector Taylor series for deep neural networks in robust speech recognition (Vancouver, BC, Canada, 6 May 2013), pp. 7408\u20137412.","DOI":"10.1109\/ICASSP.2013.6639102"},{"key":"47_CR26","doi-asserted-by":"crossref","unstructured":"B Li, Y Tsao, KC Sim, in Proc. of Interspeech,An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition (Lyon, France, 25 August 2013), pp. 3002\u20133006.","DOI":"10.21437\/Interspeech.2013-278"},{"key":"47_CR27","doi-asserted-by":"crossref","unstructured":"ML Seltzer, D Yu, Y Wang, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),An investigation of deep neural networks for noise robust speech recognition (Vancouver, BC, Canada, 6 May 2013), pp. 7398\u20137402.","DOI":"10.1109\/ICASSP.2013.6639100"},{"key":"47_CR28","doi-asserted-by":"crossref","unstructured":"P Vincent, H Larochelle, Y Bengio, P-A Manzagol, in Proc. of the 25th International Conference on Machine Learning (ICML),Extracting and composing robust features with denoising autoencoders (Helsinki, Finland, 5 July 2008), pp. 1096\u20131103.","DOI":"10.1145\/1390156.1390294"},{"key":"47_CR29","doi-asserted-by":"crossref","unstructured":"AL Maas, QV Le, O\u2019Neil TM, O Vinyals, P Nguyen, AY Ng, in Proc. of Interspeech,Recurrent neural networks for noise reduction in robust ASR (Portland, Oregon, USA, 9 September 2012), pp. 22\u201325.","DOI":"10.21437\/Interspeech.2012-6"},{"key":"47_CR30","doi-asserted-by":"crossref","unstructured":"X Meng, C Liu, Z Zhang, D Wang, in Proc. of ChinaSIP 2014,Noisy training for deep neural networks (Xi\u2018an, China, 7 July 2014), pp. 16\u201320.","DOI":"10.1109\/ChinaSIP.2014.6889193"},{"issue":"3","key":"47_CR31","doi-asserted-by":"publisher","first-page":"643","DOI":"10.1162\/neco.1996.8.3.643","volume":"8","author":"G An","year":"1996","unstructured":"G An, The effects of adding noise during backpropagation training on a generalization performance. Neural Comput. 8(3), 643\u2013674 (1996).","journal-title":"Neural Comput"},{"issue":"4","key":"47_CR32","doi-asserted-by":"publisher","first-page":"678","DOI":"10.1109\/21.370200","volume":"25","author":"Y Grandvalet","year":"1995","unstructured":"Y Grandvalet, S Canu, Comments on \u2018noise injection into inputs in back propagation learning\u2019. IEEE Trans. Syst. Man Cybernet. 25(4), 678\u2013681 (1995).","journal-title":"IEEE Trans. Syst. Man Cybernet"},{"issue":"1","key":"47_CR33","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1162\/neco.1995.7.1.108","volume":"7","author":"CM Bishop","year":"1995","unstructured":"CM Bishop, Training with noise is equivalent to Tikhonov regularization. Neural Comput. 7(1), 108\u2013116 (1995).","journal-title":"Neural Comput"},{"issue":"5","key":"47_CR34","doi-asserted-by":"publisher","first-page":"1093","DOI":"10.1162\/neco.1997.9.5.1093","volume":"9","author":"Y Grandvalet","year":"1997","unstructured":"Y Grandvalet, S Canu, S Boucheron, Noise injection: theoretical prospects. Neural Comput. 9(5), 1093\u20131108 (1997).","journal-title":"Neural Comput"},{"key":"47_CR35","doi-asserted-by":"crossref","unstructured":"J Sietsma, RJF Dow, in Proc. of IEEE International Conference on Neural Networks,Neural net pruning-why and how (San Diego, California, USA, 24 July 1988), pp. 325\u2013333.","DOI":"10.1109\/ICNN.1988.23864"},{"issue":"3","key":"47_CR36","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1109\/21.155944","volume":"22","author":"K Matsuoka","year":"1992","unstructured":"K Matsuoka, Noise injection into inputs in back-propagation learning. IEEE Trans. Syst. Man Cybernet. 22(3), 436\u2013440 (1992).","journal-title":"IEEE Trans. Syst. Man Cybernet"},{"issue":"3","key":"47_CR37","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1109\/72.377960","volume":"6","author":"R Reed","year":"1995","unstructured":"R Reed, RJ Marks, Seho Oh, Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Trans. Neural Netw. 6(3), 529\u2013538 (1995).","journal-title":"IEEE Trans. Neural Netw"}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-014-0047-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13636-014-0047-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-014-0047-0","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-014-0047-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T01:30:53Z","timestamp":1747445453000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-014-0047-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,20]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,12]]}},"alternative-id":["47"],"URL":"https:\/\/doi.org\/10.1186\/s13636-014-0047-0","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,1,20]]},"assertion":[{"value":"29 October 2014","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 December 2014","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 January 2015","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"2"}}