{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T10:38:59Z","timestamp":1776335939520,"version":"3.51.2"},"reference-count":446,"publisher":"Emerald","issue":"3-4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,6,30]]},"abstract":"<jats:p>This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.<\/jats:p>","DOI":"10.1561\/2000000039","type":"journal-article","created":{"date-parts":[[2014,6,30]],"date-time":"2014-06-30T09:03:26Z","timestamp":1404119006000},"page":"197-387","source":"Crossref","is-referenced-by-count":2799,"title":["Deep Learning: Methods and Applications"],"prefix":"10.1561","volume":"7","author":[{"given":"Li","family":"Deng","sequence":"first","affiliation":[{"name":"Microsoft Research, One Microsoft Way , Redmond, WA 98052 ,","place":["USA"]}]},{"given":"Dong","family":"Yu","sequence":"additional","affiliation":[{"name":"Microsoft Research, One Microsoft Way , Redmond, WA 98052 ,","place":["USA"]}]}],"member":"140","published-online":{"date-parts":[[2014,6,30]]},"reference":[{"key":"2026040313421830200_ref001","volume-title":"Proceedings of Interspeech,","author":"Abdel-Hamid","year":"2013"},{"key":"2026040313421830200_ref002","volume-title":"Proceedings of Interspeech.","author":"Abdel-Hamid","year":"2013"},{"key":"2026040313421830200_ref003","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Abdel-Hamid","year":"2012"},{"key":"2026040313421830200_ref004","volume-title":"Proceedings of Interspeech","author":"Acero","year":"2000"},{"key":"2026040313421830200_ref005","volume-title":"Proceedings of International Conference on Learning Representations (ICLR).","author":"Alain","year":"2013"},{"issue":"6","key":"2026040313421830200_ref006","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1145\/2461256.2461262","article-title":"Deep learning comes of age","volume":"56","author":"Anthes","year":"2013","journal-title":"Communications of the Association for Computing Machinery (ACM),"},{"key":"2026040313421830200_ref007","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1109\/MCI.2010.938364","article-title":"Deep machine learning \u2014 a new frontier in artificial intelligence","volume":"5","author":"Arel","year":"2010","journal-title":"IEEE Computational Intelligence Magazine,"},{"key":"2026040313421830200_ref008","volume-title":"Proceedings of the Joint Human Language Technology Conference and the North American Chapter of the Association of Computational Linguistics (HLT-NAACL) Workshop.","author":"Arisoy","year":"2012"},{"key":"2026040313421830200_ref009","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Aslan","year":"2013"},{"key":"2026040313421830200_ref010","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Ba","year":"2013"},{"issue":"3","key":"2026040313421830200_ref011","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1109\/MSP.2009.932166","article-title":"Research developments and directions in speech recognition and understanding","volume":"26","author":"Baker","year":"2009","journal-title":"IEEE Signal Processing Magazine"},{"issue":"4","key":"2026040313421830200_ref012","doi-asserted-by":"crossref","DOI":"10.1109\/MSP.2009.932707","article-title":"Updated MINS report on speech recognition and understanding","volume":"26","author":"Baker","year":"2009","journal-title":"IEEE Signal Processing Magazine,"},{"key":"2026040313421830200_ref013","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Baldi","year":"2013"},{"key":"2026040313421830200_ref014","unstructured":"E.\n              Battenberg\n            , E.Schmidt, and J.Bello. Deep learning for music, special session at International Conference on Acoustics Speech and Signal Processing (ICASSP) (http:\/\/www.icassp2014.org\/ special_sections.html#ss8), 2014."},{"key":"2026040313421830200_ref015","volume-title":"Proceedings of International Symposium on Music Information Retrieval (ISMIR).","author":"Batternberg","year":"2012"},{"key":"2026040313421830200_ref016","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Bell","year":"2013"},{"key":"2026040313421830200_ref017","author":"Bengio","year":"1991"},{"key":"2026040313421830200_ref018","author":"Bengio","year":"2002"},{"key":"2026040313421830200_ref019","volume-title":"Scholarpedia,","author":"Bengio","year":"2008"},{"issue":"1","key":"2026040313421830200_ref020","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000006","article-title":"Learning deep architectures for AI","volume":"2","author":"Bengio","year":"2009","journal-title":"Foundations and Trends in Machine Learning,"},{"key":"2026040313421830200_ref021","first-page":"17","article-title":"Deep learning of representations for unsupervised and transfer learning","volume":"27","author":"Bengio","year":"2012","journal-title":"Journal of Machine Learning Research Workshop and Conference Proceedings"},{"key":"2026040313421830200_ref022","first-page":"1","volume-title":"Statistical Language and Speech Processing,","author":"Bengio","year":"2013"},{"key":"2026040313421830200_ref023","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Bengio","year":"2013"},{"key":"2026040313421830200_ref024","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: A review and new perspectives","volume":"38","author":"Bengio","year":"2013","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),"},{"key":"2026040313421830200_ref025","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1109\/72.125866","article-title":"Global optimization of a neural network-hidden markov model hybrid","volume":"3","author":"Bengio","year":"1992","journal-title":"IEEE Transactions on Neural Networks,"},{"key":"2026040313421830200_ref026","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Bengio","year":"2000"},{"key":"2026040313421830200_ref027","first-page":"1137","article-title":"A neural probabilistic language model","volume":"3","author":"Bengio","year":"2003","journal-title":"Journal of Machine Learning Research,"},{"key":"2026040313421830200_ref028","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Bengio","year":"2006"},{"key":"2026040313421830200_ref029","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1109\/72.279181","article-title":"Learning long-term dependencies with gradient descent is difficult","volume":"5","author":"Bengio","year":"1994","journal-title":"IEEE Transactions on Neural Networks,"},{"key":"2026040313421830200_ref030","volume-title":"Proceedings of International Conference on Machine Learning (ICML), 2014.","author":"Bengio","year":"2013"},{"key":"2026040313421830200_ref031","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Bengio","year":"2013"},{"key":"2026040313421830200_ref032","first-page":"281","article-title":"Random search for hyper-parameter optimization","volume":"3","author":"Bergstra","year":"2012","journal-title":"Journal on Machine Learning Research,"},{"key":"2026040313421830200_ref033","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1109\/89.902277","article-title":"An application of discriminative feature extraction to filter-bank-based speech recognition","volume":"9","author":"Biem","year":"2001","journal-title":"IEEE Transactions on Speech and Audio Processing,"},{"key":"2026040313421830200_ref034","first-page":"29","article-title":"Dynamic graphical models","volume":"33","author":"Bilmes","year":"2010","journal-title":"IEEE Signal Processing Magazine,"},{"key":"2026040313421830200_ref035","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1109\/MSP.2005.1511827","article-title":"Graphical model architectures for speech recognition","volume":"22","author":"Bilmes","year":"2005","journal-title":"IEEE Signal Processing Magazine,"},{"key":"2026040313421830200_ref036","volume-title":"Machine Learning,","author":"Bordes","year":"2013"},{"key":"2026040313421830200_ref037","volume-title":"Proceedings of Association for the Advancement of Artificial Intelligence (AAAI).","author":"Bordes","year":"2011"},{"key":"2026040313421830200_ref038","first-page":"3207","article-title":"From machine learning to machine reasoning: An essay","volume":"14","author":"Bottou","year":"2013","journal-title":"Journal of Machine Learning Research,"},{"key":"2026040313421830200_ref039","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Bottou","year":"2004"},{"key":"2026040313421830200_ref040","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Boulanger-Lewandowski","year":"2012"},{"key":"2026040313421830200_ref041","volume-title":"Proceedings of International Symposium on Music Information Retrieval (ISMIR).","author":"Boulanger-Lewandowski","year":"2013"},{"key":"2026040313421830200_ref042","volume-title":"Connectionist Speech Recognition: A Hybrid Approach.","author":"Bourlard","year":"1993"},{"key":"2026040313421830200_ref043","author":"Bouvrie","year":"2009"},{"key":"2026040313421830200_ref044","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1023\/A:1018046112532","article-title":"Stacked regression","volume":"24","author":"Breiman","year":"1996","journal-title":"Machine Learning,"},{"key":"2026040313421830200_ref045","author":"Bridle","year":"1998"},{"issue":"11","key":"2026040313421830200_ref046","doi-asserted-by":"crossref","first-page":"2290","DOI":"10.1109\/TASL.2013.2271591","article-title":"Large vocabulary speech recognition on parallel architectures","volume":"21","author":"Cardinal","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref047","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1023\/A:1007379606734","article-title":"Multitask learning","volume":"28","author":"Caruana","year":"1997","journal-title":"Machine Learning"},{"key":"2026040313421830200_ref048","volume-title":"Proceedings of International Conference on Learning Representations.","author":"Chen","year":"2014"},{"key":"2026040313421830200_ref049","volume-title":"Proceedings of Interspeech.","author":"Chen","year":"2012"},{"key":"2026040313421830200_ref050","first-page":"243","volume-title":"IEEE Transactions on Speech and Audio Processing","author":"Chengalvarayan","year":"1997"},{"key":"2026040313421830200_ref051","first-page":"232","volume-title":"IEEE Transactions on Speech and Audio Processing,","author":"Chengalvarayan","year":"1997"},{"issue":"6","key":"2026040313421830200_ref052","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1109\/89.725317","article-title":"Speech trajectory discrimination using the minimum classification error learning","volume":"6","author":"Chengalvarayan","year":"1998","journal-title":"IEEE Transactions on Speech and Audio Processing,"},{"key":"2026040313421830200_ref053","first-page":"342","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Cho","year":"2009"},{"key":"2026040313421830200_ref054","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Ciresan","year":"2012"},{"key":"2026040313421830200_ref055","volume-title":"Neural Computation","author":"Ciresan","year":"2010"},{"key":"2026040313421830200_ref056","volume-title":"Proceedings of International Joint Conference on Neural Networks (IJCNN).","author":"Ciresan","year":"2011"},{"key":"2026040313421830200_ref057","volume-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR).","author":"Ciresan","year":"2012"},{"key":"2026040313421830200_ref058","volume-title":"Proceedings of International Joint Conference on Neural Networks (IJCNN).","author":"Ciresan","year":"2012"},{"key":"2026040313421830200_ref059","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Coates","year":"2013"},{"key":"2026040313421830200_ref060","first-page":"671","volume-title":"Proceedings of International Joint Conference on Artificial Intelligence (IJCAI),","author":"Cohen","year":"2005"},{"key":"2026040313421830200_ref061","volume-title":"Proceedings of Artificial Intelligence and Statistics (AISTATS","author":"Collobert","year":"2011"},{"key":"2026040313421830200_ref062","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Collobert","year":"2008"},{"key":"2026040313421830200_ref063","first-page":"2493","article-title":"Natural language processing (almost) from scratch","volume":"12","author":"Collobert","year":"2011","journal-title":"Journal on Machine Learning Research,"},{"key":"2026040313421830200_ref064","first-page":"469","article-title":"Phone recognition with the mean-covariance restricted boltzmann machine","volume":"23","author":"Dahl","year":"2010","journal-title":"Proceedings of Neural Information Processing Systems (NIPS"},{"key":"2026040313421830200_ref065","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Dahl","year":"2013"},{"key":"2026040313421830200_ref066","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Dahl","year":"2013"},{"key":"2026040313421830200_ref067","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Dahl","year":"2011"},{"issue":"1","key":"2026040313421830200_ref068","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/TASL.2011.2134090","article-title":"Context-dependent, pre-trained deep neural networks for large vocabulary speech recognition","volume":"20","author":"Dahl","year":"2012","journal-title":"IEEE Transactions on Audio, Speech, & Language Processing,"},{"key":"2026040313421830200_ref069","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Dean","year":"2012"},{"key":"2026040313421830200_ref070","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Demuynck","year":"2013"},{"issue":"1","key":"2026040313421830200_ref071","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/0165-1684(92)90112-A","article-title":"A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal","volume":"27","author":"Deng","year":"1992","journal-title":"Signal Processing,"},{"key":"2026040313421830200_ref072","first-page":"471","volume-title":"IEEE Transactions on Speech and Audio Processing","author":"Deng","year":"1993"},{"issue":"4","key":"2026040313421830200_ref073","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1016\/S0167-6393(98)00023-5","article-title":"A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition","volume":"24","author":"Deng","year":"1998","journal-title":"Speech Communication,"},{"key":"2026040313421830200_ref074","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1007\/978-3-642-60087-6_20","volume-title":"Computational Models of Speech Pattern Processing,","author":"Deng","year":"1999"},{"key":"2026040313421830200_ref075","first-page":"115","volume-title":"Mathematical Foundations of Speech and Language Processing,","author":"Deng","year":"2003"},{"key":"2026040313421830200_ref076","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-02555-6","volume-title":"Dynamic Speech Models","author":"Deng","year":"2006"},{"key":"2026040313421830200_ref077","volume-title":"Proceedings of Asian-Pacific Signal & Information Processing Annual Summit and Conference (APSIPA-ASC).","author":"Deng","year":"2011"},{"issue":"6","key":"2026040313421830200_ref078","article-title":"The MNIST database of handwritten digit images for machine learning research","volume":"29","author":"Deng","year":"2012","journal-title":"IEEE Signal Processing Magazine,"},{"key":"2026040313421830200_ref079","volume-title":"Neural Information Processing Systems (NIPS) Workshop on Learning Output Representations.","author":"Deng","year":"2013"},{"key":"2026040313421830200_ref080","volume-title":"Asian-Pacific Signal & Information Processing Association Transactions on Signal and Information Processing.","author":"Deng","year":"2013"},{"key":"2026040313421830200_ref081","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Deng","year":"2013"},{"key":"2026040313421830200_ref082","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Deng","year":"2001"},{"key":"2026040313421830200_ref083","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1109\/89.593305","article-title":"Speaker-independent phonetic classification using hidden markov models with state-conditioned mixtures of trend functions","volume":"5","author":"Deng","year":"1997","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"issue":"4","key":"2026040313421830200_ref084","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1109\/89.326610","article-title":"Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states","volume":"2","author":"Deng","year":"1994","journal-title":"IEEE Transactions on Speech and Audio Processing,"},{"key":"2026040313421830200_ref085","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Deng","year":"2014"},{"issue":"6","key":"2026040313421830200_ref086","doi-asserted-by":"crossref","first-page":"3058","DOI":"10.1121\/1.404202","article-title":"Structural design of a hidden Markov model based speech recognizer using multi-valued phonetic features: Comparison with segmental speech units","volume":"92","author":"Deng","year":"1992","journal-title":"Journal of the Acoustical Society of America,"},{"issue":"2","key":"2026040313421830200_ref087","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1016\/0893-6080(94)90027-2","volume":"7","author":"Deng","year":"1994","journal-title":"Neural Networks,"},{"key":"2026040313421830200_ref088","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Deng","year":"2013"},{"key":"2026040313421830200_ref089","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Deng","year":"2013"},{"issue":"1","key":"2026040313421830200_ref090","first-page":"11","article-title":"Challenges in adopting speech recognition","volume":"47","author":"Deng","year":"2004","journal-title":"Communications of the Association for Computing Machinery (ACM),"},{"key":"2026040313421830200_ref091","volume-title":"Proceedings of Interspeech.","author":"Deng","year":"2012"},{"issue":"7","key":"2026040313421830200_ref092","doi-asserted-by":"crossref","first-page":"1677","DOI":"10.1109\/78.134406","article-title":"Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition","volume":"39","author":"Deng","year":"1991","journal-title":"IEEE Transactions on Signal Processing,"},{"issue":"4","key":"2026040313421830200_ref093","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1016\/0885-2308(90)90015-X","article-title":"Large vocabulary word recognition using context\u2013dependent allophonic hidden Markov models","volume":"4","author":"Deng","year":"1990","journal-title":"Computer Speech and Language,"},{"key":"2026040313421830200_ref094","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Deng","year":"2013"},{"key":"2026040313421830200_ref095","doi-asserted-by":"crossref","first-page":"1060","DOI":"10.1109\/TASL.2013.2244083","article-title":"Machine learning paradigms in speech recognition: An overview","volume":"21","author":"Deng","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, & Language"},{"key":"2026040313421830200_ref096","doi-asserted-by":"crossref","first-page":"3036","DOI":"10.1121\/1.1315288","article-title":"Spontaneous speech recognition using a statistical coarticulatory model for the vocal tract resonance dynamics","volume":"108","author":"Deng","year":"2000","journal-title":"Journal of the Acoustical Society America"},{"key":"2026040313421830200_ref097","volume-title":"Speech Processing \u2014 A Dynamic and Optimization-Oriented Approach","author":"Deng","year":"2003"},{"issue":"2\u20133","key":"2026040313421830200_ref098","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/S0167-6393(97)00018-6","article-title":"Production models as a structural basis for automatic speech recognition","volume":"33","author":"Deng","year":"1997","journal-title":"Speech Communication,"},{"issue":"4","key":"2026040313421830200_ref099","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1109\/89.506934","article-title":"Transitional speech units and their representation by regressive Markov states: Applications to speech recognition","volume":"4","author":"Deng","year":"1996","journal-title":"IEEE Transactions on speech and audio processing,"},{"key":"2026040313421830200_ref100","volume-title":"Proceedings of Interspeech.","author":"Deng","year":"2010"},{"issue":"5","key":"2026040313421830200_ref101","doi-asserted-by":"crossref","first-page":"2702","DOI":"10.1121\/1.409839","article-title":"A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features","volume":"85","author":"Deng","year":"1994","journal-title":"Journal of the Acoustical Society of America,"},{"key":"2026040313421830200_ref102","volume-title":"Proceedings of IEEE Workshop on Spoken Language Technologies","author":"Deng","year":"2012"},{"issue":"8","key":"2026040313421830200_ref103","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1109\/TSA.2002.804538","article-title":"Distributed speech processing in mipad\u2019s multimodal user interface","volume":"10","author":"Deng","year":"2002","journal-title":"IEEE Transactions on Speech and Audio Processing,"},{"issue":"3","key":"2026040313421830200_ref104","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1109\/TSA.2005.845814","article-title":"Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion","volume":"13","author":"Deng","year":"2005","journal-title":"IEEE Transactions on Speech and Audio Processing,"},{"key":"2026040313421830200_ref105","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Deng","year":"2007"},{"key":"2026040313421830200_ref106","volume-title":"Proceedings of Interspeech.","author":"Deng","year":"2011"},{"issue":"1","key":"2026040313421830200_ref107","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1109\/TSA.2005.854107","article-title":"A bidirectional target filtering model of speech coarticulation: Two\u2013stage implementation for phonetic recognition","volume":"14","author":"Deng","year":"2006","journal-title":"IEEE Transactions on Audio and Speech Processing"},{"issue":"5","key":"2026040313421830200_ref108","doi-asserted-by":"crossref","first-page":"1492","DOI":"10.1109\/TASL.2006.878265","article-title":"Structured speech modeling","volume":"14","author":"Deng","year":"2006","journal-title":"IEEE Transactions on Audio, Speech and Language Processing"},{"key":"2026040313421830200_ref109","volume-title":"Neural Information Processing Systems (NIPS) Workshop","author":"Deng","year":"2009"},{"key":"2026040313421830200_ref110","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Deng","year":"2012"},{"key":"2026040313421830200_ref111","first-page":"233","volume-title":"Proceedings of 4th Workshop on Statistical Machine Translation,","author":"Deselaers","year":"2009"},{"key":"2026040313421830200_ref112","author":"Diez","year":"2013"},{"key":"2026040313421830200_ref113","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Dognin","year":"2013"},{"key":"2026040313421830200_ref114","first-page":"201","volume-title":"Journal on Machine Learning Research,","author":"Erhan"},{"key":"2026040313421830200_ref115","first-page":"6885","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP),","author":"Fernandez","year":"2013"},{"key":"2026040313421830200_ref116","first-page":"41","volume-title":"Machine Learning","author":"Fine","year":"1998"},{"key":"2026040313421830200_ref117","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Frome","year":"2013"},{"key":"2026040313421830200_ref118","volume-title":"Proceedings of Interspeech","author":"Fu","year":"2007"},{"key":"2026040313421830200_ref119","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1007\/978-3-642-21317-5_5","volume-title":"Robust Speech Recognition of Uncertain or Missing Data: Theory and Application","author":"Gales","year":"2011"},{"key":"2026040313421830200_ref120","volume-title":"Proceedings of Conference on Information and Knowledge Management (CIKM).","author":"Gao","year":"2010"},{"key":"2026040313421830200_ref121","volume-title":"Proceedings of Neural Information Processing Systems (NIPS) Workshop on Deep Learning","author":"Gao","year":"2013"},{"key":"2026040313421830200_ref122","author":"Gao","year":"2013"},{"key":"2026040313421830200_ref123","volume-title":"Proceedings of Association for Computational Linguistics (ACL).","author":"Gao","year":"2014"},{"key":"2026040313421830200_ref124","volume-title":"Proceedings of Special Interest Group on Information Retrieval (SIGIR).","author":"Gao","year":"2011"},{"key":"2026040313421830200_ref125","volume-title":"Neural Information Processing Systems (NIPS),","author":"Gens","year":"2012"},{"key":"2026040313421830200_ref126","author":"George","year":"2008"},{"issue":"6","key":"2026040313421830200_ref127","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1109\/TASL.2009.2032607","article-title":"Error approximation and minimum phone error acoustic model estimation","volume":"18","author":"Gibson","year":"2010","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref128","author":"Girshick","year":"2013"},{"key":"2026040313421830200_ref129","volume-title":"Proceedings of Artificial Intelligence and Statistics (AISTATS).","author":"Glorot","year":"2010"},{"key":"2026040313421830200_ref130","volume-title":"Proceedings of Artificial Intelligence and Statistics (AIS-TATS","author":"Glorot","year":"2011"},{"key":"2026040313421830200_ref131","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Goodfellow","year":"2013"},{"key":"2026040313421830200_ref132","author":"Grais","year":"2013"},{"key":"2026040313421830200_ref133","volume-title":"Representation Learning Workshop, International Conference on Machine Learning (ICML),","author":"Graves","year":"2012"},{"key":"2026040313421830200_ref134","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Graves","year":"2006"},{"key":"2026040313421830200_ref135","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Graves","year":"2013"},{"key":"2026040313421830200_ref136","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Graves","year":"2013"},{"key":"2026040313421830200_ref137","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Grezl","year":"2008"},{"key":"2026040313421830200_ref138","doi-asserted-by":"crossref","unstructured":"C.\n              Gulcehre\n            , K.Cho, R.Pascanu, and Y.Bengio. Learnednorm pooling for deep feedforward and recurrent neural networks. http:\/\/arxiv.org\/abs\/1311.1780, 2014.","DOI":"10.1007\/978-3-662-44848-9_34"},{"key":"2026040313421830200_ref139","first-page":"307","article-title":"Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics","volume":"13","author":"Gutmann","year":"2012","journal-title":"Journal of Machine Learning Research,"},{"key":"2026040313421830200_ref140","first-page":"486","volume-title":"IEEE Transactions on Audio, Speech, and Language Processing,","author":"Hain","year":"2012"},{"key":"2026040313421830200_ref141","volume-title":"Proceedings of International Symposium on Music Information Retrieval (ISMIR).","author":"Hamel","year":"2010"},{"key":"2026040313421830200_ref142","author":"Hawkins","year":"2010"},{"key":"2026040313421830200_ref143","volume-title":"On Intelligence: How a New Understanding of the Brain will lead to the Creation of Truly Intelligent Machines","author":"Hawkins","year":"2004"},{"key":"2026040313421830200_ref144","volume-title":"IEEE Signal Processing Magazine,","author":"He","year":"2011"},{"key":"2026040313421830200_ref145","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"He","year":"2012"},{"key":"2026040313421830200_ref146","volume-title":"Proceedings of the IEEE.","author":"He","year":"2013"},{"key":"2026040313421830200_ref147","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/MSP.2008.926652","article-title":"Discriminative learning in sequential pattern recognition \u2014 a unifying review for optimization-oriented speech recognition","volume":"25","author":"He","year":"2008","journal-title":"IEEE Signal Processing Magazine,"},{"issue":"5","key":"2026040313421830200_ref148","doi-asserted-by":"crossref","first-page":"1138","DOI":"10.1109\/TASL.2010.2082532","article-title":"Equivalence of generative and log-liner models","volume":"19","author":"Heigold","year":"2011","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"issue":"12","key":"2026040313421830200_ref149","doi-asserted-by":"crossref","first-page":"2616","DOI":"10.1109\/TASL.2013.2280234","article-title":"Investigations on an EM-style optimization algorithm for discriminative training of HMMs","volume":"21","author":"Heigold","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref150","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Heigold","year":"2013"},{"issue":"8","key":"2026040313421830200_ref151","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1109\/TASL.2009.2022204","article-title":"Discriminative input stream combination for conditional random field phone recognition","volume":"17","author":"Heintz","year":"2009","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref152","volume-title":"Proceedings of Special Interest Group on Disclosure and Dialogue (SIGDIAL).","author":"Henderson","year":"2013"},{"key":"2026040313421830200_ref153","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Hermans","year":"2013"},{"key":"2026040313421830200_ref154","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Hermansky","year":"2000"},{"issue":"2","key":"2026040313421830200_ref155","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1109\/TASL.2008.2010286","article-title":"Speech recognition using augmented conditional random fields","volume":"17","author":"Hifny","year":"2009","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref156","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/0004-3702(90)90004-J","article-title":"Mapping part-whole hierarchies into connectionist networks","volume":"46","author":"Hinton","year":"1990","journal-title":"Artificial Intelligence,"},{"key":"2026040313421830200_ref157","first-page":"1","volume-title":"Artificial Intelligence,","author":"Hinton","year":"1990"},{"key":"2026040313421830200_ref158","first-page":"10","volume-title":"Canadian Psychology,","author":"Hinton","year":"2003"},{"key":"2026040313421830200_ref159","first-page":"2010","author":"Hinton","year":"2010"},{"issue":"10","key":"2026040313421830200_ref160","article-title":"A better way to learn features","volume":"54","author":"Hinton","year":"2011","journal-title":"Communications of the Association for Computing Machinery (ACM),"},{"issue":"6","key":"2026040313421830200_ref161","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","article-title":"Deep neural networks for acoustic modeling in speech recognition","volume":"29","author":"Hinton","year":"2012","journal-title":"IEEE Signal Processing Magazine,"},{"key":"2026040313421830200_ref162","volume-title":"Proceedings of International Conference on Artificial Neural Networks.","author":"Hinton","year":"2011"},{"key":"2026040313421830200_ref163","first-page":"1527","volume-title":"Neural Computation,","author":"Hinton","year":"2006"},{"key":"2026040313421830200_ref164","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1126\/science.1127647","article-title":"Reducing the dimensionality of data with neural networks","volume":"313","author":"Hinton","year":"5786","journal-title":"Science,"},{"key":"2026040313421830200_ref165","first-page":"1","volume-title":"Topics in Cognitive Science,","author":"Hinton","year":"2010"},{"key":"2026040313421830200_ref166","author":"Hinton","year":"2012"},{"key":"2026040313421830200_ref167","author":"Hochreiter","year":"1991"},{"key":"2026040313421830200_ref168","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation,"},{"key":"2026040313421830200_ref169","volume-title":"Proceedings of Association for Computational Linguistics (ACL).","author":"Huang","year":"2012"},{"key":"2026040313421830200_ref170","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Huang","year":"2013"},{"key":"2026040313421830200_ref171","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Huang","year":"2013"},{"key":"2026040313421830200_ref172","volume-title":"Association for Computing Machinery (ACM) International Conference Information and Knowledge Management (CIKM),","author":"Huang","year":"2013"},{"key":"2026040313421830200_ref173","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Huang","year":"2013"},{"issue":"8","key":"2026040313421830200_ref174","doi-asserted-by":"crossref","first-page":"1941","DOI":"10.1109\/TASL.2010.2040782","article-title":"Hierarchical bayesian language models for conversational speech recognition","volume":"18","author":"Huang","year":"2010","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref175","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Huang","year":"2001"},{"key":"2026040313421830200_ref176","first-page":"2360","volume-title":"Proceedings of Interspeech,","author":"Huang","year":"2013"},{"key":"2026040313421830200_ref177","volume-title":"Proceedings of International Conference on Machine Learning and Application (ICMLA).","author":"Humphrey","year":"2012"},{"key":"2026040313421830200_ref178","volume-title":"Proceedings of International Symposium on Music Information Retrieval (ISMIR).","author":"Humphrey","year":"2012"},{"key":"2026040313421830200_ref179","volume-title":"Journal of Intelligent Information Systems,","author":"Humphrey","year":"2013"},{"key":"2026040313421830200_ref180","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Hutchinson","year":"2012"},{"key":"2026040313421830200_ref181","first-page":"1944","volume-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence,","author":"Hutchinson","year":"2013"},{"key":"2026040313421830200_ref182","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Imseng","year":"2013"},{"key":"2026040313421830200_ref183","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Jaitly","year":"2011"},{"key":"2026040313421830200_ref184","volume-title":"Proceedings of Interspeech.","author":"Jaitly","year":"2012"},{"key":"2026040313421830200_ref185","first-page":"2146","volume-title":"Proceedings of International Conference on Computer Vision,","author":"Jarrett","year":"2009"},{"issue":"3","key":"2026040313421830200_ref186","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1109\/MSP.2010.936018","article-title":"Parameter estimation of statistical models using convex optimization: An advanced method of discriminative training for speech and language processing","volume":"27","author":"Jiang","year":"2010","journal-title":"IEEE Signal Processing Magazine"},{"key":"2026040313421830200_ref187","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1109\/TIT.1986.1057145","article-title":"Maximum likelihood estimation for multivariate mixture observations of Markov chains","volume":"32","author":"Juang","year":"1986","journal-title":"IEEE Transactions on Information Theory,"},{"key":"2026040313421830200_ref188","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/89.568732","article-title":"Minimum classification error rate methods for speech recognition","volume":"5","author":"Juang","year":"1997","journal-title":"IEEE Transactions On Speech and Audio Processing,"},{"key":"2026040313421830200_ref189","volume-title":"Proceedings of International Conference on Multimodal Interaction (ICMI).","author":"Kahou","year":"2013"},{"key":"2026040313421830200_ref190","first-page":"8012","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP),","author":"Kang","year":"2013"},{"key":"2026040313421830200_ref191","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Kashiwagi","year":"2013"},{"key":"2026040313421830200_ref192","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Kavukcuoglu","year":"2010"},{"key":"2026040313421830200_ref193","first-page":"1094","volume-title":"IEEE Transactions on Audio, Speech, and Language Processing,","author":"Ketabdar","year":"2010"},{"key":"2026040313421830200_ref194","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Kingsbury","year":"2009"},{"key":"2026040313421830200_ref195","volume-title":"Proceedings of Interspeech.","author":"Kingsbury","year":"2012"},{"key":"2026040313421830200_ref196","volume-title":"Proceedings of Neural Information Processing Systems (NIPS) Deep Learning Workshop.","author":"Kiros","year":"2013"},{"key":"2026040313421830200_ref197","first-page":"1285","volume-title":"IEEE Transactions on Audio, Speech, and Language Processing","author":"Ko","year":"2013"},{"key":"2026040313421830200_ref198","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Krizhevsky","year":"2012"},{"key":"2026040313421830200_ref199","volume-title":"Proceedings of Interspeech.","author":"Kubo","year":"2012"},{"key":"2026040313421830200_ref200","volume-title":"How to Create a Mind.","author":"Kurzweil","year":"2012"},{"key":"2026040313421830200_ref201","first-page":"2506","volume-title":"IEEE Transactions on Audio, Speech, and Language Processing,","author":"Lal","year":"2013"},{"issue":"1","key":"2026040313421830200_ref202","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1016\/0893-6080(90)90044-L","article-title":"A time-delay neural network architecture for isolated word recognition","volume":"3","author":"Lang","year":"1990","journal-title":"Neural Networks,"},{"key":"2026040313421830200_ref203","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Larochelle","year":"2008"},{"key":"2026040313421830200_ref204","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Le","year":"2013"},{"key":"2026040313421830200_ref205","first-page":"778","volume-title":"Proceedings of Empirical Methods in Natural Language Processing (EMNLP","author":"Le","year":"2010"},{"key":"2026040313421830200_ref206","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Le","year":"2011"},{"key":"2026040313421830200_ref207","first-page":"197","volume-title":"IEEE Transactions on Audio, Speech, and Language Processing","author":"Le","year":"2013"},{"key":"2026040313421830200_ref208","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Le","year":"2011"},{"key":"2026040313421830200_ref209","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Le","year":"2012"},{"key":"2026040313421830200_ref210","volume-title":"Proceedings of European Conference on Computer Vision (ECCV).","author":"LeCun","year":"2012"},{"key":"2026040313421830200_ref211","first-page":"255","volume-title":"The Handbook of Brain Theory and Neural Networks,","author":"LeCun","year":"1995"},{"key":"2026040313421830200_ref212","first-page":"2278","volume-title":"Proceedings of the IEEE,","author":"LeCun","year":"1998"},{"key":"2026040313421830200_ref213","volume-title":"Proceedings of International Conference on Document Analysis and Recognition (ICDAR).","author":"LeCun","year":"2007"},{"key":"2026040313421830200_ref214","first-page":"109","volume-title":"Proceedings of International Conference on Spoken Language Processing (ICSLP),","author":"Lee","year":"2004"},{"key":"2026040313421830200_ref215","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Lee","year":"2009"},{"key":"2026040313421830200_ref216","first-page":"95","volume-title":"Communications of the Association for Computing Machinery (ACM),","author":"Lee","year":"2011"},{"key":"2026040313421830200_ref217","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Lee","year":"2010"},{"key":"2026040313421830200_ref218","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Lena","year":"2012"},{"key":"2026040313421830200_ref219","author":"Levine"},{"key":"2026040313421830200_ref220","first-page":"1","volume-title":"IEEE\/Association for Computing Machinery (ACM) Transactions on Audio, Speech, and Language Processing","author":"Li","year":"2014"},{"key":"2026040313421830200_ref221","volume-title":"Proceedings of IEEE Spoken Language Technology (SLT","author":"Li","year":"2012"},{"key":"2026040313421830200_ref222","first-page":"312","volume-title":"Proceedings Conference on Affective Computing and Intelligent Interaction (ACII),","author":"Li","year":"2013"},{"key":"2026040313421830200_ref223","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Liao","year":"2013"},{"key":"2026040313421830200_ref224","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Liao","year":"2013"},{"key":"2026040313421830200_ref225","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Lin","year":"2009"},{"key":"2026040313421830200_ref226","volume-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR","author":"Lin","year":"2011"},{"issue":"10","key":"2026040313421830200_ref227","doi-asserted-by":"crossref","first-page":"2129","DOI":"10.1109\/TASL.2013.2269291","article-title":"Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis","volume":"21","author":"Ling","year":"2013","journal-title":"IEEE Transactions on Audio Speech Language Processing"},{"key":"2026040313421830200_ref228","first-page":"7825","article-title":"Modeling spectral envelopes using restricted boltzmann machines for statistical parametric speech synthesis","author":"Ling","year":"2013","journal-title":"International Conference on Acoustics Speech and Signal Processing (ICASSP),"},{"key":"2026040313421830200_ref229","article-title":"Articulatory control of HMM-based parametric speech synthesis using feature-space-switched multiple regression","volume-title":"IEEE Transactions on Audio, Speech, and Language Processing,","author":"Ling","year":"2013"},{"issue":"9","key":"2026040313421830200_ref230","doi-asserted-by":"crossref","first-page":"1791","DOI":"10.1109\/TASL.2013.2248718","article-title":"Joint uncertainty decoding for noise robust subspace gaussian mixture models","volume":"21","author":"Lu","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref231","volume-title":"Computer, Speech and Language","author":"Ma","year":"2000"},{"issue":"6","key":"2026040313421830200_ref232","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1109\/TSA.2003.818075","article-title":"Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model","volume":"11","author":"Ma","year":"2003","journal-title":"IEEE Transactions on Speech and Audio Processing,"},{"issue":"1","key":"2026040313421830200_ref233","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1109\/TSA.2003.818074","article-title":"Target-directed mixture dynamic models for spontaneous speech recognition","volume":"12","author":"Ma","year":"2004","journal-title":"IEEE Transactions on Speech and Audio Processing,"},{"key":"2026040313421830200_ref234","article-title":"Rectifier nonlinearities improve neural network acoustic models","volume-title":"International Conference on Machine Learning (ICML) Workshop on Deep Learning for Audio, Speech, and Language Processing,","author":"Maas","year":"2013"},{"key":"2026040313421830200_ref235","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2012-6","article-title":"Recurrent neural networks for noise reduction in robust ASR","volume-title":"Proceedings of Interspeech.","author":"Maas","year":"2012"},{"key":"2026040313421830200_ref236","volume-title":"Introduction to Information Retrieval.","author":"Manning","year":"2009"},{"key":"2026040313421830200_ref237","article-title":"Scientists see promise in deep-learning programs","volume-title":"New York Times","author":"J.Markoff","year":"2012"},{"key":"2026040313421830200_ref238","article-title":"Deep learning with hessian-free optimization","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Martens","year":"2010"},{"key":"2026040313421830200_ref239","article-title":"Learning recurrent neural networks with hessian-free optimization","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Martens","year":"2011"},{"key":"2026040313421830200_ref240","article-title":"A PAC-bayesian tutorial with a dropout bound","author":"McAllester","year":"2013"},{"issue":"2","key":"2026040313421830200_ref241","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1109\/TASL.2012.2226158","article-title":"Learning lexicons from speech using a pronunciation mixture model","volume":"21","author":"McGraw","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref242","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2013-596","article-title":"Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding","volume-title":"Proceedings of Interspeech.","author":"Mesnil","year":"2013"},{"key":"2026040313421830200_ref243","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2013-526","article-title":"Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training","volume-title":"Proceedings of Interspeech.","author":"Miao","year":"2013"},{"key":"2026040313421830200_ref244","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU.2013.6707763","article-title":"Deep maxout networks for low resource speech recognition","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Miao","year":"2013"},{"key":"2026040313421830200_ref245","author":"Mikolov","year":"2012"},{"key":"2026040313421830200_ref246","article-title":"Efficient estimation of word representations in vector space","volume-title":"Proceedings of International Conference on Learning Representations (ICLR).","author":"Mikolov","year":"2013"},{"key":"2026040313421830200_ref247","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU.2011.6163930","article-title":"Strategies for training large scale neural network language models","volume-title":"Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Mikolov","year":"2011"},{"key":"2026040313421830200_ref248","first-page":"1045","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP),","author":"Mikolov","year":"2010"},{"key":"2026040313421830200_ref249","article-title":"Exploiting similarities among languages for machine translation","author":"Mikolov","year":"2013"},{"key":"2026040313421830200_ref250","article-title":"Distributed representations of words and phrases and their compositionality","volume-title":"Proceedings of Neural Information Processing Systems (NIPS)","author":"Mikolov","year":"2013"},{"key":"2026040313421830200_ref251","first-page":"957","article-title":"A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP),","author":"Minami","year":"2002"},{"key":"2026040313421830200_ref252","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1145\/1273496.1273577","article-title":"Three new graphical models for statistical language modeling","volume-title":"Proceedings of International Conference on Machine Learning (ICML),","author":"Mnih","year":"2007"},{"key":"2026040313421830200_ref253","first-page":"1081","article-title":"A scalable hierarchical distributed language model","volume-title":"Proceedings of Neural Information Processing Systems (NIPS),","author":"Mnih","year":"2008"},{"key":"2026040313421830200_ref254","article-title":"Learning word embeddings efficiently with noise-contrastive estimation","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Mnih","year":"2013"},{"key":"2026040313421830200_ref255","first-page":"1751","article-title":"A fast and simple algorithm for training neural probabilistic language models","volume-title":"Proceedings of International Conference on Machine Learning (ICML),","author":"Mnih","year":"2012"},{"key":"2026040313421830200_ref256","article-title":"laying arari with deep reinforcement learning","volume-title":"Neural Information Processing Systems (NIPS) Deep Learning Workshop,","author":"Mnih","year":"2013"},{"key":"2026040313421830200_ref257","article-title":"Deep belief networks for phone recognition","volume-title":"Proceedings of Neural Information Processing Systems (NIPS) Workshop Deep Learning for Speech Recognition and Related Applications.","author":"Mohamed","year":"2009"},{"issue":"1","key":"2026040313421830200_ref258","doi-asserted-by":"crossref","DOI":"10.1109\/TASL.2011.2109382","article-title":"Acoustic modeling using deep belief networks","volume":"20","author":"Mohamed","year":"2012","journal-title":"IEEE Transactions on Audio, Speech, & Language Processing,"},{"key":"2026040313421830200_ref259","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2012.6288863","article-title":"Understanding how deep belief networks perform acoustic modelling","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Mohamed","year":"2012"},{"key":"2026040313421830200_ref260","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2010-304","article-title":"Investigation of full-sequence training of deep belief networks for speech recognition","volume-title":"Proceedings of Interspeech.","author":"Mohamed","year":"2010"},{"issue":"1","key":"2026040313421830200_ref261","doi-asserted-by":"crossref","DOI":"10.1109\/TASL.2011.2116010","article-title":"Deep and wide: Multiple layers in automatic speech recognition","volume":"20","author":"Morgan","year":"2012","journal-title":"IEEE Transactions on Audio, Speech, & Language Processing"},{"issue":"5","key":"2026040313421830200_ref262","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1109\/MSP.2005.1511826","article-title":"Pushing the envelope \u2014 aside [speech recognition]","volume":"22","author":"Morgan","year":"2005","journal-title":"IEEE Signal Processing Magazine,"},{"key":"2026040313421830200_ref263","article-title":"Hierarchical probabilistic neural network language models","volume-title":"Proceedings of Artificial Intelligence and Statistics (AISTATS).","author":"Morin","year":"2005"},{"key":"2026040313421830200_ref264","volume-title":"Machine Learning","author":"Murphy","year":"2012"},{"key":"2026040313421830200_ref265","article-title":"3-d object recognition with deep belief nets","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Nair","year":"2009"},{"key":"2026040313421830200_ref266","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2013-102","article-title":"Voice conversion in high-order eigen space using deep belief nets","volume-title":"Proceedings of Interspeech.","author":"Nakashika","year":"2013"},{"key":"2026040313421830200_ref267","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.1999.758176","article-title":"Speech translation: Coupling of recognition and translation","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Ney","year":"1999"},{"key":"2026040313421830200_ref268","article-title":"Learning deep energy models","volume-title":"Proceedings of International Conference on Machine Learning (ICML)","author":"Ngiam","year":"2011"},{"key":"2026040313421830200_ref269","article-title":"Multimodal deep learning","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Ngiam","year":"2011"},{"key":"2026040313421830200_ref270","article-title":"Zero-shot learning by convex combination of semantic embeddings","author":"Norouzi"},{"key":"2026040313421830200_ref271","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/j.cviu.2004.02.004","article-title":"Layered representations for learning and inferring office activity from multiple sensory channels","volume":"96","author":"Oliver","year":"2004","journal-title":"Computer Vision and Image Understanding,"},{"key":"2026040313421830200_ref272","article-title":"Can \u2018deep learning\u2019 offer deep insights about visual representation?","volume-title":"Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning","author":"Olshausen","year":"2012"},{"key":"2026040313421830200_ref273","article-title":"Moving beyond the \u2018beads-on-a-string\u2019 model of speech","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Ostendorf","year":"1999"},{"issue":"5","key":"2026040313421830200_ref274","doi-asserted-by":"crossref","DOI":"10.1109\/89.536930","article-title":"From HMMs to segment models: A unified view of stochastic modeling for speech recognition","volume":"4","author":"Ostendorf","year":"1996","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"issue":"8","key":"2026040313421830200_ref275","doi-asserted-by":"crossref","first-page":"2249","DOI":"10.1109\/TASL.2010.2098870","article-title":"Probabilistic template-based chord recognition","volume":"19","author":"Oudre","year":"2011","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"2026040313421830200_ref276","article-title":"Learning input and recurrent weight matrices in echo state networks","volume-title":"Neural Information Processing Systems (NIPS) Deep Learning Workshop","author":"Palangi","year":"2013"},{"key":"2026040313421830200_ref277","article-title":"Using deep stacking network to improve structured compressive sensing with multiple measurement vectors","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Palangi","year":"2013"},{"key":"2026040313421830200_ref278","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1109\/TASL.2008.2011515","article-title":"Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition","volume":"17","author":"Papandreou","year":"2009","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref279","article-title":"How to construct deep recurrent neural networks","volume-title":"Proceedings of International Conference on Learning Representations (ICLR).","author":"Pascanu","year":"2014"},{"key":"2026040313421830200_ref280","article-title":"On the difficulty of training recurrent neural networks","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Pascanu","year":"2013"},{"key":"2026040313421830200_ref281","article-title":"Conditional neural fields","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Peng","year":"2009"},{"key":"2026040313421830200_ref282","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.1999.758074","article-title":"Initial evaluation of hidden dynamic models on conversational speech","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Picone","year":"1999"},{"issue":"2","key":"2026040313421830200_ref283","doi-asserted-by":"crossref","DOI":"10.1109\/TASL.2010.2045943","article-title":"Analysis of MLP-based hierarchical phone posterior probability estimators","volume":"19","author":"Pinto","year":"2011","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref284","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2012.6288836","article-title":"Improved pre-training of deep belief networks using sparse encoding symmetric machines","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Plahl","year":"2012"},{"key":"2026040313421830200_ref285","article-title":"Hierarchical bottleneck features for LVCSR","volume-title":"Proceedings of Interspeech.","author":"Plahl","year":"2010"},{"issue":"3","key":"2026040313421830200_ref286","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1109\/72.377968","article-title":"Holographic reduced representations","volume":"6","author":"Plate","year":"1995","journal-title":"IEEE Transactions on Neural Networks,"},{"key":"2026040313421830200_ref287","first-page":"45","article-title":"How the brain might work: The role of information and learning in understanding and replicating intelligence","author":"Poggio","year":"2007","journal-title":"Information: Science and Technology for the New Century,"},{"key":"2026040313421830200_ref288","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/0004-3702(90)90005-K","article-title":"Recursive distributed representations","volume":"46","author":"Pollack","year":"1990","journal-title":"Artificial Intelligence"},{"key":"2026040313421830200_ref289","doi-asserted-by":"crossref","DOI":"10.1109\/ICCVW.2011.6130310","article-title":"Sum-product networks: A new deep architecture","volume-title":"Proceedings of Uncertainty in Artificial Intelligence.","author":"Poon","year":"2011"},{"key":"2026040313421830200_ref290","article-title":"Minimum phone error and I-smoothing for improved discriminative training","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Povey","year":"2002"},{"key":"2026040313421830200_ref291","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2010.5495222","article-title":"Backpropagation training for multilayer conditional random field based phone recognition","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Prabhavalkar","year":"2010"},{"key":"2026040313421830200_ref292","doi-asserted-by":"crossref","first-page":"1604","DOI":"10.1126\/science.275.5306.1604","article-title":"Optimality: From neural networks to universal grammar","volume":"275","author":"Prince","year":"1997","journal-title":"Science,"},{"key":"2026040313421830200_ref293","first-page":"257","article-title":"A tutorial on hidden markov models and selected applications in speech recognition","volume-title":"Proceedings of the IEEE,","author":"Rabiner","year":"1989"},{"key":"2026040313421830200_ref294","article-title":"Sparse feature learning for deep belief networks","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Ranzato","year":"2007"},{"key":"2026040313421830200_ref295","article-title":"Energy-based models in document recognition and computer vision","volume-title":"Proceedings of International Conference on Document Analysis and Recognition (ICDAR).","author":"Ranzato","year":"2007"},{"key":"2026040313421830200_ref296","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2010.5539962","article-title":"Modeling pixel means and covariances using factorized third-order boltzmann machines","volume-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR).","author":"Ranzato","year":"2010"},{"key":"2026040313421830200_ref297","article-title":"Efficient learning of sparse representations with an energy-based model","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Ranzato","year":"2006"},{"key":"2026040313421830200_ref298","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2011.5995710","article-title":"On deep generative models with applications to recognition","volume-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR","author":"Ranzato","year":"2011"},{"issue":"2","key":"2026040313421830200_ref299","first-page":"149","article-title":"Construction of state-dependent dynamic parameters by maximum likelihood: Applications to speech recognition","volume":"55","author":"Rathinavalu","year":"1997","journal-title":"Signal Processing"},{"key":"2026040313421830200_ref300","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2012.6288869","article-title":"Factorial hidden restricted boltzmann machines for noise robust speech recognition","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Rennie","year":"2012"},{"key":"2026040313421830200_ref301","first-page":"66","article-title":"Single-channel multi-talker speech recognition \u2014 graphical modeling approaches","volume":"33","author":"Rennie","year":"2010","journal-title":"IEEE Signal Processing Magazine,"},{"key":"2026040313421830200_ref302","article-title":"A direct adaptive method for faster back-propagation learning: The RPROP algorithm","volume-title":"Proceedings of the IEEE International Conference on Neural Networks.","author":"Riedmiller","year":"1993"},{"key":"2026040313421830200_ref303","first-page":"833","article-title":"Contractive autoencoders: Explicit invariance during feature extraction","volume-title":"Proceedings of International Conference on Machine Learning (ICML),","author":"Rifai","year":"2011"},{"key":"2026040313421830200_ref304","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1109\/72.279192","article-title":"An application of recurrent nets to phone probability estimation","volume":"5","author":"Robinson","year":"1994","journal-title":"IEEE Transactions on Neural Networks,"},{"key":"2026040313421830200_ref305","article-title":"Accelerating hessian-free optimization for deep neural networks by implicit pre-conditioning and sampling","author":"Sainath","year":"2013"},{"key":"2026040313421830200_ref306","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU.2013.6707749","article-title":"Improvements to deep convolutional neural networks for LVCSR","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Sainath","year":"2013"},{"key":"2026040313421830200_ref307","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU.2013.6707746","article-title":"Learning filter banks within a deep neural network framework","volume-title":"Proceedings of The Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Sainath","year":"2013"},{"key":"2026040313421830200_ref308","article-title":"Autoencoder bottleneck features using deep belief networks","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Sainath","year":"2012"},{"key":"2026040313421830200_ref309","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU.2011.6163900","article-title":"Making deep belief networks effective for large vocabulary continuous speech recognition","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Sainath","year":"2011"},{"key":"2026040313421830200_ref310","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2013.6638949","article-title":"Low-rank matrix factorization for deep neural network training with high-dimensional output targets","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Sainath","year":"2013"},{"issue":"11","key":"2026040313421830200_ref311","doi-asserted-by":"crossref","first-page":"2267","DOI":"10.1109\/TASL.2013.2284378","article-title":"Optimization techniques to improve training speed of deep neural networks for large speech tasks","volume":"21","author":"Sainath","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref312","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2013.6639347","article-title":"Convolutional neural networks for LVCSR","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Sainath","year":"2013"},{"key":"2026040313421830200_ref313","doi-asserted-by":"crossref","DOI":"10.1109\/TASL.2011.2155060","article-title":"Exemplar-based sparse representation features: From TIMIT to LVCSR","volume-title":"IEEE Transactions on Speech and Audio Processing,","author":"Sainath","year":"2011"},{"key":"2026040313421830200_ref314","article-title":"Semantic hashing","volume-title":"Proceedings of Special Interest Group on Information Retrieval (SIGIR) Workshop on Information Retrieval and Applications of Graphical Models.","author":"Salakhutdinov","year":"2007"},{"key":"2026040313421830200_ref315","article-title":"Deep boltzmann machines","volume-title":"Proceedings of Artificial Intelligence and Statistics (AISTATS).","author":"Salakhutdinov","year":"2009"},{"key":"2026040313421830200_ref316","article-title":"A better way to pretrain deep boltzmann machines","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Salakhutdinov","year":"2012"},{"key":"2026040313421830200_ref317","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU.2013.6707705","article-title":"Speaker adaptation of neural network acoustic models using i-vectors","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Saon","year":"2013"},{"key":"2026040313421830200_ref318","first-page":"5680","article-title":"Deep belief nets for natural language call-routing","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP),","author":"Sarikaya","year":"2011"},{"key":"2026040313421830200_ref319","doi-asserted-by":"crossref","DOI":"10.1109\/ASPAA.2011.6082328","article-title":"Learning emotion-based acoustic features with deep belief networks","volume-title":"Proceedings IEEE of Signal Processing to Audio and Acoustics","author":"Schmidt","year":"2011"},{"key":"2026040313421830200_ref320","article-title":"Continuous space translation models for phrase-based statistical machine translation","volume-title":"Proceedings of Computional Linguistics","author":"Schwenk","year":"2012"},{"key":"2026040313421830200_ref321","first-page":"11","article-title":"Large, pruned or continuous space language models on a gpu for statistical machine translation","volume-title":"Proceedings of the Joint Human Language Technology Conference and the North American Chapter of the Association of Computational Linguistics (HLT-NAACL) 2012 Workshop on the future of language modeling for Human Language Technology (HLT),","author":"Schwenk"},{"key":"2026040313421830200_ref322","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2014.6853593","article-title":"On parallelizability of stochastic gradient descent for speech DNNs","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Seide","year":"2014"},{"key":"2026040313421830200_ref323","first-page":"24","article-title":"Feature engineering in context-dependent deep neural networks for conversational speech transcription","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU),","author":"Seide","year":"2011"},{"key":"2026040313421830200_ref324","first-page":"437","article-title":"Conversational speech transcription using context-dependent deep neural networks","volume-title":"Proceedings of Interspeech,","author":"Seide","year":"2011"},{"key":"2026040313421830200_ref325","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2013.6639100","article-title":"An investigation of deep neural networks for noise robust speech recognition","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Seltzer","year":"2013"},{"issue":"3","key":"2026040313421830200_ref326","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1109\/TASL.2012.2227740","article-title":"Autoregressive models for statistical parametric speech synthesis","volume":"21","author":"Shannon","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, Language Processing,"},{"key":"2026040313421830200_ref327","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1109\/89.260337","article-title":"Waveform-based speech recognition using hidden filter models: Parameter selection and sensitivity to power normalization","volume":"2","author":"Sheikhzadeh","year":"1994","journal-title":"IEEE Transactions on on Speech and Audio Processing (ICASSP),"},{"key":"2026040313421830200_ref328","doi-asserted-by":"crossref","DOI":"10.1145\/2567948.2577348","article-title":"Learning semantic representations using convolutional neural networks for web search","volume-title":"Proceedings World Wide Web.","author":"Shen","year":"2014"},{"key":"2026040313421830200_ref329","article-title":"Deep fisher networks for large-scale image classification","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Simonyan","year":"2013"},{"issue":"10","key":"2026040313421830200_ref330","doi-asserted-by":"crossref","first-page":"2152","DOI":"10.1109\/TASL.2013.2270370","article-title":"Hermitian polynomial for speaker adaptation of connectionist speech recognition systems","volume":"21","author":"Siniscalchi","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref331","doi-asserted-by":"crossref","DOI":"10.1109\/TASL.2012.2234115","article-title":"A bottom-up modular search approach to large vocabulary continuous speech recognition","volume":"21","author":"Siniscalchi","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, Language Processing"},{"key":"2026040313421830200_ref332","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.neucom.2012.11.008","article-title":"Exploiting deep neural networks for detection-based speech recognition","volume":"106","author":"Siniscalchi","year":"2013","journal-title":"Neurocomputing"},{"issue":"3","key":"2026040313421830200_ref333","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1109\/LSP.2013.2237901","article-title":"Speech recognition using long-span temporal patterns in a deep network model","volume":"20","author":"Siniscalchi","year":"2013","journal-title":"IEEE Signal Processing Letters,"},{"issue":"1","key":"2026040313421830200_ref334","doi-asserted-by":"crossref","DOI":"10.1109\/TASL.2011.2129510","article-title":"Sparse multilayer perceptrons for phoneme recognition","volume":"20","author":"Sivaram","year":"2012","journal-title":"IEEE Transactions on Audio, Speech, & Language Processing,"},{"key":"2026040313421830200_ref335","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1016\/0004-3702(90)90007-M","article-title":"Tensor product variable binding and the representation of symbolic structures in connectionist systems","volume":"46","author":"Smolensky","year":"1990","journal-title":"Artificial Intelligence,"},{"key":"2026040313421830200_ref336","volume-title":"The Harmonic Mind \u2014 From Neural Computation to Optimality-Theoretic Grammar.","author":"Smolensky","year":"2006"},{"key":"2026040313421830200_ref337","article-title":"Practical bayesian optimization of machine learning algorithms","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Snoek","year":"2012"},{"key":"2026040313421830200_ref338","article-title":"New directions in deep learning: Structured models, tasks, and datasets","volume-title":"Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning,","author":"Socher","year":"2012"},{"key":"2026040313421830200_ref339","unstructured":"R.\n              Socher\n            , Y.Bengio, and C.Manning. Deep learning for NLP. Tutorial at Association of Computational Logistics (ACL), 2012, and North American Chapter of the Association of Computational Linguistics (NAACL),2013. http:\/\/www.socher.org\/index."},{"key":"2026040313421830200_ref340","article-title":"Reasoning with neural tensor networks for knowledge base completion","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Socher","year":"2013"},{"key":"2026040313421830200_ref341","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2010.5540112","article-title":"Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora","volume-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)","author":"Socher","year":"2010"},{"key":"2026040313421830200_ref342","article-title":"Zero-shot learning through cross-modal transfer","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Socher","year":"2013"},{"key":"2026040313421830200_ref343","article-title":"Grounded compositional semantics for finding and describing images with sentences","volume-title":"Neural Information Processing Systems (NIPS) Deep Learning Workshop,","author":"Socher","year":"2013"},{"key":"2026040313421830200_ref344","article-title":"Parsing natural scenes and natural language with recursive neural networks","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Socher","year":"2011"},{"key":"2026040313421830200_ref345","article-title":"Dynamic pooling and unfolding recursive autoencoders for paraphrase detection","volume-title":"Proceedings of Neural Information Processing Systems (NIPS)","author":"Socher","year":"2011"},{"key":"2026040313421830200_ref346","article-title":"Semisupervised recursive autoencoders for predicting sentiment distributions","volume-title":"Proceedings of Empirical Methods in Natural Language Processing (EMNLP).","author":"Socher","year":"2011"},{"key":"2026040313421830200_ref347","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D13-1170","article-title":"Recursive deep models for semantic compositionality over a sentiment treebank","volume-title":"Proceedings of Empirical Methods in Natural Language Processing (EMNLP).","author":"Socher","year":"2013"},{"key":"2026040313421830200_ref348","article-title":"Multimodal learning with deep boltzmann machines","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Srivastava","year":"2012"},{"key":"2026040313421830200_ref349","article-title":"Discriminative transfer learning with tree-based priors","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Srivastava","year":"2013"},{"key":"2026040313421830200_ref350","article-title":"Compete to compute","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"Srivastava","year":"2013"},{"key":"2026040313421830200_ref351","first-page":"109","article-title":"Preliminary investigation of boltzmann machine classifiers for speaker recognition","volume-title":"Proceedings of Odyssey,","author":"Stafylakis","year":"2012"},{"key":"2026040313421830200_ref352","article-title":"Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure","volume-title":"Proceedings of Artificial Intelligence and Statistics (AISTATS).","author":"Stoyanov","year":"2011"},{"key":"2026040313421830200_ref353","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2013.6638951","article-title":"Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Su","year":"2013"},{"key":"2026040313421830200_ref354","volume-title":"Proceedings of IEEE International Conference on Multimedia & Expo (ICME).","author":"Subramanya","year":"2005"},{"issue":"2","key":"2026040313421830200_ref355","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.1121\/1.1420380","article-title":"An overlapping-feature based phonological model incorporating linguistic constraints: Applications to speech recognition","volume":"111","author":"Sun","year":"2002","journal-title":"Journal on Acoustical Society of America,"},{"key":"2026040313421830200_ref356","author":"Sutskever","year":"2013"},{"key":"2026040313421830200_ref357","article-title":"Generating text with recurrent neural networks","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Sutskever","year":"2011"},{"key":"2026040313421830200_ref358","article-title":"Deep networks for robust visual recognition","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Tang","year":"2010"},{"key":"2026040313421830200_ref359","volume-title":"Learning Stochastic Feedforward Neural Networks","author":"Tang","year":"2013"},{"key":"2026040313421830200_ref360","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2008.4587633","article-title":"Small codes and large image databases for recognition","volume-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR).","author":"Tarralba","year":"2008"},{"key":"2026040313421830200_ref361","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/7503.003.0173","article-title":"Modeling human motion using binary latent variables","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Taylor","year":"2007"},{"key":"2026040313421830200_ref362","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2013.6638959","article-title":"Deep neural network features and semi-supervised training for low resource speech recognition","volume-title":"Proceedings of Interspeech.","author":"Thomas","year":"2013"},{"key":"2026040313421830200_ref363","doi-asserted-by":"crossref","DOI":"10.1145\/1390156.1390290","article-title":"Training restricted boltzmann machines using approximations to the likelihood gradient","volume-title":"Proceedings of International Conference on Machine Learning (ICML).","author":"Tieleman","year":"2008"},{"issue":"5","key":"2026040313421830200_ref364","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1109\/JPROC.2013.2251852","article-title":"Speech synthesis based on hidden markov models","volume":"101","author":"Tokuda","year":"2013","journal-title":"Proceedings of the IEEE,"},{"issue":"11","key":"2026040313421830200_ref365","doi-asserted-by":"crossref","first-page":"2439","DOI":"10.1109\/TASL.2013.2280209","article-title":"Acoustic modeling with hierarchical reservoirs","volume":"21","author":"Triefenbach","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref366","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2012.6289054","article-title":"Towards deep understanding: Deep convex networks for semantic utterance classification","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Tu\u00fcr","year":"2012"},{"key":"2026040313421830200_ref367","article-title":"Word representations: A simple and general method for semi-supervised learning","volume-title":"Proceedings of Association for Computational Linguistics (ACL).","author":"Turian","year":"2010"},{"key":"2026040313421830200_ref368","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2012-5","article-title":"Context-dependent MLPs for LVCSR: TANDEM, hybrid or both?","volume-title":"Proceedings of Interspeech","author":"Tuske","year":"2012"},{"key":"2026040313421830200_ref369","article-title":"A deep neural network for acoustic-articulatory speech inversion","volume-title":"Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning","author":"Uria","year":"2011"},{"issue":"4","key":"2026040313421830200_ref370","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1109\/TASL.2010.2061226","article-title":"Extended VTS for noise-robust speech recognition","volume":"19","author":"van Dalen","year":"2011","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref371","article-title":"Deep content-based music recommendation","volume-title":"Proceedings of Neural Information Processing Systems (NIPS).","author":"van den Oord","year":"2013"},{"key":"2026040313421830200_ref372","article-title":"Speaker recognition by means of deep belief networks","volume-title":"Proceedings of Biometric Technologies in Forensic Science.","author":"Vasilakakis","year":"2013"},{"key":"2026040313421830200_ref373","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2013-548","article-title":"Sequence-discriminative training of deep neural networks","volume-title":"Proceedings of Interspeech.","author":"Vesely","year":"2013"},{"key":"2026040313421830200_ref374","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU.2013.6707741","article-title":"Semi-supervised training of deep neural networks","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Vesely","year":"2013"},{"issue":"7","key":"2026040313421830200_ref375","doi-asserted-by":"crossref","first-page":"1661","DOI":"10.1162\/NECO_a_00142","article-title":"A connection between score matching and denoising autoencoder","volume":"23","author":"Vincent","year":"2011","journal-title":"Neural Computation,"},{"key":"2026040313421830200_ref376","first-page":"3371","article-title":"Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion","volume":"11","author":"Vincent","year":"2010","journal-title":"Journal of Machine Learning Research,"},{"key":"2026040313421830200_ref377","article-title":"Learning with recursive perceptual representations","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Vinyals","year":"2012"},{"key":"2026040313421830200_ref378","article-title":"Krylov subspace descent for deep learning","volume-title":"Proceedings of Artificial Intelligence and Statistics (AISTATS","author":"Vinyals","year":"2012"},{"key":"2026040313421830200_ref379","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2011.5947378","article-title":"Comparing multilayer perceptron to deep belief network tandem features for robust ASR","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Vinyals","year":"2011"},{"key":"2026040313421830200_ref380","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2012.6288816","article-title":"Revisiting recurrent neural networks for robust ASR","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Vinyals","year":"2012"},{"key":"2026040313421830200_ref381","article-title":"Dropout training as adaptive regularization","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Wager","year":"2013"},{"key":"2026040313421830200_ref382","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/29.21701","article-title":"Phoneme recognition using time-delay neural networks","volume":"37","author":"Waibel","year":"1989","journal-title":"IEEE Transactions on Acoustical Speech, and Signal Processing,"},{"key":"2026040313421830200_ref383","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU.2013.6707753","article-title":"Context-dependent modelling of deep neural network using logistic regression","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Wang","year":"2013"},{"key":"2026040313421830200_ref384","article-title":"Regression-based context-dependent modeling of deep neural networks for speech recognition","volume-title":"IEEE\/Association for Computing Machinery (ACM) Transactions on Audio, Speech, and Language Processing,","author":"Wang","year":"2014"},{"key":"2026040313421830200_ref385","article-title":"An empirical analysis of dropout in piecewise linear networks","volume-title":"Proceedings of International Conference on Learning Representations (ICLR).","author":"Warde-Farley","year":"2014"},{"key":"2026040313421830200_ref386","article-title":"Exponential family harmoniums with an application to information retrieval","volume-title":"Proceedings of Neural Information Processing Systems (NIPS","author":"Welling","year":"2005"},{"key":"2026040313421830200_ref387","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2014.6854681","article-title":"Single-channel mixed speech recognition using deep neural networks","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Weng","year":"2014"},{"issue":"1","key":"2026040313421830200_ref388","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1007\/s10994-010-5198-3","article-title":"Large scale image annotation: Learning to rank with joint word-image embeddings","volume":"81","author":"Weston","year":"2010","journal-title":"Machine Learning"},{"key":"2026040313421830200_ref389","article-title":"Wsabie: Scaling up to large vocabulary image annotation","volume-title":"Proceedings of International Joint Conference on Artificial Intelligence (IJCAI).","author":"Weston","year":"2011"},{"key":"2026040313421830200_ref390","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2013-734","article-title":"Investigations on hessian-free optimization for cross-entropy training of deep neural networks","volume-title":"Proceedings of Interspeech.","author":"Wiesler","year":"2013"},{"issue":"4","key":"2026040313421830200_ref391","doi-asserted-by":"crossref","DOI":"10.1109\/TASL.2010.2064309","article-title":"A probabilistic interaction model for multi-pitch tracking with factorial hidden markov model","volume":"19","author":"Wohlmayr","year":"2011","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"issue":"2","key":"2026040313421830200_ref392","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","article-title":"Stacked generalization","volume":"5","author":"Wolpert","year":"1992","journal-title":"Neural Networks,"},{"issue":"11","key":"2026040313421830200_ref393","doi-asserted-by":"crossref","first-page":"2231","DOI":"10.1109\/TASL.2013.2283777","article-title":"Optimization algorithms and applications for speech and language processing","volume":"21","author":"Wright","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"issue":"6","key":"2026040313421830200_ref394","first-page":"118","article-title":"A geometric perspective of large-margin training of gaussian models","volume":"27","author":"Xiao","year":"2010","journal-title":"IEEE Signal Processing Magazine,"},{"key":"2026040313421830200_ref395","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1162\/089976603762552988","article-title":"Equivalence of backpropagation and contrastive hebbian learning in a layered network","volume":"15","author":"Xie","year":"2003","journal-title":"Neural computation,"},{"issue":"1","key":"2026040313421830200_ref396","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1109\/LSP.2013.2291240","article-title":"An experimental study on speech enhancement based on deep neural networks","volume":"21","author":"Xu","year":"2014","journal-title":"IEEE Signal Processing Letters,"},{"key":"2026040313421830200_ref397","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2013-552","article-title":"Restructuring of deep neural network acoustic models with singular value decomposition","volume-title":"Proceedings of Interspeech","author":"Xue","year":"2013"},{"key":"2026040313421830200_ref398","doi-asserted-by":"crossref","first-page":"1207","DOI":"10.1109\/TASL.2008.2001106","article-title":"An integrative and discriminative technique for spoken utterance classification","volume":"16","author":"Yamin","year":"2008","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref399","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2013-47","article-title":"A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR","volume-title":"Proceedings of Interspeech.","author":"Yan","year":"2013"},{"key":"2026040313421830200_ref400","first-page":"275","article-title":"Combining a two-step CRF model and a joint source-channel model for machine transliteration","volume-title":"Proceedings of Association for Computational Linguistics (ACL),","author":"Yang","year":"2010"},{"key":"2026040313421830200_ref401","article-title":"A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation","volume-title":"Neurocomputing,","author":"Yao","year":"2013"},{"key":"2026040313421830200_ref402","doi-asserted-by":"crossref","DOI":"10.1109\/SLT.2012.6424251","article-title":"Adaptation of context-dependent deep neural networks for automatic speech recognition","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Yao","year":"2012"},{"key":"2026040313421830200_ref403","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2013-569","article-title":"Recurrent neural networks for language understanding","volume-title":"Proceedings of Interspeech","author":"Yao","year":"2013"},{"issue":"10","key":"2026040313421830200_ref404","doi-asserted-by":"crossref","first-page":"2182","DOI":"10.1109\/TASL.2013.2272513","article-title":"Noise model transfer: Novel approach to robustness against nonstationary noise","volume":"21","author":"Yoshioka","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref405","article-title":"Investigation of unsupervised adaptation of DNN acoustic models with filter bank input","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Yoshioka","year":"2013"},{"issue":"3","key":"2026040313421830200_ref406","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1080\/17442509908834179","article-title":"On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates","volume":"65","author":"Younes","year":"1999","journal-title":"Stochastics and Stochastic Reports"},{"key":"2026040313421830200_ref407","article-title":"Factorized deep neural networks for adaptive speech recognition","volume-title":"International Workshop on Statistical Machine Learning for Speech Processing,","author":"Yu","year":"2012"},{"key":"2026040313421830200_ref408","article-title":"Learning in the deep-structured conditional random fields","volume-title":"Neural Information Processing Systems (NIPS) 2009 Workshop on Deep Learning for Speech Recognition and Related Applications","author":"Yu","year":"2009"},{"issue":"4","key":"2026040313421830200_ref409","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1109\/MSP.2009.932793","article-title":"Solving nonlinear estimation problems using splines","volume":"26","author":"Yu","year":"2009","journal-title":"IEEE Signal Processing Magazine,"},{"key":"2026040313421830200_ref410","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2010-35","article-title":"Deep-structured hidden conditional random fields for phonetic recognition","volume-title":"Proceedings of Interspeech.","author":"Yu","year":"2010"},{"key":"2026040313421830200_ref411","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2011-606","article-title":"Accelerated parallelizable neural networks learning algorithms for speech recognition","volume-title":"Proceedings of Interspeech.","author":"Yu","year":"2011"},{"key":"2026040313421830200_ref412","first-page":"145","article-title":"Deep learning and its applications to signal and information processing","volume-title":"IEEE Signal Processing Magazine","author":"Yu","year":"2011"},{"key":"2026040313421830200_ref413","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1016\/j.patrec.2011.12.002","article-title":"Efficient and effective algorithms for training single-hidden-layer neural networks","volume":"33","author":"Yu","year":"2012","journal-title":"Pattern Recognition Letters,"},{"key":"2026040313421830200_ref414","article-title":"Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition","volume-title":"Neural Information Processing Systems (NIPS) 2010 Workshop on Deep Learning and Unsupervised Feature Learning,","author":"Yu","year":"2010"},{"issue":"5","key":"2026040313421830200_ref415","article-title":"Robust speech recognition using cepstral minimum-mean-square-error noise suppressor","volume":"16","author":"Yu","year":"2008","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"issue":"7","key":"2026040313421830200_ref416","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1109\/TASL.2009.2020890","article-title":"A novel framework and training algorithm for variable-parameter hidden markov models","volume":"17","author":"Yu","year":"2009","journal-title":"IEEE Transactions on Audio, Speech and Language Processing,"},{"issue":"4","key":"2026040313421830200_ref417","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1016\/j.csl.2008.03.002","article-title":"Large-margin minimum classification error training: A theoretical risk minimization perspective","volume":"22","author":"Yu","year":"2008","journal-title":"Computer Speech and Language"},{"key":"2026040313421830200_ref418","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2007.367275","article-title":"Large-margin minimum classification error training for large-scale speech recognition tasks","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Yu","year":"2007"},{"key":"2026040313421830200_ref419","article-title":"Discriminative pretraining of deep neural networks","volume-title":"U.S. Patent Filing,","author":"Yu","year":"2011"},{"key":"2026040313421830200_ref420","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2009.4960553","article-title":"Cross-lingual speech recognition under runtime resource constraints","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Yu","year":"2009"},{"key":"2026040313421830200_ref421","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2012-2","article-title":"Large vocabulary speech recognition using deep tensor neural networks","volume-title":"Proceedings of Interspeech.","author":"Yu","year":"2012"},{"issue":"2","key":"2026040313421830200_ref422","doi-asserted-by":"crossref","first-page":"388","DOI":"10.1109\/TASL.2012.2227738","article-title":"The deep tensor neural network with applications to large vocabulary speech recognition","volume":"21","author":"Yu","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref423","doi-asserted-by":"crossref","first-page":"2461","DOI":"10.1109\/TASL.2011.2141988","article-title":"Calibration of confidence measures in speech recognition","volume":"19","author":"Yu","year":"2010","journal-title":"IEEE Transactions on Audio, Speech and Language"},{"key":"2026040313421830200_ref424","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2012.6288897","article-title":"Exploiting sparseness in deep neural networks for large vocabulary speech recognition","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Yu","year":"2012"},{"key":"2026040313421830200_ref425","article-title":"Improved bottleneck features using pre-trained deep neural networks","volume-title":"Proceedings of Interspeech.","author":"Yu","year":"2011"},{"key":"2026040313421830200_ref426","article-title":"Feature learning in deep neural networks \u2014 studies on speech recognition","volume-title":"Proceedings of International Conference on Learning Representations (ICLR).","author":"Yu","year":"2013"},{"key":"2026040313421830200_ref427","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2012.6288837","article-title":"Boosting attribute and phone estimation accuracies with deep neural networks for detection\u2013based speech recognition","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Yu","year":"2012"},{"key":"2026040313421830200_ref428","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1109\/JSTSP.2010.2075990","article-title":"Sequential labeling using deep-structured conditional random fields","volume":"4","author":"Yu","year":"2010","journal-title":"Journal of Selected Topics in Signal Processing,"},{"key":"2026040313421830200_ref429","first-page":"5030","article-title":"Language recognition using deep-structured conditional random fields","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP","author":"Yu","year":"2010"},{"key":"2026040313421830200_ref430","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2013.6639201","article-title":"KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Yu","year":"2013"},{"issue":"4","key":"2026040313421830200_ref431","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1109\/TASL.2008.2011535","article-title":"Unsupervised adaptation with discriminative mapping transforms","volume":"17","author":"Yu","year":"2009","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref432","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2011.5995732","article-title":"Learning image representations from the pixel level via hierarchical sparse coding","volume-title":"Proceedings Computer Vision and Pattern Recognition (CVPR).","author":"Yu","year":"2011"},{"key":"2026040313421830200_ref433","first-page":"144","article-title":"Fast evaluation of connectionist language models","volume-title":"International Conference on Artificial Neural Networks,","author":"Zamora-Martinez","year":"2009"},{"key":"2026040313421830200_ref434","article-title":"Hierarchical convolutional deep learning in computer vision","author":"Zeiler","year":"2014"},{"key":"2026040313421830200_ref435","article-title":"Stochastic pooling for regularization of deep convolutional neural networks","volume-title":"Proceedings of International Conference on Learning Representations (ICLR).","author":"Zeiler","year":"2013"},{"key":"2026040313421830200_ref436","first-page":"1","article-title":"Visualizing and understanding convolutional networks","author":"Zeiler","year":"2013"},{"key":"2026040313421830200_ref437","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV.2011.6126474","article-title":"Adaptive deconvolutional networks for mid and high level feature learning","volume-title":"Proceedings of International Conference on Computer vision (ICCV).","author":"Zeiler","year":"2011"},{"issue":"3","key":"2026040313421830200_ref438","doi-asserted-by":"crossref","first-page":"794","DOI":"10.1109\/TASL.2011.2165280","article-title":"Product of experts for statistical parametric speech synthesis","volume":"20","author":"Zen","year":"2012","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"issue":"2","key":"2026040313421830200_ref439","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1109\/TASL.2010.2049685","article-title":"Continuous stochastic feature mapping based on trajectory HMMs","volume":"19","author":"Zen","year":"2011","journal-title":"IEEE Transactions on Audio, Speech, and Language Processings,"},{"key":"2026040313421830200_ref440","first-page":"7962","article-title":"Statistical parametric speech synthesis using deep neural networks","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP),","author":"Zen","year":"2013"},{"key":"2026040313421830200_ref441","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2014.6853589","article-title":"Improving deep neural network acoustic models using generalized maxout networks","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Zhang","year":"2014"},{"issue":"4","key":"2026040313421830200_ref442","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1109\/TASL.2012.2229986","article-title":"Deep belief networks based voice activity detection","volume":"21","author":"Zhang","year":"2013","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"2026040313421830200_ref443","article-title":"Multisensory microphones for robust speech detection, enhancement and recognition","volume-title":"Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP).","author":"Zhang","year":"2004"},{"issue":"8","key":"2026040313421830200_ref444","doi-asserted-by":"crossref","first-page":"2191","DOI":"10.1109\/TASL.2012.2199107","article-title":"Nonlinear compensation using the gauss-newton method for noise-robust speech recognition","volume":"20","author":"Zhao","year":"2012","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing,"},{"key":"2026040313421830200_ref445","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D13-1141","article-title":"Bilingual word embeddings for phrase-based machine translation","volume-title":"Proceedings of Empirical Methods in Natural Language Processing (EMNLP).","author":"Zou","year":"2013"},{"key":"2026040313421830200_ref446","doi-asserted-by":"crossref","DOI":"10.1109\/ASRU.2009.5372916","article-title":"A segmental CRF approach to large vocabulary continuous speech recognition","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Zweig","year":"2009"}],"container-title":["Foundations and Trends\u00ae in Signal Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftsig\/article-pdf\/7\/3-4\/197\/11133703\/2000000039en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftsig\/article-pdf\/7\/3-4\/197\/11133703\/2000000039en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T17:43:39Z","timestamp":1775238219000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftsig\/article\/7\/3-4\/197\/1331260\/Deep-Learning-Methods-and-Applications"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6,30]]},"references-count":446,"journal-issue":{"issue":"3-4","published-print":{"date-parts":[[2014,6,30]]}},"URL":"https:\/\/doi.org\/10.1561\/2000000039","relation":{},"ISSN":["1932-8346","1932-8354"],"issn-type":[{"value":"1932-8346","type":"print"},{"value":"1932-8354","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,6,30]]}}}