{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T19:46:12Z","timestamp":1767383172428},"reference-count":53,"publisher":"IGI Global","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,10]]},"abstract":"<jats:p>CNNs are playing a vital role in the field of automatic speech recognition. Most CNNs employ a softmax activation layer to minimize cross-entropy loss. This layer generates the posterior probability in object classification tasks. SVMs are also offering promising results in the field of ASR. In this article, two different approaches: CNNs and SVMs, are combined together to propose a new hybrid architecture. This model replaces the softmax layer, i.e. the last layer of CNN by SVMs to effectively deal with high dimensional features. This model should be interpreted as a special form of structured SVM and named the Convolutional Neural SVM. (CNSVM). CNSVMs incorporate the characteristics of both models which CNNs learn features from the speech signal and SVMs classify these features into corresponding text. The parameters of CNNs and SVMs are trained jointly using a sequence level max-margin and sMBR criterion. The performance achieved by CNSVM on Hindi and Punjabi speech corpus for word error rate is 13.43% and 15.86%, respectively, which is a significant improvement on CNNs.<\/jats:p>","DOI":"10.4018\/ijapuc.2019100101","type":"journal-article","created":{"date-parts":[[2019,9,26]],"date-time":"2019-09-26T19:54:49Z","timestamp":1569527689000},"page":"1-15","source":"Crossref","is-referenced-by-count":1,"title":["Hindi and Punjabi Continuous Speech Recognition Using CNSVM"],"prefix":"10.4018","volume":"11","author":[{"given":"Vishal","family":"Passricha","sequence":"first","affiliation":[{"name":"National Institute of Technology, Kurukshetra, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shubhanshi","family":"Singhal","sequence":"additional","affiliation":[{"name":"Technical Education and Research Integrated Institute, Kurukshetra, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"2432","reference":[{"key":"IJAPUC.2019100101-0","doi-asserted-by":"crossref","unstructured":"Abdel-Hamid, O., Deng, L., & Yu, D. (2013). Exploring convolutional neural network structures and optimization techniques for speech recognition. Paper presented at the Interspeech. Academic Press.","DOI":"10.21437\/Interspeech.2013-744"},{"key":"IJAPUC.2019100101-1","doi-asserted-by":"publisher","DOI":"10.1109\/taslp.2014.2339736"},{"key":"IJAPUC.2019100101-2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2012.6288864"},{"key":"IJAPUC.2019100101-3","doi-asserted-by":"publisher","DOI":"10.1109\/TENCON.2013.6718948"},{"key":"IJAPUC.2019100101-4","unstructured":"Agarap, A. F. (2017). An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification."},{"key":"IJAPUC.2019100101-5","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19403-0_45"},{"key":"IJAPUC.2019100101-6","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-012-9133-9"},{"key":"IJAPUC.2019100101-7","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-012-9131-y"},{"key":"IJAPUC.2019100101-8","doi-asserted-by":"publisher","DOI":"10.1007\/s11235-011-9623-0"},{"key":"IJAPUC.2019100101-9","doi-asserted-by":"publisher","DOI":"10.1080\/03772063.2015.1056844"},{"key":"IJAPUC.2019100101-10","doi-asserted-by":"publisher","DOI":"10.1049\/iet-spr.2015.0488"},{"key":"IJAPUC.2019100101-11","doi-asserted-by":"publisher","DOI":"10.1016\/j.compeleceng.2014.01.008"},{"key":"IJAPUC.2019100101-12","volume":"Vol. 247","author":"H. A.Bourlard","year":"2012","journal-title":"Connectionist speech recognition: a hybrid approach"},{"key":"IJAPUC.2019100101-13","unstructured":"Boyd, S., & Mutapcic, A. (2007). Subgradient methods. Lecture notes of EE364b, Stanford University, Winter Quarter."},{"key":"IJAPUC.2019100101-14","unstructured":"Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., . . . Le, Q. V. (2012). Large scale distributed deep networks. Paper presented at the Advances in neural information processing systems. Academic Press."},{"issue":"4","key":"IJAPUC.2019100101-15","first-page":"359","article-title":"Punjabi automatic speech recognition using HTK.","volume":"9","author":"M.Dua","year":"2012","journal-title":"International Journal of Computer Science Issues"},{"key":"IJAPUC.2019100101-16","doi-asserted-by":"crossref","unstructured":"Dua, M., Aggarwal, R. K., & Biswas, M. (2018a). Performance evaluation of Hindi speech recognition system using optimized filterbanks. Engineering Science and Technology, an International Journal, 21(3), 389-398.","DOI":"10.1016\/j.jestch.2018.04.005"},{"key":"IJAPUC.2019100101-17","first-page":"1","article-title":"GFCC based discriminatively trained noise robust continuous ASR system for Hindi language.","author":"M.Dua","year":"2018","journal-title":"Journal of Ambient Intelligence and Humanized Computing"},{"key":"IJAPUC.2019100101-18","doi-asserted-by":"crossref","unstructured":"Gales, M., & Young, S. (2008). The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing, 1(3), 195-304.","DOI":"10.1561\/2000000004"},{"key":"IJAPUC.2019100101-19","doi-asserted-by":"crossref","unstructured":"Ganapathiraju, A., Hamaker, J., & Picone, J. (1998). Support vector machines for speech recognition. Paper presented at theFifth International Conference on Spoken Language Processing. Academic Press.","DOI":"10.21437\/ICSLP.1998-176"},{"key":"IJAPUC.2019100101-20","unstructured":"Garg, M., & Aggarwal, N. (2014). Punjabi Speech Recognition: A Survey. Paper presented at the International Conference on Advances In Engineering And Technology -ICAET-2014. Academic Press."},{"key":"IJAPUC.2019100101-21","doi-asserted-by":"crossref","unstructured":"Ghai, W., & Singh, N. (2013). Phone-based acoustic modeling for automatic speech recognition for Punjabi language. Journal of speech sciences, 1(3), 69-83.","DOI":"10.20396\/joss.v3i1.15040"},{"key":"IJAPUC.2019100101-22","doi-asserted-by":"crossref","unstructured":"Gibson, M., & Hain, T. (2006). Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition. Paper presented at theNinth International Conference on Spoken Language Processing. Academic Press.","DOI":"10.21437\/Interspeech.2006-603"},{"key":"IJAPUC.2019100101-23","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638966"},{"key":"IJAPUC.2019100101-24","unstructured":"Jha, G. N. (2010). The TDIL Program and the Indian Language Corporate Initiative (ILCI). Paper presented at the LREC. Academic Press."},{"key":"IJAPUC.2019100101-25","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-009-5108-8"},{"key":"IJAPUC.2019100101-26","first-page":"1","article-title":"Refinement of HMM model parameters for Punjabi automatic speech recognition (PASR) System.","author":"V.Kadyan","year":"2017","journal-title":"Journal of the Institution of Electronics and Telecommunication Engineers"},{"key":"IJAPUC.2019100101-27","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-017-9446-9"},{"key":"IJAPUC.2019100101-28","doi-asserted-by":"crossref","unstructured":"Kaiser, J., Horvat, B., & Kacic, Z. (2000). A novel loss function for the overall risk criterion based discriminative training of HMM models. Paper presented at theSixth International Conference on Spoken Language Processing. Academic Press.","DOI":"10.21437\/ICSLP.2000-412"},{"key":"IJAPUC.2019100101-29","unstructured":"Katyal, A., & Gill, J. (2014). Punjabi Speech Recognition of Isolated Words Using Compound EEMD & Neural Network. International Journal Of Soft Computing And Engineering."},{"key":"IJAPUC.2019100101-30","doi-asserted-by":"crossref","unstructured":"Kingsbury, B., Sainath, T. N., & Soltau, H. (2012). Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization. Paper presented at the Interspeech. Academic Press.","DOI":"10.21437\/Interspeech.2012-3"},{"key":"IJAPUC.2019100101-31","unstructured":"Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Paper presented at the Advances in neural information processing systems. Academic Press."},{"key":"IJAPUC.2019100101-32","unstructured":"LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. In The handbook of brain theory and neural networks. Academic Press."},{"key":"IJAPUC.2019100101-33","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA.2012.14"},{"key":"IJAPUC.2019100101-34","doi-asserted-by":"publisher","DOI":"10.1109\/ICRAIE.2016.7939586"},{"key":"IJAPUC.2019100101-35","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-018-09584-4"},{"key":"IJAPUC.2019100101-36","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-019-01325-y"},{"key":"IJAPUC.2019100101-37","doi-asserted-by":"publisher","DOI":"10.1515\/jisys-2018-0372"},{"key":"IJAPUC.2019100101-38","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2007.366914"},{"key":"IJAPUC.2019100101-39","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-16687-7_35"},{"key":"IJAPUC.2019100101-40","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2013.6707749"},{"key":"IJAPUC.2019100101-41","doi-asserted-by":"crossref","unstructured":"Samudravijaya, K., Rao, P., & Agrawal, S. (2000). Hindi speech database. Paper presented at theSixth International Conference on Spoken Language Processing. Academic Press.","DOI":"10.21437\/ICSLP.2000-847"},{"key":"IJAPUC.2019100101-42","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273598"},{"key":"IJAPUC.2019100101-43","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2012.2227734"},{"key":"IJAPUC.2019100101-44","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2010.2077626"},{"key":"IJAPUC.2019100101-45","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-018-1146-z"},{"key":"IJAPUC.2019100101-46","doi-asserted-by":"publisher","DOI":"10.1109\/72.774254"},{"key":"IJAPUC.2019100101-47","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0"},{"key":"IJAPUC.2019100101-48","doi-asserted-by":"crossref","unstructured":"Vesel\u00fd, K., Ghoshal, A., Burget, L., & Povey, D. (2013). Sequence-discriminative training of deep neural networks. Paper presented at the Interspeech. Academic Press.","DOI":"10.21437\/Interspeech.2013-548"},{"key":"IJAPUC.2019100101-49","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(98)00033-8"},{"key":"IJAPUC.2019100101-50","doi-asserted-by":"crossref","unstructured":"Walia, N. K., & Tiwana, S. K. (2016). Research Issues in ASR: A leading edge to Punjabi Language. IOSR Journal of Computer Engineering, 38-43.","DOI":"10.9790\/0661-15010010138-43"},{"key":"IJAPUC.2019100101-51","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178835"},{"key":"IJAPUC.2019100101-52","doi-asserted-by":"crossref","unstructured":"Zhang, S.-X., & Gales, M. J. (2011). Structured support vector machines for noise robust continuous speech recognition. Paper presented at theTwelfth Annual Conference of the International Speech Communication Association. Academic Press.","DOI":"10.21437\/Interspeech.2011-406"}],"container-title":["International Journal of Advanced Pervasive and Ubiquitous Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=238852","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,30]],"date-time":"2022-09-30T00:44:20Z","timestamp":1664498660000},"score":1,"resource":{"primary":{"URL":"http:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/IJAPUC.2019100101"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2019,10]]},"references-count":53,"journal-issue":{"issue":"4"},"URL":"https:\/\/doi.org\/10.4018\/ijapuc.2019100101","relation":{},"ISSN":["1937-965X","1937-9668"],"issn-type":[{"value":"1937-965X","type":"print"},{"value":"1937-9668","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10]]}}}