{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T04:59:30Z","timestamp":1764997170208,"version":"3.46.0"},"reference-count":46,"publisher":"Walter de Gruyter GmbH","issue":"1","license":[{"start":{"date-parts":[[2018,2,20]],"date-time":"2018-02-20T00:00:00Z","timestamp":1519084800000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,12,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The classical approach to build an automatic speech recognition (ASR) system uses different feature extraction methods at the front end and various parameter classification techniques at the back end. The Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) techniques are the conventional approaches used for many years for feature extraction, and the hidden Markov model (HMM) has been the most obvious selection for feature classification. However, the performance of MFCC-HMM and PLP-HMM-based ASR system degrades in real-time environments. The proposed work discusses the implementation of discriminatively trained Hindi ASR system using noise robust integrated features and refined HMM model. It sequentially combines MFCC with PLP and MFCC with gammatone-frequency cepstral coefficient (GFCC) to obtain MF-PLP and MF-GFCC integrated feature vectors, respectively. The HMM parameters are refined using genetic algorithm (GA) and particle swarm optimization (PSO). Discriminative training of acoustic model using maximum mutual information (MMI) and minimum phone error (MPE) is preformed to enhance the accuracy of the proposed system. The results show that discriminative training using MPE with MF-GFCC integrated feature vector and PSO-HMM parameter refinement gives significantly better results than the other implemented techniques.<\/jats:p>","DOI":"10.1515\/jisys-2017-0618","type":"journal-article","created":{"date-parts":[[2018,2,27]],"date-time":"2018-02-27T05:46:46Z","timestamp":1519710406000},"page":"327-344","source":"Crossref","is-referenced-by-count":22,"title":["Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling"],"prefix":"10.1515","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7071-8323","authenticated-orcid":false,"given":"Mohit","family":"Dua","sequence":"first","affiliation":[{"name":"Department of Computer Engineering , National Institute of Technology , Kurukshetra , India"}]},{"given":"Rajesh Kumar","family":"Aggarwal","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering , National Institute of Technology , Kurukshetra , India"}]},{"given":"Mantosh","family":"Biswas","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering , National Institute of Technology , Kurukshetra , India"}]}],"member":"374","published-online":{"date-parts":[[2018,2,20]]},"reference":[{"key":"2025120523362767091_j_jisys-2017-0618_ref_001","unstructured":"A. Acero, Acoustical and environmental robustness in automatic speech recognition, vol. 201, Springer Science & Business Media, New York, USA, 2012."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_002","doi-asserted-by":"crossref","unstructured":"A. Adiga, M. Magimai and C. S. Seelamantula, Gammatone wavelet cepstral coefficients for robust speech recognition, in: IEEE TENCON 2013-2013 IEEE Region 10 Conference (31194), Xi'an, China, 2013.","DOI":"10.1109\/TENCON.2013.6718948"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_003","doi-asserted-by":"crossref","unstructured":"R. K. Aggarwal and M. Dave, Discriminative techniques for Hindi speech recognition system, Inf. Sys. Indian Lang. 139 (2011), 261\u2013266.","DOI":"10.1007\/978-3-642-19403-0_45"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_004","doi-asserted-by":"crossref","unstructured":"R. K. Aggarwal and M. Dave, Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II), Int. J. Speech Technol. 14.4 (2011), 309\u2013320.","DOI":"10.1007\/s10772-011-9106-4"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_005","doi-asserted-by":"crossref","unstructured":"R. K. Aggarwal and M. Dave, Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I), Int. J. Speech Technol. 14.4 (2011), 297.","DOI":"10.1007\/s10772-011-9108-2"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_006","doi-asserted-by":"crossref","unstructured":"R. K. Aggarwal and M. Dave, Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system, Telecommun. Syst. 52 (2013), 1\u201310.","DOI":"10.1007\/s11235-011-9623-0"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_007","doi-asserted-by":"crossref","unstructured":"L. Bahl, P. Brown, P. de Souza and R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, in: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP\u201986, Tokyo, Japan, vol. 11, IEEE, 1986.","DOI":"10.1109\/ICASSP.1986.1169179"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_008","doi-asserted-by":"crossref","unstructured":"J. M. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan and D. O\u2019Shaughnessy, Developments and directions in speech recognition and understanding, Part 1 [DSP Education], IEEE Signal Process. Mag. 26.3 (2009), 75\u201380.","DOI":"10.1109\/MSP.2009.932166"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_009","unstructured":"W. Burgos, Gammatone and MFCC Features in Speaker Recognition, Dissertation, 2014."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_010","unstructured":"H. P. Combrinck and E. C. Botha, On the Mel-scaled cepstrum, Department of Electrical and Electronic Engineering, University of Pretoria, Pretoria, South Africa, 1996."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_011","doi-asserted-by":"crossref","unstructured":"S. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process 28.4 (1980), 357\u2013366.","DOI":"10.1109\/TASSP.1980.1163420"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_012","doi-asserted-by":"crossref","unstructured":"M. Dua, R. K. Aggarwal and M. Biswas, Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system, in: 2017 International Conference on Computer and Applications (ICCA), Dubai, United Arab Emirates, IEEE, 2017.","DOI":"10.1109\/COMAPP.2017.8079777"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_013","unstructured":"K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San Diego, CA, USA, 2013."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_014","doi-asserted-by":"crossref","unstructured":"S. Furui, 40 years of progress in automatic speaker recognition, Advances in Biometrics 5558 (2009), 1050\u20131059.","DOI":"10.1007\/978-3-642-01793-3_106"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_015","doi-asserted-by":"crossref","unstructured":"D. Gillick, S. Wegmann and L. Gillick, Discriminative training for speech recognition is compensating for statistical dependence in the HMM framework, in: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, IEEE, 2012.","DOI":"10.1109\/ICASSP.2012.6288979"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_016","doi-asserted-by":"crossref","unstructured":"H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, J. Acoust. Soc. Am. 87.4 (1990), 1738\u20131752.","DOI":"10.1121\/1.399423"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_017","unstructured":"J. H. Holland, Adaptation in natural and artificial systems. 1975, University of Michigan Press, Ann Arbor, MI, 1992."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_018","unstructured":"X. Huang, A. Acero and H.-W. Hon, Spoken Language Processing: a Guide to Theory, Algorithm, and System Development, Prentice Hall PTR, NJ, USA, 2001."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_019","doi-asserted-by":"crossref","unstructured":"N. Jakovljevic, D. Miskovic, M. Janev, M. Secujski and V. Delic, Comparison of linear discriminant analysis approaches in automatic speech recognition, Elektron. Elektrotech. 19.7 (2013), 76\u201379.","DOI":"10.5755\/j01.eee.19.7.5167"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_020","doi-asserted-by":"crossref","unstructured":"V. Kadyan, A. Mantri and R. K. Aggarwal, Refinement of HMM model parameters for Punjabi automatic speech recognition (PASR) System, IETE J. Res. (2017), 1\u201316.","DOI":"10.1080\/03772063.2017.1369370"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_021","doi-asserted-by":"crossref","unstructured":"V. Kadyan, A. Mantri and R. K. Aggarwal, A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers, Int. J. Speech Technol. 20 (2017), 1\u20139.","DOI":"10.1007\/s10772-017-9446-9"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_022","doi-asserted-by":"crossref","unstructured":"J. Kennedy and R. Eberhart, Particle swarm optimization, in: IEEE Int. Conf. Neural Networks, Perth, WA, Australia, vol. 4, 1995.","DOI":"10.1109\/ICNN.1995.488968"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_023","unstructured":"J. Koehler, N. Morgan, H. Hermansky, H. G. Hirsch and G. Tong, Integrating RASTA-PLP into Speech Recognition, in: 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, Adelaide, SA, Australia, 1994, ICASSP-94, vol. 1. IEEE, 1994."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_024","doi-asserted-by":"crossref","unstructured":"T.-W. Kuan, A.-C. Tsai, P.-H. Sung, J.-F. Wang and H.-S. Kuo, A robust BFCC feature extraction for ASR system, Artif. Intell. Res. 5.2 (2016), 14.","DOI":"10.5430\/air.v5n2p14"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_025","doi-asserted-by":"crossref","unstructured":"N. Kumar and A. G. Andreou, Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition, Speech Commun. 26.4 (1998), 283\u2013297.","DOI":"10.1016\/S0167-6393(98)00061-2"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_026","unstructured":"G. Kunkle and A. Gerald, Sequence scoring experiments using the TIMIT corpus and the HTK recognition framework, Dissertation, Florida Institute of Technology, Florida, USA, 2010."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_027","doi-asserted-by":"crossref","unstructured":"J. Li, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan and D. O\u2019Shaughnessy, An overview of noise-robust automatic speech recognition, IEEE\/ACM Trans. Audio Speech Lang. Process. 22.4 (2014), 745\u2013777.","DOI":"10.1109\/TASLP.2014.2304637"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_028","doi-asserted-by":"crossref","unstructured":"E. McDermott, T. J. Hazen, J. L. Roux, A. Nakamura and S. Katagiri, Discriminative training for large-vocabulary speech recognition using minimum classification error, IEEE Trans. Audio Speech Lang. Process. 15.1 (2007), 203\u2013223.","DOI":"10.1109\/TASL.2006.876778"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_029","doi-asserted-by":"crossref","unstructured":"M. McLaren, R. Vogt, B. Baker and S. Sridharan, A comparison of session variability compensation techniques for SVM-based speaker recognition, in: Eighth Annual Conference of the International Speech Communication Association Antwerp, Belgium, pp. 790\u2013793, 2007.","DOI":"10.21437\/Interspeech.2007-150"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_030","doi-asserted-by":"crossref","unstructured":"F. Meriem, H. Farid, B. Messaoud and A. Abderrahmene, New front end based on multitaper and gammatone filters for robust speaker verification, in: Recent Advances in Electrical Engineering and Control Applications, Springer International Publishing, Cham(ZG), Switzerland, pp. 344\u2013354, 2017.","DOI":"10.1007\/978-3-319-48929-2_27"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_031","doi-asserted-by":"crossref","unstructured":"T. Mittal and R. K. Sharma, Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm, Turk. J. Electr. Eng. Comput. Sci. 24.6 (2016), 4790\u20134803.","DOI":"10.3906\/elk-1412-193"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_032","unstructured":"J. M. Naik, L. P. Netsch and G. R. Doddington, Speaker verification over long distance telephone lines, in: 1989 International Conference on Acoustics, Speech, and Signal Processing, 1989, ICASSP-89, Glasgow, UK, IEEE, 1989."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_033","unstructured":"D. Povey, Discriminative training for large vocabulary speech recognition, Dissertation, University of Cambridge, Cambridge, United Kingdom, 2005."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_034","doi-asserted-by":"crossref","unstructured":"D. Povey and P. C. Woodland, Minimum phone error and I-smoothing for improved discriminative training, in: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA, vol. 1, IEEE, 2002.","DOI":"10.1109\/ICASSP.2002.1005687"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_035","unstructured":"L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition (Vol. 14), PTR Prentice Hall, Englewood Cliffs, 1993."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_036","doi-asserted-by":"crossref","unstructured":"D. A. Reynolds, Experimental evaluation of features for robust speaker identification, IEEE Trans. Speech Audio Process. 2.4 (1994), 639\u2013643.","DOI":"10.1109\/89.326623"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_037","unstructured":"K. Samudravijaya, P. V. S. Rao and S. S. Agrawal, Hindi speech database, in: International Conference on spoken Language Processing, Beijing, China, 2002, pp. 456\u2013464."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_038","doi-asserted-by":"crossref","unstructured":"G. Saon and J.-T. Chien, Large-vocabulary continuous speech recognition systems: a look at some recent advances, IEEE Signal Process. Mag. 29.6 (2012), 18\u201333.","DOI":"10.1109\/MSP.2012.2197156"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_039","doi-asserted-by":"crossref","unstructured":"A. Sharma, M. C. Shrotriya, O. Farooq and Z. A. Abbasi, Hybrid wavelet based LPC features for Hindi speech recognition, Int. J. Inf. Commun. Technol. 1.3\u20134 (2008), 373\u2013381.","DOI":"10.1504\/IJICT.2008.024008"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_040","doi-asserted-by":"crossref","unstructured":"R. Storn and K. Price, Differential evolution \u2013 a simple and efficient heuristic for global optimization over continuous spaces, J. Global Optim. 11.4 (1997), 341\u2013359.","DOI":"10.1023\/A:1008202821328"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_041","doi-asserted-by":"crossref","unstructured":"X. Valero and F. Alias, Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification, IEEE Trans. Multimedia 14.6 (2012), 1684\u20131689.","DOI":"10.1109\/TMM.2012.2199972"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_042","unstructured":"K. Vertanen, An Overview of Discriminative Training for Speech Recognition, University of Cambridge, Cambridge, UK, 2004."},{"key":"2025120523362767091_j_jisys-2017-0618_ref_043","doi-asserted-by":"crossref","unstructured":"C. P. Woodland and D. Povey, Large scale discriminative training of hidden Markov models for speech recognition, Comput. Speech Lang. 16.1 (2002), 25\u201347.","DOI":"10.1006\/csla.2001.0182"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_044","doi-asserted-by":"crossref","unstructured":"X. Zhao and D. L. Wang, Analyzing noise robustness of MFCC and GFCC features in speaker identification, in: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2013.","DOI":"10.1109\/ICASSP.2013.6639061"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_045","doi-asserted-by":"crossref","unstructured":"X. Zhao, Y. Shao and D. L. Wang, CASA-based robust speaker identification, IEEE Transactions on Audio, Speech, and Language Processing 20.5 (2012), 1608\u20131616.","DOI":"10.1109\/TASL.2012.2186803"},{"key":"2025120523362767091_j_jisys-2017-0618_ref_046","unstructured":"H. Zhou, D. Karakos, S. Khudanpur, A. G. Andreou and C. E. Priebe, On projections of Gaussian distributions using maximum likelihood criteria, in: Information Theory and Applications Workshop, 2009, IEEE, 2009."}],"container-title":["Journal of Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyter.com\/view\/journals\/jisys\/29\/1\/article-p327.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jisys-2017-0618\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jisys-2017-0618\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T23:38:15Z","timestamp":1764977895000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jisys-2017-0618\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,2,20]]},"references-count":46,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2018,4,25]]},"published-print":{"date-parts":[[2019,12,18]]}},"alternative-id":["10.1515\/jisys-2017-0618"],"URL":"https:\/\/doi.org\/10.1515\/jisys-2017-0618","relation":{},"ISSN":["2191-026X","0334-1860"],"issn-type":[{"type":"electronic","value":"2191-026X"},{"type":"print","value":"0334-1860"}],"subject":[],"published":{"date-parts":[[2018,2,20]]}}}