{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T23:40:55Z","timestamp":1764978055489,"version":"3.46.0"},"reference-count":35,"publisher":"Walter de Gruyter GmbH","issue":"1","license":[{"start":{"date-parts":[[2018,10,1]],"date-time":"2018-10-01T00:00:00Z","timestamp":1538352000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,12,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations.<\/jats:p>","DOI":"10.1515\/jisys-2018-0057","type":"journal-article","created":{"date-parts":[[2018,9,30]],"date-time":"2018-09-30T05:01:40Z","timestamp":1538283700000},"page":"959-976","source":"Crossref","is-referenced-by-count":10,"title":["Optimizing Integrated Features for Hindi Automatic Speech Recognition System"],"prefix":"10.1515","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7071-8323","authenticated-orcid":false,"given":"Mohit","family":"Dua","sequence":"first","affiliation":[{"name":"Department of Computer Engineering , National Institute of Technology , Kurukshetra 136119, India"}]},{"given":"Rajesh Kumar","family":"Aggarwal","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering , National Institute of Technology , Kurukshetra 136119, India"}]},{"given":"Mantosh","family":"Biswas","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering , National Institute of Technology , Kurukshetra 136119, India"}]}],"member":"374","published-online":{"date-parts":[[2018,10,1]]},"reference":[{"key":"2025120523362750363_j_jisys-2018-0057_ref_001","unstructured":"M. A. Abd El-Fattah, M. I. Dessouky, S. M. Diab and F. E. Abd El-samie, Adaptive Wiener filtering approach for speech enhancement, Ubiquitous Comput. Commun. J. 3 (2008), 1\u20138."},{"key":"2025120523362750363_j_jisys-2018-0057_ref_002","unstructured":"A. Acero, Acoustical and Environmental Robustness in Automatic Speech Recognition, vol. 201, Springer Science & Business Media, New York, 2012."},{"key":"2025120523362750363_j_jisys-2018-0057_ref_003","doi-asserted-by":"crossref","unstructured":"K. R. Aggarwal and M. Dave, Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I), Int. J. Speech Technol. 14 (2011), 297\u2013308.","DOI":"10.1007\/s10772-011-9108-2"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_004","doi-asserted-by":"crossref","unstructured":"K. R. Aggarwal and M. Dave, Filterbank optimization for robust ASR using GA and PSO, Int. J. Speech Technol. 15 (2012), 191\u2013201.","DOI":"10.1007\/s10772-012-9133-9"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_005","doi-asserted-by":"crossref","unstructured":"K. R. Aggarwal and M. Dave, Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system, Telecommun. Syst. 52 (2013), 1457\u20131466.","DOI":"10.1007\/s11235-011-9623-0"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_006","doi-asserted-by":"crossref","unstructured":"M. J. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan and D. O\u2019Shaughnessy, Developments and directions in speech recognition and understanding, Part 1 [DSP Education], IEEE Signal Process. Mag. 26 (2009), 75\u201380.","DOI":"10.1109\/MSP.2009.932166"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_007","unstructured":"W. Burgos, Gammatone and MFCC Features in Speaker Recognition, Dissertation, 2014."},{"key":"2025120523362750363_j_jisys-2018-0057_ref_008","unstructured":"P. H. Combrinck and E. C. Botha, On the Mel-Scaled Cepstrum, Department of Electrical and Electronic Engineering, University of Pretoria, Hatfield, South Africa, 1996."},{"key":"2025120523362750363_j_jisys-2018-0057_ref_009","doi-asserted-by":"crossref","unstructured":"S. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process. 28 (1980), 357\u2013366.","DOI":"10.1109\/TASSP.1980.1163420"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_010","doi-asserted-by":"crossref","unstructured":"M. Dua, R. K. Aggarwal and M. Biswas, Performance evaluation of Hindi speech recognition system using optimized filterbanks, Eng. Sci. Technol. 21 (2018), 389\u2013398.","DOI":"10.1016\/j.jestch.2018.04.005"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_011","doi-asserted-by":"crossref","unstructured":"M. Dua, R. K. Aggarwal and M. Biswas, Discriminative training using noise robust integrated features and refined HMM modeling, J. Intell. Syst. 29 (2020), 327\u2013344.","DOI":"10.1515\/jisys-2017-0618"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_012","unstructured":"K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, 2013."},{"key":"2025120523362750363_j_jisys-2018-0057_ref_013","doi-asserted-by":"crossref","unstructured":"Z.-F. Hao, Z.-G. Wang and H. Huang, A particle swarm optimization algorithm with crossover operator, in: 2007 International Conference on Machine Learning and Cybernetics, vol. 2, IEEE, HongKong, China, 2007.","DOI":"10.1109\/ICMLC.2007.4370295"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_014","doi-asserted-by":"crossref","unstructured":"H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, J. Acoust. Soc. Am. 87 (1990), 1738\u20131752.","DOI":"10.1121\/1.399423"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_015","doi-asserted-by":"crossref","unstructured":"H. Hermansky and S. Sharma, Temporal patterns (TRAPS) in ASR of noisy speech, in: Proceedings of 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, IEEE, Phoenix, AZ, USA, 1999.","DOI":"10.1109\/ICASSP.1999.758119"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_016","doi-asserted-by":"crossref","unstructured":"K. Kirchhoff, Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments, in: Fifth International Conference on Spoken Language Processing, Sydney, Australia, 1998.","DOI":"10.21437\/ICSLP.1998-313"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_017","doi-asserted-by":"crossref","unstructured":"N. Kumar and A. G. Andreou, Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition, Speech Commun. 26 (1998), 283\u2013297.","DOI":"10.1016\/S0167-6393(98)00061-2"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_018","doi-asserted-by":"crossref","unstructured":"S. Kwong, C.-W. Chau and W. A. Halang, Genetic algorithm for optimizing the nonlinear time alignment of automatic speech recognition systems, IEEE Trans. Indust. Electron. 43 (1996), 559\u2013566.","DOI":"10.1109\/41.538613"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_019","doi-asserted-by":"crossref","unstructured":"S. Kwong, C. W. Chau, K. F. Man and K. S. Tangb, Optimisation of HMM topology and its model parameters by genetic algorithms, Pattern Recogn. 34 (2001), 509\u2013522.","DOI":"10.1016\/S0031-3203(99)00226-5"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_020","doi-asserted-by":"crossref","unstructured":"J. Li, L. Deng, Y. Gong and R. Haeb-Umbach, An overview of noise-robust automatic speech recognition, IEEE\/ACM Trans. Audio Speech Lang. Process. 22 (2014), 745\u2013777.","DOI":"10.1109\/TASLP.2014.2304637"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_021","doi-asserted-by":"crossref","unstructured":"T. Mittal and R. K. Sharma, Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm, Turk. J. Elect. Eng. Comput. Sci. 24 (2016), 4790\u20134803.","DOI":"10.3906\/elk-1412-193"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_022","doi-asserted-by":"crossref","unstructured":"N. Najkar, F. Razzazi and H. Sameti, A novel approach to HMM-based speech recognition systems using particle swarm optimization, Math. Comput. Modell. 52 (2010), 1910\u20131920.","DOI":"10.1016\/j.mcm.2010.03.041"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_023","doi-asserted-by":"crossref","unstructured":"M. Pant, R. Thangaraj and A. Abraham, A new PSO algorithm with crossover operator for global optimization problems, in: Innovations in Hybrid Intelligent Systems, pp. 215\u2013222, Springer, Berlin, 2007.","DOI":"10.1007\/978-3-540-74972-1_29"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_024","unstructured":"R. L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NY, 1993."},{"key":"2025120523362750363_j_jisys-2018-0057_ref_025","doi-asserted-by":"crossref","unstructured":"A. D. Reynolds, Experimental evaluation of features for robust speaker identification, IEEE Trans. Speech Audio Process. 2 (1994), 639\u2013643.","DOI":"10.1109\/89.326623"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_026","unstructured":"K. Samudravijaya, P. V. S. Rao and S. S. Agrawal, Hindi speech database, in: International Conference on Spoken Language Processing, Beijing, China, pp. 456\u2013464, 2002."},{"key":"2025120523362750363_j_jisys-2018-0057_ref_027","doi-asserted-by":"crossref","unstructured":"G. Saon and J.-T. Chien, Large-vocabulary continuous speech recognition systems: a look at some recent advances, IEEE Signal Process. Mag. 29 (2012), 18\u201333.","DOI":"10.1109\/MSP.2012.2197156"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_028","doi-asserted-by":"crossref","unstructured":"R. Schluter and H. Ney, Using phase spectrum information for improved speech recognition performance, in: Proceedings 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP\u201901), vol. 1, IEEE, Salt Lake City, UT, USA, 2001.","DOI":"10.1109\/ICASSP.2001.940785"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_029","doi-asserted-by":"crossref","unstructured":"R. Schluter, I. Bezrukov, H. Wagner and H. Ney, Gammatone features and feature combination for large vocabulary speech recognition, in: IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007), vol. 4, IEEE, Honolulu, HI, USA, 2007.","DOI":"10.1109\/ICASSP.2007.366996"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_030","doi-asserted-by":"crossref","unstructured":"A. Sharma, M. C. Shrotriya, O. Farooq and Z. A. Abbasi, Hybrid wavelet based LPC features for Hindi speech recognition, Int. J. Inform. Commun. Technol. 1 (2008), 373\u2013381.","DOI":"10.1504\/IJICT.2008.024008"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_031","doi-asserted-by":"crossref","unstructured":"H. Tolba, S.-A. Selouani and D. O\u2019Shaughnessy, Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm, in: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, IEEE, Orlando, FL, USA, 2002.","DOI":"10.21437\/ICSLP.2002-578"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_032","doi-asserted-by":"crossref","unstructured":"A. Varga and H. J. Steeneken, Assessment for automatic speech recognition, II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems, Speech Commun. 12 (1993), 247\u2013251.","DOI":"10.1016\/0167-6393(93)90095-3"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_033","doi-asserted-by":"crossref","unstructured":"F. Yang, C. Zhang and T. Sun, Comparison of particle swarm optimization and genetic algorithm for HMM training, in: 19th International Conference on Pattern Recognition, 2008 (ICPR 2008), IEEE, Tampa, FL, USA, 2008.","DOI":"10.1109\/ICPR.2008.4761282"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_034","doi-asserted-by":"crossref","unstructured":"A. Zolnay, R. Schl\u00fcter and H. Ney, Robust speech recognition using a voiced-unvoiced feature, in: Seventh International Conference on Spoken Language Processing, Denver, Colorado, USA, 2002.","DOI":"10.21437\/ICSLP.2002-38"},{"key":"2025120523362750363_j_jisys-2018-0057_ref_035","doi-asserted-by":"crossref","unstructured":"A. Zolnay, R. Schluter and H. Ney, Acoustic feature combination for robust speech recognition, in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing 2005 (ICASSP\u201905), vol. 1. IEEE, Philadelphia, PA, USA, 2005.","DOI":"10.1109\/ICASSP.2005.1415149"}],"container-title":["Journal of Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyter.com\/view\/journals\/jisys\/29\/1\/article-p959.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jisys-2018-0057\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jisys-2018-0057\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T23:37:23Z","timestamp":1764977843000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jisys-2018-0057\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,1]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2018,4,25]]},"published-print":{"date-parts":[[2019,12,18]]}},"alternative-id":["10.1515\/jisys-2018-0057"],"URL":"https:\/\/doi.org\/10.1515\/jisys-2018-0057","relation":{},"ISSN":["2191-026X","0334-1860"],"issn-type":[{"type":"electronic","value":"2191-026X"},{"type":"print","value":"0334-1860"}],"subject":[],"published":{"date-parts":[[2018,10,1]]}}}