{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T05:36:37Z","timestamp":1774935397217,"version":"3.50.1"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,6,2]],"date-time":"2022-06-02T00:00:00Z","timestamp":1654128000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,6,2]],"date-time":"2022-06-02T00:00:00Z","timestamp":1654128000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Development of a native language robust ASR framework is very challenging as well as an active area of research. Although an urge for investigation of effective front-end as well as back-end approaches are required for tackling environment differences, large training complexity and inter-speaker variability in achieving success of a recognition system. In this paper, four front-end approaches: mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), relative spectral-perceptual linear prediction (RASTA-PLP) and power-normalized cepstral coefficients (PNCC) have been investigated to generate unique and robust feature vectors at different SNR values. Furthermore, to handle the large training data complexity, parameter optimization has been performed with sequence-discriminative training techniques: maximum mutual information (MMI), minimum phone error (MPE), boosted-MMI (bMMI), and state-level minimum Bayes risk (sMBR). It has been demonstrated by selection of an optimal value of parameters using lattice generation, and adjustments of learning rates. In proposed framework, four different systems have been tested by analyzing various feature extraction approaches (with or without speaker normalization through Vocal Tract Length Normalization (VTLN) approach in test set) and classification strategy on with or without artificial extension of train dataset. To compare each system performance, true matched (adult train and test\u2014S1, child train and test\u2014S2) and mismatched (adult train and child test\u2014S3, adult\u2009+\u2009child train and child test\u2014S4) systems on large adult and very small Punjabi clean speech corpus have been demonstrated. Consequently, gender-based in-domain data augmented is used to moderate acoustic and phonetic variations throughout adult and children\u2019s speech under mismatched conditions. The experiment result shows that an effective framework developed on PNCC\u2009+\u2009VTLN front-end approach using TDNN-sMBR-based model through parameter optimization technique yields a relative improvement (RI) of 40.18%, 47.51%, and 49.87% in matched, mismatched and gender-based in-domain augmented system under typical clean and noisy conditions, respectively.<\/jats:p>","DOI":"10.1007\/s40747-022-00651-7","type":"journal-article","created":{"date-parts":[[2022,6,2]],"date-time":"2022-06-02T09:02:54Z","timestamp":1654160574000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions"],"prefix":"10.1007","volume":"9","author":[{"given":"Puneet","family":"Bawa","sequence":"first","affiliation":[]},{"given":"Virender","family":"Kadyan","sequence":"additional","affiliation":[]},{"given":"Abinash","family":"Tripathy","sequence":"additional","affiliation":[]},{"given":"Thipendra P.","family":"Singh","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,6,2]]},"reference":[{"key":"651_CR1","doi-asserted-by":"publisher","first-page":"107810","DOI":"10.1016\/j.apacoust.2020.107810","volume":"175","author":"P Bawa","year":"2021","unstructured":"Bawa P, Kadyan V (2021) Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions. Appl Acoust 175:107810","journal-title":"Appl Acoust"},{"key":"651_CR2","first-page":"241","volume-title":"International conference on applied human factors and ergonomics","author":"G L\u00f3pez","year":"2017","unstructured":"L\u00f3pez G, Quesada L, Guerrero LA (2017) Alexa vs. Siri vs. Cortana vs. Google Assistant: a comparison of speech-based natural user interfaces. International conference on applied human factors and ergonomics. Springer, Cham, pp 241\u2013250"},{"issue":"1","key":"651_CR3","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1080\/02763869.2018.1404391","volume":"37","author":"MB Hoy","year":"2018","unstructured":"Hoy MB (2018) Alexa, Siri, Cortana, and more: an introduction to voice assistants. Med Ref Serv Q 37(1):81\u201388","journal-title":"Med Ref Serv Q"},{"key":"651_CR4","doi-asserted-by":"publisher","DOI":"10.1007\/s40860-021-00140-7","author":"A Kumar","year":"2021","unstructured":"Kumar A, Aggarwal RK (2021) An exploration of semi-supervised and language-adversarial transfer learning using hybrid acoustic model for Hindi speech recognition. J Reliable Intell Environ. https:\/\/doi.org\/10.1007\/s40860-021-00140-7","journal-title":"J Reliable Intell Environ"},{"key":"651_CR5","doi-asserted-by":"publisher","first-page":"101077","DOI":"10.1016\/j.csl.2020.101077","volume":"63","author":"PG Shivakumar","year":"2020","unstructured":"Shivakumar PG, Georgiou P (2020) Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations. Comput Speech Lang 63:101077","journal-title":"Comput Speech Lang"},{"key":"651_CR6","doi-asserted-by":"publisher","first-page":"101101","DOI":"10.1016\/j.csl.2020.101101","volume":"63","author":"M Kumar","year":"2020","unstructured":"Kumar M, Kim SH, Lord C, Lyon TD, Narayanan S (2020) Leveraging linguistic context in dyadic interactions to improve automatic speech recognition for children. Comput Speech Lang 63:101101","journal-title":"Comput Speech Lang"},{"key":"651_CR7","doi-asserted-by":"publisher","first-page":"1981","DOI":"10.3389\/fpsyg.2019.01981","volume":"10","author":"LJ Leibold","year":"2019","unstructured":"Leibold LJ, Buss E (2019) Masked speech recognition in school-age children. Front Psychol 10:1981","journal-title":"Front Psychol"},{"issue":"S 02","key":"651_CR8","first-page":"10859","volume":"98","author":"T M\u00fcller","year":"2019","unstructured":"M\u00fcller T, Speck I, Wesarg T, Wiebe K, Hassepa\u00df F, Jakob T, Arndt S (2019) Speech recognition in noise in single-sided deaf cochlear implant children using digital wireless adaptive microphone technology. Laryngorhinootologie 98(S 02):10859","journal-title":"Laryngorhinootologie"},{"key":"651_CR9","doi-asserted-by":"crossref","unstructured":"Shahnawazuddin S, Bandarupalli TS, Chakravarthy R (2020) Improving automatic speech recognition by classifying adult and child speakers into separate groups using speech rate rhythmicity parameter. In: 2020 International Conference on Signal Processing and Communications (SPCOM). IEEE, pp. 1\u20135","DOI":"10.1109\/SPCOM50965.2020.9179497"},{"key":"651_CR10","doi-asserted-by":"publisher","first-page":"409","DOI":"10.1007\/978-981-33-6881-1_33","volume-title":"Advances in speech and music technology","author":"A Kumar","year":"2021","unstructured":"Kumar A, Aggarwal RK (2021) Bi-lingual TDNN-LSTM acoustic modeling for limited resource hindi and marathi language ASR. Advances in speech and music technology. Springer, Singapore, pp 409\u2013423"},{"key":"651_CR11","doi-asserted-by":"crossref","unstructured":"Shahnawazuddin S, Sinha R (2015) Low-memory fast on-line adaptation for acoustically mismatched children's speech recognition. In: Sixteenth annual conference of the international speech communication association","DOI":"10.21437\/Interspeech.2015-377"},{"key":"651_CR12","doi-asserted-by":"crossref","unstructured":"Koehler J, Morgan N, Hermansky H, Hirsch HG, Tong G (1994) Integrating RASTA-PLP into speech recognition. In: Proceedings of ICASSP'94. In: IEEE international conference on acoustics, speech and signal processing, vol 1. IEEE, pp. I-421","DOI":"10.1109\/ICASSP.1994.389266"},{"key":"651_CR13","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-021-03468-3","author":"V Kadyan","year":"2021","unstructured":"Kadyan V, Bawa P, Hasija T (2021) In domain training data augmentation on noise robust Punjabi Children speech recognition. J Ambient Intell Humaniz Comput. https:\/\/doi.org\/10.1007\/s12652-021-03468-3","journal-title":"J Ambient Intell Humaniz Comput"},{"key":"651_CR14","doi-asserted-by":"crossref","unstructured":"Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp. 7204\u20137208","DOI":"10.1109\/ICASSP.2013.6639061"},{"issue":"7","key":"651_CR15","doi-asserted-by":"publisher","first-page":"1315","DOI":"10.1109\/TASLP.2016.2545928","volume":"24","author":"C Kim","year":"2016","unstructured":"Kim C, Stern RM (2016) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE\/ACM Trans Audio Speech Lang Process 24(7):1315\u20131329","journal-title":"IEEE\/ACM Trans Audio Speech Lang Process"},{"issue":"6","key":"651_CR16","doi-asserted-by":"publisher","first-page":"656","DOI":"10.1109\/89.799691","volume":"7","author":"JT Chien","year":"1999","unstructured":"Chien JT (1999) Online hierarchical transformation of hidden Markov models for speech recognition. IEEE Trans Speech Audio Process 7(6):656\u2013667","journal-title":"IEEE Trans Speech Audio Process"},{"key":"651_CR17","doi-asserted-by":"crossref","unstructured":"Bahl L, Brown P, De Souza P, Mercer R (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: ICASSP'86. IEEE international conference on acoustics, speech, and signal processing, vol 11. IEEE, pp. 49\u201352","DOI":"10.1109\/ICASSP.1986.1169179"},{"key":"651_CR18","doi-asserted-by":"crossref","unstructured":"Povey D, Woodland PC (2002) Minimum phone error and I-smoothing for improved discriminative training. In: 2002 IEEE international conference on acoustics, speech, and signal processing, vol 1. IEEE, pp. I-105","DOI":"10.1109\/ICASSP.2002.1005687"},{"key":"651_CR19","doi-asserted-by":"crossref","unstructured":"Povey D, Kanevsky D, Kingsbury B, Ramabhadran B, Saon G, Visweswariah K (2008) Boosted MMI for model and feature-space discriminative training. In: 2008 IEEE international conference on acoustics, speech and signal processing. IEEE, pp. 4057\u20134060","DOI":"10.1109\/ICASSP.2008.4518545"},{"key":"651_CR20","doi-asserted-by":"crossref","unstructured":"Vesel\u00fd K, Hannemann M, Burget L (2013) Semi-supervised training of deep neural networks. In: 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, pp. 267\u2013272","DOI":"10.1109\/ASRU.2013.6707741"},{"issue":"1","key":"651_CR21","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1007\/s10772-018-09577-3","volume":"22","author":"V Kadyan","year":"2019","unstructured":"Kadyan V, Mantri A, Aggarwal RK, Singh A (2019) A comparative study of deep neural network based Punjabi-ASR system. Int J Speech Technol 22(1):111\u2013119","journal-title":"Int J Speech Technol"},{"key":"651_CR22","first-page":"2345","volume":"2013","author":"K Vesel\u00fd","year":"2013","unstructured":"Vesel\u00fd K, Ghoshal A, Burget L, Povey D (2013) Sequence-discriminative training of deep neural networks. Interspeech 2013:2345\u20132349","journal-title":"Interspeech"},{"key":"651_CR23","doi-asserted-by":"crossref","unstructured":"Zhang S, Lei M, Liu Y, Li W (2019) Investigation of modeling units for mandarin speech recognition using dfsmn-ctc-smbr. In: ICASSP 2019\u20132019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 7085\u20137089","DOI":"10.1109\/ICASSP.2019.8683859"},{"key":"651_CR24","doi-asserted-by":"crossref","unstructured":"Rao K, Senior A, Sak H (2016) Flat start training of CD-CTC-SMBR LSTM RNN acoustic models. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 5405\u20135409","DOI":"10.1109\/ICASSP.2016.7472710"},{"key":"651_CR25","doi-asserted-by":"crossref","unstructured":"Fainberg J, Bell P, Lincoln M, Renals S (2016) Improving Children's speech recognition through out-of-domain data augmentation. In: Interspeech, pp. 1598\u20131602","DOI":"10.21437\/Interspeech.2016-1348"},{"key":"651_CR26","doi-asserted-by":"crossref","unstructured":"Serizel R, Giuliani D (2014) Vocal tract length normalisation approaches to DNN-based children's and adults' speech recognition. In: 2014 IEEE spoken language technology workshop (SLT). IEEE\u00b8 pp. 135\u2013140","DOI":"10.1109\/SLT.2014.7078563"},{"key":"651_CR27","doi-asserted-by":"crossref","unstructured":"Poorjam AH, Jensen JR, Little MA, Christensen MG (2017) Dominant distortion classification for pre-processing of vowels in remote biomedical voice analysis","DOI":"10.21437\/Interspeech.2017-378"},{"issue":"3","key":"651_CR28","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1016\/0167-6393(94)00059-J","volume":"16","author":"Y Gong","year":"1995","unstructured":"Gong Y (1995) Speech recognition in noisy environments: a survey. Speech Commun 16(3):261\u2013291","journal-title":"Speech Commun"},{"key":"651_CR29","doi-asserted-by":"publisher","DOI":"10.1007\/1-4020-7769-6_4","volume-title":"Audio signal processing for next-generation multimedia communication systems","author":"EJ Diethorn","year":"2004","unstructured":"Diethorn EJ (2004) Subband noise reduction methods for speech enhancement. In: Huang Y, Benesty J (eds) Audio signal processing for next-generation multimedia communication systems. Springer, Boston. https:\/\/doi.org\/10.1007\/1-4020-7769-6_4"},{"key":"651_CR30","doi-asserted-by":"publisher","first-page":"466","DOI":"10.1007\/11848035_62","volume-title":"International workshop on multimedia content representation, classification and security","author":"G Farahani","year":"2006","unstructured":"Farahani G, Ahadi SM, Homayounpour MM (2006) Robust feature extraction of speech via noise reduction in autocorrelation domain. International workshop on multimedia content representation, classification and security. Springer, Berlin, pp 466\u2013473"},{"issue":"1","key":"651_CR31","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1186\/s13636-014-0032-7","volume":"2014","author":"Y Ma","year":"2014","unstructured":"Ma Y, Nishihara A (2014) A modified Wiener filtering method combined with wavelet thresholding multitaper spectrum for speech enhancement. EURASIP J Audio Speech Music Process 2014(1):32","journal-title":"EURASIP J Audio Speech Music Process"},{"issue":"1","key":"651_CR32","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1007\/s10772-019-09654-1","volume":"23","author":"V Kadyan","year":"2020","unstructured":"Kadyan V, Mantri A, Aggarwal RK (2020) Improved filter bank on multitaper framework for robust Punjabi-ASR system. Int J Speech Technol 23(1):87\u2013100","journal-title":"Int J Speech Technol"},{"key":"651_CR33","doi-asserted-by":"crossref","unstructured":"Zhang Y, Xu K, Wan J (2018) Rubost feature for underwater targets recognition using power-normalized cepstral coefficients. In: 2018 14th IEEE international conference on signal processing (ICSP). IEEE, pp. 90\u201393","DOI":"10.1109\/ICSP.2018.8652434"},{"issue":"6","key":"651_CR34","doi-asserted-by":"publisher","first-page":"2301","DOI":"10.1007\/s12652-018-0828-x","volume":"10","author":"M Dua","year":"2019","unstructured":"Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10(6):2301\u20132314","journal-title":"J Ambient Intell Humaniz Comput"},{"key":"651_CR35","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-021-03235-4","author":"K Goyal","year":"2021","unstructured":"Goyal K, Singh A, Kadyan V (2021) A comparison of Laryngeal effect in the dialects of Punjabi language. J Ambient Intell Humaniz Comput. https:\/\/doi.org\/10.1007\/s12652-021-03235-4","journal-title":"J Ambient Intell Humaniz Comput"},{"issue":"9","key":"651_CR36","doi-asserted-by":"publisher","first-page":"1432","DOI":"10.1109\/29.90371","volume":"36","author":"A N\u00e1das","year":"1988","unstructured":"N\u00e1das A, Nahamoo D, Picheny MA (1988) On a model-robust training method for speech recognition. IEEE Trans Acoust Speech Signal Process 36(9):1432\u20131436","journal-title":"IEEE Trans Acoust Speech Signal Process"},{"key":"651_CR37","doi-asserted-by":"crossref","unstructured":"Povey D, Woodland P (2001) Improved discriminative training techniques for large vocabulary continuous speech recognition. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol 1. IEEE, pp. 45\u201348","DOI":"10.1109\/ICASSP.2001.940763"},{"issue":"9\/10","key":"651_CR38","first-page":"341","volume":"5","author":"P Boersma","year":"2001","unstructured":"Boersma P, Van Heuven V (2001) Speak and unSpeak with PRAAT. Glot Int 5(9\/10):341\u2013347","journal-title":"Glot Int"},{"issue":"3","key":"651_CR39","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1016\/0167-6393(93)90095-3","volume":"12","author":"A Varga","year":"1993","unstructured":"Varga A, Steeneken HJ (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247\u2013251","journal-title":"Speech Commun"},{"key":"651_CR40","unstructured":"Bittner R, Humphrey E, Bello J (2016) Pysox: leveraging the audio signal processing power of sox in python. In: Proceedings of the international society for music information retrieval conference late breaking and demo papers"},{"key":"651_CR41","unstructured":"Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N et al (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE signal processing society"},{"issue":"2","key":"651_CR42","doi-asserted-by":"publisher","first-page":"1617","DOI":"10.1007\/s00500-020-05248-1","volume":"25","author":"Y Kumar","year":"2021","unstructured":"Kumar Y, Singh N, Kumar M, Singh A (2021) AutoSSR: an efficient approach for automatic spontaneous speech recognition model for the Punjabi Language. Soft Comput 25(2):1617\u20131630","journal-title":"Soft Comput"},{"key":"651_CR43","unstructured":"Gretter R, Matassoni M, Bann\u00f2 S, Falavigna D (2020) TLT-school: a corpus of non native children speech. arXiv preprint arXiv:2001.08051"},{"key":"651_CR44","doi-asserted-by":"publisher","first-page":"108002","DOI":"10.1016\/j.apacoust.2021.108002","volume":"178","author":"V Kadyan","year":"2021","unstructured":"Kadyan V, Shanawazuddin S, Singh A (2021) Developing children\u2019s speech recognition system for low resource Punjabi language. Appl Acoust 178:108002","journal-title":"Appl Acoust"},{"issue":"1","key":"651_CR45","first-page":"327","volume":"29","author":"M Dua","year":"2020","unstructured":"Dua M, Aggarwal RK, Biswas M (2020) Discriminative training using noise robust integrated features and refined HMM modeling. J Intell Syst 29(1):327\u2013344","journal-title":"J Intell Syst"},{"issue":"2","key":"651_CR46","doi-asserted-by":"publisher","first-page":"473","DOI":"10.1007\/s10772-021-09797-0","volume":"24","author":"V Kadyan","year":"2021","unstructured":"Kadyan V, Bala S, Bawa P (2021) Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system. Int J Speech Technol 24(2):473\u2013481","journal-title":"Int J Speech Technol"},{"issue":"10","key":"651_CR47","doi-asserted-by":"publisher","first-page":"6747","DOI":"10.1007\/s00521-018-3499-9","volume":"31","author":"M Dua","year":"2019","unstructured":"Dua M, Aggarwal RK, Biswas M (2019) Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Comput Appl 31(10):6747\u20136755","journal-title":"Neural Comput Appl"},{"issue":"1","key":"651_CR48","first-page":"165","volume":"30","author":"A Kumar","year":"2021","unstructured":"Kumar A, Aggarwal RK (2021) Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling. J Intell Syst 30(1):165\u2013179","journal-title":"J Intell Syst"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00651-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00651-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00651-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T18:45:22Z","timestamp":1677091522000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00651-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,2]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["651"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00651-7","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,2]]},"assertion":[{"value":"14 March 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 January 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 June 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}