{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T23:41:08Z","timestamp":1764978068659,"version":"3.46.0"},"reference-count":36,"publisher":"Walter de Gruyter GmbH","issue":"1","license":[{"start":{"date-parts":[[2018,6,20]],"date-time":"2018-06-20T00:00:00Z","timestamp":1529452800000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,12,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>In this paper, the improvements in the recently developed end to end spoken query system to access the agricultural commodity prices and weather information in Kannada language\/dialects is demonstrated. The spoken query system consists of interactive voice response system (IVRS) call flow, automatic speech recognition (ASR) models and agricultural commodity prices, and weather information databases. The task specific speech data used in the earlier spoken query system had a high level of background and other types of noises as it is collected from the farmers of Karnataka state (a state in India that speaks the Kannada language) under uncontrolled environment. The different types of noises present in collected speech data had an adverse effect on the on-line and off-line recognition performances. To improve the recognition accuracy in spoken query system, a noise elimination algorithm is proposed in this work, which is a combination of spectral subtraction with voice activity detection (SS-VAD) and minimum mean square error spectrum power estimator based on zero crossing (MMSE-SPZC). The noise elimination algorithm is added in the system before the feature extraction part. In addition to this, alternate acoustic models are developed using subspace Gaussian mixture models (SGMM) and deep neural network (DNN). The experimental results show that these modeling techniques are more powerful than the conventional Gaussian mixture model (GMM) \u2013 hidden Markov model (HMM), which was used as a modeling technique for the development of ASR models to design earlier spoken query systems. The fusion of noise elimination technique and SGMM\/DNN-based modeling gives a better relative improvement of 7% accuracy compared to the earlier GMM-HMM-based ASR system. The least word error rate (WER) acoustic models could be used in spoken query system. The on-line speech recognition accuracy testing of developed spoken query system (with the help of Karnataka farmers) is also presented in this work.<\/jats:p>","DOI":"10.1515\/jisys-2018-0120","type":"journal-article","created":{"date-parts":[[2018,6,20]],"date-time":"2018-06-20T18:16:11Z","timestamp":1529518571000},"page":"664-687","source":"Crossref","is-referenced-by-count":4,"title":["Improvements in Spoken Query System to Access the Agricultural Commodity Prices and Weather Information in Kannada Language\/Dialects"],"prefix":"10.1515","volume":"29","author":[{"given":"Thimmaraja G.","family":"Yadava","sequence":"first","affiliation":[{"name":"Research Scholar, Panini Research Center, 3rd Floor, Department of ECE, Siddaganga Institute of Technology, Tumkur , Karnataka 572103 , India"}]},{"given":"H.S.","family":"Jayanna","sequence":"additional","affiliation":[{"name":"Department of ISE, Siddaganga Institute of Technology, Tumkur , Karnataka , India"}]}],"member":"374","published-online":{"date-parts":[[2018,6,20]]},"reference":[{"key":"2025120523362771342_j_jisys-2018-0120_ref_001","doi-asserted-by":"crossref","unstructured":"J. Beh and H. Ko, A novel spectral subtraction scheme for robust speech recognition: spectral subtraction using spectral harmonics of speech, in: IEEE Int. Conf. on Multimedia and Expo, vol. 3, I-648, I-651, April 2003.","DOI":"10.1007\/3-540-44864-0_115"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_002","doi-asserted-by":"crossref","unstructured":"S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process 2 ASSP-27 (1979), 113\u2013120.","DOI":"10.1109\/TASSP.1979.1163209"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_003","doi-asserted-by":"crossref","unstructured":"I. Cohen and B. Berdugo, Noise estimation by minima controlled recursive averaging for robust speech enhancement, IEEE Signal Process. Lett. 9 (2002), 12\u201315.","DOI":"10.1109\/97.988717"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_004","doi-asserted-by":"crossref","unstructured":"C. Cole, M. Karam and H. Aglan, Spectral subtraction of noise in speech processing applications, in: 40th Southeastern Symposium System Theory, SSST-2008, pp. 50\u201353, 16\u201318, New Orelans, LO, USA, March 2008.","DOI":"10.1109\/SSST.2008.4480188"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_005","doi-asserted-by":"crossref","unstructured":"G. Dahl, D. Yu, L. Deng and A. Acero, Context-dependent pre-trained deep neural networks for large vocabulary speech recognition, in: IEEE Trans. on Audio Speech, and Language Processing (receiving 2013 IEEE SPS Best Paper Award), pp. 30\u201342, Piscataway, NJ, USA, 2012.","DOI":"10.1109\/TASL.2011.2134090"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_006","doi-asserted-by":"crossref","unstructured":"D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika 81 (1994), 425\u2013455.","DOI":"10.1093\/biomet\/81.3.425"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_007","doi-asserted-by":"crossref","unstructured":"D. L. Donoho and I. M. Johnstone, Adapting to unknown smoothness via wavelet shrinkage, J. Am. Stat. Assoc. 90 (1995), 1200\u20131224.","DOI":"10.1080\/01621459.1995.10476626"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_008","doi-asserted-by":"crossref","unstructured":"Y. Ephraim and D. Malah, Speech enhancement using a minimum mean square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-32 (1984), 1109\u20131121.","DOI":"10.1109\/TASSP.1984.1164453"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_009","doi-asserted-by":"crossref","unstructured":"Y. Ephraim and D. Malah, Speech enhancement using a minimum mean square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-33 (1985), 443\u2013445.","DOI":"10.1109\/TASSP.1985.1164550"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_010","unstructured":"J. R. Glass, Challanges for spoken dialogue systems, in: Proc. IEEE ASRU Workshop, Piscataway, NJ, USA, 1999."},{"key":"2025120523362771342_j_jisys-2018-0120_ref_011","doi-asserted-by":"crossref","unstructured":"H. M. Goodarzi and S. Seyedtabaii, Speech enhancement using spectral subtraction based on a modified noise minimum statistics estimation, in: Fifth Joint Int. Conf, pp. 1339, 1343, 25\u201327 Aug. 2009.","DOI":"10.1109\/NCM.2009.272"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_012","doi-asserted-by":"crossref","unstructured":"G. E. Hinton, S. Osindero and Y. W. Teh, A fast learning algorithm for deep belief nets, Neural Comput. 18 (2006), 1527\u20131554.","DOI":"10.1162\/neco.2006.18.7.1527"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_013","doi-asserted-by":"crossref","unstructured":"G. E. Hinton, L. Deng, D. Yu, G. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath and B. Kings-bury, Deep neural networks for acoustic modeling in speech recognition, Signal Process. Mag. 29 (2012), 82\u201397.","DOI":"10.1109\/MSP.2012.2205597"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_014","doi-asserted-by":"crossref","unstructured":"Y. Hu and P. Loizou, Subjective comparison and evaluation of speech enhancement algorithms, Speech Commun. 49 (2007), 588\u2013601.","DOI":"10.1016\/j.specom.2006.12.006"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_015","doi-asserted-by":"crossref","unstructured":"Y. Hu and P. C. Loizou, Evaluation of objective quality measures for speech enhancement, IEEE Trans. Audio Speech Lang. Process. 16 (2008), 229\u2013238.","DOI":"10.1109\/TASL.2007.911054"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_016","doi-asserted-by":"crossref","unstructured":"M. Jansen, Noise reduction by wavelet thresholding, in: Ser. Lecture Notes in Statistics, vol. 161, Springer-Verlag, Berlin, Germany, 2001.","DOI":"10.1007\/978-1-4613-0145-5_7"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_017","doi-asserted-by":"crossref","unstructured":"S. Kamath and P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Orlando, USA, May 2002.","DOI":"10.1109\/ICASSP.2002.5745591"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_018","doi-asserted-by":"crossref","unstructured":"H. Liu, X. Yu, W. Wan and R. Swaminathan, An improved spectral subtraction method, in: Int. Conf. on Audio, Language and Image Processing (ICALIP), Shanghai, pp. 790\u2013793, July 2012.","DOI":"10.1109\/ICALIP.2012.6376721"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_019","doi-asserted-by":"crossref","unstructured":"P. C. Loizou, Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum, IEEE Trans. Speech Audio Process. 13 (2005), 857\u2013869.","DOI":"10.1109\/TSA.2005.851929"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_020","unstructured":"P. Loizou, Speech enhancement: theory and practice, 1st ed., CRC Taylor & Francis, Boca Raton, FL, 2007."},{"key":"2025120523362771342_j_jisys-2018-0120_ref_021","doi-asserted-by":"crossref","unstructured":"T. Lotter and P. Vary, Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model, EURASIP J. Appl. Signal Process. 5 (2005), 1110\u20131126.","DOI":"10.1155\/ASP.2005.1110"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_022","doi-asserted-by":"crossref","unstructured":"Y. Lu and P. C. Loizou, Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty, IEEE Trans. Audio Speech Lang. Process. 19 (2011), 1123\u20131137.","DOI":"10.1109\/TASL.2010.2082531"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_023","doi-asserted-by":"crossref","unstructured":"S. Mallat, A wavelet tour of signal processing, Academic Press, San Diego, CA, 1999.","DOI":"10.1016\/B978-012466606-1\/50008-8"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_024","doi-asserted-by":"crossref","unstructured":"R. Martin, Speech enhancement based on minimum mean-square error estimation and supergaussian priors, IEEE Trans. Speech Audio Process. 13 (2005), 845\u2013856.","DOI":"10.1109\/TSA.2005.851927"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_025","unstructured":"iITU-T, Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone net-works and speech codecs, ITU, ITU-T Rec. P. 862, ITU-T, Geneva, Switzerland, 2000."},{"key":"2025120523362771342_j_jisys-2018-0120_ref_026","doi-asserted-by":"crossref","unstructured":"D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai, A. Ghoshal, O. Glembek, N. Goel, M. Karafi\u00ccat, A. Rastrow, R. C. Rose, P. Schwarz and S. Thomas, The subspace gaussian mixture model-a structured model for speech recognition, in: Computer Speech and Language, pp. 404\u2013439, Elsevier, Amsterdam, The Netherlands, 2011.","DOI":"10.1016\/j.csl.2010.06.003"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_027","doi-asserted-by":"crossref","unstructured":"L. R. Rabiner, Applications of voice processing to telecommunications, Proc. IEEE 82 (1994), 199\u2013228.","DOI":"10.1109\/5.265347"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_028","unstructured":"L. Rabiner and B. H. Juang, Fundamentals of speech recognition, Prentice Hall, Inc, Upper Saddle River, NJ, USA, 1993."},{"key":"2025120523362771342_j_jisys-2018-0120_ref_029","doi-asserted-by":"crossref","unstructured":"J. Ramirez, J. M. Gorriz and J. C. Segura, Voice activity detection. Fundamentals and speech recognition system robustness, in: Robust Speech Recognition and Understanding, M. Grimm, K. Kroschel, eds., ISBN 987-3-90213-08-0, pp. 460, I-Tech, Vienna, Austria, 2007.","DOI":"10.5772\/4740"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_030","doi-asserted-by":"crossref","unstructured":"A. Rix, J. Beerends, M. Hollier and A. Hekstra, Perceptual evaluation of speech quality (PESQ)\u2013a new method for speech quality assessment of telephone networks and codecs, in: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. vol. 2, pp. 749\u2013752, 2001.","DOI":"10.1109\/ICASSP.2001.941023"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_031","doi-asserted-by":"crossref","unstructured":"R. C. Rose, S. C. Yin and Y. Tang, An investigation of subspace modeling for phonetic and speaker variability in automatic speech recognition, in: Proc. ICASSP, pp. 4508\u20134511, Prague, Czech Republic, 2011.","DOI":"10.1109\/ICASSP.2011.5947356"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_032","doi-asserted-by":"crossref","unstructured":"G. Y. Thimmaraja and H. S. Jayanna, A spoken query system for the agricultural commodity prices and weather information access in Kannada language, Int. J. Speech Technol. Springer 20 (2017), 635\u2013644.","DOI":"10.1007\/s10772-017-9428-y"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_033","doi-asserted-by":"crossref","unstructured":"A. Trihandoyo, A. Belloum and K. M. Hou, A real-time speech recognition architecture for a multi-channel interactive voice response system, Proc. ICASSP 4 (1995), 2687\u20132690.","DOI":"10.1109\/ICASSP.1995.480115"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_034","unstructured":"D. Wang and G. Brown, Eds., Computational auditory scene analysis (CASA): principles, algorithms, and applications, Wiley\/IEEE Press, Piscataway, NJ, 2006."},{"key":"2025120523362771342_j_jisys-2018-0120_ref_035","doi-asserted-by":"crossref","unstructured":"P. J. Wolfe and S. J. Godsill, Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement, in: Proc. 11th IEEE Signal Process. Workshop Statist. Signal Process., pp. 496\u2013499, Singapore, Aug. 2001.","DOI":"10.1109\/SSP.2001.955331"},{"key":"2025120523362771342_j_jisys-2018-0120_ref_036","doi-asserted-by":"crossref","unstructured":"B.-Y. Xia, Y. Liang and C.-C. Bao, A modified spectral subtraction method for speech enhancement based on masking property of human auditory system, in: Int. Conf. on Wireless Communications Signal Processing, WCSP, pp. 1\u20135, Nanjing, China, Nov. 2009.","DOI":"10.1109\/WCSP.2009.5371466"}],"container-title":["Journal of Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyter.com\/view\/journals\/jisys\/29\/1\/article-p664.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jisys-2018-0120\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jisys-2018-0120\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T23:38:21Z","timestamp":1764977901000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jisys-2018-0120\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,20]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2018,4,25]]},"published-print":{"date-parts":[[2019,12,18]]}},"alternative-id":["10.1515\/jisys-2018-0120"],"URL":"https:\/\/doi.org\/10.1515\/jisys-2018-0120","relation":{},"ISSN":["2191-026X","0334-1860"],"issn-type":[{"type":"electronic","value":"2191-026X"},{"type":"print","value":"0334-1860"}],"subject":[],"published":{"date-parts":[[2018,6,20]]}}}