{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T08:10:05Z","timestamp":1774512605163,"version":"3.50.1"},"reference-count":48,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2018,12,1]]},"DOI":"10.1587\/transinf.2018edk0001","type":"journal-article","created":{"date-parts":[[2018,11,30]],"date-time":"2018-11-30T22:27:20Z","timestamp":1543616840000},"page":"3123-3137","source":"Crossref","is-referenced-by-count":3,"title":["In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer"],"prefix":"10.1587","volume":"E101.D","author":[{"given":"Takeshi","family":"HOMMA","sequence":"first","affiliation":[{"name":"R&D Group, Hitachi, Ltd."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yasunari","family":"OBUCHI","sequence":"additional","affiliation":[{"name":"Tokyo University of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kazuaki","family":"SHIMA","sequence":"additional","affiliation":[{"name":"Clarion Co., Ltd."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rintaro","family":"IKESHITA","sequence":"additional","affiliation":[{"name":"R&D Group, Hitachi, Ltd."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hiroaki","family":"KOKUBO","sequence":"additional","affiliation":[{"name":"R&D Group, Hitachi, Ltd."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Takuya","family":"MATSUMOTO","sequence":"additional","affiliation":[{"name":"Hitachi Automotive Systems Ltd."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"crossref","unstructured":"[1] J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M. Cohen, M. Kamvar, and B. Strope, \u201cYour word is my command: Google search by voice: A case study,\u201d Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, ed. A. Neustein, pp.61-90, Springer, New York, 2010. 10.1007\/978-1-4419-5951-5_4","DOI":"10.1007\/978-1-4419-5951-5_4"},{"key":"2","doi-asserted-by":"crossref","unstructured":"[2] Y. Fujita, R. Takashima, T. Homma, and M. Togami, \u201cData augmentation using multi-input multi-output source separation for deep neural network based acoustic modeling,\u201d Proc. Interspeech, San Francisco, USA, pp.3818-3822, Sept. 2016. 10.21437\/interspeech.2016-733","DOI":"10.21437\/Interspeech.2016-733"},{"key":"3","unstructured":"[3] H. Kokubo, A. Amano, and N. Hataoka, \u201cRobust speech recognition for car environment noise,\u201d IEICE Trans. Inf. &amp; Syst. (Japanese Edition), vol.J83-DII, no.11, pp.2190-2197, Nov. 2000."},{"key":"4","doi-asserted-by":"publisher","unstructured":"[4] J.R. Bellegarda, \u201cStatistical language model adaptation: Review and perspectives,\u201d Speech Commun., vol.42, pp.93-108, Jan. 2004. 10.1016\/j.specom.2003.08.002","DOI":"10.1016\/j.specom.2003.08.002"},{"key":"5","unstructured":"[5] N. Kamado, S. Fujimura, Y. Iwase, Y. Aono, H. Masataki, T. Yamada, and R. Otsuya. \u201cIntroduction of noise-robust ASR platform based on HTML5,\u201d IPSJ SIG Technical Reports, 2015-SLP-108 (3), pp.1-6, Oct. 2015. (in Japanese)"},{"key":"6","unstructured":"[6] J.R. Bellegarda, \u201cLarge-scale personal assistant technology deployment: the Siri experience,\u201d Proc. Interspeech, Lyon, France, pp.2029-2033, Aug. 2013."},{"key":"7","doi-asserted-by":"publisher","unstructured":"[7] R. Sarikaya, \u201cThe technology behind personal digital assistants: An overview of the system architecture and key components,\u201d IEEE Signal Process. Mag., vol.34, no.1, pp.67-81, Jan. 2017. 10.1109\/msp.2016.2617341","DOI":"10.1109\/MSP.2016.2617341"},{"key":"8","unstructured":"[8] J. Twiefel, T. Baumann, S. Heinrich, and S. Wermter, \u201cImproving domain-independent cloud-based speech recognition with domain-dependent phonetic post-processing,\u201d Proc. AAAI, Qu\u00e9bec, Canada, pp.1529-1535, July 2014."},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] T. Homma, K. Shima, and T. Matsumoto, \u201cRobust utterance classification using multiple classifiers in the presence of speech recognition errors,\u201d Proc. IEEE SLT, San Diego, USA, pp.369-375, Dec. 2016. 10.1109\/slt.2016.7846291","DOI":"10.1109\/SLT.2016.7846291"},{"key":"10","doi-asserted-by":"crossref","unstructured":"[10] J.G. Fiscus, \u201cA post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER),\u201d Proc. IEEE ASRU, Santa Barbara, USA, pp.347-354, Dec. 1997. 10.1109\/asru.1997.659110","DOI":"10.1109\/ASRU.1997.659110"},{"key":"11","doi-asserted-by":"publisher","unstructured":"[11] T. Utsuro, Y. Kodama, T. Watanabe, H. Nishizaki, and S. Nakagawa, \u201cCombining outputs of multiple LVCSR models by machine learning,\u201d Syst. Comput. Jpn., vol.36, no.10, pp.9-15, Sept. 2005. 10.1002\/scj.20340","DOI":"10.1002\/scj.20340"},{"key":"12","doi-asserted-by":"crossref","unstructured":"[12] D. Hillard, B. Hoffmeister, M. Ostendorf, R. Schl\u00fcter, and H. Ney, \u201ciROVER: Improving system combination with classification,\u201d Proc. NAACL-HLT, Rochester, USA, pp.65-68, April 2007. 10.3115\/1614108.1614125","DOI":"10.3115\/1614108.1614125"},{"key":"13","doi-asserted-by":"publisher","unstructured":"[13] S. Li, Y. Akita, and T. Kawahara, \u201cSemi-supervised acoustic model training by discriminative data selection from multiple ASR systems&apos; hypotheses,\u201d IEEE\/ACM Trans. Audio Speech Lang. Process., vol.24, no.9, pp.1524-1534, Sept. 2016. 10.1109\/taslp.2016.2562505","DOI":"10.1109\/TASLP.2016.2562505"},{"key":"14","doi-asserted-by":"crossref","unstructured":"[14] V. Soto, O. Siohan, M. Elfeky, and P.J. Moreno, \u201cSelection and combination of hypotheses for dialectal speech recognition,\u201d Proc. ICASSP, Shanghai, China, pp.5845-5849, March 2016. 10.1109\/icassp.2016.7472798","DOI":"10.1109\/ICASSP.2016.7472798"},{"key":"15","doi-asserted-by":"crossref","unstructured":"[15] Y. Fujita, R. Takashima, T. Homma, R. Ikeshita, Y. Kawaguchi, T. Sumiyoshi, T. Endo, and M. Togami, \u201cUnified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection,\u201d Proc. ASRU, Scottsdale, USA, pp.416-422, Dec. 2015. 10.1109\/asru.2015.7404825","DOI":"10.1109\/ASRU.2015.7404825"},{"key":"16","doi-asserted-by":"crossref","unstructured":"[16] N. Sawada and H. Nishizaki, \u201cRecurrent neural network-based phoneme sequence estimation using multiple ASR systems&apos; outputs for spoken term detection,\u201d Proc. Interspeech, San Francisco, USA, pp.3688-3692, Sept. 2016. 10.21437\/interspeech.2016-337","DOI":"10.21437\/Interspeech.2016-337"},{"key":"17","unstructured":"[17] M. Katsumaru, M. Nakano, K. Komatani, K. Funakoshi, T. Ogata, and H.G. Okuno, \u201cImproving speech understanding accuracy with limited training data using multiple language models and multiple understanding models,\u201d Proc. Interspeech, pp.2735-2738, Brighton, United Kingdom, Sept. 2009."},{"key":"18","unstructured":"[18] Y. Obuchi, R. Takeda, and N. Kanda, \u201cVoice activity detection based on augmented statistical noise suppression,\u201d Proc. APSIPA ASC, Hollywood, USA, Dec. 2012."},{"key":"19","unstructured":"[19] Y. Obuchi, \u201cSpeech processing for car navigation systems,\u201d Technical Report of IEICE, EA 114(274), pp.3-8, Oct. 2014. (in Japanese)"},{"key":"20","doi-asserted-by":"crossref","unstructured":"[20] W. Zhu and D. O&apos;Shaughnessy, \u201cUsing noise reduction and spectral emphasis techniques to improve ASR performance in noisy conditions,\u201d Proc. IEEE ASRU, St. Thomas, USA, pp.357-362, Nov.-Dec. 2003. 10.1109\/asru.2003.1318467","DOI":"10.1109\/ASRU.2003.1318467"},{"key":"21","doi-asserted-by":"publisher","unstructured":"[21] X. Cui and A. Alwan, \u201cNoise robust speech recognition using feature compensation based on polynomial regression of utterance SNR,\u201d IEEE Trans. Speech Audio Process., vol.13, no.6, pp.1161-1172, Nov. 2005. 10.1109\/tsa.2005.853002","DOI":"10.1109\/TSA.2005.853002"},{"key":"22","doi-asserted-by":"crossref","unstructured":"[22] L. Deng, A. Acero, M. Plumpe, and X. Huang, \u201cLarge-vocabulary speech recognition under adverse acoustic environments,\u201d Proc. ICSLP, Beijing, China, pp.806-809, Oct. 2000.","DOI":"10.21437\/ICSLP.2000-657"},{"key":"23","doi-asserted-by":"publisher","unstructured":"[23] Y. Obuchi, R. Takeda, and M. Togami, \u201cNoise suppression method for preprocessor of time-lag speech recognition system based on bidirectional optimally modified log spectral amplitude estimation,\u201d Acoust. Sci. &amp; Tech., vol.34, no.2, pp.133-141, March 2013. 10.1250\/ast.34.133","DOI":"10.1250\/ast.34.133"},{"key":"24","doi-asserted-by":"publisher","unstructured":"[24] I. Cohen and B. Berdugo, \u201cSpeech enhancement for non-stationary noise environments,\u201d Signal Process., vol.81, no.11, pp.2403-2418, Nov. 2001. 10.1016\/s0165-1684(01)00128-1","DOI":"10.1016\/S0165-1684(01)00128-1"},{"key":"25","doi-asserted-by":"publisher","unstructured":"[25] Y. Ephraim and D. Malah, \u201cSpeech enhancement using a minimum mean-square error short-time spectral amplitude estimator,\u201d IEEE Trans. Acoust. Speech Signal Process., vol.32, no.6, pp.1109-1121, Dec. 1984. 10.1109\/tassp.1984.1164453","DOI":"10.1109\/TASSP.1984.1164453"},{"key":"26","doi-asserted-by":"crossref","unstructured":"[26] C. Chelba, M. Mahajan, and A. Acero, \u201cSpeech utterance classification,\u201d Proc. ICASSP, Hong Kong, China, pp.I-280-I-283, April 2003. 10.1109\/icassp.2003.1198772","DOI":"10.1109\/ICASSP.2003.1198772"},{"key":"27","doi-asserted-by":"crossref","unstructured":"[27] C.T. Hemphill, J.J. Godfrey, and G.R. Doddington, \u201cThe ATIS spoken language systems pilot corpus,\u201d Proc. 3rd DARPA Speech and Natural Language Workshop, Hidden Valley, USA, pp.96-101, June 1990. 10.3115\/116580.116613","DOI":"10.3115\/116580.116613"},{"key":"28","doi-asserted-by":"crossref","unstructured":"[28] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, \u201cA convolutional neural network for modelling sentences,\u201d Proc. ACL, Baltimore, USA, pp.655-665, June 2014. 10.3115\/v1\/p14-1062","DOI":"10.3115\/v1\/P14-1062"},{"key":"29","doi-asserted-by":"crossref","unstructured":"[29] M. Henderson, M. Ga\u0161i\u0107, B. Thomson, P. Tsiakoulis, K. Yu, and S. Young, \u201cDiscriminative spoken language understanding using word confusion networks,\u201d Proc. IEEE SLT, Miami, USA, pp.176-181, Dec. 2012. 10.1109\/slt.2012.6424218","DOI":"10.1109\/SLT.2012.6424218"},{"key":"30","doi-asserted-by":"publisher","unstructured":"[30] D. Hakkani-T\u00fcr, F. B\u00e9chet, G. Riccardi, and G. Tur, \u201cBeyond ASR 1-best: Using word confusion networks in spoken language understanding,\u201d Comput. Speech Lang., vol.20, no.4, pp.495-514, Oct. 2006. 10.1016\/j.csl.2005.07.005","DOI":"10.1016\/j.csl.2005.07.005"},{"key":"31","unstructured":"[31] IBM, \u201cSpeech to text: API reference,\u201d https:\/\/www.ibm.com\/watson\/developercloud\/speech-to-text\/api\/v1, accessed Jan. 21. 2018."},{"key":"32","unstructured":"[32] Google, \u201cCloud speech API basics,\u201d https:\/\/cloud.google.com\/speech\/docs\/basics, accessed Jan. 21. 2018."},{"key":"33","unstructured":"[33] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, \u201cLIBLINEAR: A library for large linear classification,\u201d J. Mach. Learn. Res., vol.9, pp.1871-1874, Aug. 2008."},{"key":"34","unstructured":"[34] T. Kudo, K. Yamamoto, and Y. Matsumoto, \u201cApplying conditional random fields to Japanese morphological analysis,\u201d Proc. EMNLP, Barcelona, Spain, pp.230-237, July 2004."},{"key":"35","unstructured":"[35] \u201cNAIST Japanese dictionary,\u201d https:\/\/osdn.net\/projects\/naist-jdic, accessed June 11. 2018."},{"key":"36","unstructured":"[36] K. Shima, T. Homma, R. Ikeshita, H. Kokubo, Y. Obuchi, and J. She, \u201cInterview-style-based method of collecting spontaneous speech corpus for car navigation systems,\u201d IEICE Trans. Inf. &amp; Syst. (Japanese Edition), vol.J101-D, no.2, pp.446-455, Feb. 2018."},{"key":"37","unstructured":"[37] Clarion. Co., Ltd., \u201cClarion Intelligent VOICE,\u201d http:\/\/www.clarion.com\/us\/en\/products-personal\/service\/IntelligentVoice\/, accessedOct. 6. 2017."},{"key":"38","unstructured":"[38] A. Yano, T. Honda, A. Hayashi, H. Miyazawa, and H. Sawajiri, \u201cCar information system for added value in connected cars,\u201d Hitachi Review, vol.95, no.11, pp.68-71, Nov. 2013. (in Japanese)"},{"key":"39","unstructured":"[39] C. Kim and R.M. Stern, \u201cRobust signal-to-noise ratio estimation based on waveform amplitude distribution analysis,\u201d Proc. Interspeech, Brisbane, Australia, pp.2598-2601, Sept. 2008."},{"key":"40","unstructured":"[40] S. Nakagawa and H. Takagi, \u201cStatistical methods for comparing pattern recognition algorithms and comments on evaluating speech recognition performance,\u201d J. Acoust. Soc. Jpn. (Japanese Edition), vol.50, no.10, pp.849-854, Oct. 1994."},{"key":"41","doi-asserted-by":"crossref","unstructured":"[41] K. Ono, R. Takeda, E. Nichols, M. Nakano, and K. Komatani, \u201cToward lexical acquisition during dialogues through implicit confirmation for closed-domain chatbots,\u201d Proc. Workshop on Chatbots and Conversational Agent Technologies (WOCHAT), Sept. 2016.","DOI":"10.18653\/v1\/W17-5507"},{"key":"42","unstructured":"[42] H. He, A. Balakrishnan, M. Eric, and P. Liang, \u201cLearning symmetric collaborative dialogue agents with dynamic knowledge graph embeddings,\u201d arXiv:1704.07130v1, April 2017."},{"key":"43","doi-asserted-by":"publisher","unstructured":"[43] R. Sarikaya, G.E. Hinton, and A. Deoras, \u201cApplication of deep belief networks for natural language understanding,\u201d IEEE\/ACM Trans. Audio Speech Lang. Process., vol.22, no.4, pp.778-784, April 2014. 10.1109\/taslp.2014.2303296","DOI":"10.1109\/TASLP.2014.2303296"},{"key":"44","doi-asserted-by":"crossref","unstructured":"[44] B. Liu and I. Lane, \u201cJoint online spoken language understanding and language modeling with recurrent neural networks,\u201d Proc. SIGDIAL, Los Angeles, USA, pp.22-30, Sept. 2017.","DOI":"10.18653\/v1\/W16-3603"},{"key":"45","doi-asserted-by":"crossref","unstructured":"[45] Y.-Y. Wang, A. Acero, C. Chelba, B. Frey, and L. Wong, \u201cCombination of statistical and rule-based approaches for spoken language understanding,\u201d Proc. ICSLP, Denver, USA, pp.609-612, Sept. 2002.","DOI":"10.21437\/ICSLP.2002-204"},{"key":"46","doi-asserted-by":"crossref","unstructured":"[46] T. Homma, A.S. Arantes, T. Gonzalez, and M. Togami, \u201cMaximizing SLU performance with minimal training data using hybrid RNN plus rule-based approach,\u201d Proc. SIGDIAL, Melbourne, Australia, pp.366-370, July 2018.","DOI":"10.18653\/v1\/W18-5043"},{"key":"47","doi-asserted-by":"crossref","unstructured":"[47] R.B. Miller, \u201cResponse time in man-computer conversational transactions,\u201d Proc. AFIPS &apos;68 Fall Joint Computer Conference (Part I), pp.267-277, San Francisco, USA, Dec. 1968. DOI: 10.1145\/1476589.1476628 10.1145\/1476589.1476628","DOI":"10.1145\/1476589.1476628"},{"key":"48","unstructured":"[48] T. Homma, R. Zhang, T. Matsumoto, and H. Kokubo, \u201cSpeech recognition apparatus and speech recognition system,\u201d Japan Patent, JP 2018-81185 A, 2018."}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E101.D\/12\/E101.D_2018EDK0001\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T07:57:47Z","timestamp":1662537467000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E101.D\/12\/E101.D_2018EDK0001\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,1]]},"references-count":48,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2018]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2018edk0001","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"value":"0916-8532","type":"print"},{"value":"1745-1361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,12,1]]}}}