{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T10:19:42Z","timestamp":1760955582425},"reference-count":51,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2015]]},"DOI":"10.1587\/transinf.2015edp7138","type":"journal-article","created":{"date-parts":[[2015,9,30]],"date-time":"2015-09-30T18:07:53Z","timestamp":1443636473000},"page":"1808-1817","source":"Crossref","is-referenced-by-count":19,"title":["Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation"],"prefix":"10.1587","volume":"E98.D","author":[{"given":"Chung-Chien","family":"HSU","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, National Chiao Tung University"}]},{"given":"Kah-Meng","family":"CHEONG","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, National Chiao Tung University"}]},{"given":"Tai-Shih","family":"CHI","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, National Chiao Tung University"}]},{"given":"Yu","family":"TSAO","sequence":"additional","affiliation":[{"name":"Research Center for Information Technology Innovation, Academia Sinica"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"crossref","unstructured":"[1] A. Benyassine, E. Shlomot, H.Y. Su, D. Massaloux, C. Lamblin, and J.P. Petit, \u201cITU-T recommendation G.729 annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications,\u201d IEEE Commun. Mag., vol.35, no.9, pp.64-73, 1997.","DOI":"10.1109\/35.620527"},{"key":"2","unstructured":"[2] ETSI, \u201cVoice activity detector (VAD) for adaptive multi-rate (AMR) speech traffic channels,\u201d ETSI EN 301 708 Recommendation, 1999."},{"key":"3","doi-asserted-by":"crossref","unstructured":"[3] T. Fukuda, O. Ichikawa, and M. Nishimura, \u201cLong-term spectro-temporal and static harmonic features for voice activity detection,\u201d IEEE J. Sel. Topics Signal Process., vol.4, no.5, pp.834-844, 2010.","DOI":"10.1109\/JSTSP.2010.2069750"},{"key":"4","doi-asserted-by":"crossref","unstructured":"[4] M.W. Mak and H.B. Yu, \u201cA study of voice activity detection techniques for NIST speaker recognition evaluations,\u201d Comput. Speech Lang., vol.28, no.1, pp.295-313, 2014.","DOI":"10.1016\/j.csl.2013.07.003"},{"key":"5","doi-asserted-by":"crossref","unstructured":"[5] I. McCowan, D. Dean, M. McLaren, R. Vogt, and S. Sridharan, \u201cThe delta-phase spectrum with application to voice activity detection and speaker recognition,\u201d IEEE Trans. Audio, Speech, Language Process., vol.19, no.7, pp.2026-2038, 2011.","DOI":"10.1109\/TASL.2011.2109379"},{"key":"6","doi-asserted-by":"crossref","unstructured":"[6] J. Sohn, N.S. Kim, and W. Sung, \u201cA statistical model-based voice activity detection,\u201d IEEE Signal Process. Lett., vol.6, no.1, pp.1-3, Jan. 1999.","DOI":"10.1109\/97.736233"},{"key":"7","doi-asserted-by":"crossref","unstructured":"[7] B. Lee and M. Hasegawa-Johnson, \u201cMinimum mean squared error a posteriori estimation of high variance vehicular noise,\u201d Proc. Biennial on DSP for In-Vehicle and Mobile Systems, 2007.","DOI":"10.1007\/978-0-387-79582-9_18"},{"key":"8","doi-asserted-by":"crossref","unstructured":"[8] J. Ramirez, J.C. Segura, C. Benitez, A. de la Torre, and A. Rubio, \u201cEfficient voice activity detection algorithms using long-term speech information,\u201d Speech Commun., vol.42, no.3-4, pp.271-287, 2004.","DOI":"10.1016\/j.specom.2003.10.002"},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] P.K. Ghosh, A. Tsiartas, and S. Narayanan, \u201cRobust voice activity detection using long-term signal variability,\u201d IEEE Trans. Audio, Speech, Language Process., vol.19, no.3, pp.600-613, 2011.","DOI":"10.1109\/TASL.2010.2052803"},{"key":"10","doi-asserted-by":"crossref","unstructured":"[10] A.S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound, MIT Press, 1990.","DOI":"10.7551\/mitpress\/1486.001.0001"},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] T.M. Elliott and F.E. Theunissen, \u201cThe modulation transfer function for speech intelligibility,\u201d PLoS Comput. Biol., vol.5, no.3, p.e1000302, 2009.","DOI":"10.1371\/journal.pcbi.1000302"},{"key":"12","doi-asserted-by":"crossref","unstructured":"[12] M. ter Keurs, J.M. Festen, and R. Plomp, \u201cEffect of spectral envelope smearing on speech reception. I,\u201d J. Acoust. Soc. Am., vol.91, no.5, pp.2872-2880, 1992.","DOI":"10.1121\/1.402950"},{"key":"13","doi-asserted-by":"crossref","unstructured":"[13] R. Drullman, J.M. Festen, and R. Plomp, \u201cEffect of temporal envelope smearing on speech reception,\u201d J. Acoust. Soc. Am., vol.95, no.2, pp.1053-1064, 1994.","DOI":"10.1121\/1.408467"},{"key":"14","doi-asserted-by":"crossref","unstructured":"[14] R.V. Shannon, F.G. Zeng, V. Kamath, J. Wygonski, and M. Ekelid, \u201cSpeech recognition with primarily temporal cues,\u201d Science, vol.13, pp.303-304, 1995.","DOI":"10.1126\/science.270.5234.303"},{"key":"15","doi-asserted-by":"crossref","unstructured":"[15] G. Evangelopoulos and P. Maragos, \u201cMultiband modulation energy tracking for noisy speech detection,\u201d IEEE Trans. Audio, Speech, Language Process., vol.14, no.6, pp.2024-2038, 2006.","DOI":"10.1109\/TASL.2006.872625"},{"key":"16","doi-asserted-by":"crossref","unstructured":"[16] J. Bach, B. Kollmeier, and J. Anem\u00fcller, \u201cModulation-based detection of speech in real background noise: Generalization to novel background classes,\u201d Proc. IEEE ICASSP, pp.41-44, 2010.","DOI":"10.1109\/ICASSP.2010.5496244"},{"key":"17","unstructured":"[17] M. Unoki, X. Lu, R. Petrick, S. Morita, M. Akagi, and R. Hoffmann, \u201cVoice activity detection in MTF-based power envelope restoration,\u201d Proc. INTERSPEECH, pp.2609-2612, 2011."},{"key":"18","doi-asserted-by":"crossref","unstructured":"[18] S. Morita, M. Unoki, X. Lu, and M. Akagi, \u201cRobust voice activity detection based on concept of modulation transfer function in noisy reverberant environments,\u201d Proc. IEEE ISCSLP, pp.108-112, 2014.","DOI":"10.1109\/ISCSLP.2014.6936716"},{"key":"19","doi-asserted-by":"crossref","unstructured":"[19] S. Norman-Haignere, N. Kanwisher, and J.H. McDermott, \u201cCortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex,\u201d J. Neurosci., vol.33, no.50, pp.19451-19469, 2013.","DOI":"10.1523\/JNEUROSCI.2880-13.2013"},{"key":"20","doi-asserted-by":"crossref","unstructured":"[20] S. Shamma and D. Klein, \u201cThe case of the missing pitch templates: How harmonic templates emerge in the early auditory system,\u201d J. Acoust. Soc. Am., vol.107, no.5, pp.2631-2644, 2000.","DOI":"10.1121\/1.428649"},{"key":"21","doi-asserted-by":"crossref","unstructured":"[21] L.N. Tan, B. Borgstrom, and A. Alwan, \u201cVoice activity detection using harmonic frequency components in likelihood ratio test,\u201d Proc. IEEE ICASSP, pp.4466-4469, 2010.","DOI":"10.1109\/ICASSP.2010.5495611"},{"key":"22","unstructured":"[22] E. Chuangsuwanich and J.R. Glass, \u201cRobust voice activity detector for real world applications using harmonicity and modulation frequency,\u201d Proc. INTERSPEECH, pp.2645-2648, 2011."},{"key":"23","unstructured":"[23] D.A. Depireux, J.Z. Simon, D.J. Klein, and S.A. Shamma, \u201cSpectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex,\u201d J. Neurophysiol., vol.85, no.3, pp.1220-1234, 2001."},{"key":"24","doi-asserted-by":"crossref","unstructured":"[24] N. Mesgarani and E.F. Chang, \u201cSelective cortical representation of attended speaker in multi-talker speech perception,\u201d Nature, vol.485, pp.233-236, 2012.","DOI":"10.1038\/nature11020"},{"key":"25","doi-asserted-by":"crossref","unstructured":"[25] T. Chi, P. Ru, and S.A. Shamma, \u201cMultiresolution spectrotemporal analysis of complex sounds,\u201d J. Acoust. Soc. Am., vol.118, no.2, pp.887-906, 2005.","DOI":"10.1121\/1.1945807"},{"key":"26","doi-asserted-by":"crossref","unstructured":"[26] N. Mesgarani, S.V. David, J.B. Fritz, and S.A. Shamma, \u201cMechanisms of noise robust representation of speech in primary auditory cortex,\u201d Proc. Natl. Acad. Sci. U.S.A., vol.111, no.18, pp.6792-6797, 2014.","DOI":"10.1073\/pnas.1318017111"},{"key":"27","doi-asserted-by":"crossref","unstructured":"[27] T. Chi, Y. Gao, M.C. Guyton, P. Ru, and S. Shamma, \u201cSpectro-temporal modulation transfer functions and speech intelligibility,\u201d J. Acoust. Soc. Am., vol.106, no.5, pp.2719-2732, 1999.","DOI":"10.1121\/1.428100"},{"key":"28","doi-asserted-by":"crossref","unstructured":"[28] M. Elhilali, T. Chi, and S.A. Shamma, \u201cA spectro-temporal modulation index (STMI) for assessment of speech intelligibility,\u201d Speech Commun., vol.41, no.2-3, pp.331-348, 2003.","DOI":"10.1016\/S0167-6393(02)00134-6"},{"key":"29","doi-asserted-by":"crossref","unstructured":"[29] K. Patil, D. Pressnitzer, S. Shamma, and M. Elhilali, \u201cMusic in our ears: The biological bases of musical timbre perception,\u201d PLoS Comput. Biol., vol.8, no.10, p.e1002759, 2012.","DOI":"10.1371\/journal.pcbi.1002759"},{"key":"30","doi-asserted-by":"crossref","unstructured":"[30] R. Stern and N. Morgan, \u201cHearing is believing: Biologically inspired methods for robust automatic speech recognition,\u201d IEEE Signal Process. Mag., vol.29, no.6, pp.34-43, Nov. 2012.","DOI":"10.1109\/MSP.2012.2207989"},{"key":"31","unstructured":"[31] S. Ganapathy, S. Mallidi, and H. Hermansky, \u201cRobust feature extraction using modulation filtering of autoregressive models,\u201d IEEE\/ACM Trans. Audio, Speech, Language Process., vol.22, no.8, pp.1285-1295, 2014."},{"key":"32","doi-asserted-by":"crossref","unstructured":"[32] H. Lei, B. Meyer, and N. Mirghafori, \u201cSpectro-temporal gabor features for speaker recognition,\u201d Proc. IEEE ICASSP, pp.4241-4244, 2012.","DOI":"10.1109\/ICASSP.2012.6288855"},{"key":"33","doi-asserted-by":"crossref","unstructured":"[33] C.C. Hsu, T.E. Lin, J.H. Chen, and T.S. Chi, \u201cSpectro-temporal subband wiener filter for speech enhancement,\u201d Proc. IEEE ICASSP, pp.4001-4004, 2012.","DOI":"10.1109\/ICASSP.2012.6288795"},{"key":"34","doi-asserted-by":"crossref","unstructured":"[34] T.S. Chi and C.C. Hsu, \u201cMultiband analysis and synthesis of spectro-temporal modulations of fourier spectrogram,\u201d J. Acoust. Soc. Am., vol.129, no.5, pp.EL190-EL196, 2011.","DOI":"10.1121\/1.3565471"},{"key":"35","doi-asserted-by":"crossref","unstructured":"[35] F.G. Zeng, K. Nie, G.S. Stickney, Y.Y. Kong, M. Vongphoe, A. Bhargave, C. Wei, and K. Cao, \u201cSpeech recognition with amplitude and frequency modulations,\u201d Proc. Natl. Acad. Sci. U.S.A., vol.102, no.7, pp.2293-2298, 2005.","DOI":"10.1073\/pnas.0406460102"},{"key":"36","doi-asserted-by":"crossref","unstructured":"[36] H. Chen and F.G. Zeng, \u201cFrequency modulation detection in cochlear implant subjects,\u201d J. Acoust. Soc. Am., vol.116, no.4, pp.2269-2277, 2004.","DOI":"10.1121\/1.1785833"},{"key":"37","doi-asserted-by":"crossref","unstructured":"[37] K. Nie, G. Stickney, and F.G. Zeng, \u201cEncoding frequency modulation to improve cochlear implant performance in noise,\u201d IEEE Trans. Biomed. Eng., vol.52, no.1, pp.64-73, 2005.","DOI":"10.1109\/TBME.2004.839799"},{"key":"38","doi-asserted-by":"crossref","unstructured":"[38] M. Hamouda, F. Fnaiech, and K. Al-Haddad, \u201cA DSP based real-time simulation of dual-bridge matrix converters,\u201d Proc. IEEE ISIE, pp.594-599, 2007.","DOI":"10.1109\/ISIE.2007.4374663"},{"key":"39","doi-asserted-by":"crossref","unstructured":"[39] S. Muller, U. Ammann, and S. Rees, \u201cNew time-discrete modulation scheme for matrix converters,\u201d IEEE Trans. Ind. Electron., vol.52, no.6, pp.1607-1615, 2005.","DOI":"10.1109\/TIE.2005.858713"},{"key":"40","doi-asserted-by":"crossref","unstructured":"[40] E. Cornu, H. Sheikhzadeh, R. Brennan, H. Abutalebi, E. Tam, P. Iles, and K. Wong, \u201cETSI AMR-2 VAD: evaluation and ultra low-resource implementation,\u201d Proc. IEEE ICASSP, pp.II-585-8, 2003.","DOI":"10.1109\/ICME.2003.1221748"},{"key":"41","unstructured":"[41] T.F. Quatieri, Discrete-Time Speech Signal Processing Principles and Practice, Pearson Education, 2002."},{"key":"42","unstructured":"[42] V. Podlozhnyuk, \u201cFFT-based 2D convolution,\u201d NVIDIA White Paper, 2007."},{"key":"43","doi-asserted-by":"crossref","unstructured":"[43] M. Morrone and R. Owens, \u201cFeature detection from local energy,\u201d Pattern Recogn. Lett., vol.6, no.5, pp.303-313, 1987.","DOI":"10.1016\/0167-8655(87)90013-4"},{"key":"44","doi-asserted-by":"crossref","unstructured":"[44] B. Robbins and R. Owens, \u201c2D feature detection via local energy,\u201d Image Vision Comput., vol.15, no.5, pp.353-368, 1997.","DOI":"10.1016\/S0262-8856(96)01137-7"},{"key":"45","doi-asserted-by":"crossref","unstructured":"[45] M. Felsberg and G. Sommer, \u201cThe monogenic signal,\u201d IEEE Trans. Signal Process., vol.49, no.12, pp.3136-3144, 2001.","DOI":"10.1109\/78.969520"},{"key":"46","doi-asserted-by":"crossref","unstructured":"[46] V. Zue, S. Seneff, and J. Glass, \u201cSpeech database development at MIT: TIMIT and beyond,\u201d Speech Commun., vol.9, no.4, pp.351-356, 1990.","DOI":"10.1016\/0167-6393(90)90010-7"},{"key":"47","doi-asserted-by":"crossref","unstructured":"[47] A. Varga and H.J. Steeneken, \u201cAssessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,\u201d Speech Commun., vol.12, no.3, pp.247-251, 1993.","DOI":"10.1016\/0167-6393(93)90095-3"},{"key":"48","doi-asserted-by":"crossref","unstructured":"[48] M. Aten, G. Towers, C. Whitley, P. Wheeler, J. Clare, and K. Bradley, \u201cReliability comparison of matrix and other converter topologies,\u201d IEEE Trans. Aerosp. Electron. Syst., vol.42, no.3, pp.867-875, 2006.","DOI":"10.1109\/TAES.2006.248190"},{"key":"49","unstructured":"[49] ITU-T, \u201cObjective measurement of active speech level. ITU-T Recommendation P.56,\u201d ITU-T Recommendation P.56."},{"key":"50","doi-asserted-by":"crossref","unstructured":"[50] F. Beritelli, S. Casale, and G. Ruggeri, \u201cPerformance evaluation and comparison of ITU-T\/ETSI voice activity detectors,\u201d Proc. IEEE ICASSP, pp.1425-1428, 2001.","DOI":"10.1109\/ICASSP.2001.941197"},{"key":"51","unstructured":"[51] X. Lu, M. Unoki, R. Isotani, H. Kawai, and S. Nakamura, \u201cAdaptive regularization framework for robust voice activity detection,\u201d Proc. INTERSPEECH, pp.2653-2656, 2011."}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E98.D\/10\/E98.D_2015EDP7138\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,31]],"date-time":"2019-08-31T00:24:10Z","timestamp":1567211050000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E98.D\/10\/E98.D_2015EDP7138\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015]]},"references-count":51,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2015]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2015edp7138","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"value":"0916-8532","type":"print"},{"value":"1745-1361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015]]}}}