{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,12]],"date-time":"2024-09-12T00:15:18Z","timestamp":1726100118419},"reference-count":25,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2021,11,1]]},"DOI":"10.1587\/transinf.2021edp7041","type":"journal-article","created":{"date-parts":[[2021,10,31]],"date-time":"2021-10-31T22:14:05Z","timestamp":1635718445000},"page":"1971-1980","source":"Crossref","is-referenced-by-count":1,"title":["DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching"],"prefix":"10.1587","volume":"E104.D","author":[{"given":"Satoshi","family":"MIZOGUCHI","sequence":"first","affiliation":[{"name":"The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuki","family":"SAITO","sequence":"additional","affiliation":[{"name":"The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shinnosuke","family":"TAKAMICHI","sequence":"additional","affiliation":[{"name":"The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hiroshi","family":"SARUWATARI","sequence":"additional","affiliation":[{"name":"The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"publisher","unstructured":"[1] K. Kobayashi, Y. Haneda, K. Furuya, and A. Kataoka, \u201cA hands-free unit with noise reduction by using adaptive beamformer,\u201d IEEE Trans. Consum. Electron., vol.54, no.1, pp.116-122, Feb. 2008. 10.1109\/TCE.2008.4470033","DOI":"10.1109\/TCE.2008.4470033"},{"key":"2","doi-asserted-by":"crossref","unstructured":"[2] Y. Hioka, K. Furuya, K. Kobayashi, S. Sakauchi, and Y. Haneda, \u201cAngular region-wise speech enhancement for hands-free speakerphone,\u201d IEEE Trans. Consum. Electron., vol.58, no.4, pp.1403-1410, Nov. 2012. 10.1109\/GCCE.2012.6379560","DOI":"10.1109\/TCE.2012.6415013"},{"key":"3","doi-asserted-by":"publisher","unstructured":"[3] T. Traphagan, J.V. Kucsera, and K. Kishi, \u201cImpact of class lecture webcasting on attendance and learning,\u201d Educational Technology Research and Development, vol.58, no.1, pp.19-37, Feb. 2010. 10.1007\/s11423-009-9128-7","DOI":"10.1007\/s11423-009-9128-7"},{"key":"4","doi-asserted-by":"publisher","unstructured":"[4] Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, \u201cA regression approach to speech enhancement based on deep neural networks,\u201d IEEE\/ACM Trans. Audio, Speech and Language Processing, vol.23, no.1, pp.7-19, Jan. 2015. 10.1109\/TASLP.2014.2364452","DOI":"10.1109\/TASLP.2014.2364452"},{"key":"5","unstructured":"[5] S.-W. Fu, Y. Tsao, and X. Lu, \u201cSNR-aware convolutional neural network modeling for speech enhancement,\u201d Proc. INTERSPEECH, pp.3678-3772, San Francisco, U.S.A., Sept. 2016. 10.21437\/Interspeech.2016-211"},{"key":"6","doi-asserted-by":"crossref","unstructured":"[6] F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Roux, J.R. Hershey, and B. Schuller, \u201cSpeech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR,\u201d Proc. 12th Int. Conf. Latent Variable Analysis and Signal Separation, vol.9237, pp.91-99, Liberec, Czech Republic, Aug. 2015. 10.1007\/978-3-319-22482-4_11","DOI":"10.1007\/978-3-319-22482-4_11"},{"key":"7","doi-asserted-by":"crossref","unstructured":"[7] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, \u201cSpeech enhancement based on deep denoising autoencoder,\u201d Proc. INTERSPEECH, pp.436-440, Lyon, France, Aug. 2013.","DOI":"10.21437\/Interspeech.2013-130"},{"key":"8","doi-asserted-by":"crossref","unstructured":"[8] S. Leglaive, U. Simsekli, A. Liutkus, L. Girin, and R. Horaud, \u201cSpeech enhancement with variational autoencoders and alpha-stable distributions,\u201d IEEE Int. Conf. Acoust., Speech, Signal Process., pp.541-545, Brighton, United Kingdom, May 2019. 10.1109\/ICASSP.2019.8682546","DOI":"10.1109\/ICASSP.2019.8682546"},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] Y. Koizumi, K. Niwa, Y. Hioka, K. Kobayashi, and Y. Haneda, \u201cDNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements,\u201d Proc. Int. Conf. Acoust., Speech, Signal Process., pp.81-85, New Orleans, LA, U.S.A., March 2017. 10.1109\/ICASSP.2017.7952122","DOI":"10.1109\/ICASSP.2017.7952122"},{"key":"10","doi-asserted-by":"publisher","unstructured":"[10] A.H. Moore, P.P. Parada, and P.A. Naylor, \u201cSpeech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures,\u201d Computer Speech and Language, vol.246, pp.574-584, Nov. 2017. 10.1016\/j.csl.2016.11.003","DOI":"10.1016\/j.csl.2016.11.003"},{"key":"11","doi-asserted-by":"publisher","unstructured":"[11] T.H. Dat, K. Takeda, and F. Itakura, \u201cMultichannel speech enhancement based on generalized gamma prior distribution with its online adaptive estimation,\u201d IEICE Trans. Inf. &amp; Syst., vol.E91-D, no.3, pp.439-447, March 2008. 10.1093\/ietisy\/e91-d.3.439","DOI":"10.1093\/ietisy\/e91-d.3.439"},{"key":"12","doi-asserted-by":"publisher","unstructured":"[12] O. Capp\u00e9, \u201cElimination of the musical noise phenomenon with the ephraim and malah noise suppressor,\u201d IEEE Trans. Speech Audio Process., vol.2, no.2, pp.345-349, April 1994. 10.1109\/89.279283","DOI":"10.1109\/89.279283"},{"key":"13","doi-asserted-by":"publisher","unstructured":"[13] Z. Goh, K.-C. Tan, and B. Tan, \u201cPostprocessing method for suppressing musical noise generated by spectral subtraction,\u201d IEEE Trans. Speech Audio Process., vol.6, no.3, pp.287-292, May 1998. 10.1109\/89.668822","DOI":"10.1109\/89.668822"},{"key":"14","unstructured":"[14] Y. Uemura, Y. Takahashi, H. Saruwatari, K. Shikano, and K. Kondo, \u201cAutomatic optimization scheme of spectral subtraction based on musical noise assessment via higher-order statistics,\u201d Proc. International Workshop for Acoustic Echo and Noise Control, Seattle, W.A., U.S.A., Sept. 2008."},{"key":"15","unstructured":"[15] Y. Li, K. Swersky, and R. Zemel, \u201cGenerative moment matching networks,\u201d Proc. 32th Int. Conf. Machine Learning, vol.37, pp.1718-1727, Lille, France, July 2015."},{"key":"16","unstructured":"[16] J.F. Kenny and E.S. Keeping, \u201cMoments in standard units,\u201d Mathematics of Statistics, Pt. 1, 3rd ed. pp.98-99, Princeton, NJ: Van Nostrand, 1962."},{"key":"17","unstructured":"[17] \u201cJNAS,\u201d http:\/\/research.nii.ac.jp\/src\/JNAS.html, accessed: 2018-12-05."},{"key":"18","unstructured":"[18] R. Sonobe, S. Takamichi, and H. Saruwatari, \u201cJSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis,\u201d arXiv preprint, 1711.00354, Oct. 2017."},{"key":"19","doi-asserted-by":"publisher","unstructured":"[19] R. Miyazaki, H. Saruwatari, T. Inoue, Y. Takahashi, K. Shikano, and K. Kondo, \u201cMusical-noise-free speech enhancement based on optimized iterative spectral subtraction,\u201d IEEE Trans. Audio, Speech, Language Process., vol.20, no.7, pp.2080-2094, Sept. 2021. 10.1109\/TASL.2012.2196513","DOI":"10.1109\/TASL.2012.2196513"},{"key":"20","doi-asserted-by":"crossref","unstructured":"[20] O. Ronneberger, P. Fischer, and T. Brox, \u201cU-Net: Convolutional networks for biomedical image segmentation,\u201d Proc. 18th Int. Conf. Medical Image Computing and Computer Assisted Intervention, pp.234-241, Munich, Germany, Oct. 2015. 10.1007\/978-3-319-24574-4_28","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"21","unstructured":"[21] A.L. Maas, A.Y. Hanuun, and A.Y. Ng, \u201cRectifier nonlinearities improve neural network acoustic models,\u201d Proc. 30th Int. Conf. Machine Learning, vol.30, Atlanta, Georgia, U.S.A., June 2013."},{"key":"22","unstructured":"[22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, \u201cDropout: A simple way to prevent neural networks from overfitting,\u201d J. Mach. Learn. Res., vol.15, pp.1929-1958, June 2014."},{"key":"23","unstructured":"[23] D. Kingma and J. Ba, \u201cAdam: A method for stochastic optimization,\u201d Proc. Int. Conf. Learning Representations, Banff, Canada, Dec. 2014."},{"key":"24","doi-asserted-by":"crossref","unstructured":"[24] A.W. Rix, J.G. Beerends, M.P. Hollier, and A.P. Hekstra, \u201cPerceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,\u201d Proc. ICASSP, pp.749-752, Salt Lake City, U.S.A., April 2001. 10.1109\/ICASSP.2001.941023","DOI":"10.1109\/ICASSP.2001.941023"},{"key":"25","unstructured":"[25] \u201cLancers,\u201d https:\/\/www.lancers.jp\/."}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E104.D\/11\/E104.D_2021EDP7041\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,11]],"date-time":"2024-09-11T04:48:07Z","timestamp":1726030087000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E104.D\/11\/E104.D_2021EDP7041\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,1]]},"references-count":25,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2021]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2021edp7041","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"type":"print","value":"0916-8532"},{"type":"electronic","value":"1745-1361"}],"subject":[],"published":{"date-parts":[[2021,11,1]]},"article-number":"2021EDP7041"}}