{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T22:40:02Z","timestamp":1747348802218,"version":"3.40.5"},"reference-count":28,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. ^|^ Syst."],"published-print":{"date-parts":[[2015]]},"DOI":"10.1587\/transinf.2014edp7183","type":"journal-article","created":{"date-parts":[[2015,1,5]],"date-time":"2015-01-05T07:24:50Z","timestamp":1420442690000},"page":"157-165","source":"Crossref","is-referenced-by-count":2,"title":["Similar Speaker Selection Technique Based on Distance Metric Learning Using Highly Correlated Acoustic Features with Perceptual Voice Quality Similarity"],"prefix":"10.1587","volume":"E98.D","author":[{"given":"Yusuke","family":"IJIMA","sequence":"first","affiliation":[{"name":"NTT Media Intelligence Laboratories, NTT Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hideyuki","family":"MIZUNO","sequence":"additional","affiliation":[{"name":"NTT Media Intelligence Laboratories, NTT Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"532","reference":[{"key":"1","unstructured":"[1] Y. Ijima, M. Isogai, and H. Mizuno, \u201cCorrelation analysis of acoustic features with perceptual voice quality similarity for similar speaker selection,\u201d INTERSPEECH 2011, pp.2237-2240, 2011."},{"key":"2","doi-asserted-by":"crossref","unstructured":"[2] Y. Ijima, M. Isogai, and H. Mizuno, \u201cSimilar speaker selection technique based on distance metric learning with perceptual voice quality similarity,\u201d INTERSPEECH 2012, 2012.","DOI":"10.21437\/Interspeech.2012-534"},{"key":"3","doi-asserted-by":"crossref","unstructured":"[3] J. Yamagishi and T. Kobayashi, \u201cAverage-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training,\u201d IEICE Trans. Inf. &amp; Syst., vol.E90-D, no.2, pp.533-543, Feb. 2007.","DOI":"10.1093\/ietisy\/e90-d.2.533"},{"key":"4","doi-asserted-by":"crossref","unstructured":"[4] H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, \u201cA hidden semi-markov model-based speech synthesis system,\u201d IEICE Trans. Inf. &amp; Syst., vol.E90-D, no.5, pp.825-834, May 2007.","DOI":"10.1093\/ietisy\/e90-d.5.825"},{"key":"5","doi-asserted-by":"crossref","unstructured":"[5] J. Yamagishi, O. Watts, S. King, and B. Usabaev, \u201cRoles of the average voice in speaker-adaptive HMM-based speech synthesis,\u201d INTERSPEECH 2010, pp.418-421, Sept. 2010.","DOI":"10.21437\/Interspeech.2010-174"},{"key":"6","doi-asserted-by":"crossref","unstructured":"[6] R. Dall, M. Veaux, J. Yamagishi, and S. King, \u201cAnalysis of speaker clustering strategies for HMM-based speech synthesis,\u201d INTERSPEECH 2012, 2012.","DOI":"10.21437\/Interspeech.2012-295"},{"key":"7","doi-asserted-by":"crossref","unstructured":"[7] Y.J. Wu, S. King, and K. Tokuda, \u201cCross-lingual speaker adaptation for HMM-based speech synthesis,\u201d ISCSLP2008, pp.1-4, 2008.","DOI":"10.1109\/CHINSL.2008.ECP.14"},{"key":"8","doi-asserted-by":"crossref","unstructured":"[8] S. Yoshizawa, A. Baba, K. Matsunami, Y. Mera, M. Yamada, and K. Shikano, \u201cUnsupervised speaker adaptation based on sufficient HMM statistics of selected speakers,\u201d ICASSP 2001, pp.341-344, May 2001.","DOI":"10.21437\/Eurospeech.2001-317"},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] C. Huang, T. Chen, and E. Chang, \u201cSpeaker selection training for large vocabulary continuous speech recognition,\u201d ICASSP 2002, pp.609-612, May 2002.","DOI":"10.1109\/ICASSP.2002.5743791"},{"key":"10","doi-asserted-by":"crossref","unstructured":"[10] D.A. Reynolds, \u201cSpeaker identification and verification using Gaussian mixture speaker models,\u201d Speech Communication, vol.17, no.1-2, pp.91-108, Aug. 1995.","DOI":"10.1016\/0167-6393(95)00009-D"},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] N. Higuchi and M. Hashimoto, \u201cAnalysis of acoustic features affecting speaker identification,\u201d Eurospeech-95, pp.435-438, 1995.","DOI":"10.21437\/Eurospeech.1995-118"},{"key":"12","unstructured":"[12] K. Amino, T. Sugawara, and T. Arai, \u201cSpeaker similarity in human perception and their spectral properties,\u201d WESPAC IX, 2006."},{"key":"13","unstructured":"[13] Y. Adachi, S. Kawamoto, S. Morishima, and S. Nakamura, \u201cPerceptual similarity measurement of speech by combination of acoustic features,\u201d ICASSP 2008, pp.4861-4864, 2008."},{"key":"14","unstructured":"[14] L. Yang, \u201cAn overview of distance metric learning,\u201d http:\/\/www.cs.cmu.edu\/~liuy\/dist_overview.pdf, 2007."},{"key":"15","doi-asserted-by":"crossref","unstructured":"[15] H. Chang and D.Y. Yeung, \u201cKernel-based distance metric learning for content-based image retrieval,\u201d Image Vision Comput., vol.25, no.5, pp.695-703, May 2007.","DOI":"10.1016\/j.imavis.2006.05.013"},{"key":"16","unstructured":"[16] M. Slaney, K. Weinberger, and W. White, \u201cLearning a metric for music similarity,\u201d ISMIR 2008, pp.313-316, Sept. 2008."},{"key":"17","doi-asserted-by":"crossref","unstructured":"[17] D. Mochihashi, G. Kikui, and K. Kita, \u201cLearning an optimal distance metric in a linguistic vector space,\u201d Systems and Computers in Japan, pp.12-21, 2006.","DOI":"10.1002\/scj.20533"},{"key":"18","unstructured":"[18] NTT-AT, \u201cJapanese speech database (in Japanese).\u201d http:\/\/www.ntt-at.co.jp\/product\/denwa_j"},{"key":"19","doi-asserted-by":"crossref","unstructured":"[19] H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne, \u201cRestructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds,\u201d Speech Communication, vol.27, pp.187-207, 1999.","DOI":"10.1016\/S0167-6393(98)00085-5"},{"key":"20","doi-asserted-by":"crossref","unstructured":"[20] H. Hermansky, \u201cPerceptual linear predictive (PLP) analysis of speech,\u201d The Journal of the Acoustic Society of America, vol.87, pp.1738-1752, 1990.","DOI":"10.1121\/1.399423"},{"key":"21","unstructured":"[21] N. Minematsu, K. Tsuda, and K. Hirose, \u201cQuantitative analysis of f0-induced variations of cepstrum coefficients,\u201d ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding, pp.113-117, 2001."},{"key":"22","doi-asserted-by":"crossref","unstructured":"[22] T. Cover and P. Hart, \u201cNearest neighbor pattern classification,\u201d IEEE Trans. Inf. Theory, vol.IT-13, no.1, pp.21-27, 1967.","DOI":"10.1109\/TIT.1967.1053964"},{"key":"23","unstructured":"[23] N.S. A. Bar-Hillel, T. Hertz, and D. Weinshall, \u201cLearning distance functions using equivalence relations,\u201d ICML 2003, pp.11-18, 2003."},{"key":"24","unstructured":"[24] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, \u201cNeighbourhood components analysis,\u201d NIPS, pp.513-520, 2005."},{"key":"25","unstructured":"[25] K. Weinberger, J. Blitzer, and L. Saul, \u201cDistance metric learning for large margin nearest neighbor classification,\u201d NIPS, pp.1473-1480, 2006."},{"key":"26","doi-asserted-by":"crossref","unstructured":"[26] W.M. Campbell, D.E. Sturim, and D.A. Reynolds, \u201cSupport vector machines using GMM supervectors for speaker verification,\u201d IEEE Signal Process. Lett., vol.13, no.5, pp.308-311, May 2006.","DOI":"10.1109\/LSP.2006.870086"},{"key":"27","unstructured":"[27] N. Dehak, R. Dehak, P. Kenny, N. Brummer, P. Ouellet, and P. Dumouchel, \u201cSupport vector machines versus fast scoring in the low-dimensional total variability space for speaker verification,\u201d INTERSPEECH 2009, pp.1559-1562, 2009."},{"key":"28","doi-asserted-by":"crossref","unstructured":"[28] H. Liu and H. Motoda, Feature selection for knowledge discovery and data mining, Springer, 1998.","DOI":"10.1007\/978-1-4615-5689-3"}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E98.D\/1\/E98.D_2014EDP7183\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T22:18:06Z","timestamp":1747347486000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E98.D\/1\/E98.D_2014EDP7183\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2014edp7183","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"type":"print","value":"0916-8532"},{"type":"electronic","value":"1745-1361"}],"subject":[],"published":{"date-parts":[[2015]]}}}