{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:41:58Z","timestamp":1760132518072,"version":"3.41.0"},"reference-count":27,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2018]]},"DOI":"10.1587\/transinf.2017edp7175","type":"journal-article","created":{"date-parts":[[2017,12,31]],"date-time":"2017-12-31T22:30:03Z","timestamp":1514759403000},"page":"205-214","source":"Crossref","is-referenced-by-count":6,"title":["Learning Supervised Feature Transformations on Zero Resources for Improved Acoustic Unit Discovery"],"prefix":"10.1587","volume":"E101.D","author":[{"given":"Michael","family":"HECK","sequence":"first","affiliation":[{"name":"Augmented Human Communication Laboratory, Graduate School of Information Science, Nara Institute of Science and Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sakriani","family":"SAKTI","sequence":"additional","affiliation":[{"name":"Augmented Human Communication Laboratory, Graduate School of Information Science, Nara Institute of Science and Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Satoshi","family":"NAKAMURA","sequence":"additional","affiliation":[{"name":"Augmented Human Communication Laboratory, Graduate School of Information Science, Nara Institute of Science and Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"crossref","unstructured":"[1] A. Park and J.R. Glass, \u201cTowards unsupervised pattern discovery in speech,\u201d Automatic Speech Recognition and Understanding, IEEE Workshop on, pp.53-58, IEEE, 2005. 10.1109\/asru.2005.1566529","DOI":"10.1109\/ASRU.2005.1566529"},{"key":"2","doi-asserted-by":"publisher","unstructured":"[2] A.S. Park and J.R. Glass, \u201cUnsupervised pattern discovery in speech,\u201d Audio, Speech, and Language Processing, IEEE Transactions on, vol.16, no.1, pp.186-197, 2008. 10.1109\/tasl.2007.909282","DOI":"10.1109\/TASL.2007.909282"},{"key":"3","doi-asserted-by":"crossref","unstructured":"[3] B. Varadarajan, S. Khudanpur, and E. Dupoux, \u201cUnsupervised learning of acoustic sub-word units,\u201d Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp.165-168, 2008. 10.3115\/1557690.1557736","DOI":"10.3115\/1557690.1557736"},{"key":"4","doi-asserted-by":"crossref","unstructured":"[4] Y. Zhang and J.R. Glass, \u201cUnsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams,\u201d Automatic Speech Recognition &amp; Understanding, 2009. ASRU 2009. IEEE Workshop on, pp.398-403, IEEE, 2009. 10.1109\/asru.2009.5372931","DOI":"10.1109\/ASRU.2009.5372931"},{"key":"5","unstructured":"[5] I. Malioutov, A. Park, R. Barzilay, and J. Glass, \u201cMaking sense of sound: Unsupervised topic segmentation over acoustic input,\u201d Association for Computational Linguistics Annual Meeting, pp.504-511, 2007."},{"key":"6","unstructured":"[6] M. Dredze, A. Jansen, G. Coppersmith, and K. Church, \u201cNLP on spoken documents without ASR,\u201d Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp.460-470, Association for Computational Linguistics, 2010."},{"key":"7","unstructured":"[7] M. Versteegh, R. Thiolliere, T. Schatz, X.N. Cao, X. Anguera, A. Jansen, and E. Dupoux, \u201cThe zero resource speech challenge 2015,\u201d Proceedings of Interspeech, pp.3169-3173, 2015."},{"key":"8","unstructured":"[8] D. Renshaw, H. Kamper, A. Jansen, and S. Goldwater, \u201cA comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge,\u201d Proceedings of Interspeech, pp.3199-3203, 2015."},{"key":"9","unstructured":"[9] L. Badino, A. Mereta, and L. Rosasco, \u201cDiscovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders,\u201d Proceedings of Interspeech, pp.3174-3178, 2015."},{"key":"10","unstructured":"[10] R. Thiolliere, E. Dunbar, G. Synnaeve, M. Versteegh, and E. Dupoux, \u201cA hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling,\u201d Proceedings of Interspeech, pp.3179-3183, 2015."},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] H. Kamper, A. Jansen, S. King, and S. Goldwater, \u201cUnsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings,\u201d Spoken Language Technology Workshop (SLT), 2014 IEEE, pp.100-105, IEEE, 2014. 10.1109\/slt.2014.7078557","DOI":"10.1109\/SLT.2014.7078557"},{"key":"12","unstructured":"[12] H. Chen, C.C. Leung, L. Xie, B. Ma, and H. Li, \u201cParallel inference of Dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study,\u201d Proceedings of Interspeech, pp.3189-3193, 2015."},{"key":"13","doi-asserted-by":"publisher","unstructured":"[13] R.A. Fisher, \u201cThe use of multiple measurements in taxonomic problems,\u201d Annals of eugenics, vol.7, no.2, pp.179-188, 1936. 10.1111\/j.1469-1809.1936.tb02137.x","DOI":"10.1111\/j.1469-1809.1936.tb02137.x"},{"key":"14","doi-asserted-by":"crossref","unstructured":"[14] R.A. Gopinath, \u201cMaximum likelihood modeling with Gaussian distributions for classification,\u201d Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, pp.661-664, IEEE, 1998. 10.1109\/icassp.1998.675351","DOI":"10.1109\/ICASSP.1998.675351"},{"key":"15","doi-asserted-by":"publisher","unstructured":"[15] M.J.F. Gales, \u201cSemi-tied covariance matrices for hidden Markov models,\u201d Speech and Audio Processing, IEEE Transactions on, vol.7, no.3, pp.272-281, 1999. 10.1109\/89.759034","DOI":"10.1109\/89.759034"},{"key":"16","doi-asserted-by":"crossref","unstructured":"[16] T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, \u201cA compact model for speaker-adaptive training,\u201d Spoken Language, 1996. ICSLP 96. Proceedings, Fourth International Conference on, pp.1137-1140, IEEE, 1996. 10.1109\/icslp.1996.607807","DOI":"10.21437\/ICSLP.1996-253"},{"key":"17","doi-asserted-by":"publisher","unstructured":"[17] M.J. Gales, \u201cMaximum likelihood linear transformations for HMM-based speech recognition,\u201d Computer speech &amp; language, vol.12, no.2, pp.75-98, 1998. 10.1006\/csla.1998.0043","DOI":"10.1006\/csla.1998.0043"},{"key":"18","doi-asserted-by":"crossref","unstructured":"[18] C. Ding and T. Li, \u201cAdaptive dimension reduction using discriminant analysis and <i>k<\/i>-means clustering,\u201d Proceedings of the 24th international conference on Machine learning, pp.521-528, ACM, 2007. 10.1145\/1273496.1273562","DOI":"10.1145\/1273496.1273562"},{"key":"19","doi-asserted-by":"crossref","unstructured":"[19] J. Tang, X. Hu, H. Gao, and H. Liu, \u201cDiscriminant analysis for unsupervised feature selection,\u201d SDM, pp.938-946, SIAM, 2014. 10.1137\/1.9781611973440.107","DOI":"10.1137\/1.9781611973440.107"},{"key":"20","unstructured":"[20] J. Chang and J.W. Fisher III, \u201cParallel sampling of DP mixture models using sub-cluster splits,\u201d Advances in Neural Information Processing Systems, pp.620-628, 2013."},{"key":"21","doi-asserted-by":"publisher","unstructured":"[21] M.A. Pitt, K. Johnson, E. Hume, S. Kiesling, and W. Raymond, \u201cThe Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability,\u201d Speech Communication, vol.45, no.1, pp.89-95, 2005. 10.1016\/j.specom.2004.09.001","DOI":"10.1016\/j.specom.2004.09.001"},{"key":"22","doi-asserted-by":"publisher","unstructured":"[22] N.J. De Vries, M.H. Davel, J. Badenhorst, W.D. Basson, F. De Wet, E. Barnard, and A. De Waal, \u201cA smartphone-based ASR data collection tool for under-resourced languages,\u201d Speech communication, vol.56, pp.119-131, 2014. 10.1016\/j.specom.2013.07.001","DOI":"10.1016\/j.specom.2013.07.001"},{"key":"23","unstructured":"[23] T. Schatz, V. Peddinti, F. Bach, A. Jansen, H. Hermansky, and E. Dupoux, \u201cEvaluating speech features with the minimal-pair ABX task: Analysis of the classical MFC\/PLP pipeline,\u201d Proceedings of Interspeech, pp.1781-1785, 2013."},{"key":"24","unstructured":"[24] D. Povey, A. Ghoshal, G. Boulianne, N. Goel, M. Hannemann, Y. Qian, P. Schwarz, and G. Stemmer, \u201cThe Kaldi speech recognition toolkit,\u201d Proceedings of IEEE, 2011."},{"key":"25","doi-asserted-by":"publisher","unstructured":"[25] K. Pearson, \u201cOn lines and planes of closest fit to systems of points in space,\u201d The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol.2, no.11, pp.559-572, 1901. 10.1080\/14786440109462720","DOI":"10.1080\/14786440109462720"},{"key":"26","doi-asserted-by":"publisher","unstructured":"[26] H. Hotelling, \u201cAnalysis of a complex of statistical variables into principal components,\u201d Journal of educational psychology, vol.24, no.6, pp.417-441, 1933. 10.1037\/h0071325","DOI":"10.1037\/h0071325"},{"key":"27","doi-asserted-by":"crossref","unstructured":"[27] E. Dunbar, X.N. Cao, J. Benjumea, J. Karadyi, M. Bernard, L. Besacier, X. Anguerra, and E. Dupoux, \u201cThe zero resource speech challenge 2017,\u201d Proceedings of ASRU, 2017 (in press).","DOI":"10.1109\/ASRU.2017.8268953"}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E101.D\/1\/E101.D_2017EDP7175\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,29]],"date-time":"2025-06-29T09:44:27Z","timestamp":1751190267000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E101.D\/1\/E101.D_2017EDP7175\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2017edp7175","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"type":"print","value":"0916-8532"},{"type":"electronic","value":"1745-1361"}],"subject":[],"published":{"date-parts":[[2018]]}}}