{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:25:54Z","timestamp":1750307154133,"version":"3.41.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2011,11,1]],"date-time":"2011-11-01T00:00:00Z","timestamp":1320105600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2011,11]]},"abstract":"<jats:p>\n            Sung language recognition relies on both effective feature extraction and acoustic modeling. In this paper, we study rhythm based music segmentation with the frame size being the duration of the smallest note in the music, as opposed to fixed length segmentation in spoken language recognition. It is found that acoustic features extracted from the rhythm based segmentation scheme outperform those from fixed length segmentation. We also study the effectiveness of a musically motivated acoustic feature.\n            <jats:italic>Octave scale cepstral coefficients<\/jats:italic>\n            (OSCCs) by comparing with the other acoustic features: Log frequency cepstral coefficients,\n            <jats:italic>Linear prediction coefficients<\/jats:italic>\n            (LPC) and LPC-derived cepstral coefficients. Finally, we examine the modeling capabilities of Gaussian mixture models and support vector machines in sung language recognition experiments. Experiments conducted on a corpus of 400 popular songs sung in English, Chinese, German, and Indonesian, showed that the OSCC feature outperforms other features. A sung language recognition accuracy of 64.9% was achieved when Gaussian mixture models were trained on shifted-delta-OSCC acoustic features, extracted via rhythm based music segmentation.\n          <\/jats:p>","DOI":"10.1145\/2043612.2043615","type":"journal-article","created":{"date-parts":[[2011,12,6]],"date-time":"2011-12-06T19:05:23Z","timestamp":1323198323000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Beat space segmentation and octave scale cepstral feature for sung language recognition in pop music"],"prefix":"10.1145","volume":"7","author":[{"given":"Namunu C.","family":"Maddage","sequence":"first","affiliation":[{"name":"Royal Melbourne Institute of Technology University (RMIT), Melbourne, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haizhou","family":"Li","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2011,12,2]]},"reference":[{"volume-title":"Proceedings of the International Congress of Phonetic Sciences (ICPhS).","author":"Adda-Decker M.","key":"e_1_2_1_1_1","unstructured":"Adda-Decker, M., Antoine, F., Boula de Mareuil, P., Vasilescu, I., Lamel, L., Vaissiere, J., Geoffrois, E., and Lienard, J. S. 2003. Phonetic knowledge, phonotactics and perceptual validation for automatic language identification. In Proceedings of the International Congress of Phonetic Sciences (ICPhS)."},{"volume-title":"Proceedings of 8th European Conference on Speech and Communication and Technology (Eurospeech).","author":"Adami A. G.","key":"e_1_2_1_2_1","unstructured":"Adami, A. G. and Hermansky, H. 2003. Segmentation of speech for speaker and language recognition. In Proceedings of 8th European Conference on Speech and Communication and Technology (Eurospeech)."},{"key":"e_1_2_1_3_1","first-page":"1933","article-title":"An efficient algorithm for the calculation of a constant Q transform","volume":"92","author":"Brown J. C.","year":"1991","unstructured":"Brown, J. C. and Puckette, M. S. 1991. An efficient algorithm for the calculation of a constant Q transform. J. Acoust. Soc. Amer. 92, 5, 1933--1941.","journal-title":"J. Acoust. Soc. Amer."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2005.06.003"},{"volume-title":"Proceedings of the Multimedia Information Retrieval Workshop.","author":"Dai P.","key":"e_1_2_1_5_1","unstructured":"Dai, P., Iurgel, U., and Rigoll, G. 2003. A novel feature combination approach for spoken document classification with support vector machines. In Proceedings of the Multimedia Information Retrieval Workshop."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.394387"},{"volume-title":"Proceedings of the International Conference on Digital Audio Effects (DAFx).","author":"Duxburg C.","key":"e_1_2_1_7_1","unstructured":"Duxburg, C., Sandler, M., and Davies, M. 2002. A hybrid approach to musical note onset detection. In Proceedings of the International Conference on Digital Audio Effects (DAFx)."},{"volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","author":"Ellis D. P. W.","key":"e_1_2_1_8_1","unstructured":"Ellis, D. P. W. and Poliner, G. E. 2006. Identifying \u2018cover songs\u2019 with chroma features and dynamic programming beat tracking. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1915565"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.413218"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1909011"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.880079"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1076\/jnmr.30.2.159.7114"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.381582"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1180639.1180777"},{"volume-title":"Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'02)","author":"Jiang D. N.","key":"e_1_2_1_17_1","unstructured":"Jiang, D. N., Lu, L., Zhang, H. J., Tao, J. H., and Cai, L. H. 2002. Music type classification by spectral contrast feature. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'02). 113--116."},{"key":"e_1_2_1_18_1","unstructured":"John R. D. John H. L. and John G. P. 1999. Discrete-Time Processing of Speech Signals. IEEE Press."},{"volume-title":"The Brain, and Ecstasy: How Music Captures Our Imagination","author":"Jourdain R.","key":"e_1_2_1_19_1","unstructured":"Jourdain, R. 1997. Music, The Brain, and Ecstasy: How Music Captures Our Imagination. Harper Collins."},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","author":"Kirchhoff K.","key":"e_1_2_1_21_1","unstructured":"Kirchhoff, K., Parandekar, S., and Bilmes, J. 2002. Mixed memory markov models for automatic language identification. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219904"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.876860"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.860344"},{"volume-title":"Proceedings of the International Conference on Spoken Language Processing (ICSLP).","author":"Ma B.","key":"e_1_2_1_25_1","unstructured":"Ma, B., Guan, C., Li, H., and Lee, C. H. 2002. Multilingual speech recognition with language identification. In Proceedings of the International Conference on Spoken Language Processing (ICSLP)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148185"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1027527.1027549"},{"volume-title":"Proceedings of the International Conference on Spoken Language Processing (ICSLP).","author":"Matrouf D.","key":"e_1_2_1_28_1","unstructured":"Matrouf, D., Adda-Decker, M., Lamel, L. F., and Gauvain, J.-L. 1998. Language identification incorporating lexical information. In Proceedings of the International Conference on Spoken Language Processing (ICSLP)."},{"volume-title":"Proceedings of the 5th International Symposium\/Conf. of Music Information Retrieval (ISMIR).","author":"Nwe T. L.","key":"e_1_2_1_29_1","unstructured":"Nwe, T. L. and Wang, Y. 2004. Automatic detection of vocal segments in popular songs. In Proceedings of the 5th International Symposium\/Conf. of Music Information Retrieval (ISMIR)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.18626"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","unstructured":"Rabiner L. R. and Juang B. H. 1993. Fundamentals of Speech Recognition. Prentice-Hall.","DOI":"10.5555\/153687"},{"key":"e_1_2_1_32_1","first-page":"73","article-title":"Robust text-independent speaker identification using Gaussian mixture speaker models","volume":"3","author":"Reynolds D. A.","year":"2005","unstructured":"Reynolds, D. A. and Rose, R. C. 2005. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 73--83.","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"e_1_2_1_33_1","unstructured":"Rossing T. D. Moore F. R. and Wheeler P. A. 2001. The Science of Sound 3rd Ed. Addison Wesley."},{"volume-title":"The Associated Board of the Royal Schools of Music","author":"Royal Schools","key":"e_1_2_1_34_1","unstructured":"Royal Schools of Music. 1949. Rudiments and Theory of Music, The Associated Board of the Royal Schools of Music, London."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2006.1598089"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.421129"},{"volume-title":"Proceedings of the International Conference on Music Information Retrieval (ISMIR).","author":"Schwenninger J.","key":"e_1_2_1_37_1","unstructured":"Schwenninger, J., Brueckner, R., Willett, D., and Hennecke, M. 2006. Language Identification in vocal music. In Proceedings of the International Conference on Music Information Retrieval (ISMIR)."},{"volume-title":"Proceedings of the 8th European Conference on Speech and Communication and Technology (Eurospeech).","author":"Singer E.","key":"e_1_2_1_38_1","unstructured":"Singer, E., Torres-Carrasquillo, P. A., Gleason, T. P., Campbell, W. M., and Reynolds, D. A. 2003. Acoustic, phonetic and discriminative approaches to automatic language recognition. In Proceedings of the 8th European Conference on Speech and Communication and Technology (Eurospeech)."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1915893"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/1170742.1171074"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1914347"},{"volume-title":"Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP).","author":"Thyme-Gobbel A. E.","key":"e_1_2_1_42_1","unstructured":"Thyme-Gobbel, A. E. and Hutchins, S. E. 1996. On using prosodic cues in automatic language identification. In Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.21437\/ICSLP.2002-74"},{"volume-title":"Proceedings of the International Conference on Music Information Retrieval (ISMIR).","author":"Tsai W.-H.","key":"e_1_2_1_44_1","unstructured":"Tsai, W.-H. and Wang, H. M. 2004. Towards automatic identification of singing language in popular music recordings. In Proceedings of the International Conference on Music Information Retrieval (ISMIR)."},{"volume-title":"Proceedings of the International Conference on Music Information Retrieval (ISMIR).","author":"Typke R.","key":"e_1_2_1_45_1","unstructured":"Typke, R., Wiering, F., and Veltkamp, R. 2005. A survey of music information retrieval systems. In Proceedings of the International Conference on Music Information Retrieval (ISMIR)."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.880085"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1907344"},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Xiao Z. Dellandrea E. Dou W. and Chen L. 2008. What is the best segment duration for music mood analysis&quest; In Proceedings of the 6th International Workshop on Content-Based Multimedia Indexing (CBMI). 17--24.","DOI":"10.1109\/CBMI.2008.4564922"},{"volume-title":"The HTK Book Version 3.4. Department of Engineering","author":"Young S.","key":"e_1_2_1_49_1","unstructured":"Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P. 2006. The HTK Book Version 3.4. Department of Engineering, University of Cambridge."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMMC.2005.56"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.1996.481450"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2043612.2043615","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2043612.2043615","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:54:18Z","timestamp":1750240458000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2043612.2043615"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,11]]},"references-count":49,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2011,11]]}},"alternative-id":["10.1145\/2043612.2043615"],"URL":"https:\/\/doi.org\/10.1145\/2043612.2043615","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2011,11]]},"assertion":[{"value":"2010-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-03-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-12-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}