{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:38:10Z","timestamp":1750307890139,"version":"3.41.0"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2010,8,1]],"date-time":"2010-08-01T00:00:00Z","timestamp":1280620800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2010,8]]},"abstract":"<jats:p>We propose a signal-based approach instead of the commonly used model-based approach, to automatically align vocal music with text lyrics at the word level. In this approach, we use a text-to-speech system to synthesize the singing voice according to the lyrics. In this way, aligning the music signal with the corresponding text lyrics becomes the alignment of two audio signals. This study uses the results of music information modeling and singing voice synthesis. In music information modeling, we study different music representation strategies for music segmentation, music region indexing and region content descriptions; in singing voice synthesis, we generate singing voice by making use of music knowledge to approximate the target vocal line in terms of tempo. The experimental results on a 20-song database show 26.3% and 36.1% word level alignment error rates at eighth note and sixteenth note alignment tolerances respectively. The proposed approach presents an alternative and effective solution to music-lyrics alignment which may require less training dataset.<\/jats:p>","DOI":"10.1145\/1823746.1823753","type":"journal-article","created":{"date-parts":[[2010,8,31]],"date-time":"2010-08-31T13:05:55Z","timestamp":1283259955000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Word level automatic alignment of music and lyrics using vocal synthesis"],"prefix":"10.1145","volume":"6","author":[{"given":"Namunu C.","family":"Maddage","sequence":"first","affiliation":[{"name":"Royal Melbourne Institute of Technology (RMIT), Melbourne, Australia"}]},{"given":"Khe Chai","family":"Sim","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), Singapore"}]},{"given":"Haizhou","family":"Li","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), Singapore"}]}],"member":"320","published-online":{"date-parts":[[2010,8,27]]},"reference":[{"volume-title":"Rudiments and Theory of Music. The associated board of the Royal Schools of Music","author":"Royal Schools","key":"e_1_2_1_1_1","unstructured":"Royal Schools of Music. 1949. Rudiments and Theory of Music. The associated board of the Royal Schools of Music . London . Royal Schools of Music. 1949. Rudiments and Theory of Music. The associated board of the Royal Schools of Music. London."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2003.822637"},{"volume-title":"Proceedings of IEEE Workshop on Applications of Signal processing to Audio and Acoustics (WASPAA).","author":"Berenzweig A. L.","key":"e_1_2_1_3_1","unstructured":"Berenzweig , A. L. and Ellis , D. P.W. 2001. Location singing voice segments within music signals . In Proceedings of IEEE Workshop on Applications of Signal processing to Audio and Acoustics (WASPAA). Berenzweig, A. L. and Ellis, D. P.W. 2001. Location singing voice segments within music signals. In Proceedings of IEEE Workshop on Applications of Signal processing to Audio and Acoustics (WASPAA)."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.404385"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Chen. K. Gao S. Zhu Y. and Sun Q. 2006. Popular song and lyrics synchronization and its application to music information retrieval. In Proceeding of Multimedia Networking and Computing.  Chen. K. Gao S. Zhu Y. and Sun Q. 2006. Popular song and lyrics synchronization and its application to music information retrieval. In Proceeding of Multimedia Networking and Computing.","DOI":"10.1117\/12.652096"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the International Computer Music Conference. 193--198","author":"Dannenberg R. B.","year":"1984","unstructured":"Dannenberg , R. B. 1984 . An on-line algorithm for real-time accompaniment . In Proceedings of the International Computer Music Conference. 193--198 . Dannenberg, R. B. 1984. An on-line algorithm for real-time accompaniment. In Proceedings of the International Computer Music Conference. 193--198."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.394387"},{"volume-title":"Proceedings of the International Conference on Digital Audio Effects (DAFx).","author":"Duxburg C.","key":"e_1_2_1_8_1","unstructured":"Duxburg , C. , Sandler , M. and Davies , M . 2002. A hybrid approach to musical note onset detection . In Proceedings of the International Conference on Digital Audio Effects (DAFx). Duxburg, C., Sandler, M. and Davies, M. 2002. A hybrid approach to musical note onset detection. In Proceedings of the International Conference on Digital Audio Effects (DAFx)."},{"volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","author":"Ellis D. P. W.","key":"e_1_2_1_9_1","unstructured":"Ellis , D. P. W. and Poliner , G. E . 2006. Identifying \u2018cover songs\u2019 with chroma features and dynamic programming beat tracking . In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Ellis, D. P. W. and Poliner, G. E. 2006. Identifying \u2018cover songs\u2019 with chroma features and dynamic programming beat tracking. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1915565"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISM.2006.38"},{"volume-title":"Proceedings of the IEEE Consumer Communications &amp; Networking.","author":"Furini M.","key":"e_1_2_1_12_1","unstructured":"Furini , M. and Alboresi , L . 2004. Audio-text synchronization inside MP3 files: A new approach and its implementation . In Proceedings of the IEEE Consumer Communications &amp; Networking. Furini, M. and Alboresi, L. 2004. Audio-text synchronization inside MP3 files: A new approach and its implementation. In Proceedings of the IEEE Consumer Communications &amp; Networking."},{"volume-title":"Proceedings of the International Computer Music Conference (ICMC). 301--308","author":"Grubb L.","key":"e_1_2_1_13_1","unstructured":"Grubb , L. and Dannenberg , R. B . 1997. A stochastic method of tracking a vocal performer . In Proceedings of the International Computer Music Conference (ICMC). 301--308 . Grubb, L. and Dannenberg, R. B. 1997. A stochastic method of tracking a vocal performer. In Proceedings of the International Computer Music Conference (ICMC). 301--308."},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics Speech Signal Processing (ICASSP). 238--241","author":"Hamon C.","key":"e_1_2_1_14_1","unstructured":"Hamon , C. Mouline , E. , and Charpentier , F . 1989. A diphone synthesis system based on time-domain prosodic modifications of speech . In Proceedings of the IEEE International Conference on Acoustics Speech Signal Processing (ICASSP). 238--241 . Hamon, C. Mouline, E., and Charpentier, F. 1989. A diphone synthesis system based on time-domain prosodic modifications of speech. In Proceedings of the IEEE International Conference on Acoustics Speech Signal Processing (ICASSP). 238--241."},{"volume-title":"Proceedings of the IEEE Workshop on Application of Signal Processing to Audio and Acoustics.","author":"Hu N.","key":"e_1_2_1_15_1","unstructured":"Hu , N. , Dannenberg , R.B. and Tzanetakis , G . 2003. Polyphonic audio matching and alignment for music retrieval . In Proceedings of the IEEE Workshop on Application of Signal Processing to Audio and Acoustics. Hu, N., Dannenberg, R.B. and Tzanetakis, G. 2003. Polyphonic audio matching and alignment for music retrieval. In Proceedings of the IEEE Workshop on Application of Signal Processing to Audio and Acoustics."},{"volume-title":"Proceedings of the International Computer Music Conference (ICMC). 70--77","author":"Inoue W.","key":"e_1_2_1_16_1","unstructured":"Inoue , W. , Hashimoto , S. and Ohteru , S . 1994. Adaptive karaoke system-human singing accompaniment based on speech recognition . In Proceedings of the International Computer Music Conference (ICMC). 70--77 . Inoue, W., Hashimoto, S. and Ohteru, S. 1994. Adaptive karaoke system-human singing accompaniment based on speech recognition. In Proceedings of the International Computer Music Conference (ICMC). 70--77."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1180639.1180777"},{"key":"e_1_2_1_18_1","unstructured":"John R. D. John H. L. and John G. P. 1999. Discrete-Time Processing of Speech Signals. IEEE Press.  John R. D. John H. L. and John G. P. 1999. Discrete-Time Processing of Speech Signals. IEEE Press."},{"volume-title":"The Brain, and Ecstasy: How Music Captures Our Imagination","author":"Jourdain R.","key":"e_1_2_1_19_1","unstructured":"Jourdain , R. 1997. Music , The Brain, and Ecstasy: How Music Captures Our Imagination . HarperCollins . Jourdain, R. 1997. Music, The Brain, and Ecstasy: How Music Captures Our Imagination. HarperCollins."},{"volume-title":"Proceedings of the International Computer Music Conference (ICMC), 138--145","author":"Katayose H.","key":"e_1_2_1_20_1","unstructured":"Katayose , H. , Kanomori , T. , Kamei , K. , Nagashima , Y. , Sato , K. , Inokuchi , S. , and Simura , S . 1993. Virtual performer . In Proceedings of the International Computer Music Conference (ICMC), 138--145 . Katayose, H., Kanomori, T., Kamei, K., Nagashima, Y., Sato, K., Inokuchi, S., and Simura, S. 1993. Virtual performer. In Proceedings of the International Computer Music Conference (ICMC), 138--145."},{"volume-title":"Proceedings of Philips Symposium on Intelligent Algorithms.","author":"Korst J.","key":"e_1_2_1_22_1","unstructured":"Korst , J. , and Geleijnse , G . 2006. Efficient lyrics retrieval and alignment . In Proceedings of Philips Symposium on Intelligent Algorithms. Korst, J., and Geleijnse, G. 2006. Efficient lyrics retrieval and alignment. In Proceedings of Philips Symposium on Intelligent Algorithms."},{"volume-title":"Proceedings of the International Computer Music Conference (ICMC).","author":"Loscos A.","key":"e_1_2_1_23_1","unstructured":"Loscos , A. , Cano , P. , and Bonada , J . 1999. Low-delay singing voice alignment to text . In Proceedings of the International Computer Music Conference (ICMC). Loscos, A., Cano, P., and Bonada, J. 1999. Low-delay singing voice alignment to text. In Proceedings of the International Computer Music Conference (ICMC)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148185"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1975.9792"},{"volume-title":"Proceedings of the 5th International Symposium\/Conference of Music Information Retrieval (ISMIR).","author":"Nwe T. L.","key":"e_1_2_1_26_1","unstructured":"Nwe , T. L. and Wang , Y . 2004. Automatic detection of vocal segments in popular songs . In Proceedings of the 5th International Symposium\/Conference of Music Information Retrieval (ISMIR). Nwe, T. L. and Wang, Y. 2004. Automatic detection of vocal segments in popular songs. In Proceedings of the 5th International Symposium\/Conference of Music Information Retrieval (ISMIR)."},{"key":"e_1_2_1_27_1","unstructured":"Rabiner L. R. and Juang B. H. 1993. Fundamentals of Speech Recognition. Prentice-Hall.   Rabiner L. R. and Juang B. H. 1993. Fundamentals of Speech Recognition. Prentice-Hall."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1978.1163055"},{"volume-title":"Proceedings of the International Conference on Music Information (ISMIR).","author":"Sheh A.","key":"e_1_2_1_29_1","unstructured":"Sheh , A. and Ellis , D. P. W. 2003. Chord segmentation and recognition using EM-trained hidden Markov models . In Proceedings of the International Conference on Music Information (ISMIR). Sheh, A. and Ellis, D. P. W. 2003. Chord segmentation and recognition using EM-trained hidden Markov models. In Proceedings of the International Conference on Music Information (ISMIR)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1915893"},{"volume-title":"Proceedings of the 3rd International Workshop on Speech Synthesis.","author":"Taylor P. A.","key":"e_1_2_1_31_1","unstructured":"Taylor , P. A. , Black , A. W. , and Caley , R. J . 1998. The architecture of the festival speech synthesis system . In Proceedings of the 3rd International Workshop on Speech Synthesis. Taylor, P. A., Black, A. W., and Caley, R. J. 1998. The architecture of the festival speech synthesis system. In Proceedings of the 3rd International Workshop on Speech Synthesis."},{"volume-title":"Proceedings of the International Symposium of Music Information Retrieval (ISMIR).","author":"Tsai W. H.","key":"e_1_2_1_32_1","unstructured":"Tsai , W. H. , Wang , H. M. , Rodgers , D. , Cheng , S. S. and Yu , H. M . 2004. Blind clustering of popular music recordings based on singer voice characteristics . In Proceedings of the International Symposium of Music Information Retrieval (ISMIR). Tsai, W. H., Wang, H. M., Rodgers, D., Cheng, S. S. and Yu, H. M. 2004. Blind clustering of popular music recordings based on singer voice characteristics. In Proceedings of the International Symposium of Music Information Retrieval (ISMIR)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1027527.1027576"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Wong C. H. Szeto W. M. and Wong K. H. 2006. Automatic lyrics alignment for Cantonese popular music. In Multimedia Systems.  Wong C. H. Szeto W. M. and Wong K. H. 2006. Automatic lyrics alignment for Cantonese popular music. In Multimedia Systems.","DOI":"10.1007\/s00530-006-0055-8"},{"key":"e_1_2_1_35_1","unstructured":"Young S. Evermann G. Gales M. Hain T. Kershaw D. Liu X. Moore G. Odell J. Ollason D. Povey D. Valtchev V. and Woodland P. 2006. The HTK Book Version 3.4. Department of Engineering University of Cambridge.  Young S. Evermann G. Gales M. Hain T. Kershaw D. Liu X. Moore G. Odell J. Ollason D. Povey D. Valtchev V. and Woodland P. 2006. The HTK Book Version 3.4. Department of Engineering University of Cambridge."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1823746.1823753","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1823746.1823753","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:47:17Z","timestamp":1750258037000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1823746.1823753"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,8]]},"references-count":34,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2010,8]]}},"alternative-id":["10.1145\/1823746.1823753"],"URL":"https:\/\/doi.org\/10.1145\/1823746.1823753","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2010,8]]},"assertion":[{"value":"2008-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-08-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}