{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T20:57:08Z","timestamp":1778101028185,"version":"3.51.4"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2021,2,28]],"date-time":"2021-02-28T00:00:00Z","timestamp":1614470400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Institute of Informatics (NII), Tokyo"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2021,2,28]]},"abstract":"<jats:p>Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables us to learn and discover latent relationships between interesting lyrics and accompanying melodies. Unfortunately, the limited availability of a paired lyrics\u2013melody dataset with alignment information has hindered the research progress. To address this problem, we create a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment through leveraging different music sources where alignment relationship between syllables and music attributes is extracted. Most importantly, we propose a novel deep generative model, conditional Long Short-Term Memory (LSTM)\u2013Generative Adversarial Network for melody generation from lyrics, which contains a deep LSTM generator and a deep LSTM discriminator both conditioned on lyrics. In particular, lyrics-conditioned melody and alignment relationship between syllables of given lyrics and notes of predicted melody are generated simultaneously. Extensive experimental results have proved the effectiveness of our proposed lyrics-to-melody generative model, where plausible and tuneful sequences can be inferred from lyrics.<\/jats:p>","DOI":"10.1145\/3424116","type":"journal-article","created":{"date-parts":[[2021,4,16]],"date-time":"2021-04-16T12:42:08Z","timestamp":1618576928000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":91,"title":["Conditional LSTM-GAN for Melody Generation from Lyrics"],"prefix":"10.1145","volume":"17","author":[{"given":"Yi","family":"Yu","sequence":"first","affiliation":[{"name":"Digital Content and Media Sciences Research Division, National Institute of Informatics, Japan"}]},{"given":"Abhishek","family":"Srivastava","sequence":"additional","affiliation":[{"name":"Multimodal Digital Media Analysis Lab, Indraprastha Institute of Information Technology Delhi, India"}]},{"given":"Simon","family":"Canales","sequence":"additional","affiliation":[{"name":"Institut de g\u00e9nie \u00e9lectrique et \u00e9lectronique, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Switzerland"}]}],"member":"320","published-online":{"date-parts":[[2021,4,16]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"A preliminary framework for description, analysis and comparison of creative systems. J. Knowl. Based Syst. 19, 7","author":"Wiggins Geraint A.","year":"2006","unstructured":"Geraint A. Wiggins . 2006. A preliminary framework for description, analysis and comparison of creative systems. J. Knowl. Based Syst. 19, 7 ( 2006 ), 449\u2013458. Geraint A. Wiggins. 2006. A preliminary framework for description, analysis and comparison of creative systems. J. Knowl. Based Syst. 19, 7 (2006), 449\u2013458."},{"key":"e_1_2_1_2_1","first-page":"3","article-title":"1958. Musical composition with a High-Speed digital computer","volume":"6","author":"Hiller L. A.","year":"1958","unstructured":"L. A. Hiller and L. M. Isaacson . 1958. Musical composition with a High-Speed digital computer . J. Aud. Eng. Soc. 6 , 3 ( 1958 ), 154\u2013160. L. A. Hiller and L. M. Isaacson. 1958. Musical composition with a High-Speed digital computer. J. Aud. Eng. Soc. 6, 3 (1958), 154\u2013160.","journal-title":"J. Aud. Eng. Soc."},{"key":"e_1_2_1_3_1","first-page":"2","article-title":"1999. Statistical learning of harmonic movement","volume":"28","author":"Ponsford D.","year":"1999","unstructured":"D. Ponsford , G. Wiggins , and C. Mellish . 1999. Statistical learning of harmonic movement . J. New Mus. Res. 28 , 2 ( 1999 ), 150\u2013177. D. Ponsford, G. Wiggins, and C. Mellish. 1999. Statistical learning of harmonic movement. J. New Mus. Res. 28, 2 (1999), 150\u2013177.","journal-title":"J. New Mus. Res."},{"key":"e_1_2_1_4_1","unstructured":"Jean-Pierre Briot and Fran\u00e7ois Pachet. 2017. Music generation by deep learning\u2014Challenges and directions. arxiv:1712.04371. Retrieved from http:\/\/arxiv.org\/abs\/1712.04371.  Jean-Pierre Briot and Fran\u00e7ois Pachet. 2017. Music generation by deep learning\u2014Challenges and directions. arxiv:1712.04371. Retrieved from http:\/\/arxiv.org\/abs\/1712.04371."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3281746","article-title":"Deep cross-modal correlation learning for audio and lyrics in music retrieval","volume":"15","author":"Yu Y.","year":"2019","unstructured":"Y. Yu , S. Tang , F. Raposo , and L. Chen . Deep cross-modal correlation learning for audio and lyrics in music retrieval . ACM Trans. Multimedia Comput. Commun. Appl. 15 , 1 , Article 20 ( 2019 ), 1--16. Y. Yu, S. Tang, F. Raposo, and L. Chen. Deep cross-modal correlation learning for audio and lyrics in music retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1, Article 20 (2019), 1--16.","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"key":"e_1_2_1_6_1","unstructured":"Wikipedia. Melody. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Melody.  Wikipedia. Melody. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Melody."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 6th International Conference on Computational Creativity. 204\u2013211","author":"Scirea Marco","year":"2015","unstructured":"Marco Scirea , Gabriella A. B. Barros , Noor Shaker , and Julian Togelius . 2015 . SMUG: Scientific music generator . In Proceedings of the 6th International Conference on Computational Creativity. 204\u2013211 . Marco Scirea, Gabriella A. B. Barros, Noor Shaker, and Julian Togelius. 2015. SMUG: Scientific music generator. In Proceedings of the 6th International Conference on Computational Creativity. 204\u2013211."},{"key":"e_1_2_1_8_1","unstructured":"Margareta Ackerman and David Loker. 2016. Algorithmic songwriting with ALYSIA. arxiv:1612.01058. Retrieved from http:\/\/arxiv.org\/abs\/1612.01058.  Margareta Ackerman and David Loker. 2016. Algorithmic songwriting with ALYSIA. arxiv:1612.01058. Retrieved from http:\/\/arxiv.org\/abs\/1612.01058."},{"key":"e_1_2_1_9_1","unstructured":"Hangbo Bao Shaohan Huang Furu Wei Lei Cui Yu Wu Chuanqi Tan Songhao Piao and Ming Zhou. 2018. Neural melody composition from lyrics. arxiv:1809.04318. Retrieved from http:\/\/arxiv.org\/abs\/1809.04318.  Hangbo Bao Shaohan Huang Furu Wei Lei Cui Yu Wu Chuanqi Tan Songhao Piao and Ming Zhou. 2018. Neural melody composition from lyrics. arxiv:1809.04318. Retrieved from http:\/\/arxiv.org\/abs\/1809.04318."},{"key":"e_1_2_1_10_1","unstructured":"I. J. Goodfellow J. Pouget-Abadie M. Mirza B. Xu D. Warde-Farley S. Ozair A. Courville and Y. Bengio. 2014. Generative adversarial networks. arxiv:stat.ML\/1406.2661. Retrieved from https:\/\/arxiv.org\/abs\/1406.2661.  I. J. Goodfellow J. Pouget-Abadie M. Mirza B. Xu D. Warde-Farley S. Ozair A. Courville and Y. Bengio. 2014. Generative adversarial networks. arxiv:stat.ML\/1406.2661. Retrieved from https:\/\/arxiv.org\/abs\/1406.2661."},{"key":"e_1_2_1_11_1","first-page":"11","article-title":"2020. Bridge-GAN: Interpretable representation learning for text-to-image synthesis","volume":"30","author":"Yuan M.","year":"2020","unstructured":"M. Yuan and Y. Peng . 2020. Bridge-GAN: Interpretable representation learning for text-to-image synthesis . IEEE Trans. Circ. Syst. Vid. Technol. 30 , 11 ( 2020 ), 4258\u20134268. M. Yuan and Y. Peng. 2020. Bridge-GAN: Interpretable representation learning for text-to-image synthesis. IEEE Trans. Circ. Syst. Vid. Technol. 30, 11 (2020), 4258\u20134268.","journal-title":"IEEE Trans. Circ. Syst. Vid. Technol."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/307"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)","author":"Nie Weili","unstructured":"Weili Nie , Nina Narodytska , and Ankit Patel . RelGAN : Relational generative adversarial networks for text generation . In Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919) . Weili Nie, Nina Narodytska, and Ankit Patel. RelGAN: Relational generative adversarial networks for text generation. In Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)."},{"key":"e_1_2_1_14_1","volume-title":"Vico","author":"Fern\u00e1ndez Rodriguez Jose David","year":"2014","unstructured":"Jose David Fern\u00e1ndez Rodriguez and Francisco J . Vico . 2014 . AI methods in algorithmic composition: A comprehensive survey. CoRR abs\/1402.0585 (2014). Jose David Fern\u00e1ndez Rodriguez and Francisco J. Vico. 2014. AI methods in algorithmic composition: A comprehensive survey. CoRR abs\/1402.0585 (2014)."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1978802.1978809"},{"key":"e_1_2_1_16_1","volume-title":"Inmamusys: Intelligent multiagent music system. Expert Syst. Appl. 36, 3, Part 1","author":"Delgado Miguel","year":"2009","unstructured":"Miguel Delgado , Waldo Fajardo , and Miguel Molina-Solana . 2009 . Inmamusys: Intelligent multiagent music system. Expert Syst. Appl. 36, 3, Part 1 (2009), 4574--4580. Miguel Delgado, Waldo Fajardo, and Miguel Molina-Solana. 2009. Inmamusys: Intelligent multiagent music system. Expert Syst. Appl. 36, 3, Part 1 (2009), 4574--4580."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the SAISB Symposium on Artificial Intelligence and Creativity in the Arts and Sciences. 30\u201335","author":"Conklin Darrell","year":"2003","unstructured":"Darrell Conklin . 2003 . Music generation from statistical models . In Proceedings of the SAISB Symposium on Artificial Intelligence and Creativity in the Arts and Sciences. 30\u201335 . Darrell Conklin. 2003. Music generation from statistical models. In Proceedings of the SAISB Symposium on Artificial Intelligence and Creativity in the Arts and Sciences. 30\u201335."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the International Conference on Computational Creativity. 16\u201325","author":"Eigenfeldt Arne","year":"2010","unstructured":"Arne Eigenfeldt and Philippe Pasquier . 2010 . Realtime generation of harmonic progressions using controlled Markov selection . In Proceedings of the International Conference on Computational Creativity. 16\u201325 . Arne Eigenfeldt and Philippe Pasquier. 2010. Realtime generation of harmonic progressions using controlled Markov selection. In Proceedings of the International Conference on Computational Creativity. 16\u201325."},{"key":"e_1_2_1_19_1","volume-title":"Computer Models of Musical Creativity","author":"Cope David","unstructured":"David Cope . 2005. Computer Models of Musical Creativity . The MIT Press . David Cope. 2005. Computer Models of Musical Creativity. The MIT Press."},{"key":"e_1_2_1_20_1","unstructured":"Jian Wu Changran Hu Yulong Wang Xiaolin Hu and Jun Zhu. 2017. A hierarchical recurrent neural network for symbolic melody generation. arxiv:1712.05274. Retrieved from https:\/\/arxiv.org\/abs\/1712.05274.  Jian Wu Changran Hu Yulong Wang Xiaolin Hu and Jun Zhu. 2017. A hierarchical recurrent neural network for symbolic melody generation. arxiv:1712.05274. Retrieved from https:\/\/arxiv.org\/abs\/1712.05274."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-55750-2_9"},{"key":"e_1_2_1_22_1","unstructured":"Olof Mogren. 2016. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arxiv:1611.09904. Retrieved from https:\/\/arxiv.org\/abs\/1611.09904.  Olof Mogren. 2016. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arxiv:1611.09904. Retrieved from https:\/\/arxiv.org\/abs\/1611.09904."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the International Conference of Sound and Music Computing. 299\u2013302","author":"Fukayama Satoru","year":"2010","unstructured":"Satoru Fukayama , Kei Nakatsuma , Shinji Sako , Takuya Nishimoto , and Shigeki Sagayama . 2010 . Automatic song composition from the lyrics exploiting prosody of the Japanese language . In Proceedings of the International Conference of Sound and Music Computing. 299\u2013302 . Satoru Fukayama, Kei Nakatsuma, Shinji Sako, Takuya Nishimoto, and Shigeki Sagayama. 2010. Automatic song composition from the lyrics exploiting prosody of the Japanese language. In Proceedings of the International Conference of Sound and Music Computing. 299\u2013302."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 3rd International Conference on Computational Creativity","author":"Monteith Kristine","year":"2012","unstructured":"Kristine Monteith , Tony R. Martinez , and Dan Ventura . 2012 . Automatic generation of melodic accompaniments for lyrics . In Proceedings of the 3rd International Conference on Computational Creativity , 2012. 87\u201394. Kristine Monteith, Tony R. Martinez, and Dan Ventura. 2012. Automatic generation of melodic accompaniments for lyrics. In Proceedings of the 3rd International Conference on Computational Creativity, 2012. 87\u201394."},{"key":"e_1_2_1_25_1","unstructured":"Retrieved from http:\/\/www.musiccrashcourses.com\/lessons\/pitch.html\/.  Retrieved from http:\/\/www.musiccrashcourses.com\/lessons\/pitch.html\/."},{"key":"e_1_2_1_26_1","unstructured":"Wikipedia. Duration. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Duration_(music)\/.  Wikipedia. Duration. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Duration_(music)\/."},{"key":"e_1_2_1_27_1","unstructured":"Wikipedia. Rest. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Rest_(music)\/.  Wikipedia. Rest. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Rest_(music)\/."},{"key":"e_1_2_1_28_1","unstructured":"Wikipedia. Syllable. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Syllable.  Wikipedia. Syllable. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Syllable."},{"key":"e_1_2_1_29_1","unstructured":"Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https:\/\/arxiv.org\/abs\/1301.3781.  Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https:\/\/arxiv.org\/abs\/1301.3781."},{"key":"e_1_2_1_30_1","volume-title":"Long short-term memory. Neural Comput. 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural Comput. 9, 8 ( 1997 ), 1735\u20131780. DOI:http:\/\/dx.doi.org\/10.1162\/neco.1997.9.8.1735 10.1162\/neco.1997.9.8.1735 Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735\u20131780. DOI:http:\/\/dx.doi.org\/10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_31_1","unstructured":"Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arxiv:1411.1784. Retrieved from http:\/\/arxiv.org\/abs\/1411.1784.  Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arxiv:1411.1784. Retrieved from http:\/\/arxiv.org\/abs\/1411.1784."},{"key":"e_1_2_1_32_1","unstructured":"Retrieved from https:\/\/colinraffel.com\/projects\/lmd\/.  Retrieved from https:\/\/colinraffel.com\/projects\/lmd\/."},{"key":"e_1_2_1_33_1","unstructured":"Retrieved from https:\/\/www.reddit.com\/r\/datasets\/.  Retrieved from https:\/\/www.reddit.com\/r\/datasets\/."},{"key":"e_1_2_1_34_1","volume-title":"A hilbert space embedding for distributions","author":"Smola Alex","unstructured":"Alex Smola , Arthur Gretton , Le Song , and Bernhard Sch\u00f6lkopf . 2007. A hilbert space embedding for distributions . In Algorithmic Learning Theory, Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto (Eds.). Springer , Berlin . Alex Smola, Arthur Gretton, Le Song, and Bernhard Sch\u00f6lkopf. 2007. A hilbert space embedding for distributions. In Algorithmic Learning Theory, Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto (Eds.). Springer, Berlin."},{"key":"e_1_2_1_35_1","unstructured":"Wacha Bounliphone Eugene Belilovsky Matthew B. Blaschko Ioannis Antonoglou and Arthur Gretton. 2015. A test of relative similarity for model selection in generative models. arxiv:1511.04581. Retrieved from https:\/\/arxiv.org\/abs\/1511.04581.  Wacha Bounliphone Eugene Belilovsky Matthew B. Blaschko Ioannis Antonoglou and Arthur Gretton. 2015. A test of relative similarity for model selection in generative models. arxiv:1511.04581. Retrieved from https:\/\/arxiv.org\/abs\/1511.04581."},{"key":"e_1_2_1_36_1","volume-title":"On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions","author":"Reddi Sashank J.","year":"2015","unstructured":"Sashank J. Reddi , Aaditya Ramdas , Barnabas Poczos , Aarti Singh , and Larry Wasserman . On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions . 2015 . In Proc. AAAI. 3571--3577. Sashank J. Reddi, Aaditya Ramdas, Barnabas Poczos, Aarti Singh, and Larry Wasserman. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. 2015. In Proc. AAAI. 3571--3577."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations).","author":"Lee Hsin-Pei","unstructured":"Hsin-Pei Lee , Jhih-Sheng Fang , and Wei-Yun Ma. i Composer : An automatic songwriting system for Chinese popular music . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Hsin-Pei Lee, Jhih-Sheng Fang, and Wei-Yun Ma. iComposer: An automatic songwriting system for Chinese popular music. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)."},{"key":"e_1_2_1_38_1","unstructured":"Lantao Yu Weinan Zhang Jun Wang and Yong Yu. 2016. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. arxiv:cs.LG\/1609.05473. Retrieved from https:\/\/arxiv.org\/abs\/1609.05473.  Lantao Yu Weinan Zhang Jun Wang and Yong Yu. 2016. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. arxiv:cs.LG\/1609.05473. Retrieved from https:\/\/arxiv.org\/abs\/1609.05473."},{"key":"e_1_2_1_39_1","unstructured":"https:\/\/synthesizerv.com\/en\/. ([n.d.]).  https:\/\/synthesizerv.com\/en\/. ([n.d.])."},{"key":"e_1_2_1_40_1","unstructured":"Eric Jang Shixiang Gu and Ben Poole. 2016. Categorical reparameterization with Gumbel-Softmax. arxiv:stat.ML\/1611.01144. Retrieved from https:\/\/arxiv.org\/abs\/1611.01144.  Eric Jang Shixiang Gu and Ben Poole. 2016. Categorical reparameterization with Gumbel-Softmax. arxiv:stat.ML\/1611.01144. Retrieved from https:\/\/arxiv.org\/abs\/1611.01144."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3424116","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3424116","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:51Z","timestamp":1750197711000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3424116"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,28]]},"references-count":40,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,2,28]]}},"alternative-id":["10.1145\/3424116"],"URL":"https:\/\/doi.org\/10.1145\/3424116","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,28]]},"assertion":[{"value":"2020-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}