{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T08:03:26Z","timestamp":1761897806164,"version":"3.41.0"},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,2,27]],"date-time":"2023-02-27T00:00:00Z","timestamp":1677456000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2023,5,31]]},"abstract":"<jats:p>Melody generation aims to learn the distribution of real melodies to generate new melodies conditioned on lyrics, which has been a very interesting topic in the area of artificial intelligence and music. However, a challenging issue still limits the quality and reliability of melody generation conditioned on lyrics: how to enhance the interpretability between the input lyrics and generated melodies so humans can understand their relationships. To solve this issue, in this article, we propose a model for melody generation from lyrics with local interpretability, which contains two significant contributions: (i) Mutual information between input lyrics and generated melody is exploited to instruct the training of the network, which avoids the loss of content consistency during the training stage. (ii) Transformer is explored to efficiently extract semantic features from lyrics sequences, which provides more interpretable correlations between different syllables in lyrics. Experiments on a large-scale dataset with paired lyrics-melodies demonstrate that the proposed approach can generate higher-quality melodies from lyrics compared with existing methods.<\/jats:p>\n          <jats:p\/>","DOI":"10.1145\/3572031","type":"journal-article","created":{"date-parts":[[2022,11,29]],"date-time":"2022-11-29T12:09:40Z","timestamp":1669723780000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Melody Generation from Lyrics with Local Interpretability"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9007-1667","authenticated-orcid":false,"given":"Wei","family":"Duan","sequence":"first","affiliation":[{"name":"Digital Content and Media Sciences Research Division, National Institute of Informatics, SOKENDAI, Chiyoda-ku, Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0294-6620","authenticated-orcid":false,"given":"Yi","family":"Yu","sequence":"additional","affiliation":[{"name":"Digital Content and Media Sciences Research Division, National Institute of Informatics, SOKENDAI, Chiyoda-ku, Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7005-992X","authenticated-orcid":false,"given":"Xulong","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5784-8411","authenticated-orcid":false,"given":"Suhua","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Computer and Network Engineering, Graduate School of Informatics and Engineering, The University of Electro-Communications, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4486-8341","authenticated-orcid":false,"given":"Wei","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4385-8798","authenticated-orcid":false,"given":"Keizo","family":"Oyama","sequence":"additional","affiliation":[{"name":"Digital Content and Media Sciences Research Division, National Institute of Informatics, SOKENDAI, Chiyoda-ku, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2023,2,27]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-55750-2_1"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-32233-5_39"},{"key":"e_1_3_1_4_2","volume-title":"4th International Conference on Learning Representations","author":"Bounliphone Wacha","year":"2016","unstructured":"Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, and Arthur Gretton. 2016. A test of relative similarity for model selection in generative models. In 4th International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1511.04581."},{"key":"e_1_3_1_5_2","first-page":"2172","volume-title":"29th International Conference on Neural Information Processing Systems","author":"Chen Xi","year":"2016","unstructured":"Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In 29th International Conference on Neural Information Processing Systems. 2172\u20132180."},{"key":"e_1_3_1_6_2","article-title":"Jukebox: A generative model for music","volume":"2005","author":"Dhariwal Prafulla","year":"2020","unstructured":"Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. 2020. Jukebox: A generative model for music. CoRR abs\/2005.00341 (2020).","journal-title":"CoRR"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11312"},{"key":"e_1_3_1_8_2","article-title":"Generative adversarial networks","volume":"1406","author":"Goodfellow Ian J.","year":"2014","unstructured":"Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative adversarial networks. CoRR abs\/1406.2661 (2014).","journal-title":"CoRR"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_1_10_2","volume-title":"7th International Conference on Learning Representations","author":"Huang Cheng-Zhi Anna","year":"2019","unstructured":"Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. 2019. Music transformer: Generating music with long-term structure. In 7th International Conference on Learning Representations."},{"key":"e_1_3_1_11_2","volume-title":"7th International Conference on Learning Representations","author":"Jolicoeur-Martineau Alexia","year":"2019","unstructured":"Alexia Jolicoeur-Martineau. 2019. The relativistic discriminator: A key element missing from standard GAN. In 7th International Conference on Learning Representations."},{"key":"e_1_3_1_12_2","article-title":"TeleMelody: Lyric-to-melody generation with a template-based two-stage method","volume":"2109","author":"Ju Zeqian","year":"2021","unstructured":"Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, and Tie-Yan Liu. 2021. TeleMelody: Lyric-to-melody generation with a template-based two-stage method. CoRR abs\/2109.09617 (2021).","journal-title":"CoRR"},{"key":"e_1_3_1_13_2","article-title":"GANS for sequences of discrete elements with the Gumbel-Softmax distribution","volume":"1611","author":"Kusner Matt J.","year":"2016","unstructured":"Matt J. Kusner and Jos\u00e9 Miguel Hern\u00e1ndez-Lobato. 2016. GANS for sequences of discrete elements with the Gumbel-Softmax distribution. CoRR abs\/1611.04051 (2016).","journal-title":"CoRR"},{"key":"e_1_3_1_14_2","first-page":"84","volume-title":"Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Lee Hsin-Pei","year":"2019","unstructured":"Hsin-Pei Lee, Jhih-Sheng Fang, and Wei-Yun Ma. 2019. iComposer: An automatic songwriting system for Chinese popular music. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 84\u201388."},{"key":"e_1_3_1_15_2","article-title":"Automatic neural lyrics and melody composition","volume":"2011","author":"Madhumani Gurunath Reddy","year":"2020","unstructured":"Gurunath Reddy Madhumani, Yi Yu, Florian Harsco\u00ebt, Simon Canales, and Suhua Tang. 2020. Automatic neural lyrics and melody composition. CoRR abs\/2011.06380 (2020).","journal-title":"CoRR"},{"key":"e_1_3_1_16_2","volume-title":"1st International Conference on Learning Representations, Workshop Track Proceedings","author":"Mikolov Tom\u00e1s","year":"2013","unstructured":"Tom\u00e1s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, Workshop Track Proceedings."},{"key":"e_1_3_1_17_2","first-page":"87","volume-title":"3rd International Conference on Computational Creativity","author":"Monteith Kristine","year":"2012","unstructured":"Kristine Monteith, Tony R. Martinez, and Dan Ventura. 2012. Automatic generation of melodic accompaniments for lyrics. In 3rd International Conference on Computational Creativity. 87\u201394."},{"key":"e_1_3_1_18_2","volume-title":"35th International Conference on International Computer Music Conference","author":"Nichols Eric","year":"2009","unstructured":"Eric Nichols. 2009. Lyric-based rhythm suggestion. In 35th International Conference on International Computer Music Conference."},{"key":"e_1_3_1_19_2","first-page":"311","volume-title":"40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In 40th Annual Meeting of the Association for Computational Linguistics. ACL, 311\u2013318."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413721"},{"key":"e_1_3_1_21_2","article-title":"Interpretable machine learning: Fundamental principles and 10 grand challenges","volume":"2103","author":"Rudin Cynthia","year":"2021","unstructured":"Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong. 2021. Interpretable machine learning: Fundamental principles and 10 grand challenges. CoRR abs\/2103.11251 (2021).","journal-title":"CoRR"},{"key":"e_1_3_1_22_2","first-page":"204","volume-title":"6th International Conference on Computational Creativity","author":"Scirea Marco","year":"2015","unstructured":"Marco Scirea, Gabriella A. B. Barros, Noor Shaker, and Julian Togelius. 2015. SMUG: Scientific music generator. In 6th International Conference on Computational Creativity. 204\u2013211."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i15.17626"},{"key":"e_1_3_1_24_2","first-page":"87","volume-title":"4th International Conference on Computational Creativity","author":"Toivanen Jukka M.","year":"2013","unstructured":"Jukka M. Toivanen, Hannu Toivonen, and Alessandro Valitutti. 2013. Automatical composition of lyrical songs. In 4th International Conference on Computational Creativity. 87\u201391."},{"key":"e_1_3_1_25_2","first-page":"5998","volume-title":"International Conference on Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In International Conference on Advances in Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3424116"},{"key":"e_1_3_1_27_2","article-title":"Conditional hybrid GAN for sequence generation","volume":"2009","author":"Yu Yi","year":"2020","unstructured":"Yi Yu, Abhishek Srivastava, and Rajiv Ratn Shah. 2020. Conditional hybrid GAN for sequence generation. CoRR abs\/2009.08616 (2020).","journal-title":"CoRR"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220105"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3572031","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3572031","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:14Z","timestamp":1750182674000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3572031"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,27]]},"references-count":27,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,5,31]]}},"alternative-id":["10.1145\/3572031"],"URL":"https:\/\/doi.org\/10.1145\/3572031","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2023,2,27]]},"assertion":[{"value":"2022-06-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-06","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}